All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 18:40 ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-25 18:40 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-bcache, lvm-devel

Hello,

We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
We have been using the 4.19 branch for months without issue; we just 
switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
related to the issue, I don't know, maybe coincidence:

	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
	{
	    int r;
	    enum allocation_event ev;
	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);

	    /* FIXME: we should loop round a couple of times */
	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
	    if (r)
		return r;

	    smd->begin = *b + 1;
	    r = sm_ll_inc(&smd->ll, *b, &ev);
	    if (!r) {
		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		smd->nr_allocated_this_transaction++;
	    }

	    return r;
	}

This is a brand-new thin pool created about 12 hours ago:

  lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0

We are using bcache, but I don't see any bcache code in the backtraces.  
The metadata is also on the bcache volume.

We were transferring data to the new thin volumes and it ran for about 12 
hours and then gave the trace below.  So far it has only happened once 
and I don't have a way to reproduce it.

Any idea what this BUG_ON would indicate and how we might contrive a fix?

-Eric



[199391.677689] ------------[ cut here ]------------
[199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
[199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
[199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
[199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
[199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199391.690984] Call Trace:
[199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199391.693852]  ? sort+0x17b/0x270
[199391.694527]  ? u32_swap+0x10/0x10
[199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199391.695890]  process_one_work+0x171/0x370
[199391.696640]  worker_thread+0x49/0x3f0
[199391.697332]  kthread+0xf8/0x130
[199391.697988]  ? max_active_store+0x80/0x80
[199391.698659]  ? kthread_bind+0x10/0x10
[199391.699281]  ret_from_fork+0x1f/0x40
[199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
[199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199391.708083] ---[ end trace c31536d98046e8ec ]---
[199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199391.877317] Kernel panic - not syncing: Fatal exception
[199391.878006] Kernel Offset: disabled
[199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
[199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
[199392.034277] Call Trace:
[199392.034929]  <IRQ>
[199392.035576]  native_apic_msr_write+0x2e/0x40
[199392.036228]  arch_irq_work_raise+0x28/0x40
[199392.036877]  irq_work_queue_on+0x83/0xa0
[199392.037518]  irq_work_run_list+0x4c/0x70
[199392.038149]  irq_work_run+0x14/0x40
[199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
[199392.039393]  call_function_single_interrupt+0xf/0x20
[199392.040011]  </IRQ>
[199392.040624] RIP: 0010:panic+0x209/0x25c
[199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.046486]  oops_end+0xc1/0xd0
[199392.047149]  do_trap+0x13d/0x150
[199392.047795]  do_error_trap+0xd5/0x130
[199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.049048]  invalid_op+0x14/0x20
[199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.054807]  ? __wake_up_common_lock+0x87/0xc0
[199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.057484]  ? sort+0x17b/0x270
[199392.058016]  ? u32_swap+0x10/0x10
[199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.059060]  process_one_work+0x171/0x370
[199392.059576]  worker_thread+0x49/0x3f0
[199392.060083]  kthread+0xf8/0x130
[199392.060587]  ? max_active_store+0x80/0x80
[199392.061086]  ? kthread_bind+0x10/0x10
[199392.061569]  ret_from_fork+0x1f/0x40
[199392.062038] ------------[ cut here ]------------
[199392.062508] sched: Unexpected reschedule of offline CPU#1!
[199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
[199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
[199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
[199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
[199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
[199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
[199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
[199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
[199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
[199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
[199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199392.078481] Call Trace:
[199392.079117]  <IRQ>
[199392.079745]  check_preempt_curr+0x6b/0x90
[199392.080373]  ttwu_do_wakeup+0x19/0x130
[199392.080999]  try_to_wake_up+0x1e2/0x460
[199392.081623]  __wake_up_common+0x8f/0x160
[199392.082246]  ep_poll_callback+0x1af/0x300
[199392.082860]  __wake_up_common+0x8f/0x160
[199392.083470]  __wake_up_common_lock+0x7a/0xc0
[199392.084074]  irq_work_run_list+0x4c/0x70
[199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
[199392.085277]  call_function_single_interrupt+0xf/0x20
[199392.085879]  </IRQ>
[199392.086477] RIP: 0010:panic+0x209/0x25c
[199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.092255]  oops_end+0xc1/0xd0
[199392.092894]  do_trap+0x13d/0x150
[199392.093516]  do_error_trap+0xd5/0x130
[199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.094718]  invalid_op+0x14/0x20
[199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.100229]  ? __wake_up_common_lock+0x87/0xc0
[199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.102750]  ? sort+0x17b/0x270
[199392.103242]  ? u32_swap+0x10/0x10
[199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.104228]  process_one_work+0x171/0x370
[199392.104714]  worker_thread+0x49/0x3f0
[199392.105193]  kthread+0xf8/0x130
[199392.105665]  ? max_active_store+0x80/0x80
[199392.106132]  ? kthread_bind+0x10/0x10
[199392.106601]  ret_from_fork+0x1f/0x40
[199392.107069] ---[ end trace c31536d98046e8ed ]---
[199392.107544] ------------[ cut here ]------------
[199392.108017] sched: Unexpected reschedule of offline CPU#7!
[199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
[199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
[199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
[199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
[199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
[199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
[199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
[199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
[199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
[199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
[199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199392.123867] Call Trace:
[199392.124495]  <IRQ>
[199392.125116]  update_process_times+0x40/0x50
[199392.125742]  tick_sched_handle+0x25/0x60
[199392.126367]  tick_sched_timer+0x37/0x70
[199392.126987]  __hrtimer_run_queues+0xfb/0x270
[199392.127601]  hrtimer_interrupt+0x122/0x270
[199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
[199392.128819]  apic_timer_interrupt+0xf/0x20
[199392.129423]  </IRQ>
[199392.130020] RIP: 0010:panic+0x209/0x25c
[199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.135792]  oops_end+0xc1/0xd0
[199392.136444]  do_trap+0x13d/0x150
[199392.137094]  do_error_trap+0xd5/0x130
[199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.138383]  invalid_op+0x14/0x20
[199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.144301]  ? __wake_up_common_lock+0x87/0xc0
[199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.146949]  ? sort+0x17b/0x270
[199392.147450]  ? u32_swap+0x10/0x10
[199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.148441]  process_one_work+0x171/0x370
[199392.148937]  worker_thread+0x49/0x3f0
[199392.149430]  kthread+0xf8/0x130
[199392.149922]  ? max_active_store+0x80/0x80
[199392.150406]  ? kthread_bind+0x10/0x10
[199392.150883]  ret_from_fork+0x1f/0x40
[199392.151353] ---[ end trace c31536d98046e8ee ]---


--
Eric Wheeler

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 18:40 ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-25 18:40 UTC (permalink / raw)
  To: dm-devel, linux-block, linux-bcache, lvm-devel

Hello,

We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
We have been using the 4.19 branch for months without issue; we just 
switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
related to the issue, I don't know, maybe coincidence:

	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
	{
	    int r;
	    enum allocation_event ev;
	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);

	    /* FIXME: we should loop round a couple of times */
	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
	    if (r)
		return r;

	    smd->begin = *b + 1;
	    r = sm_ll_inc(&smd->ll, *b, &ev);
	    if (!r) {
		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		smd->nr_allocated_this_transaction++;
	    }

	    return r;
	}

This is a brand-new thin pool created about 12 hours ago:

  lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0

We are using bcache, but I don't see any bcache code in the backtraces.  
The metadata is also on the bcache volume.

We were transferring data to the new thin volumes and it ran for about 12 
hours and then gave the trace below.  So far it has only happened once 
and I don't have a way to reproduce it.

Any idea what this BUG_ON would indicate and how we might contrive a fix?

-Eric



[199391.677689] ------------[ cut here ]------------
[199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
[199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
[199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
[199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
[199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199391.690984] Call Trace:
[199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199391.693852]  ? sort+0x17b/0x270
[199391.694527]  ? u32_swap+0x10/0x10
[199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199391.695890]  process_one_work+0x171/0x370
[199391.696640]  worker_thread+0x49/0x3f0
[199391.697332]  kthread+0xf8/0x130
[199391.697988]  ? max_active_store+0x80/0x80
[199391.698659]  ? kthread_bind+0x10/0x10
[199391.699281]  ret_from_fork+0x1f/0x40
[199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_
 bit drm_kms_helper
[199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199391.708083] ---[ end trace c31536d98046e8ec ]---
[199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199391.877317] Kernel panic - not syncing: Fatal exception
[199391.878006] Kernel Offset: disabled
[199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
[199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
[199392.034277] Call Trace:
[199392.034929]  <IRQ>
[199392.035576]  native_apic_msr_write+0x2e/0x40
[199392.036228]  arch_irq_work_raise+0x28/0x40
[199392.036877]  irq_work_queue_on+0x83/0xa0
[199392.037518]  irq_work_run_list+0x4c/0x70
[199392.038149]  irq_work_run+0x14/0x40
[199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
[199392.039393]  call_function_single_interrupt+0xf/0x20
[199392.040011]  </IRQ>
[199392.040624] RIP: 0010:panic+0x209/0x25c
[199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.046486]  oops_end+0xc1/0xd0
[199392.047149]  do_trap+0x13d/0x150
[199392.047795]  do_error_trap+0xd5/0x130
[199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.049048]  invalid_op+0x14/0x20
[199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.054807]  ? __wake_up_common_lock+0x87/0xc0
[199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.057484]  ? sort+0x17b/0x270
[199392.058016]  ? u32_swap+0x10/0x10
[199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.059060]  process_one_work+0x171/0x370
[199392.059576]  worker_thread+0x49/0x3f0
[199392.060083]  kthread+0xf8/0x130
[199392.060587]  ? max_active_store+0x80/0x80
[199392.061086]  ? kthread_bind+0x10/0x10
[199392.061569]  ret_from_fork+0x1f/0x40
[199392.062038] ------------[ cut here ]------------
[199392.062508] sched: Unexpected reschedule of offline CPU#1!
[199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_
 bit drm_kms_helper
[199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
[199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
[199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
[199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
[199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
[199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
[199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
[199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
[199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
[199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199392.078481] Call Trace:
[199392.079117]  <IRQ>
[199392.079745]  check_preempt_curr+0x6b/0x90
[199392.080373]  ttwu_do_wakeup+0x19/0x130
[199392.080999]  try_to_wake_up+0x1e2/0x460
[199392.081623]  __wake_up_common+0x8f/0x160
[199392.082246]  ep_poll_callback+0x1af/0x300
[199392.082860]  __wake_up_common+0x8f/0x160
[199392.083470]  __wake_up_common_lock+0x7a/0xc0
[199392.084074]  irq_work_run_list+0x4c/0x70
[199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
[199392.085277]  call_function_single_interrupt+0xf/0x20
[199392.085879]  </IRQ>
[199392.086477] RIP: 0010:panic+0x209/0x25c
[199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.092255]  oops_end+0xc1/0xd0
[199392.092894]  do_trap+0x13d/0x150
[199392.093516]  do_error_trap+0xd5/0x130
[199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.094718]  invalid_op+0x14/0x20
[199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.100229]  ? __wake_up_common_lock+0x87/0xc0
[199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.102750]  ? sort+0x17b/0x270
[199392.103242]  ? u32_swap+0x10/0x10
[199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.104228]  process_one_work+0x171/0x370
[199392.104714]  worker_thread+0x49/0x3f0
[199392.105193]  kthread+0xf8/0x130
[199392.105665]  ? max_active_store+0x80/0x80
[199392.106132]  ? kthread_bind+0x10/0x10
[199392.106601]  ret_from_fork+0x1f/0x40
[199392.107069] ---[ end trace c31536d98046e8ed ]---
[199392.107544] ------------[ cut here ]------------
[199392.108017] sched: Unexpected reschedule of offline CPU#7!
[199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_
 bit drm_kms_helper
[199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
[199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
[199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
[199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
[199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
[199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
[199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
[199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
[199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
[199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199392.123867] Call Trace:
[199392.124495]  <IRQ>
[199392.125116]  update_process_times+0x40/0x50
[199392.125742]  tick_sched_handle+0x25/0x60
[199392.126367]  tick_sched_timer+0x37/0x70
[199392.126987]  __hrtimer_run_queues+0xfb/0x270
[199392.127601]  hrtimer_interrupt+0x122/0x270
[199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
[199392.128819]  apic_timer_interrupt+0xf/0x20
[199392.129423]  </IRQ>
[199392.130020] RIP: 0010:panic+0x209/0x25c
[199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.135792]  oops_end+0xc1/0xd0
[199392.136444]  do_trap+0x13d/0x150
[199392.137094]  do_error_trap+0xd5/0x130
[199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.138383]  invalid_op+0x14/0x20
[199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.144301]  ? __wake_up_common_lock+0x87/0xc0
[199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.146949]  ? sort+0x17b/0x270
[199392.147450]  ? u32_swap+0x10/0x10
[199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.148441]  process_one_work+0x171/0x370
[199392.148937]  worker_thread+0x49/0x3f0
[199392.149430]  kthread+0xf8/0x130
[199392.149922]  ? max_active_store+0x80/0x80
[199392.150406]  ? kthread_bind+0x10/0x10
[199392.150883]  ret_from_fork+0x1f/0x40
[199392.151353] ---[ end trace c31536d98046e8ee ]---


--
Eric Wheeler

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 18:40 ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-25 18:40 UTC (permalink / raw)
  To: lvm-devel

Hello,

We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
We have been using the 4.19 branch for months without issue; we just 
switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
related to the issue, I don't know, maybe coincidence:

	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
	{
	    int r;
	    enum allocation_event ev;
	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);

	    /* FIXME: we should loop round a couple of times */
	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
	    if (r)
		return r;

	    smd->begin = *b + 1;
	    r = sm_ll_inc(&smd->ll, *b, &ev);
	    if (!r) {
		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
		smd->nr_allocated_this_transaction++;
	    }

	    return r;
	}

This is a brand-new thin pool created about 12 hours ago:

  lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0

We are using bcache, but I don't see any bcache code in the backtraces.  
The metadata is also on the bcache volume.

We were transferring data to the new thin volumes and it ran for about 12 
hours and then gave the trace below.  So far it has only happened once 
and I don't have a way to reproduce it.

Any idea what this BUG_ON would indicate and how we might contrive a fix?

-Eric



[199391.677689] ------------[ cut here ]------------
[199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
[199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
[199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
[199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
[199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199391.690984] Call Trace:
[199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199391.693852]  ? sort+0x17b/0x270
[199391.694527]  ? u32_swap+0x10/0x10
[199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199391.695890]  process_one_work+0x171/0x370
[199391.696640]  worker_thread+0x49/0x3f0
[199391.697332]  kthread+0xf8/0x130
[199391.697988]  ? max_active_store+0x80/0x80
[199391.698659]  ? kthread_bind+0x10/0x10
[199391.699281]  ret_from_fork+0x1f/0x40
[199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
[199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199391.708083] ---[ end trace c31536d98046e8ec ]---
[199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199391.877317] Kernel panic - not syncing: Fatal exception
[199391.878006] Kernel Offset: disabled
[199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
[199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
[199392.034277] Call Trace:
[199392.034929]  <IRQ>
[199392.035576]  native_apic_msr_write+0x2e/0x40
[199392.036228]  arch_irq_work_raise+0x28/0x40
[199392.036877]  irq_work_queue_on+0x83/0xa0
[199392.037518]  irq_work_run_list+0x4c/0x70
[199392.038149]  irq_work_run+0x14/0x40
[199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
[199392.039393]  call_function_single_interrupt+0xf/0x20
[199392.040011]  </IRQ>
[199392.040624] RIP: 0010:panic+0x209/0x25c
[199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.046486]  oops_end+0xc1/0xd0
[199392.047149]  do_trap+0x13d/0x150
[199392.047795]  do_error_trap+0xd5/0x130
[199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.049048]  invalid_op+0x14/0x20
[199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.054807]  ? __wake_up_common_lock+0x87/0xc0
[199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.057484]  ? sort+0x17b/0x270
[199392.058016]  ? u32_swap+0x10/0x10
[199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.059060]  process_one_work+0x171/0x370
[199392.059576]  worker_thread+0x49/0x3f0
[199392.060083]  kthread+0xf8/0x130
[199392.060587]  ? max_active_store+0x80/0x80
[199392.061086]  ? kthread_bind+0x10/0x10
[199392.061569]  ret_from_fork+0x1f/0x40
[199392.062038] ------------[ cut here ]------------
[199392.062508] sched: Unexpected reschedule of offline CPU#1!
[199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
[199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
[199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
[199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
[199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
[199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
[199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
[199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
[199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
[199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
[199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199392.078481] Call Trace:
[199392.079117]  <IRQ>
[199392.079745]  check_preempt_curr+0x6b/0x90
[199392.080373]  ttwu_do_wakeup+0x19/0x130
[199392.080999]  try_to_wake_up+0x1e2/0x460
[199392.081623]  __wake_up_common+0x8f/0x160
[199392.082246]  ep_poll_callback+0x1af/0x300
[199392.082860]  __wake_up_common+0x8f/0x160
[199392.083470]  __wake_up_common_lock+0x7a/0xc0
[199392.084074]  irq_work_run_list+0x4c/0x70
[199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
[199392.085277]  call_function_single_interrupt+0xf/0x20
[199392.085879]  </IRQ>
[199392.086477] RIP: 0010:panic+0x209/0x25c
[199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
[199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.092255]  oops_end+0xc1/0xd0
[199392.092894]  do_trap+0x13d/0x150
[199392.093516]  do_error_trap+0xd5/0x130
[199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.094718]  invalid_op+0x14/0x20
[199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.100229]  ? __wake_up_common_lock+0x87/0xc0
[199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.102750]  ? sort+0x17b/0x270
[199392.103242]  ? u32_swap+0x10/0x10
[199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.104228]  process_one_work+0x171/0x370
[199392.104714]  worker_thread+0x49/0x3f0
[199392.105193]  kthread+0xf8/0x130
[199392.105665]  ? max_active_store+0x80/0x80
[199392.106132]  ? kthread_bind+0x10/0x10
[199392.106601]  ret_from_fork+0x1f/0x40
[199392.107069] ---[ end trace c31536d98046e8ed ]---
[199392.107544] ------------[ cut here ]------------
[199392.108017] sched: Unexpected reschedule of offline CPU#7!
[199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
[199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
[199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
[199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
[199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
[199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
[199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
[199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
[199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
[199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
[199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
[199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
[199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
[199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
[199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
[199392.123867] Call Trace:
[199392.124495]  <IRQ>
[199392.125116]  update_process_times+0x40/0x50
[199392.125742]  tick_sched_handle+0x25/0x60
[199392.126367]  tick_sched_timer+0x37/0x70
[199392.126987]  __hrtimer_run_queues+0xfb/0x270
[199392.127601]  hrtimer_interrupt+0x122/0x270
[199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
[199392.128819]  apic_timer_interrupt+0xf/0x20
[199392.129423]  </IRQ>
[199392.130020] RIP: 0010:panic+0x209/0x25c
[199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
[199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
[199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
[199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
[199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
[199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
[199392.135792]  oops_end+0xc1/0xd0
[199392.136444]  do_trap+0x13d/0x150
[199392.137094]  do_error_trap+0xd5/0x130
[199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.138383]  invalid_op+0x14/0x20
[199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
[199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
[199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
[199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
[199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
[199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
[199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
[199392.144301]  ? __wake_up_common_lock+0x87/0xc0
[199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
[199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
[199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
[199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
[199392.146949]  ? sort+0x17b/0x270
[199392.147450]  ? u32_swap+0x10/0x10
[199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
[199392.148441]  process_one_work+0x171/0x370
[199392.148937]  worker_thread+0x49/0x3f0
[199392.149430]  kthread+0xf8/0x130
[199392.149922]  ? max_active_store+0x80/0x80
[199392.150406]  ? kthread_bind+0x10/0x10
[199392.150883]  ret_from_fork+0x1f/0x40
[199392.151353] ---[ end trace c31536d98046e8ee ]---


--
Eric Wheeler



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
  2019-09-25 18:40 ` Eric Wheeler
  (?)
@ 2019-09-25 20:01   ` Mike Snitzer
  -1 siblings, 0 replies; 43+ messages in thread
From: Mike Snitzer @ 2019-09-25 20:01 UTC (permalink / raw)
  To: Eric Wheeler, ejt; +Cc: dm-devel, linux-block, linux-bcache, lvm-devel

On Wed, Sep 25 2019 at  2:40pm -0400,
Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:

> Hello,
> 
> We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> We have been using the 4.19 branch for months without issue; we just 
> switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> related to the issue, I don't know, maybe coincidence:
> 
> 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> 	{
> 	    int r;
> 	    enum allocation_event ev;
> 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> 
> 	    /* FIXME: we should loop round a couple of times */
> 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> 	    if (r)
> 		return r;
> 
> 	    smd->begin = *b + 1;
> 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> 	    if (!r) {
> 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		smd->nr_allocated_this_transaction++;
> 	    }
> 
> 	    return r;
> 	}
> 
> This is a brand-new thin pool created about 12 hours ago:
> 
>   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> 
> We are using bcache, but I don't see any bcache code in the backtraces.  
> The metadata is also on the bcache volume.

So bcache is be used for both data and metadata.
 
> We were transferring data to the new thin volumes and it ran for about 12 
> hours and then gave the trace below.  So far it has only happened once 
> and I don't have a way to reproduce it.
> 
> Any idea what this BUG_ON would indicate and how we might contrive a fix?
>
> [199391.677689] ------------[ cut here ]------------
> [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.690984] Call Trace:
> [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199391.693852]  ? sort+0x17b/0x270
> [199391.694527]  ? u32_swap+0x10/0x10
> [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199391.695890]  process_one_work+0x171/0x370
> [199391.696640]  worker_thread+0x49/0x3f0
> [199391.697332]  kthread+0xf8/0x130
> [199391.697988]  ? max_active_store+0x80/0x80
> [199391.698659]  ? kthread_bind+0x10/0x10
> [199391.699281]  ret_from_fork+0x1f/0x40

The stack shows the call to sm_disk_new_block() is due to
dm_pool_alloc_data_block().

sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
getting called without the passed 'ev' being set to SM_ALLOC.  Only
drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()

sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
returns 0 -- which simply means success.  And on success
sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).

sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
1) ref_count wasn't set
or
2) old was identified

So all said: somehow a new data block was found to already be in use.
_WHY_ that is the case isn't clear from this stack...

But it does speak to the possibility of data block allocation racing
with other operations to the same block.  Which implies missing locking.

But that's all I've got so far... I'll review past dm-thinp changes with
all this in mind and see what turns up.  But Joe Thornber (ejt) likely
needs to have a look at this too.

But could it be that bcache is the source of the data device race (same
block used concurrently)?  And DM thinp is acting as the canary in the
coal mine?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 20:01   ` Mike Snitzer
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Snitzer @ 2019-09-25 20:01 UTC (permalink / raw)
  To: Eric Wheeler, ejt; +Cc: linux-block, dm-devel, linux-bcache, lvm-devel

On Wed, Sep 25 2019 at  2:40pm -0400,
Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:

> Hello,
> 
> We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> We have been using the 4.19 branch for months without issue; we just 
> switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> related to the issue, I don't know, maybe coincidence:
> 
> 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> 	{
> 	    int r;
> 	    enum allocation_event ev;
> 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> 
> 	    /* FIXME: we should loop round a couple of times */
> 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> 	    if (r)
> 		return r;
> 
> 	    smd->begin = *b + 1;
> 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> 	    if (!r) {
> 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		smd->nr_allocated_this_transaction++;
> 	    }
> 
> 	    return r;
> 	}
> 
> This is a brand-new thin pool created about 12 hours ago:
> 
>   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> 
> We are using bcache, but I don't see any bcache code in the backtraces.  
> The metadata is also on the bcache volume.

So bcache is be used for both data and metadata.
 
> We were transferring data to the new thin volumes and it ran for about 12 
> hours and then gave the trace below.  So far it has only happened once 
> and I don't have a way to reproduce it.
> 
> Any idea what this BUG_ON would indicate and how we might contrive a fix?
>
> [199391.677689] ------------[ cut here ]------------
> [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.690984] Call Trace:
> [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199391.693852]  ? sort+0x17b/0x270
> [199391.694527]  ? u32_swap+0x10/0x10
> [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199391.695890]  process_one_work+0x171/0x370
> [199391.696640]  worker_thread+0x49/0x3f0
> [199391.697332]  kthread+0xf8/0x130
> [199391.697988]  ? max_active_store+0x80/0x80
> [199391.698659]  ? kthread_bind+0x10/0x10
> [199391.699281]  ret_from_fork+0x1f/0x40

The stack shows the call to sm_disk_new_block() is due to
dm_pool_alloc_data_block().

sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
getting called without the passed 'ev' being set to SM_ALLOC.  Only
drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()

sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
returns 0 -- which simply means success.  And on success
sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).

sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
1) ref_count wasn't set
or
2) old was identified

So all said: somehow a new data block was found to already be in use.
_WHY_ that is the case isn't clear from this stack...

But it does speak to the possibility of data block allocation racing
with other operations to the same block.  Which implies missing locking.

But that's all I've got so far... I'll review past dm-thinp changes with
all this in mind and see what turns up.  But Joe Thornber (ejt) likely
needs to have a look at this too.

But could it be that bcache is the source of the data device race (same
block used concurrently)?  And DM thinp is acting as the canary in the
coal mine?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 20:01   ` Mike Snitzer
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Snitzer @ 2019-09-25 20:01 UTC (permalink / raw)
  To: lvm-devel

On Wed, Sep 25 2019 at  2:40pm -0400,
Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:

> Hello,
> 
> We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> We have been using the 4.19 branch for months without issue; we just 
> switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> related to the issue, I don't know, maybe coincidence:
> 
> 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> 	{
> 	    int r;
> 	    enum allocation_event ev;
> 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> 
> 	    /* FIXME: we should loop round a couple of times */
> 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> 	    if (r)
> 		return r;
> 
> 	    smd->begin = *b + 1;
> 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> 	    if (!r) {
> 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		smd->nr_allocated_this_transaction++;
> 	    }
> 
> 	    return r;
> 	}
> 
> This is a brand-new thin pool created about 12 hours ago:
> 
>   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> 
> We are using bcache, but I don't see any bcache code in the backtraces.  
> The metadata is also on the bcache volume.

So bcache is be used for both data and metadata.
 
> We were transferring data to the new thin volumes and it ran for about 12 
> hours and then gave the trace below.  So far it has only happened once 
> and I don't have a way to reproduce it.
> 
> Any idea what this BUG_ON would indicate and how we might contrive a fix?
>
> [199391.677689] ------------[ cut here ]------------
> [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.690984] Call Trace:
> [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199391.693852]  ? sort+0x17b/0x270
> [199391.694527]  ? u32_swap+0x10/0x10
> [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199391.695890]  process_one_work+0x171/0x370
> [199391.696640]  worker_thread+0x49/0x3f0
> [199391.697332]  kthread+0xf8/0x130
> [199391.697988]  ? max_active_store+0x80/0x80
> [199391.698659]  ? kthread_bind+0x10/0x10
> [199391.699281]  ret_from_fork+0x1f/0x40

The stack shows the call to sm_disk_new_block() is due to
dm_pool_alloc_data_block().

sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
getting called without the passed 'ev' being set to SM_ALLOC.  Only
drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()

sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
returns 0 -- which simply means success.  And on success
sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).

sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
1) ref_count wasn't set
or
2) old was identified

So all said: somehow a new data block was found to already be in use.
_WHY_ that is the case isn't clear from this stack...

But it does speak to the possibility of data block allocation racing
with other operations to the same block.  Which implies missing locking.

But that's all I've got so far... I'll review past dm-thinp changes with
all this in mind and see what turns up.  But Joe Thornber (ejt) likely
needs to have a look at this too.

But could it be that bcache is the source of the data device race (same
block used concurrently)?  And DM thinp is acting as the canary in the
coal mine?

Thanks,
Mike



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
  2019-09-25 20:01   ` Mike Snitzer
  (?)
@ 2019-09-25 20:33     ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-25 20:33 UTC (permalink / raw)
  To: Coly Li; +Cc: Mike Snitzer, ejt, dm-devel, linux-block, linux-bcache, lvm-devel

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> On Wed, Sep 25 2019 at  2:40pm -0400, Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:
> 
> > Hello,
> > 
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> > 
> > 	    return r;
> > 	}
> > 
> > This is a brand-new thin pool created about 12 hours ago:
> > 
> >   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> > 
> > We are using bcache, but I don't see any bcache code in the backtraces.  
> > The metadata is also on the bcache volume.
> 
> So bcache is be used for both data and metadata.

Correct.
  
> > We were transferring data to the new thin volumes and it ran for about 12 
> > hours and then gave the trace below.  So far it has only happened once 
> > and I don't have a way to reproduce it.
> > 
> > Any idea what this BUG_ON would indicate and how we might contrive a fix?
> >
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.
> 
> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?

Hi Mike, thanks for the detail.  

Coly, any idea on the possible bcache interaction here?


--
Eric Wheeler


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 20:33     ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-25 20:33 UTC (permalink / raw)
  To: Coly Li; +Cc: Mike Snitzer, dm-devel, linux-bcache, lvm-devel, linux-block, ejt

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> On Wed, Sep 25 2019 at  2:40pm -0400, Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:
> 
> > Hello,
> > 
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> > 
> > 	    return r;
> > 	}
> > 
> > This is a brand-new thin pool created about 12 hours ago:
> > 
> >   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> > 
> > We are using bcache, but I don't see any bcache code in the backtraces.  
> > The metadata is also on the bcache volume.
> 
> So bcache is be used for both data and metadata.

Correct.
  
> > We were transferring data to the new thin volumes and it ran for about 12 
> > hours and then gave the trace below.  So far it has only happened once 
> > and I don't have a way to reproduce it.
> > 
> > Any idea what this BUG_ON would indicate and how we might contrive a fix?
> >
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.
> 
> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?

Hi Mike, thanks for the detail.  

Coly, any idea on the possible bcache interaction here?


--
Eric Wheeler

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-25 20:33     ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-25 20:33 UTC (permalink / raw)
  To: lvm-devel

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> On Wed, Sep 25 2019 at  2:40pm -0400, Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:
> 
> > Hello,
> > 
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> > 
> > 	    return r;
> > 	}
> > 
> > This is a brand-new thin pool created about 12 hours ago:
> > 
> >   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> > 
> > We are using bcache, but I don't see any bcache code in the backtraces.  
> > The metadata is also on the bcache volume.
> 
> So bcache is be used for both data and metadata.

Correct.
  
> > We were transferring data to the new thin volumes and it ran for about 12 
> > hours and then gave the trace below.  So far it has only happened once 
> > and I don't have a way to reproduce it.
> > 
> > Any idea what this BUG_ON would indicate and how we might contrive a fix?
> >
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.
> 
> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?

Hi Mike, thanks for the detail.  

Coly, any idea on the possible bcache interaction here?


--
Eric Wheeler



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
  2019-09-25 20:01   ` Mike Snitzer
  (?)
@ 2019-09-26 18:27     ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-26 18:27 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: ejt, Coly Li, dm-devel, linux-block, linux-bcache, lvm-devel

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> On Wed, Sep 25 2019 at  2:40pm -0400,
> Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:
> 
> > Hello,
> > 
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> > 
> > 	    return r;
> > 	}
> > 
> > This is a brand-new thin pool created about 12 hours ago:
> > 
> >   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> > 
> > We are using bcache, but I don't see any bcache code in the backtraces.  
> > The metadata is also on the bcache volume.
> 
> So bcache is be used for both data and metadata.

Hi Mike, 

I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
volume and we got the same trace this morning, so while the tdata still 
passes through bcache, all meta operations are direct to an SSD. This is 
still using multi-queue scsi, but dm_mod.use_blk_mq=N.

Since bcache is no longer involved with metadata operations, and since 
this appears to be a metadata issue, are there any other reasons to 
suspect bcache?

Since we seem to hit this every night, I can try any patches that you 
would like for testing. I appreciate your help, hopefully we can solve 
this quickly. 


-Eric
  
> > We were transferring data to the new thin volumes and it ran for about 12 
> > hours and then gave the trace below.  So far it has only happened once 
> > and I don't have a way to reproduce it.
> > 
> > Any idea what this BUG_ON would indicate and how we might contrive a fix?
> >
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.
> 
> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?
> 
> Thanks,
> Mike
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-26 18:27     ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-26 18:27 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, linux-bcache, lvm-devel, linux-block, ejt, Coly Li

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> On Wed, Sep 25 2019 at  2:40pm -0400,
> Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:
> 
> > Hello,
> > 
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> > 
> > 	    return r;
> > 	}
> > 
> > This is a brand-new thin pool created about 12 hours ago:
> > 
> >   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> > 
> > We are using bcache, but I don't see any bcache code in the backtraces.  
> > The metadata is also on the bcache volume.
> 
> So bcache is be used for both data and metadata.

Hi Mike, 

I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
volume and we got the same trace this morning, so while the tdata still 
passes through bcache, all meta operations are direct to an SSD. This is 
still using multi-queue scsi, but dm_mod.use_blk_mq=N.

Since bcache is no longer involved with metadata operations, and since 
this appears to be a metadata issue, are there any other reasons to 
suspect bcache?

Since we seem to hit this every night, I can try any patches that you 
would like for testing. I appreciate your help, hopefully we can solve 
this quickly. 


-Eric
  
> > We were transferring data to the new thin volumes and it ran for about 12 
> > hours and then gave the trace below.  So far it has only happened once 
> > and I don't have a way to reproduce it.
> > 
> > Any idea what this BUG_ON would indicate and how we might contrive a fix?
> >
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.
> 
> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?
> 
> Thanks,
> Mike
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-26 18:27     ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-26 18:27 UTC (permalink / raw)
  To: lvm-devel

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> On Wed, Sep 25 2019 at  2:40pm -0400,
> Eric Wheeler <dm-devel@lists.ewheeler.net> wrote:
> 
> > Hello,
> > 
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> > 
> > 	    return r;
> > 	}
> > 
> > This is a brand-new thin pool created about 12 hours ago:
> > 
> >   lvcreate -c 64k -L 12t --type thin-pool --thinpool data-pool --poolmetadatasize 16G data /dev/bcache0
> > 
> > We are using bcache, but I don't see any bcache code in the backtraces.  
> > The metadata is also on the bcache volume.
> 
> So bcache is be used for both data and metadata.

Hi Mike, 

I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
volume and we got the same trace this morning, so while the tdata still 
passes through bcache, all meta operations are direct to an SSD. This is 
still using multi-queue scsi, but dm_mod.use_blk_mq=N.

Since bcache is no longer involved with metadata operations, and since 
this appears to be a metadata issue, are there any other reasons to 
suspect bcache?

Since we seem to hit this every night, I can try any patches that you 
would like for testing. I appreciate your help, hopefully we can solve 
this quickly. 


-Eric
  
> > We were transferring data to the new thin volumes and it ran for about 12 
> > hours and then gave the trace below.  So far it has only happened once 
> > and I don't have a way to reproduce it.
> > 
> > Any idea what this BUG_ON would indicate and how we might contrive a fix?
> >
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.
> 
> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?
> 
> Thanks,
> Mike
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
  2019-09-26 18:27     ` Eric Wheeler
  (?)
@ 2019-09-27  8:32       ` Joe Thornber
  -1 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2019-09-27  8:32 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: Mike Snitzer, ejt, Coly Li, dm-devel, linux-block, linux-bcache,
	lvm-devel, joe.thornber

Hi Eric,

On Thu, Sep 26, 2019 at 06:27:09PM +0000, Eric Wheeler wrote:
> I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
> volume and we got the same trace this morning, so while the tdata still 
> passes through bcache, all meta operations are direct to an SSD. This is 
> still using multi-queue scsi, but dm_mod.use_blk_mq=N.
> 
> Since bcache is no longer involved with metadata operations, and since 
> this appears to be a metadata issue, are there any other reasons to 
> suspect bcache?

Did you recreate the pool, or are you just using the existing pool but with
a different IO path?  If it's the latter then there could still be something
wrong with the metadata, introduced while bcache was in the stack.

Would it be possible to send me a copy of the metadata device please so
I can double check the space maps (I presume you've run thin_check on it)?

[Assuming you're using the existing pool] Another useful experiment would be to 
thump_dump and then thin_restore the metadata, which will create totally fresh
metadata and see if you can still reproduce the issue.

Thanks,

- Joe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-27  8:32       ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2019-09-27  8:32 UTC (permalink / raw)
  To: Eric Wheeler
  Cc: joe.thornber, Mike Snitzer, dm-devel, linux-bcache, Coly Li,
	linux-block, ejt, lvm-devel

Hi Eric,

On Thu, Sep 26, 2019 at 06:27:09PM +0000, Eric Wheeler wrote:
> I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
> volume and we got the same trace this morning, so while the tdata still 
> passes through bcache, all meta operations are direct to an SSD. This is 
> still using multi-queue scsi, but dm_mod.use_blk_mq=N.
> 
> Since bcache is no longer involved with metadata operations, and since 
> this appears to be a metadata issue, are there any other reasons to 
> suspect bcache?

Did you recreate the pool, or are you just using the existing pool but with
a different IO path?  If it's the latter then there could still be something
wrong with the metadata, introduced while bcache was in the stack.

Would it be possible to send me a copy of the metadata device please so
I can double check the space maps (I presume you've run thin_check on it)?

[Assuming you're using the existing pool] Another useful experiment would be to 
thump_dump and then thin_restore the metadata, which will create totally fresh
metadata and see if you can still reproduce the issue.

Thanks,

- Joe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-27  8:32       ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2019-09-27  8:32 UTC (permalink / raw)
  To: lvm-devel

Hi Eric,

On Thu, Sep 26, 2019 at 06:27:09PM +0000, Eric Wheeler wrote:
> I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
> volume and we got the same trace this morning, so while the tdata still 
> passes through bcache, all meta operations are direct to an SSD. This is 
> still using multi-queue scsi, but dm_mod.use_blk_mq=N.
> 
> Since bcache is no longer involved with metadata operations, and since 
> this appears to be a metadata issue, are there any other reasons to 
> suspect bcache?

Did you recreate the pool, or are you just using the existing pool but with
a different IO path?  If it's the latter then there could still be something
wrong with the metadata, introduced while bcache was in the stack.

Would it be possible to send me a copy of the metadata device please so
I can double check the space maps (I presume you've run thin_check on it)?

[Assuming you're using the existing pool] Another useful experiment would be to 
thump_dump and then thin_restore the metadata, which will create totally fresh
metadata and see if you can still reproduce the issue.

Thanks,

- Joe



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
  2019-09-27  8:32       ` Joe Thornber
  (?)
@ 2019-09-27 18:45         ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-27 18:45 UTC (permalink / raw)
  To: Joe Thornber
  Cc: Mike Snitzer, ejt, Coly Li, dm-devel, linux-block, linux-bcache,
	lvm-devel, joe.thornber

On Fri, 27 Sep 2019, Joe Thornber wrote:

> Hi Eric,
> 
> On Thu, Sep 26, 2019 at 06:27:09PM +0000, Eric Wheeler wrote:
> > I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
> > volume and we got the same trace this morning, so while the tdata still 
> > passes through bcache, all meta operations are direct to an SSD. This is 
> > still using multi-queue scsi, but dm_mod.use_blk_mq=N.
> > 
> > Since bcache is no longer involved with metadata operations, and since 
> > this appears to be a metadata issue, are there any other reasons to 
> > suspect bcache?
> 
> Did you recreate the pool, or are you just using the existing pool but with
> a different IO path?  If it's the latter then there could still be something
> wrong with the metadata, introduced while bcache was in the stack.

We did not create the pool after the initial problem, though the pool was 
new just before the problem occurred. 
 
> Would it be possible to send me a copy of the metadata device please so
> I can double check the space maps (I presume you've run thin_check on it)?

~]# /usr/local/bin/thin_check /dev/mapper/data-data--pool_tmeta 
examining superblock
TRANSACTION_ID=2347
METADATA_FREE_BLOCKS=4145151
examining devices tree
examining mapping tree
checking space map counts

~]# echo $?
0

~]# /usr/local/bin/thin_check -V
0.8.5

> [Assuming you're using the existing pool] Another useful experiment would be to 
> thump_dump and then thin_restore the metadata, which will create totally fresh
> metadata and see if you can still reproduce the issue.

It didn't lockup last night, but I'll keep working to reproduce the 
problem and let you know what we find.

Mike said it could be a race:

> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
>
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.

Would a spinlock on the block solve the issue?

Where might such a spinlock be added?


--
Eric Wheeler


> 
> Thanks,
> 
> - Joe
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-27 18:45         ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-27 18:45 UTC (permalink / raw)
  To: Joe Thornber
  Cc: joe.thornber, Mike Snitzer, dm-devel, linux-bcache, Coly Li,
	linux-block, ejt, lvm-devel

On Fri, 27 Sep 2019, Joe Thornber wrote:

> Hi Eric,
> 
> On Thu, Sep 26, 2019 at 06:27:09PM +0000, Eric Wheeler wrote:
> > I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
> > volume and we got the same trace this morning, so while the tdata still 
> > passes through bcache, all meta operations are direct to an SSD. This is 
> > still using multi-queue scsi, but dm_mod.use_blk_mq=N.
> > 
> > Since bcache is no longer involved with metadata operations, and since 
> > this appears to be a metadata issue, are there any other reasons to 
> > suspect bcache?
> 
> Did you recreate the pool, or are you just using the existing pool but with
> a different IO path?  If it's the latter then there could still be something
> wrong with the metadata, introduced while bcache was in the stack.

We did not create the pool after the initial problem, though the pool was 
new just before the problem occurred. 
 
> Would it be possible to send me a copy of the metadata device please so
> I can double check the space maps (I presume you've run thin_check on it)?

~]# /usr/local/bin/thin_check /dev/mapper/data-data--pool_tmeta 
examining superblock
TRANSACTION_ID=2347
METADATA_FREE_BLOCKS=4145151
examining devices tree
examining mapping tree
checking space map counts

~]# echo $?
0

~]# /usr/local/bin/thin_check -V
0.8.5

> [Assuming you're using the existing pool] Another useful experiment would be to 
> thump_dump and then thin_restore the metadata, which will create totally fresh
> metadata and see if you can still reproduce the issue.

It didn't lockup last night, but I'll keep working to reproduce the 
problem and let you know what we find.

Mike said it could be a race:

> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
>
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.

Would a spinlock on the block solve the issue?

Where might such a spinlock be added?


--
Eric Wheeler


> 
> Thanks,
> 
> - Joe
> 

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-09-27 18:45         ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-09-27 18:45 UTC (permalink / raw)
  To: lvm-devel

On Fri, 27 Sep 2019, Joe Thornber wrote:

> Hi Eric,
> 
> On Thu, Sep 26, 2019 at 06:27:09PM +0000, Eric Wheeler wrote:
> > I pvmoved the tmeta to an SSD logical volume (dm-linear) on a non-bcache 
> > volume and we got the same trace this morning, so while the tdata still 
> > passes through bcache, all meta operations are direct to an SSD. This is 
> > still using multi-queue scsi, but dm_mod.use_blk_mq=N.
> > 
> > Since bcache is no longer involved with metadata operations, and since 
> > this appears to be a metadata issue, are there any other reasons to 
> > suspect bcache?
> 
> Did you recreate the pool, or are you just using the existing pool but with
> a different IO path?  If it's the latter then there could still be something
> wrong with the metadata, introduced while bcache was in the stack.

We did not create the pool after the initial problem, though the pool was 
new just before the problem occurred. 
 
> Would it be possible to send me a copy of the metadata device please so
> I can double check the space maps (I presume you've run thin_check on it)?

~]# /usr/local/bin/thin_check /dev/mapper/data-data--pool_tmeta 
examining superblock
TRANSACTION_ID=2347
METADATA_FREE_BLOCKS=4145151
examining devices tree
examining mapping tree
checking space map counts

~]# echo $?
0

~]# /usr/local/bin/thin_check -V
0.8.5

> [Assuming you're using the existing pool] Another useful experiment would be to 
> thump_dump and then thin_restore the metadata, which will create totally fresh
> metadata and see if you can still reproduce the issue.

It didn't lockup last night, but I'll keep working to reproduce the 
problem and let you know what we find.

Mike said it could be a race:

> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
>
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.

Would a spinlock on the block solve the issue?

Where might such a spinlock be added?


--
Eric Wheeler


> 
> Thanks,
> 
> - Joe
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
  2019-09-25 18:40 ` Eric Wheeler
  (?)
@ 2019-12-20 19:54   ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-20 19:54 UTC (permalink / raw)
  To: ejt
  Cc: Mike Snitzer, joe.thornber, dm-devel, linux-block, lvm-devel,
	markus.schade

On Wed, 25 Sep 2019, Eric Wheeler wrote:
> We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> We have been using the 4.19 branch for months without issue; we just 
> switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> related to the issue, I don't know, maybe coincidence:
> 
> 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> 	{
> 	    int r;
> 	    enum allocation_event ev;
> 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> 
> 	    /* FIXME: we should loop round a couple of times */
> 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> 	    if (r)
> 		return r;
> 
> 	    smd->begin = *b + 1;
> 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> 	    if (!r) {
> 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		smd->nr_allocated_this_transaction++;
> 	    }

Hello all,

We hit this BUG_ON again, this time with 4.19.86 with 
scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
additionally reported by Markus Schade:

  https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
     and
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398

In our case, the most latest trace (below) is from a different system that
has been stable for years on Linux 4.1 with tmeta direct on the SSD.
We updated to 4.19.86 a few weeks ago and just hit this, what Mike
Snitzer explains to be an allocator race:

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.

Where would you look to add locking do you think? 

> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?

As Marcus has shown, this bug triggers without bcache.


Other questions:

1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:

+	spin_lock(&lock); /* protect smd->begin */
	smd->begin = *b + 1;
	r = sm_ll_inc(&smd->ll, *b, &ev);
	if (!r) {
-		BUG_ON(ev != SM_ALLOC); 
		smd->nr_allocated_this_transaction++;
	}
+	else {
+		r = -ENOSPC;
+		smd->begin = *b - 1;
+	}
+	spin_unlock(&lock);

The lock might protect smd->begin, but I'm not sure how &smd->ll might 
have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
sm_ll_mutate() then perhaps this is safe.  What do you think?

Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
of space, but I would take it over a BUG_ON.

2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
   taught retry?  This bug shows up weeks or months apart, even on heavily 
   loaded systems with ~100 live thin volumes, so retrying would be fine 
   IMHO.

3) In the thread from June, Marcus says:
	"Judging from the call trace, my guess is that there is somewhere 
	a race condition, when a new block needs to be allocated which has 
	still to be discarded."

Is this discard sitation possible?  Wouldn't the bio prison prevent this?

--
Eric Wheeler
www.datawall.us



Here is the new trace, old trace below:

kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
Workqueue: dm-thin do_worker [dm_thin_pool]
RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
Call Trace:
 dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
 alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
 process_cell+0x2a3/0x550 [dm_thin_pool]
 ? mempool_alloc+0x6f/0x180
 ? sort+0x17b/0x270
 ? u32_swap+0x10/0x10
 process_deferred_bios+0x1af/0x870 [dm_thin_pool]
 do_worker+0x94/0xe0 [dm_thin_pool]
 process_one_work+0x171/0x370
 worker_thread+0x49/0x3f0
 kthread+0xf8/0x130
 ? max_active_store+0x80/0x80
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x1f/0x40

> 
> [199391.677689] ------------[ cut here ]------------
> [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.690984] Call Trace:
> [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199391.693852]  ? sort+0x17b/0x270
> [199391.694527]  ? u32_swap+0x10/0x10
> [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199391.695890]  process_one_work+0x171/0x370
> [199391.696640]  worker_thread+0x49/0x3f0
> [199391.697332]  kthread+0xf8/0x130
> [199391.697988]  ? max_active_store+0x80/0x80
> [199391.698659]  ? kthread_bind+0x10/0x10
> [199391.699281]  ret_from_fork+0x1f/0x40
> [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199391.708083] ---[ end trace c31536d98046e8ec ]---
> [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.877317] Kernel panic - not syncing: Fatal exception
> [199391.878006] Kernel Offset: disabled
> [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> [199392.034277] Call Trace:
> [199392.034929]  <IRQ>
> [199392.035576]  native_apic_msr_write+0x2e/0x40
> [199392.036228]  arch_irq_work_raise+0x28/0x40
> [199392.036877]  irq_work_queue_on+0x83/0xa0
> [199392.037518]  irq_work_run_list+0x4c/0x70
> [199392.038149]  irq_work_run+0x14/0x40
> [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> [199392.039393]  call_function_single_interrupt+0xf/0x20
> [199392.040011]  </IRQ>
> [199392.040624] RIP: 0010:panic+0x209/0x25c
> [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.046486]  oops_end+0xc1/0xd0
> [199392.047149]  do_trap+0x13d/0x150
> [199392.047795]  do_error_trap+0xd5/0x130
> [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.049048]  invalid_op+0x14/0x20
> [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.057484]  ? sort+0x17b/0x270
> [199392.058016]  ? u32_swap+0x10/0x10
> [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.059060]  process_one_work+0x171/0x370
> [199392.059576]  worker_thread+0x49/0x3f0
> [199392.060083]  kthread+0xf8/0x130
> [199392.060587]  ? max_active_store+0x80/0x80
> [199392.061086]  ? kthread_bind+0x10/0x10
> [199392.061569]  ret_from_fork+0x1f/0x40
> [199392.062038] ------------[ cut here ]------------
> [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199392.078481] Call Trace:
> [199392.079117]  <IRQ>
> [199392.079745]  check_preempt_curr+0x6b/0x90
> [199392.080373]  ttwu_do_wakeup+0x19/0x130
> [199392.080999]  try_to_wake_up+0x1e2/0x460
> [199392.081623]  __wake_up_common+0x8f/0x160
> [199392.082246]  ep_poll_callback+0x1af/0x300
> [199392.082860]  __wake_up_common+0x8f/0x160
> [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> [199392.084074]  irq_work_run_list+0x4c/0x70
> [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> [199392.085277]  call_function_single_interrupt+0xf/0x20
> [199392.085879]  </IRQ>
> [199392.086477] RIP: 0010:panic+0x209/0x25c
> [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.092255]  oops_end+0xc1/0xd0
> [199392.092894]  do_trap+0x13d/0x150
> [199392.093516]  do_error_trap+0xd5/0x130
> [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.094718]  invalid_op+0x14/0x20
> [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.102750]  ? sort+0x17b/0x270
> [199392.103242]  ? u32_swap+0x10/0x10
> [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.104228]  process_one_work+0x171/0x370
> [199392.104714]  worker_thread+0x49/0x3f0
> [199392.105193]  kthread+0xf8/0x130
> [199392.105665]  ? max_active_store+0x80/0x80
> [199392.106132]  ? kthread_bind+0x10/0x10
> [199392.106601]  ret_from_fork+0x1f/0x40
> [199392.107069] ---[ end trace c31536d98046e8ed ]---
> [199392.107544] ------------[ cut here ]------------
> [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199392.123867] Call Trace:
> [199392.124495]  <IRQ>
> [199392.125116]  update_process_times+0x40/0x50
> [199392.125742]  tick_sched_handle+0x25/0x60
> [199392.126367]  tick_sched_timer+0x37/0x70
> [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> [199392.127601]  hrtimer_interrupt+0x122/0x270
> [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> [199392.128819]  apic_timer_interrupt+0xf/0x20
> [199392.129423]  </IRQ>
> [199392.130020] RIP: 0010:panic+0x209/0x25c
> [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.135792]  oops_end+0xc1/0xd0
> [199392.136444]  do_trap+0x13d/0x150
> [199392.137094]  do_error_trap+0xd5/0x130
> [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.138383]  invalid_op+0x14/0x20
> [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.146949]  ? sort+0x17b/0x270
> [199392.147450]  ? u32_swap+0x10/0x10
> [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.148441]  process_one_work+0x171/0x370
> [199392.148937]  worker_thread+0x49/0x3f0
> [199392.149430]  kthread+0xf8/0x130
> [199392.149922]  ? max_active_store+0x80/0x80
> [199392.150406]  ? kthread_bind+0x10/0x10
> [199392.150883]  ret_from_fork+0x1f/0x40
> [199392.151353] ---[ end trace c31536d98046e8ee ]---
> 
> 
> --
> Eric Wheeler
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-12-20 19:54   ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-20 19:54 UTC (permalink / raw)
  To: ejt
  Cc: Mike Snitzer, markus.schade, lvm-devel, linux-block, dm-devel,
	joe.thornber

On Wed, 25 Sep 2019, Eric Wheeler wrote:
> We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> We have been using the 4.19 branch for months without issue; we just 
> switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> related to the issue, I don't know, maybe coincidence:
> 
> 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> 	{
> 	    int r;
> 	    enum allocation_event ev;
> 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> 
> 	    /* FIXME: we should loop round a couple of times */
> 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> 	    if (r)
> 		return r;
> 
> 	    smd->begin = *b + 1;
> 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> 	    if (!r) {
> 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		smd->nr_allocated_this_transaction++;
> 	    }

Hello all,

We hit this BUG_ON again, this time with 4.19.86 with 
scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
additionally reported by Markus Schade:

  https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
     and
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398

In our case, the most latest trace (below) is from a different system that
has been stable for years on Linux 4.1 with tmeta direct on the SSD.
We updated to 4.19.86 a few weeks ago and just hit this, what Mike
Snitzer explains to be an allocator race:

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.

Where would you look to add locking do you think? 

> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?

As Marcus has shown, this bug triggers without bcache.


Other questions:

1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:

+	spin_lock(&lock); /* protect smd->begin */
	smd->begin = *b + 1;
	r = sm_ll_inc(&smd->ll, *b, &ev);
	if (!r) {
-		BUG_ON(ev != SM_ALLOC); 
		smd->nr_allocated_this_transaction++;
	}
+	else {
+		r = -ENOSPC;
+		smd->begin = *b - 1;
+	}
+	spin_unlock(&lock);

The lock might protect smd->begin, but I'm not sure how &smd->ll might 
have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
sm_ll_mutate() then perhaps this is safe.  What do you think?

Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
of space, but I would take it over a BUG_ON.

2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
   taught retry?  This bug shows up weeks or months apart, even on heavily 
   loaded systems with ~100 live thin volumes, so retrying would be fine 
   IMHO.

3) In the thread from June, Marcus says:
	"Judging from the call trace, my guess is that there is somewhere 
	a race condition, when a new block needs to be allocated which has 
	still to be discarded."

Is this discard sitation possible?  Wouldn't the bio prison prevent this?

--
Eric Wheeler
www.datawall.us



Here is the new trace, old trace below:

kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
Workqueue: dm-thin do_worker [dm_thin_pool]
RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
Call Trace:
 dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
 alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
 process_cell+0x2a3/0x550 [dm_thin_pool]
 ? mempool_alloc+0x6f/0x180
 ? sort+0x17b/0x270
 ? u32_swap+0x10/0x10
 process_deferred_bios+0x1af/0x870 [dm_thin_pool]
 do_worker+0x94/0xe0 [dm_thin_pool]
 process_one_work+0x171/0x370
 worker_thread+0x49/0x3f0
 kthread+0xf8/0x130
 ? max_active_store+0x80/0x80
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x1f/0x40

> 
> [199391.677689] ------------[ cut here ]------------
> [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.690984] Call Trace:
> [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199391.693852]  ? sort+0x17b/0x270
> [199391.694527]  ? u32_swap+0x10/0x10
> [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199391.695890]  process_one_work+0x171/0x370
> [199391.696640]  worker_thread+0x49/0x3f0
> [199391.697332]  kthread+0xf8/0x130
> [199391.697988]  ? max_active_store+0x80/0x80
> [199391.698659]  ? kthread_bind+0x10/0x10
> [199391.699281]  ret_from_fork+0x1f/0x40
> [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_alg
 o_bit drm_kms_helper
> [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199391.708083] ---[ end trace c31536d98046e8ec ]---
> [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.877317] Kernel panic - not syncing: Fatal exception
> [199391.878006] Kernel Offset: disabled
> [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> [199392.034277] Call Trace:
> [199392.034929]  <IRQ>
> [199392.035576]  native_apic_msr_write+0x2e/0x40
> [199392.036228]  arch_irq_work_raise+0x28/0x40
> [199392.036877]  irq_work_queue_on+0x83/0xa0
> [199392.037518]  irq_work_run_list+0x4c/0x70
> [199392.038149]  irq_work_run+0x14/0x40
> [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> [199392.039393]  call_function_single_interrupt+0xf/0x20
> [199392.040011]  </IRQ>
> [199392.040624] RIP: 0010:panic+0x209/0x25c
> [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.046486]  oops_end+0xc1/0xd0
> [199392.047149]  do_trap+0x13d/0x150
> [199392.047795]  do_error_trap+0xd5/0x130
> [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.049048]  invalid_op+0x14/0x20
> [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.057484]  ? sort+0x17b/0x270
> [199392.058016]  ? u32_swap+0x10/0x10
> [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.059060]  process_one_work+0x171/0x370
> [199392.059576]  worker_thread+0x49/0x3f0
> [199392.060083]  kthread+0xf8/0x130
> [199392.060587]  ? max_active_store+0x80/0x80
> [199392.061086]  ? kthread_bind+0x10/0x10
> [199392.061569]  ret_from_fork+0x1f/0x40
> [199392.062038] ------------[ cut here ]------------
> [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_alg
 o_bit drm_kms_helper
> [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199392.078481] Call Trace:
> [199392.079117]  <IRQ>
> [199392.079745]  check_preempt_curr+0x6b/0x90
> [199392.080373]  ttwu_do_wakeup+0x19/0x130
> [199392.080999]  try_to_wake_up+0x1e2/0x460
> [199392.081623]  __wake_up_common+0x8f/0x160
> [199392.082246]  ep_poll_callback+0x1af/0x300
> [199392.082860]  __wake_up_common+0x8f/0x160
> [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> [199392.084074]  irq_work_run_list+0x4c/0x70
> [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> [199392.085277]  call_function_single_interrupt+0xf/0x20
> [199392.085879]  </IRQ>
> [199392.086477] RIP: 0010:panic+0x209/0x25c
> [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.092255]  oops_end+0xc1/0xd0
> [199392.092894]  do_trap+0x13d/0x150
> [199392.093516]  do_error_trap+0xd5/0x130
> [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.094718]  invalid_op+0x14/0x20
> [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.102750]  ? sort+0x17b/0x270
> [199392.103242]  ? u32_swap+0x10/0x10
> [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.104228]  process_one_work+0x171/0x370
> [199392.104714]  worker_thread+0x49/0x3f0
> [199392.105193]  kthread+0xf8/0x130
> [199392.105665]  ? max_active_store+0x80/0x80
> [199392.106132]  ? kthread_bind+0x10/0x10
> [199392.106601]  ret_from_fork+0x1f/0x40
> [199392.107069] ---[ end trace c31536d98046e8ed ]---
> [199392.107544] ------------[ cut here ]------------
> [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_alg
 o_bit drm_kms_helper
> [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199392.123867] Call Trace:
> [199392.124495]  <IRQ>
> [199392.125116]  update_process_times+0x40/0x50
> [199392.125742]  tick_sched_handle+0x25/0x60
> [199392.126367]  tick_sched_timer+0x37/0x70
> [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> [199392.127601]  hrtimer_interrupt+0x122/0x270
> [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> [199392.128819]  apic_timer_interrupt+0xf/0x20
> [199392.129423]  </IRQ>
> [199392.130020] RIP: 0010:panic+0x209/0x25c
> [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.135792]  oops_end+0xc1/0xd0
> [199392.136444]  do_trap+0x13d/0x150
> [199392.137094]  do_error_trap+0xd5/0x130
> [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.138383]  invalid_op+0x14/0x20
> [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.146949]  ? sort+0x17b/0x270
> [199392.147450]  ? u32_swap+0x10/0x10
> [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.148441]  process_one_work+0x171/0x370
> [199392.148937]  worker_thread+0x49/0x3f0
> [199392.149430]  kthread+0xf8/0x130
> [199392.149922]  ? max_active_store+0x80/0x80
> [199392.150406]  ? kthread_bind+0x10/0x10
> [199392.150883]  ret_from_fork+0x1f/0x40
> [199392.151353] ---[ end trace c31536d98046e8ee ]---
> 
> 
> --
> Eric Wheeler
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 


--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y
@ 2019-12-20 19:54   ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-20 19:54 UTC (permalink / raw)
  To: lvm-devel

On Wed, 25 Sep 2019, Eric Wheeler wrote:
> We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> We have been using the 4.19 branch for months without issue; we just 
> switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> related to the issue, I don't know, maybe coincidence:
> 
> 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> 	{
> 	    int r;
> 	    enum allocation_event ev;
> 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> 
> 	    /* FIXME: we should loop round a couple of times */
> 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> 	    if (r)
> 		return r;
> 
> 	    smd->begin = *b + 1;
> 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> 	    if (!r) {
> 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 		smd->nr_allocated_this_transaction++;
> 	    }

Hello all,

We hit this BUG_ON again, this time with 4.19.86 with 
scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
additionally reported by Markus Schade:

  https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
     and
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398

In our case, the most latest trace (below) is from a different system that
has been stable for years on Linux 4.1 with tmeta direct on the SSD.
We updated to 4.19.86 a few weeks ago and just hit this, what Mike
Snitzer explains to be an allocator race:

On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> 
> The stack shows the call to sm_disk_new_block() is due to
> dm_pool_alloc_data_block().
> 
> sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> getting called without the passed 'ev' being set to SM_ALLOC.  Only
> drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> 
> sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> returns 0 -- which simply means success.  And on success
> sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> 
> sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> 1) ref_count wasn't set
> or
> 2) old was identified
> 
> So all said: somehow a new data block was found to already be in use.
> _WHY_ that is the case isn't clear from this stack...
> 
> But it does speak to the possibility of data block allocation racing
> with other operations to the same block.  Which implies missing locking.

Where would you look to add locking do you think? 

> But that's all I've got so far... I'll review past dm-thinp changes with
> all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> needs to have a look at this too.
> 
> But could it be that bcache is the source of the data device race (same
> block used concurrently)?  And DM thinp is acting as the canary in the
> coal mine?

As Marcus has shown, this bug triggers without bcache.


Other questions:

1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:

+	spin_lock(&lock); /* protect smd->begin */
	smd->begin = *b + 1;
	r = sm_ll_inc(&smd->ll, *b, &ev);
	if (!r) {
-		BUG_ON(ev != SM_ALLOC); 
		smd->nr_allocated_this_transaction++;
	}
+	else {
+		r = -ENOSPC;
+		smd->begin = *b - 1;
+	}
+	spin_unlock(&lock);

The lock might protect smd->begin, but I'm not sure how &smd->ll might 
have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
sm_ll_mutate() then perhaps this is safe.  What do you think?

Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
of space, but I would take it over a BUG_ON.

2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
   taught retry?  This bug shows up weeks or months apart, even on heavily 
   loaded systems with ~100 live thin volumes, so retrying would be fine 
   IMHO.

3) In the thread from June, Marcus says:
	"Judging from the call trace, my guess is that there is somewhere 
	a race condition, when a new block needs to be allocated which has 
	still to be discarded."

Is this discard sitation possible?  Wouldn't the bio prison prevent this?

--
Eric Wheeler
www.datawall.us



Here is the new trace, old trace below:

kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
Workqueue: dm-thin do_worker [dm_thin_pool]
RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
Call Trace:
 dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
 alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
 process_cell+0x2a3/0x550 [dm_thin_pool]
 ? mempool_alloc+0x6f/0x180
 ? sort+0x17b/0x270
 ? u32_swap+0x10/0x10
 process_deferred_bios+0x1af/0x870 [dm_thin_pool]
 do_worker+0x94/0xe0 [dm_thin_pool]
 process_one_work+0x171/0x370
 worker_thread+0x49/0x3f0
 kthread+0xf8/0x130
 ? max_active_store+0x80/0x80
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x1f/0x40

> 
> [199391.677689] ------------[ cut here ]------------
> [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.690984] Call Trace:
> [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199391.693852]  ? sort+0x17b/0x270
> [199391.694527]  ? u32_swap+0x10/0x10
> [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199391.695890]  process_one_work+0x171/0x370
> [199391.696640]  worker_thread+0x49/0x3f0
> [199391.697332]  kthread+0xf8/0x130
> [199391.697988]  ? max_active_store+0x80/0x80
> [199391.698659]  ? kthread_bind+0x10/0x10
> [199391.699281]  ret_from_fork+0x1f/0x40
> [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199391.708083] ---[ end trace c31536d98046e8ec ]---
> [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199391.877317] Kernel panic - not syncing: Fatal exception
> [199391.878006] Kernel Offset: disabled
> [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> [199392.034277] Call Trace:
> [199392.034929]  <IRQ>
> [199392.035576]  native_apic_msr_write+0x2e/0x40
> [199392.036228]  arch_irq_work_raise+0x28/0x40
> [199392.036877]  irq_work_queue_on+0x83/0xa0
> [199392.037518]  irq_work_run_list+0x4c/0x70
> [199392.038149]  irq_work_run+0x14/0x40
> [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> [199392.039393]  call_function_single_interrupt+0xf/0x20
> [199392.040011]  </IRQ>
> [199392.040624] RIP: 0010:panic+0x209/0x25c
> [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.046486]  oops_end+0xc1/0xd0
> [199392.047149]  do_trap+0x13d/0x150
> [199392.047795]  do_error_trap+0xd5/0x130
> [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.049048]  invalid_op+0x14/0x20
> [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.057484]  ? sort+0x17b/0x270
> [199392.058016]  ? u32_swap+0x10/0x10
> [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.059060]  process_one_work+0x171/0x370
> [199392.059576]  worker_thread+0x49/0x3f0
> [199392.060083]  kthread+0xf8/0x130
> [199392.060587]  ? max_active_store+0x80/0x80
> [199392.061086]  ? kthread_bind+0x10/0x10
> [199392.061569]  ret_from_fork+0x1f/0x40
> [199392.062038] ------------[ cut here ]------------
> [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199392.078481] Call Trace:
> [199392.079117]  <IRQ>
> [199392.079745]  check_preempt_curr+0x6b/0x90
> [199392.080373]  ttwu_do_wakeup+0x19/0x130
> [199392.080999]  try_to_wake_up+0x1e2/0x460
> [199392.081623]  __wake_up_common+0x8f/0x160
> [199392.082246]  ep_poll_callback+0x1af/0x300
> [199392.082860]  __wake_up_common+0x8f/0x160
> [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> [199392.084074]  irq_work_run_list+0x4c/0x70
> [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> [199392.085277]  call_function_single_interrupt+0xf/0x20
> [199392.085879]  </IRQ>
> [199392.086477] RIP: 0010:panic+0x209/0x25c
> [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.092255]  oops_end+0xc1/0xd0
> [199392.092894]  do_trap+0x13d/0x150
> [199392.093516]  do_error_trap+0xd5/0x130
> [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.094718]  invalid_op+0x14/0x20
> [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.102750]  ? sort+0x17b/0x270
> [199392.103242]  ? u32_swap+0x10/0x10
> [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.104228]  process_one_work+0x171/0x370
> [199392.104714]  worker_thread+0x49/0x3f0
> [199392.105193]  kthread+0xf8/0x130
> [199392.105665]  ? max_active_store+0x80/0x80
> [199392.106132]  ? kthread_bind+0x10/0x10
> [199392.106601]  ret_from_fork+0x1f/0x40
> [199392.107069] ---[ end trace c31536d98046e8ed ]---
> [199392.107544] ------------[ cut here ]------------
> [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> [199392.123867] Call Trace:
> [199392.124495]  <IRQ>
> [199392.125116]  update_process_times+0x40/0x50
> [199392.125742]  tick_sched_handle+0x25/0x60
> [199392.126367]  tick_sched_timer+0x37/0x70
> [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> [199392.127601]  hrtimer_interrupt+0x122/0x270
> [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> [199392.128819]  apic_timer_interrupt+0xf/0x20
> [199392.129423]  </IRQ>
> [199392.130020] RIP: 0010:panic+0x209/0x25c
> [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> [199392.135792]  oops_end+0xc1/0xd0
> [199392.136444]  do_trap+0x13d/0x150
> [199392.137094]  do_error_trap+0xd5/0x130
> [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.138383]  invalid_op+0x14/0x20
> [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> [199392.146949]  ? sort+0x17b/0x270
> [199392.147450]  ? u32_swap+0x10/0x10
> [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> [199392.148441]  process_one_work+0x171/0x370
> [199392.148937]  worker_thread+0x49/0x3f0
> [199392.149430]  kthread+0xf8/0x130
> [199392.149922]  ? max_active_store+0x80/0x80
> [199392.150406]  ? kthread_bind+0x10/0x10
> [199392.150883]  ret_from_fork+0x1f/0x40
> [199392.151353] ---[ end trace c31536d98046e8ee ]---
> 
> 
> --
> Eric Wheeler
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2019-12-20 19:54   ` Eric Wheeler
  (?)
@ 2019-12-27  1:47     ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-27  1:47 UTC (permalink / raw)
  To: lvm-devel
  Cc: ejt, Mike Snitzer, markus.schade, linux-block, dm-devel, joe.thornber

On Fri, 20 Dec 2019, Eric Wheeler wrote:
> On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> 
> Hello all,
> 
> We hit this BUG_ON again, this time with 4.19.86 with 
> scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> additionally reported by Markus Schade:
> 
>   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
>      and
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> 
> In our case, the most latest trace (below) is from a different system that
> has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> Snitzer explains to be an allocator race:
> 
> On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > 
> > The stack shows the call to sm_disk_new_block() is due to
> > dm_pool_alloc_data_block().
> > 
> > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > 
> > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > returns 0 -- which simply means success.  And on success
> > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > 
> > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > 1) ref_count wasn't set
> > or
> > 2) old was identified
> > 
> > So all said: somehow a new data block was found to already be in use.
> > _WHY_ that is the case isn't clear from this stack...
> > 
> > But it does speak to the possibility of data block allocation racing
> > with other operations to the same block.  Which implies missing locking.
> 
> Where would you look to add locking do you think? 
> 
> > But that's all I've got so far... I'll review past dm-thinp changes with
> > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > needs to have a look at this too.
> > 
> > But could it be that bcache is the source of the data device race (same
> > block used concurrently)?  And DM thinp is acting as the canary in the
> > coal mine?
> 
> As Marcus has shown, this bug triggers without bcache.
> 
> 
> Other questions:
> 
> 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> 
> +	spin_lock(&lock); /* protect smd->begin */
> 	smd->begin = *b + 1;
> 	r = sm_ll_inc(&smd->ll, *b, &ev);
> 	if (!r) {
> -		BUG_ON(ev != SM_ALLOC); 
> 		smd->nr_allocated_this_transaction++;
> 	}
> +	else {
> +		r = -ENOSPC;
> +		smd->begin = *b - 1;
> +	}
> +	spin_unlock(&lock);

Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
times MQ has been turned on. 

I'm trying the -ENOSPC hack which will flag the pool as being out of space 
so I can recover more gracefully than a BUG_ON. Here's a first-draft 
patch, maybe the spinlock will even prevent the issue.

Compile tested, I'll try on a real system tomorrow.

Comments welcome:

diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b..cb27a20 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
 	return r;
 }
 
+static DEFINE_SPINLOCK(smd_lock);
 static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 {
 	int r;
@@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
 	/* FIXME: we should loop round a couple of times */
+	spin_lock(&smd_lock);
 	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
-	if (r)
+	if (r) {
+		spin_unlock(&smd_lock);
 		return r;
+	}
 
 	smd->begin = *b + 1;
 	r = sm_ll_inc(&smd->ll, *b, &ev);
 	if (!r) {
-		BUG_ON(ev != SM_ALLOC);
-		smd->nr_allocated_this_transaction++;
+		if (ev == SM_ALLOC)
+			smd->nr_allocated_this_transaction++;
+		else {
+			/* Not actually out of space, this is a bug:
+			 * https://lore.kernel.org/linux-block/20190925200138.GA20584@redhat.com/
+			 */
+			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
+			r = -ENOSPC;
+			smd->begin = *b - 1;
+		}
 	}
 
+	spin_unlock(&smd_lock);
+
 	return r;
 }
 



--
Eric Wheeler


> 
> The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> sm_ll_mutate() then perhaps this is safe.  What do you think?
> 
> Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> of space, but I would take it over a BUG_ON.
> 
> 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
>    taught retry?  This bug shows up weeks or months apart, even on heavily 
>    loaded systems with ~100 live thin volumes, so retrying would be fine 
>    IMHO.
> 
> 3) In the thread from June, Marcus says:
> 	"Judging from the call trace, my guess is that there is somewhere 
> 	a race condition, when a new block needs to be allocated which has 
> 	still to be discarded."
> 
> Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> 
> --
> Eric Wheeler
> www.datawall.us
> 
> 
> 
> Here is the new trace, old trace below:
> 
> kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> Workqueue: dm-thin do_worker [dm_thin_pool]
> RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> Call Trace:
>  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
>  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
>  process_cell+0x2a3/0x550 [dm_thin_pool]
>  ? mempool_alloc+0x6f/0x180
>  ? sort+0x17b/0x270
>  ? u32_swap+0x10/0x10
>  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
>  do_worker+0x94/0xe0 [dm_thin_pool]
>  process_one_work+0x171/0x370
>  worker_thread+0x49/0x3f0
>  kthread+0xf8/0x130
>  ? max_active_store+0x80/0x80
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x1f/0x40
> 
> > 
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.877317] Kernel panic - not syncing: Fatal exception
> > [199391.878006] Kernel Offset: disabled
> > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > [199392.034277] Call Trace:
> > [199392.034929]  <IRQ>
> > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > [199392.037518]  irq_work_run_list+0x4c/0x70
> > [199392.038149]  irq_work_run+0x14/0x40
> > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > [199392.040011]  </IRQ>
> > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.046486]  oops_end+0xc1/0xd0
> > [199392.047149]  do_trap+0x13d/0x150
> > [199392.047795]  do_error_trap+0xd5/0x130
> > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.049048]  invalid_op+0x14/0x20
> > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.057484]  ? sort+0x17b/0x270
> > [199392.058016]  ? u32_swap+0x10/0x10
> > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.059060]  process_one_work+0x171/0x370
> > [199392.059576]  worker_thread+0x49/0x3f0
> > [199392.060083]  kthread+0xf8/0x130
> > [199392.060587]  ? max_active_store+0x80/0x80
> > [199392.061086]  ? kthread_bind+0x10/0x10
> > [199392.061569]  ret_from_fork+0x1f/0x40
> > [199392.062038] ------------[ cut here ]------------
> > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.078481] Call Trace:
> > [199392.079117]  <IRQ>
> > [199392.079745]  check_preempt_curr+0x6b/0x90
> > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > [199392.081623]  __wake_up_common+0x8f/0x160
> > [199392.082246]  ep_poll_callback+0x1af/0x300
> > [199392.082860]  __wake_up_common+0x8f/0x160
> > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > [199392.084074]  irq_work_run_list+0x4c/0x70
> > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > [199392.085879]  </IRQ>
> > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.092255]  oops_end+0xc1/0xd0
> > [199392.092894]  do_trap+0x13d/0x150
> > [199392.093516]  do_error_trap+0xd5/0x130
> > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.094718]  invalid_op+0x14/0x20
> > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.102750]  ? sort+0x17b/0x270
> > [199392.103242]  ? u32_swap+0x10/0x10
> > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.104228]  process_one_work+0x171/0x370
> > [199392.104714]  worker_thread+0x49/0x3f0
> > [199392.105193]  kthread+0xf8/0x130
> > [199392.105665]  ? max_active_store+0x80/0x80
> > [199392.106132]  ? kthread_bind+0x10/0x10
> > [199392.106601]  ret_from_fork+0x1f/0x40
> > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > [199392.107544] ------------[ cut here ]------------
> > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.123867] Call Trace:
> > [199392.124495]  <IRQ>
> > [199392.125116]  update_process_times+0x40/0x50
> > [199392.125742]  tick_sched_handle+0x25/0x60
> > [199392.126367]  tick_sched_timer+0x37/0x70
> > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > [199392.129423]  </IRQ>
> > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.135792]  oops_end+0xc1/0xd0
> > [199392.136444]  do_trap+0x13d/0x150
> > [199392.137094]  do_error_trap+0xd5/0x130
> > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.138383]  invalid_op+0x14/0x20
> > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.146949]  ? sort+0x17b/0x270
> > [199392.147450]  ? u32_swap+0x10/0x10
> > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.148441]  process_one_work+0x171/0x370
> > [199392.148937]  worker_thread+0x49/0x3f0
> > [199392.149430]  kthread+0xf8/0x130
> > [199392.149922]  ? max_active_store+0x80/0x80
> > [199392.150406]  ? kthread_bind+0x10/0x10
> > [199392.150883]  ret_from_fork+0x1f/0x40
> > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > 
> > 
> > --
> > Eric Wheeler
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2019-12-27  1:47     ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-27  1:47 UTC (permalink / raw)
  To: lvm-devel
  Cc: Mike Snitzer, markus.schade, dm-devel, linux-block, ejt, joe.thornber

On Fri, 20 Dec 2019, Eric Wheeler wrote:
> On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> 
> Hello all,
> 
> We hit this BUG_ON again, this time with 4.19.86 with 
> scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> additionally reported by Markus Schade:
> 
>   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
>      and
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> 
> In our case, the most latest trace (below) is from a different system that
> has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> Snitzer explains to be an allocator race:
> 
> On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > 
> > The stack shows the call to sm_disk_new_block() is due to
> > dm_pool_alloc_data_block().
> > 
> > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > 
> > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > returns 0 -- which simply means success.  And on success
> > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > 
> > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > 1) ref_count wasn't set
> > or
> > 2) old was identified
> > 
> > So all said: somehow a new data block was found to already be in use.
> > _WHY_ that is the case isn't clear from this stack...
> > 
> > But it does speak to the possibility of data block allocation racing
> > with other operations to the same block.  Which implies missing locking.
> 
> Where would you look to add locking do you think? 
> 
> > But that's all I've got so far... I'll review past dm-thinp changes with
> > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > needs to have a look at this too.
> > 
> > But could it be that bcache is the source of the data device race (same
> > block used concurrently)?  And DM thinp is acting as the canary in the
> > coal mine?
> 
> As Marcus has shown, this bug triggers without bcache.
> 
> 
> Other questions:
> 
> 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> 
> +	spin_lock(&lock); /* protect smd->begin */
> 	smd->begin = *b + 1;
> 	r = sm_ll_inc(&smd->ll, *b, &ev);
> 	if (!r) {
> -		BUG_ON(ev != SM_ALLOC); 
> 		smd->nr_allocated_this_transaction++;
> 	}
> +	else {
> +		r = -ENOSPC;
> +		smd->begin = *b - 1;
> +	}
> +	spin_unlock(&lock);

Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
times MQ has been turned on. 

I'm trying the -ENOSPC hack which will flag the pool as being out of space 
so I can recover more gracefully than a BUG_ON. Here's a first-draft 
patch, maybe the spinlock will even prevent the issue.

Compile tested, I'll try on a real system tomorrow.

Comments welcome:

diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b..cb27a20 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
 	return r;
 }
 
+static DEFINE_SPINLOCK(smd_lock);
 static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 {
 	int r;
@@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
 	/* FIXME: we should loop round a couple of times */
+	spin_lock(&smd_lock);
 	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
-	if (r)
+	if (r) {
+		spin_unlock(&smd_lock);
 		return r;
+	}
 
 	smd->begin = *b + 1;
 	r = sm_ll_inc(&smd->ll, *b, &ev);
 	if (!r) {
-		BUG_ON(ev != SM_ALLOC);
-		smd->nr_allocated_this_transaction++;
+		if (ev == SM_ALLOC)
+			smd->nr_allocated_this_transaction++;
+		else {
+			/* Not actually out of space, this is a bug:
+			 * https://lore.kernel.org/linux-block/20190925200138.GA20584@redhat.com/
+			 */
+			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
+			r = -ENOSPC;
+			smd->begin = *b - 1;
+		}
 	}
 
+	spin_unlock(&smd_lock);
+
 	return r;
 }
 



--
Eric Wheeler


> 
> The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> sm_ll_mutate() then perhaps this is safe.  What do you think?
> 
> Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> of space, but I would take it over a BUG_ON.
> 
> 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
>    taught retry?  This bug shows up weeks or months apart, even on heavily 
>    loaded systems with ~100 live thin volumes, so retrying would be fine 
>    IMHO.
> 
> 3) In the thread from June, Marcus says:
> 	"Judging from the call trace, my guess is that there is somewhere 
> 	a race condition, when a new block needs to be allocated which has 
> 	still to be discarded."
> 
> Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> 
> --
> Eric Wheeler
> www.datawall.us
> 
> 
> 
> Here is the new trace, old trace below:
> 
> kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> Workqueue: dm-thin do_worker [dm_thin_pool]
> RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> Call Trace:
>  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
>  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
>  process_cell+0x2a3/0x550 [dm_thin_pool]
>  ? mempool_alloc+0x6f/0x180
>  ? sort+0x17b/0x270
>  ? u32_swap+0x10/0x10
>  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
>  do_worker+0x94/0xe0 [dm_thin_pool]
>  process_one_work+0x171/0x370
>  worker_thread+0x49/0x3f0
>  kthread+0xf8/0x130
>  ? max_active_store+0x80/0x80
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x1f/0x40
> 
> > 
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_a
 lgo_bit drm_kms_helper
> > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.877317] Kernel panic - not syncing: Fatal exception
> > [199391.878006] Kernel Offset: disabled
> > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > [199392.034277] Call Trace:
> > [199392.034929]  <IRQ>
> > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > [199392.037518]  irq_work_run_list+0x4c/0x70
> > [199392.038149]  irq_work_run+0x14/0x40
> > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > [199392.040011]  </IRQ>
> > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.046486]  oops_end+0xc1/0xd0
> > [199392.047149]  do_trap+0x13d/0x150
> > [199392.047795]  do_error_trap+0xd5/0x130
> > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.049048]  invalid_op+0x14/0x20
> > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.057484]  ? sort+0x17b/0x270
> > [199392.058016]  ? u32_swap+0x10/0x10
> > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.059060]  process_one_work+0x171/0x370
> > [199392.059576]  worker_thread+0x49/0x3f0
> > [199392.060083]  kthread+0xf8/0x130
> > [199392.060587]  ? max_active_store+0x80/0x80
> > [199392.061086]  ? kthread_bind+0x10/0x10
> > [199392.061569]  ret_from_fork+0x1f/0x40
> > [199392.062038] ------------[ cut here ]------------
> > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_a
 lgo_bit drm_kms_helper
> > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.078481] Call Trace:
> > [199392.079117]  <IRQ>
> > [199392.079745]  check_preempt_curr+0x6b/0x90
> > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > [199392.081623]  __wake_up_common+0x8f/0x160
> > [199392.082246]  ep_poll_callback+0x1af/0x300
> > [199392.082860]  __wake_up_common+0x8f/0x160
> > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > [199392.084074]  irq_work_run_list+0x4c/0x70
> > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > [199392.085879]  </IRQ>
> > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.092255]  oops_end+0xc1/0xd0
> > [199392.092894]  do_trap+0x13d/0x150
> > [199392.093516]  do_error_trap+0xd5/0x130
> > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.094718]  invalid_op+0x14/0x20
> > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.102750]  ? sort+0x17b/0x270
> > [199392.103242]  ? u32_swap+0x10/0x10
> > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.104228]  process_one_work+0x171/0x370
> > [199392.104714]  worker_thread+0x49/0x3f0
> > [199392.105193]  kthread+0xf8/0x130
> > [199392.105665]  ? max_active_store+0x80/0x80
> > [199392.106132]  ? kthread_bind+0x10/0x10
> > [199392.106601]  ret_from_fork+0x1f/0x40
> > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > [199392.107544] ------------[ cut here ]------------
> > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_a
 lgo_bit drm_kms_helper
> > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.123867] Call Trace:
> > [199392.124495]  <IRQ>
> > [199392.125116]  update_process_times+0x40/0x50
> > [199392.125742]  tick_sched_handle+0x25/0x60
> > [199392.126367]  tick_sched_timer+0x37/0x70
> > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > [199392.129423]  </IRQ>
> > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.135792]  oops_end+0xc1/0xd0
> > [199392.136444]  do_trap+0x13d/0x150
> > [199392.137094]  do_error_trap+0xd5/0x130
> > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.138383]  invalid_op+0x14/0x20
> > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.146949]  ? sort+0x17b/0x270
> > [199392.147450]  ? u32_swap+0x10/0x10
> > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.148441]  process_one_work+0x171/0x370
> > [199392.148937]  worker_thread+0x49/0x3f0
> > [199392.149430]  kthread+0xf8/0x130
> > [199392.149922]  ? max_active_store+0x80/0x80
> > [199392.150406]  ? kthread_bind+0x10/0x10
> > [199392.150883]  ret_from_fork+0x1f/0x40
> > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > 
> > 
> > --
> > Eric Wheeler
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 


--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2019-12-27  1:47     ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-27  1:47 UTC (permalink / raw)
  To: lvm-devel

On Fri, 20 Dec 2019, Eric Wheeler wrote:
> On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> 
> Hello all,
> 
> We hit this BUG_ON again, this time with 4.19.86 with 
> scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> additionally reported by Markus Schade:
> 
>   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
>      and
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> 
> In our case, the most latest trace (below) is from a different system that
> has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> Snitzer explains to be an allocator race:
> 
> On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > 
> > The stack shows the call to sm_disk_new_block() is due to
> > dm_pool_alloc_data_block().
> > 
> > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > 
> > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > returns 0 -- which simply means success.  And on success
> > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > 
> > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > 1) ref_count wasn't set
> > or
> > 2) old was identified
> > 
> > So all said: somehow a new data block was found to already be in use.
> > _WHY_ that is the case isn't clear from this stack...
> > 
> > But it does speak to the possibility of data block allocation racing
> > with other operations to the same block.  Which implies missing locking.
> 
> Where would you look to add locking do you think? 
> 
> > But that's all I've got so far... I'll review past dm-thinp changes with
> > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > needs to have a look at this too.
> > 
> > But could it be that bcache is the source of the data device race (same
> > block used concurrently)?  And DM thinp is acting as the canary in the
> > coal mine?
> 
> As Marcus has shown, this bug triggers without bcache.
> 
> 
> Other questions:
> 
> 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> 
> +	spin_lock(&lock); /* protect smd->begin */
> 	smd->begin = *b + 1;
> 	r = sm_ll_inc(&smd->ll, *b, &ev);
> 	if (!r) {
> -		BUG_ON(ev != SM_ALLOC); 
> 		smd->nr_allocated_this_transaction++;
> 	}
> +	else {
> +		r = -ENOSPC;
> +		smd->begin = *b - 1;
> +	}
> +	spin_unlock(&lock);

Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
times MQ has been turned on. 

I'm trying the -ENOSPC hack which will flag the pool as being out of space 
so I can recover more gracefully than a BUG_ON. Here's a first-draft 
patch, maybe the spinlock will even prevent the issue.

Compile tested, I'll try on a real system tomorrow.

Comments welcome:

diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b..cb27a20 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
 	return r;
 }
 
+static DEFINE_SPINLOCK(smd_lock);
 static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 {
 	int r;
@@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
 	/* FIXME: we should loop round a couple of times */
+	spin_lock(&smd_lock);
 	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
-	if (r)
+	if (r) {
+		spin_unlock(&smd_lock);
 		return r;
+	}
 
 	smd->begin = *b + 1;
 	r = sm_ll_inc(&smd->ll, *b, &ev);
 	if (!r) {
-		BUG_ON(ev != SM_ALLOC);
-		smd->nr_allocated_this_transaction++;
+		if (ev == SM_ALLOC)
+			smd->nr_allocated_this_transaction++;
+		else {
+			/* Not actually out of space, this is a bug:
+			 * https://lore.kernel.org/linux-block/20190925200138.GA20584 at redhat.com/
+			 */
+			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
+			r = -ENOSPC;
+			smd->begin = *b - 1;
+		}
 	}
 
+	spin_unlock(&smd_lock);
+
 	return r;
 }
 



--
Eric Wheeler


> 
> The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> sm_ll_mutate() then perhaps this is safe.  What do you think?
> 
> Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> of space, but I would take it over a BUG_ON.
> 
> 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
>    taught retry?  This bug shows up weeks or months apart, even on heavily 
>    loaded systems with ~100 live thin volumes, so retrying would be fine 
>    IMHO.
> 
> 3) In the thread from June, Marcus says:
> 	"Judging from the call trace, my guess is that there is somewhere 
> 	a race condition, when a new block needs to be allocated which has 
> 	still to be discarded."
> 
> Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> 
> --
> Eric Wheeler
> www.datawall.us
> 
> 
> 
> Here is the new trace, old trace below:
> 
> kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> Workqueue: dm-thin do_worker [dm_thin_pool]
> RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> Call Trace:
>  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
>  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
>  process_cell+0x2a3/0x550 [dm_thin_pool]
>  ? mempool_alloc+0x6f/0x180
>  ? sort+0x17b/0x270
>  ? u32_swap+0x10/0x10
>  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
>  do_worker+0x94/0xe0 [dm_thin_pool]
>  process_one_work+0x171/0x370
>  worker_thread+0x49/0x3f0
>  kthread+0xf8/0x130
>  ? max_active_store+0x80/0x80
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x1f/0x40
> 
> > 
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.877317] Kernel panic - not syncing: Fatal exception
> > [199391.878006] Kernel Offset: disabled
> > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > [199392.034277] Call Trace:
> > [199392.034929]  <IRQ>
> > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > [199392.037518]  irq_work_run_list+0x4c/0x70
> > [199392.038149]  irq_work_run+0x14/0x40
> > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > [199392.040011]  </IRQ>
> > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.046486]  oops_end+0xc1/0xd0
> > [199392.047149]  do_trap+0x13d/0x150
> > [199392.047795]  do_error_trap+0xd5/0x130
> > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.049048]  invalid_op+0x14/0x20
> > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.057484]  ? sort+0x17b/0x270
> > [199392.058016]  ? u32_swap+0x10/0x10
> > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.059060]  process_one_work+0x171/0x370
> > [199392.059576]  worker_thread+0x49/0x3f0
> > [199392.060083]  kthread+0xf8/0x130
> > [199392.060587]  ? max_active_store+0x80/0x80
> > [199392.061086]  ? kthread_bind+0x10/0x10
> > [199392.061569]  ret_from_fork+0x1f/0x40
> > [199392.062038] ------------[ cut here ]------------
> > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.078481] Call Trace:
> > [199392.079117]  <IRQ>
> > [199392.079745]  check_preempt_curr+0x6b/0x90
> > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > [199392.081623]  __wake_up_common+0x8f/0x160
> > [199392.082246]  ep_poll_callback+0x1af/0x300
> > [199392.082860]  __wake_up_common+0x8f/0x160
> > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > [199392.084074]  irq_work_run_list+0x4c/0x70
> > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > [199392.085879]  </IRQ>
> > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.092255]  oops_end+0xc1/0xd0
> > [199392.092894]  do_trap+0x13d/0x150
> > [199392.093516]  do_error_trap+0xd5/0x130
> > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.094718]  invalid_op+0x14/0x20
> > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.102750]  ? sort+0x17b/0x270
> > [199392.103242]  ? u32_swap+0x10/0x10
> > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.104228]  process_one_work+0x171/0x370
> > [199392.104714]  worker_thread+0x49/0x3f0
> > [199392.105193]  kthread+0xf8/0x130
> > [199392.105665]  ? max_active_store+0x80/0x80
> > [199392.106132]  ? kthread_bind+0x10/0x10
> > [199392.106601]  ret_from_fork+0x1f/0x40
> > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > [199392.107544] ------------[ cut here ]------------
> > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.123867] Call Trace:
> > [199392.124495]  <IRQ>
> > [199392.125116]  update_process_times+0x40/0x50
> > [199392.125742]  tick_sched_handle+0x25/0x60
> > [199392.126367]  tick_sched_timer+0x37/0x70
> > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > [199392.129423]  </IRQ>
> > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.135792]  oops_end+0xc1/0xd0
> > [199392.136444]  do_trap+0x13d/0x150
> > [199392.137094]  do_error_trap+0xd5/0x130
> > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.138383]  invalid_op+0x14/0x20
> > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.146949]  ? sort+0x17b/0x270
> > [199392.147450]  ? u32_swap+0x10/0x10
> > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.148441]  process_one_work+0x171/0x370
> > [199392.148937]  worker_thread+0x49/0x3f0
> > [199392.149430]  kthread+0xf8/0x130
> > [199392.149922]  ? max_active_store+0x80/0x80
> > [199392.150406]  ? kthread_bind+0x10/0x10
> > [199392.150883]  ret_from_fork+0x1f/0x40
> > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > 
> > 
> > --
> > Eric Wheeler
> > 
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 




^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2019-12-27  1:47     ` Eric Wheeler
  (?)
@ 2019-12-28  2:13       ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-28  2:13 UTC (permalink / raw)
  To: lvm-devel
  Cc: Mike Snitzer, markus.schade, dm-devel, linux-block, ejt, joe.thornber

On Fri, 27 Dec 2019, Eric Wheeler wrote:
> On Fri, 20 Dec 2019, Eric Wheeler wrote:
> > On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > > We have been using the 4.19 branch for months without issue; we just 
> > > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > > related to the issue, I don't know, maybe coincidence:
> > > 
> > > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > > 	{
> > > 	    int r;
> > > 	    enum allocation_event ev;
> > > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > > 
> > > 	    /* FIXME: we should loop round a couple of times */
> > > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > > 	    if (r)
> > > 		return r;
> > > 
> > > 	    smd->begin = *b + 1;
> > > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > > 	    if (!r) {
> > > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > > 		smd->nr_allocated_this_transaction++;
> > > 	    }
> > 
> > Hello all,
> > 
> > We hit this BUG_ON again, this time with 4.19.86 with 
> > scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> > additionally reported by Markus Schade:
> > 
> >   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
> >      and
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> > 
> > In our case, the most latest trace (below) is from a different system that
> > has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> > We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> > Snitzer explains to be an allocator race:
> > 
> > On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > > [199391.693852]  ? sort+0x17b/0x270
> > > > [199391.694527]  ? u32_swap+0x10/0x10
> > > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > > [199391.695890]  process_one_work+0x171/0x370
> > > > [199391.696640]  worker_thread+0x49/0x3f0
> > > > [199391.697332]  kthread+0xf8/0x130
> > > > [199391.697988]  ? max_active_store+0x80/0x80
> > > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > 
> > > The stack shows the call to sm_disk_new_block() is due to
> > > dm_pool_alloc_data_block().
> > > 
> > > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > > 
> > > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > > returns 0 -- which simply means success.  And on success
> > > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > > 
> > > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > > 1) ref_count wasn't set
> > > or
> > > 2) old was identified
> > > 
> > > So all said: somehow a new data block was found to already be in use.
> > > _WHY_ that is the case isn't clear from this stack...
> > > 
> > > But it does speak to the possibility of data block allocation racing
> > > with other operations to the same block.  Which implies missing locking.
> > 
> > Where would you look to add locking do you think? 
> > 
> > > But that's all I've got so far... I'll review past dm-thinp changes with
> > > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > > needs to have a look at this too.
> > > 
> > > But could it be that bcache is the source of the data device race (same
> > > block used concurrently)?  And DM thinp is acting as the canary in the
> > > coal mine?
> > 
> > As Marcus has shown, this bug triggers without bcache.
> > 
> > 
> > Other questions:
> > 
> > 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> > 
> > +	spin_lock(&lock); /* protect smd->begin */
> > 	smd->begin = *b + 1;
> > 	r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	if (!r) {
> > -		BUG_ON(ev != SM_ALLOC); 
> > 		smd->nr_allocated_this_transaction++;
> > 	}
> > +	else {
> > +		r = -ENOSPC;
> > +		smd->begin = *b - 1;
> > +	}
> > +	spin_unlock(&lock);
> 
> Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> times MQ has been turned on. 
> 
> I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> patch, maybe the spinlock will even prevent the issue.
> 
> Compile tested, I'll try on a real system tomorrow.
> 
> Comments welcome:
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b..cb27a20 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
>  	return r;
>  }
>  
> +static DEFINE_SPINLOCK(smd_lock);
>  static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  {
>  	int r;
> @@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
>  	/* FIXME: we should loop round a couple of times */
> +	spin_lock(&smd_lock);
>  	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> -	if (r)
> +	if (r) {
> +		spin_unlock(&smd_lock);
>  		return r;
> +	}
>  
>  	smd->begin = *b + 1;
>  	r = sm_ll_inc(&smd->ll, *b, &ev);
>  	if (!r) {
> -		BUG_ON(ev != SM_ALLOC);
> -		smd->nr_allocated_this_transaction++;
> +		if (ev == SM_ALLOC)
> +			smd->nr_allocated_this_transaction++;
> +		else {
> +			/* Not actually out of space, this is a bug:
> +			 * https://lore.kernel.org/linux-block/20190925200138.GA20584@redhat.com/
> +			 */
> +			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
> +			r = -ENOSPC;
> +			smd->begin = *b - 1;
> +		}
>  	}
>  
> +	spin_unlock(&smd_lock);
> +
>  	return r;
>  }

So far, so good.  There are 3 systems running with the patch (1 w/ MQ=y, 
 2 w/ MQ=n), time will tell if we hit the WARN or if the spinlock prevents 
the race.

--
Eric Wheeler


> 
> 
> 
> --
> Eric Wheeler
> 
> 
> > 
> > The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> > have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> > sm_ll_mutate() then perhaps this is safe.  What do you think?
> > 
> > Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> > of space, but I would take it over a BUG_ON.
> > 
> > 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
> >    taught retry?  This bug shows up weeks or months apart, even on heavily 
> >    loaded systems with ~100 live thin volumes, so retrying would be fine 
> >    IMHO.
> > 
> > 3) In the thread from June, Marcus says:
> > 	"Judging from the call trace, my guess is that there is somewhere 
> > 	a race condition, when a new block needs to be allocated which has 
> > 	still to be discarded."
> > 
> > Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> > 
> > --
> > Eric Wheeler
> > www.datawall.us
> > 
> > 
> > 
> > Here is the new trace, old trace below:
> > 
> > kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> > Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> > Workqueue: dm-thin do_worker [dm_thin_pool]
> > RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> > RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> > RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> > RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> > R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> > R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> > FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> > Call Trace:
> >  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> >  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> >  process_cell+0x2a3/0x550 [dm_thin_pool]
> >  ? mempool_alloc+0x6f/0x180
> >  ? sort+0x17b/0x270
> >  ? u32_swap+0x10/0x10
> >  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
> >  do_worker+0x94/0xe0 [dm_thin_pool]
> >  process_one_work+0x171/0x370
> >  worker_thread+0x49/0x3f0
> >  kthread+0xf8/0x130
> >  ? max_active_store+0x80/0x80
> >  ? kthread_bind+0x10/0x10
> >  ret_from_fork+0x1f/0x40
> > 
> > > 
> > > [199391.677689] ------------[ cut here ]------------
> > > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.690984] Call Trace:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.877317] Kernel panic - not syncing: Fatal exception
> > > [199391.878006] Kernel Offset: disabled
> > > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > > [199392.034277] Call Trace:
> > > [199392.034929]  <IRQ>
> > > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > > [199392.037518]  irq_work_run_list+0x4c/0x70
> > > [199392.038149]  irq_work_run+0x14/0x40
> > > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > > [199392.040011]  </IRQ>
> > > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.046486]  oops_end+0xc1/0xd0
> > > [199392.047149]  do_trap+0x13d/0x150
> > > [199392.047795]  do_error_trap+0xd5/0x130
> > > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.049048]  invalid_op+0x14/0x20
> > > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.057484]  ? sort+0x17b/0x270
> > > [199392.058016]  ? u32_swap+0x10/0x10
> > > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.059060]  process_one_work+0x171/0x370
> > > [199392.059576]  worker_thread+0x49/0x3f0
> > > [199392.060083]  kthread+0xf8/0x130
> > > [199392.060587]  ? max_active_store+0x80/0x80
> > > [199392.061086]  ? kthread_bind+0x10/0x10
> > > [199392.061569]  ret_from_fork+0x1f/0x40
> > > [199392.062038] ------------[ cut here ]------------
> > > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.078481] Call Trace:
> > > [199392.079117]  <IRQ>
> > > [199392.079745]  check_preempt_curr+0x6b/0x90
> > > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > > [199392.081623]  __wake_up_common+0x8f/0x160
> > > [199392.082246]  ep_poll_callback+0x1af/0x300
> > > [199392.082860]  __wake_up_common+0x8f/0x160
> > > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > > [199392.084074]  irq_work_run_list+0x4c/0x70
> > > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > > [199392.085879]  </IRQ>
> > > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.092255]  oops_end+0xc1/0xd0
> > > [199392.092894]  do_trap+0x13d/0x150
> > > [199392.093516]  do_error_trap+0xd5/0x130
> > > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.094718]  invalid_op+0x14/0x20
> > > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.102750]  ? sort+0x17b/0x270
> > > [199392.103242]  ? u32_swap+0x10/0x10
> > > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.104228]  process_one_work+0x171/0x370
> > > [199392.104714]  worker_thread+0x49/0x3f0
> > > [199392.105193]  kthread+0xf8/0x130
> > > [199392.105665]  ? max_active_store+0x80/0x80
> > > [199392.106132]  ? kthread_bind+0x10/0x10
> > > [199392.106601]  ret_from_fork+0x1f/0x40
> > > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > > [199392.107544] ------------[ cut here ]------------
> > > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.123867] Call Trace:
> > > [199392.124495]  <IRQ>
> > > [199392.125116]  update_process_times+0x40/0x50
> > > [199392.125742]  tick_sched_handle+0x25/0x60
> > > [199392.126367]  tick_sched_timer+0x37/0x70
> > > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > > [199392.129423]  </IRQ>
> > > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.135792]  oops_end+0xc1/0xd0
> > > [199392.136444]  do_trap+0x13d/0x150
> > > [199392.137094]  do_error_trap+0xd5/0x130
> > > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.138383]  invalid_op+0x14/0x20
> > > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.146949]  ? sort+0x17b/0x270
> > > [199392.147450]  ? u32_swap+0x10/0x10
> > > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.148441]  process_one_work+0x171/0x370
> > > [199392.148937]  worker_thread+0x49/0x3f0
> > > [199392.149430]  kthread+0xf8/0x130
> > > [199392.149922]  ? max_active_store+0x80/0x80
> > > [199392.150406]  ? kthread_bind+0x10/0x10
> > > [199392.150883]  ret_from_fork+0x1f/0x40
> > > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > > 
> > > 
> > > --
> > > Eric Wheeler
> > > 
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > > 
> > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2019-12-28  2:13       ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-28  2:13 UTC (permalink / raw)
  To: lvm-devel
  Cc: Mike Snitzer, markus.schade, ejt, linux-block, dm-devel, joe.thornber

On Fri, 27 Dec 2019, Eric Wheeler wrote:
> On Fri, 20 Dec 2019, Eric Wheeler wrote:
> > On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > > We have been using the 4.19 branch for months without issue; we just 
> > > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > > related to the issue, I don't know, maybe coincidence:
> > > 
> > > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > > 	{
> > > 	    int r;
> > > 	    enum allocation_event ev;
> > > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > > 
> > > 	    /* FIXME: we should loop round a couple of times */
> > > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > > 	    if (r)
> > > 		return r;
> > > 
> > > 	    smd->begin = *b + 1;
> > > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > > 	    if (!r) {
> > > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > > 		smd->nr_allocated_this_transaction++;
> > > 	    }
> > 
> > Hello all,
> > 
> > We hit this BUG_ON again, this time with 4.19.86 with 
> > scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> > additionally reported by Markus Schade:
> > 
> >   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
> >      and
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> > 
> > In our case, the most latest trace (below) is from a different system that
> > has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> > We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> > Snitzer explains to be an allocator race:
> > 
> > On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > > [199391.693852]  ? sort+0x17b/0x270
> > > > [199391.694527]  ? u32_swap+0x10/0x10
> > > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > > [199391.695890]  process_one_work+0x171/0x370
> > > > [199391.696640]  worker_thread+0x49/0x3f0
> > > > [199391.697332]  kthread+0xf8/0x130
> > > > [199391.697988]  ? max_active_store+0x80/0x80
> > > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > 
> > > The stack shows the call to sm_disk_new_block() is due to
> > > dm_pool_alloc_data_block().
> > > 
> > > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > > 
> > > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > > returns 0 -- which simply means success.  And on success
> > > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > > 
> > > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > > 1) ref_count wasn't set
> > > or
> > > 2) old was identified
> > > 
> > > So all said: somehow a new data block was found to already be in use.
> > > _WHY_ that is the case isn't clear from this stack...
> > > 
> > > But it does speak to the possibility of data block allocation racing
> > > with other operations to the same block.  Which implies missing locking.
> > 
> > Where would you look to add locking do you think? 
> > 
> > > But that's all I've got so far... I'll review past dm-thinp changes with
> > > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > > needs to have a look at this too.
> > > 
> > > But could it be that bcache is the source of the data device race (same
> > > block used concurrently)?  And DM thinp is acting as the canary in the
> > > coal mine?
> > 
> > As Marcus has shown, this bug triggers without bcache.
> > 
> > 
> > Other questions:
> > 
> > 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> > 
> > +	spin_lock(&lock); /* protect smd->begin */
> > 	smd->begin = *b + 1;
> > 	r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	if (!r) {
> > -		BUG_ON(ev != SM_ALLOC); 
> > 		smd->nr_allocated_this_transaction++;
> > 	}
> > +	else {
> > +		r = -ENOSPC;
> > +		smd->begin = *b - 1;
> > +	}
> > +	spin_unlock(&lock);
> 
> Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> times MQ has been turned on. 
> 
> I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> patch, maybe the spinlock will even prevent the issue.
> 
> Compile tested, I'll try on a real system tomorrow.
> 
> Comments welcome:
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b..cb27a20 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
>  	return r;
>  }
>  
> +static DEFINE_SPINLOCK(smd_lock);
>  static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  {
>  	int r;
> @@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
>  	/* FIXME: we should loop round a couple of times */
> +	spin_lock(&smd_lock);
>  	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> -	if (r)
> +	if (r) {
> +		spin_unlock(&smd_lock);
>  		return r;
> +	}
>  
>  	smd->begin = *b + 1;
>  	r = sm_ll_inc(&smd->ll, *b, &ev);
>  	if (!r) {
> -		BUG_ON(ev != SM_ALLOC);
> -		smd->nr_allocated_this_transaction++;
> +		if (ev == SM_ALLOC)
> +			smd->nr_allocated_this_transaction++;
> +		else {
> +			/* Not actually out of space, this is a bug:
> +			 * https://lore.kernel.org/linux-block/20190925200138.GA20584@redhat.com/
> +			 */
> +			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
> +			r = -ENOSPC;
> +			smd->begin = *b - 1;
> +		}
>  	}
>  
> +	spin_unlock(&smd_lock);
> +
>  	return r;
>  }

So far, so good.  There are 3 systems running with the patch (1 w/ MQ=y, 
 2 w/ MQ=n), time will tell if we hit the WARN or if the spinlock prevents 
the race.

--
Eric Wheeler


> 
> 
> 
> --
> Eric Wheeler
> 
> 
> > 
> > The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> > have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> > sm_ll_mutate() then perhaps this is safe.  What do you think?
> > 
> > Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> > of space, but I would take it over a BUG_ON.
> > 
> > 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
> >    taught retry?  This bug shows up weeks or months apart, even on heavily 
> >    loaded systems with ~100 live thin volumes, so retrying would be fine 
> >    IMHO.
> > 
> > 3) In the thread from June, Marcus says:
> > 	"Judging from the call trace, my guess is that there is somewhere 
> > 	a race condition, when a new block needs to be allocated which has 
> > 	still to be discarded."
> > 
> > Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> > 
> > --
> > Eric Wheeler
> > www.datawall.us
> > 
> > 
> > 
> > Here is the new trace, old trace below:
> > 
> > kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> > Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> > Workqueue: dm-thin do_worker [dm_thin_pool]
> > RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> > RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> > RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> > RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> > R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> > R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> > FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> > Call Trace:
> >  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> >  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> >  process_cell+0x2a3/0x550 [dm_thin_pool]
> >  ? mempool_alloc+0x6f/0x180
> >  ? sort+0x17b/0x270
> >  ? u32_swap+0x10/0x10
> >  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
> >  do_worker+0x94/0xe0 [dm_thin_pool]
> >  process_one_work+0x171/0x370
> >  worker_thread+0x49/0x3f0
> >  kthread+0xf8/0x130
> >  ? max_active_store+0x80/0x80
> >  ? kthread_bind+0x10/0x10
> >  ret_from_fork+0x1f/0x40
> > 
> > > 
> > > [199391.677689] ------------[ cut here ]------------
> > > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.690984] Call Trace:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c
 _algo_bit drm_kms_helper
> > > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.877317] Kernel panic - not syncing: Fatal exception
> > > [199391.878006] Kernel Offset: disabled
> > > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > > [199392.034277] Call Trace:
> > > [199392.034929]  <IRQ>
> > > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > > [199392.037518]  irq_work_run_list+0x4c/0x70
> > > [199392.038149]  irq_work_run+0x14/0x40
> > > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > > [199392.040011]  </IRQ>
> > > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.046486]  oops_end+0xc1/0xd0
> > > [199392.047149]  do_trap+0x13d/0x150
> > > [199392.047795]  do_error_trap+0xd5/0x130
> > > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.049048]  invalid_op+0x14/0x20
> > > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.057484]  ? sort+0x17b/0x270
> > > [199392.058016]  ? u32_swap+0x10/0x10
> > > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.059060]  process_one_work+0x171/0x370
> > > [199392.059576]  worker_thread+0x49/0x3f0
> > > [199392.060083]  kthread+0xf8/0x130
> > > [199392.060587]  ? max_active_store+0x80/0x80
> > > [199392.061086]  ? kthread_bind+0x10/0x10
> > > [199392.061569]  ret_from_fork+0x1f/0x40
> > > [199392.062038] ------------[ cut here ]------------
> > > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c
 _algo_bit drm_kms_helper
> > > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.078481] Call Trace:
> > > [199392.079117]  <IRQ>
> > > [199392.079745]  check_preempt_curr+0x6b/0x90
> > > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > > [199392.081623]  __wake_up_common+0x8f/0x160
> > > [199392.082246]  ep_poll_callback+0x1af/0x300
> > > [199392.082860]  __wake_up_common+0x8f/0x160
> > > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > > [199392.084074]  irq_work_run_list+0x4c/0x70
> > > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > > [199392.085879]  </IRQ>
> > > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.092255]  oops_end+0xc1/0xd0
> > > [199392.092894]  do_trap+0x13d/0x150
> > > [199392.093516]  do_error_trap+0xd5/0x130
> > > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.094718]  invalid_op+0x14/0x20
> > > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.102750]  ? sort+0x17b/0x270
> > > [199392.103242]  ? u32_swap+0x10/0x10
> > > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.104228]  process_one_work+0x171/0x370
> > > [199392.104714]  worker_thread+0x49/0x3f0
> > > [199392.105193]  kthread+0xf8/0x130
> > > [199392.105665]  ? max_active_store+0x80/0x80
> > > [199392.106132]  ? kthread_bind+0x10/0x10
> > > [199392.106601]  ret_from_fork+0x1f/0x40
> > > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > > [199392.107544] ------------[ cut here ]------------
> > > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c
 _algo_bit drm_kms_helper
> > > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.123867] Call Trace:
> > > [199392.124495]  <IRQ>
> > > [199392.125116]  update_process_times+0x40/0x50
> > > [199392.125742]  tick_sched_handle+0x25/0x60
> > > [199392.126367]  tick_sched_timer+0x37/0x70
> > > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > > [199392.129423]  </IRQ>
> > > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.135792]  oops_end+0xc1/0xd0
> > > [199392.136444]  do_trap+0x13d/0x150
> > > [199392.137094]  do_error_trap+0xd5/0x130
> > > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.138383]  invalid_op+0x14/0x20
> > > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.146949]  ? sort+0x17b/0x270
> > > [199392.147450]  ? u32_swap+0x10/0x10
> > > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.148441]  process_one_work+0x171/0x370
> > > [199392.148937]  worker_thread+0x49/0x3f0
> > > [199392.149430]  kthread+0xf8/0x130
> > > [199392.149922]  ? max_active_store+0x80/0x80
> > > [199392.150406]  ? kthread_bind+0x10/0x10
> > > [199392.150883]  ret_from_fork+0x1f/0x40
> > > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > > 
> > > 
> > > --
> > > Eric Wheeler
> > > 
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > > 
> > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 


--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2019-12-28  2:13       ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2019-12-28  2:13 UTC (permalink / raw)
  To: lvm-devel

On Fri, 27 Dec 2019, Eric Wheeler wrote:
> On Fri, 20 Dec 2019, Eric Wheeler wrote:
> > On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > > We have been using the 4.19 branch for months without issue; we just 
> > > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > > related to the issue, I don't know, maybe coincidence:
> > > 
> > > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > > 	{
> > > 	    int r;
> > > 	    enum allocation_event ev;
> > > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > > 
> > > 	    /* FIXME: we should loop round a couple of times */
> > > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > > 	    if (r)
> > > 		return r;
> > > 
> > > 	    smd->begin = *b + 1;
> > > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > > 	    if (!r) {
> > > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > > 		smd->nr_allocated_this_transaction++;
> > > 	    }
> > 
> > Hello all,
> > 
> > We hit this BUG_ON again, this time with 4.19.86 with 
> > scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> > additionally reported by Markus Schade:
> > 
> >   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
> >      and
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> > 
> > In our case, the most latest trace (below) is from a different system that
> > has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> > We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> > Snitzer explains to be an allocator race:
> > 
> > On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > > [199391.693852]  ? sort+0x17b/0x270
> > > > [199391.694527]  ? u32_swap+0x10/0x10
> > > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > > [199391.695890]  process_one_work+0x171/0x370
> > > > [199391.696640]  worker_thread+0x49/0x3f0
> > > > [199391.697332]  kthread+0xf8/0x130
> > > > [199391.697988]  ? max_active_store+0x80/0x80
> > > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > 
> > > The stack shows the call to sm_disk_new_block() is due to
> > > dm_pool_alloc_data_block().
> > > 
> > > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > > 
> > > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > > returns 0 -- which simply means success.  And on success
> > > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > > 
> > > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > > 1) ref_count wasn't set
> > > or
> > > 2) old was identified
> > > 
> > > So all said: somehow a new data block was found to already be in use.
> > > _WHY_ that is the case isn't clear from this stack...
> > > 
> > > But it does speak to the possibility of data block allocation racing
> > > with other operations to the same block.  Which implies missing locking.
> > 
> > Where would you look to add locking do you think? 
> > 
> > > But that's all I've got so far... I'll review past dm-thinp changes with
> > > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > > needs to have a look at this too.
> > > 
> > > But could it be that bcache is the source of the data device race (same
> > > block used concurrently)?  And DM thinp is acting as the canary in the
> > > coal mine?
> > 
> > As Marcus has shown, this bug triggers without bcache.
> > 
> > 
> > Other questions:
> > 
> > 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> > 
> > +	spin_lock(&lock); /* protect smd->begin */
> > 	smd->begin = *b + 1;
> > 	r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	if (!r) {
> > -		BUG_ON(ev != SM_ALLOC); 
> > 		smd->nr_allocated_this_transaction++;
> > 	}
> > +	else {
> > +		r = -ENOSPC;
> > +		smd->begin = *b - 1;
> > +	}
> > +	spin_unlock(&lock);
> 
> Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> times MQ has been turned on. 
> 
> I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> patch, maybe the spinlock will even prevent the issue.
> 
> Compile tested, I'll try on a real system tomorrow.
> 
> Comments welcome:
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b..cb27a20 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
>  	return r;
>  }
>  
> +static DEFINE_SPINLOCK(smd_lock);
>  static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  {
>  	int r;
> @@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
>  	/* FIXME: we should loop round a couple of times */
> +	spin_lock(&smd_lock);
>  	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> -	if (r)
> +	if (r) {
> +		spin_unlock(&smd_lock);
>  		return r;
> +	}
>  
>  	smd->begin = *b + 1;
>  	r = sm_ll_inc(&smd->ll, *b, &ev);
>  	if (!r) {
> -		BUG_ON(ev != SM_ALLOC);
> -		smd->nr_allocated_this_transaction++;
> +		if (ev == SM_ALLOC)
> +			smd->nr_allocated_this_transaction++;
> +		else {
> +			/* Not actually out of space, this is a bug:
> +			 * https://lore.kernel.org/linux-block/20190925200138.GA20584 at redhat.com/
> +			 */
> +			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
> +			r = -ENOSPC;
> +			smd->begin = *b - 1;
> +		}
>  	}
>  
> +	spin_unlock(&smd_lock);
> +
>  	return r;
>  }

So far, so good.  There are 3 systems running with the patch (1 w/ MQ=y, 
 2 w/ MQ=n), time will tell if we hit the WARN or if the spinlock prevents 
the race.

--
Eric Wheeler


> 
> 
> 
> --
> Eric Wheeler
> 
> 
> > 
> > The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> > have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> > sm_ll_mutate() then perhaps this is safe.  What do you think?
> > 
> > Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> > of space, but I would take it over a BUG_ON.
> > 
> > 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
> >    taught retry?  This bug shows up weeks or months apart, even on heavily 
> >    loaded systems with ~100 live thin volumes, so retrying would be fine 
> >    IMHO.
> > 
> > 3) In the thread from June, Marcus says:
> > 	"Judging from the call trace, my guess is that there is somewhere 
> > 	a race condition, when a new block needs to be allocated which has 
> > 	still to be discarded."
> > 
> > Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> > 
> > --
> > Eric Wheeler
> > www.datawall.us
> > 
> > 
> > 
> > Here is the new trace, old trace below:
> > 
> > kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> > Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> > Workqueue: dm-thin do_worker [dm_thin_pool]
> > RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> > RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> > RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> > RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> > R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> > R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> > FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> > Call Trace:
> >  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> >  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> >  process_cell+0x2a3/0x550 [dm_thin_pool]
> >  ? mempool_alloc+0x6f/0x180
> >  ? sort+0x17b/0x270
> >  ? u32_swap+0x10/0x10
> >  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
> >  do_worker+0x94/0xe0 [dm_thin_pool]
> >  process_one_work+0x171/0x370
> >  worker_thread+0x49/0x3f0
> >  kthread+0xf8/0x130
> >  ? max_active_store+0x80/0x80
> >  ? kthread_bind+0x10/0x10
> >  ret_from_fork+0x1f/0x40
> > 
> > > 
> > > [199391.677689] ------------[ cut here ]------------
> > > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.690984] Call Trace:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.877317] Kernel panic - not syncing: Fatal exception
> > > [199391.878006] Kernel Offset: disabled
> > > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > > [199392.034277] Call Trace:
> > > [199392.034929]  <IRQ>
> > > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > > [199392.037518]  irq_work_run_list+0x4c/0x70
> > > [199392.038149]  irq_work_run+0x14/0x40
> > > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > > [199392.040011]  </IRQ>
> > > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.046486]  oops_end+0xc1/0xd0
> > > [199392.047149]  do_trap+0x13d/0x150
> > > [199392.047795]  do_error_trap+0xd5/0x130
> > > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.049048]  invalid_op+0x14/0x20
> > > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.057484]  ? sort+0x17b/0x270
> > > [199392.058016]  ? u32_swap+0x10/0x10
> > > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.059060]  process_one_work+0x171/0x370
> > > [199392.059576]  worker_thread+0x49/0x3f0
> > > [199392.060083]  kthread+0xf8/0x130
> > > [199392.060587]  ? max_active_store+0x80/0x80
> > > [199392.061086]  ? kthread_bind+0x10/0x10
> > > [199392.061569]  ret_from_fork+0x1f/0x40
> > > [199392.062038] ------------[ cut here ]------------
> > > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.078481] Call Trace:
> > > [199392.079117]  <IRQ>
> > > [199392.079745]  check_preempt_curr+0x6b/0x90
> > > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > > [199392.081623]  __wake_up_common+0x8f/0x160
> > > [199392.082246]  ep_poll_callback+0x1af/0x300
> > > [199392.082860]  __wake_up_common+0x8f/0x160
> > > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > > [199392.084074]  irq_work_run_list+0x4c/0x70
> > > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > > [199392.085879]  </IRQ>
> > > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.092255]  oops_end+0xc1/0xd0
> > > [199392.092894]  do_trap+0x13d/0x150
> > > [199392.093516]  do_error_trap+0xd5/0x130
> > > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.094718]  invalid_op+0x14/0x20
> > > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.102750]  ? sort+0x17b/0x270
> > > [199392.103242]  ? u32_swap+0x10/0x10
> > > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.104228]  process_one_work+0x171/0x370
> > > [199392.104714]  worker_thread+0x49/0x3f0
> > > [199392.105193]  kthread+0xf8/0x130
> > > [199392.105665]  ? max_active_store+0x80/0x80
> > > [199392.106132]  ? kthread_bind+0x10/0x10
> > > [199392.106601]  ret_from_fork+0x1f/0x40
> > > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > > [199392.107544] ------------[ cut here ]------------
> > > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.123867] Call Trace:
> > > [199392.124495]  <IRQ>
> > > [199392.125116]  update_process_times+0x40/0x50
> > > [199392.125742]  tick_sched_handle+0x25/0x60
> > > [199392.126367]  tick_sched_timer+0x37/0x70
> > > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > > [199392.129423]  </IRQ>
> > > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.135792]  oops_end+0xc1/0xd0
> > > [199392.136444]  do_trap+0x13d/0x150
> > > [199392.137094]  do_error_trap+0xd5/0x130
> > > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.138383]  invalid_op+0x14/0x20
> > > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.146949]  ? sort+0x17b/0x270
> > > [199392.147450]  ? u32_swap+0x10/0x10
> > > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.148441]  process_one_work+0x171/0x370
> > > [199392.148937]  worker_thread+0x49/0x3f0
> > > [199392.149430]  kthread+0xf8/0x130
> > > [199392.149922]  ? max_active_store+0x80/0x80
> > > [199392.150406]  ? kthread_bind+0x10/0x10
> > > [199392.150883]  ret_from_fork+0x1f/0x40
> > > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > > 
> > > 
> > > --
> > > Eric Wheeler
> > > 
> > > --
> > > dm-devel mailing list
> > > dm-devel at redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > > 
> > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [lvm-devel] [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2019-12-28  2:13       ` Eric Wheeler
  (?)
@ 2020-01-07 10:35         ` Joe Thornber
  -1 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 10:35 UTC (permalink / raw)
  To: LVM2 development
  Cc: Mike Snitzer, markus.schade, ejt, linux-block, dm-devel, joe.thornber

On Sat, Dec 28, 2019 at 02:13:07AM +0000, Eric Wheeler wrote:
> On Fri, 27 Dec 2019, Eric Wheeler wrote:

> > Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> > times MQ has been turned on. 
> > 
> > I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> > so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> > patch, maybe the spinlock will even prevent the issue.
> > 
> > Compile tested, I'll try on a real system tomorrow.
> > 
> > Comments welcome:

Both sm_ll_find_free_block() and sm_ll_inc() can trigger synchronous IO.  So you
absolutely cannot use a spin lock.

dm_pool_alloc_data_block() holds a big rw semaphore which should prevent anything
else trying to allocate at the same time.

- Joe


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 10:35         ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 10:35 UTC (permalink / raw)
  To: LVM2 development
  Cc: Mike Snitzer, markus.schade, dm-devel, linux-block, ejt, joe.thornber

On Sat, Dec 28, 2019 at 02:13:07AM +0000, Eric Wheeler wrote:
> On Fri, 27 Dec 2019, Eric Wheeler wrote:

> > Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> > times MQ has been turned on. 
> > 
> > I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> > so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> > patch, maybe the spinlock will even prevent the issue.
> > 
> > Compile tested, I'll try on a real system tomorrow.
> > 
> > Comments welcome:

Both sm_ll_find_free_block() and sm_ll_inc() can trigger synchronous IO.  So you
absolutely cannot use a spin lock.

dm_pool_alloc_data_block() holds a big rw semaphore which should prevent anything
else trying to allocate at the same time.

- Joe

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 10:35         ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 10:35 UTC (permalink / raw)
  To: lvm-devel

On Sat, Dec 28, 2019 at 02:13:07AM +0000, Eric Wheeler wrote:
> On Fri, 27 Dec 2019, Eric Wheeler wrote:

> > Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> > times MQ has been turned on. 
> > 
> > I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> > so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> > patch, maybe the spinlock will even prevent the issue.
> > 
> > Compile tested, I'll try on a real system tomorrow.
> > 
> > Comments welcome:

Both sm_ll_find_free_block() and sm_ll_inc() can trigger synchronous IO.  So you
absolutely cannot use a spin lock.

dm_pool_alloc_data_block() holds a big rw semaphore which should prevent anything
else trying to allocate at the same time.

- Joe



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [lvm-devel] [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2020-01-07 10:35         ` Joe Thornber
  (?)
@ 2020-01-07 10:46           ` Joe Thornber
  -1 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 10:46 UTC (permalink / raw)
  To: LVM2 development, Mike Snitzer, markus.schade, ejt, linux-block,
	dm-devel, joe.thornber

On Tue, Jan 07, 2020 at 10:35:46AM +0000, Joe Thornber wrote:
> On Sat, Dec 28, 2019 at 02:13:07AM +0000, Eric Wheeler wrote:
> > On Fri, 27 Dec 2019, Eric Wheeler wrote:
> 
> > > Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> > > times MQ has been turned on. 
> > > 
> > > I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> > > so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> > > patch, maybe the spinlock will even prevent the issue.
> > > 
> > > Compile tested, I'll try on a real system tomorrow.
> > > 
> > > Comments welcome:
> 
> Both sm_ll_find_free_block() and sm_ll_inc() can trigger synchronous IO.  So you
> absolutely cannot use a spin lock.
> 
> dm_pool_alloc_data_block() holds a big rw semaphore which should prevent anything
> else trying to allocate at the same time.

I suspect the problem is to do with the way we search for the new block in the 
space map for the previous transaction (sm->old_ll), and then increment in the current
transaction (sm->ll).

We keep old_ll around so we can ensure we never (re) allocate a block that's used in
the previous transaction.  This gives us our crash resistance, since if anything goes
wrong we effectively rollback to the previous transaction.

What I think we should be doing is running find_free on the old_ll, then double checking
it's not actually used in the current transaction.  ATM we're relying on smd->begin being
set properly everywhere, and I suspect this isn't the case.  A quick look shows sm_disk_inc_block()
doesn't adjust it.  sm_disk_inc_block() can be called when breaking sharing of a neighbouring entry
in a leaf btree node ... and we know you use snapshots very heavily.

I'll get a patch to you later today.

- Joe


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 10:46           ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 10:46 UTC (permalink / raw)
  To: LVM2 development, Mike Snitzer, markus.schade, ejt, linux-block,
	dm-devel, joe.thornber

On Tue, Jan 07, 2020 at 10:35:46AM +0000, Joe Thornber wrote:
> On Sat, Dec 28, 2019 at 02:13:07AM +0000, Eric Wheeler wrote:
> > On Fri, 27 Dec 2019, Eric Wheeler wrote:
> 
> > > Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> > > times MQ has been turned on. 
> > > 
> > > I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> > > so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> > > patch, maybe the spinlock will even prevent the issue.
> > > 
> > > Compile tested, I'll try on a real system tomorrow.
> > > 
> > > Comments welcome:
> 
> Both sm_ll_find_free_block() and sm_ll_inc() can trigger synchronous IO.  So you
> absolutely cannot use a spin lock.
> 
> dm_pool_alloc_data_block() holds a big rw semaphore which should prevent anything
> else trying to allocate at the same time.

I suspect the problem is to do with the way we search for the new block in the 
space map for the previous transaction (sm->old_ll), and then increment in the current
transaction (sm->ll).

We keep old_ll around so we can ensure we never (re) allocate a block that's used in
the previous transaction.  This gives us our crash resistance, since if anything goes
wrong we effectively rollback to the previous transaction.

What I think we should be doing is running find_free on the old_ll, then double checking
it's not actually used in the current transaction.  ATM we're relying on smd->begin being
set properly everywhere, and I suspect this isn't the case.  A quick look shows sm_disk_inc_block()
doesn't adjust it.  sm_disk_inc_block() can be called when breaking sharing of a neighbouring entry
in a leaf btree node ... and we know you use snapshots very heavily.

I'll get a patch to you later today.

- Joe

--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 10:46           ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 10:46 UTC (permalink / raw)
  To: lvm-devel

On Tue, Jan 07, 2020 at 10:35:46AM +0000, Joe Thornber wrote:
> On Sat, Dec 28, 2019 at 02:13:07AM +0000, Eric Wheeler wrote:
> > On Fri, 27 Dec 2019, Eric Wheeler wrote:
> 
> > > Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> > > times MQ has been turned on. 
> > > 
> > > I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> > > so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> > > patch, maybe the spinlock will even prevent the issue.
> > > 
> > > Compile tested, I'll try on a real system tomorrow.
> > > 
> > > Comments welcome:
> 
> Both sm_ll_find_free_block() and sm_ll_inc() can trigger synchronous IO.  So you
> absolutely cannot use a spin lock.
> 
> dm_pool_alloc_data_block() holds a big rw semaphore which should prevent anything
> else trying to allocate at the same time.

I suspect the problem is to do with the way we search for the new block in the 
space map for the previous transaction (sm->old_ll), and then increment in the current
transaction (sm->ll).

We keep old_ll around so we can ensure we never (re) allocate a block that's used in
the previous transaction.  This gives us our crash resistance, since if anything goes
wrong we effectively rollback to the previous transaction.

What I think we should be doing is running find_free on the old_ll, then double checking
it's not actually used in the current transaction.  ATM we're relying on smd->begin being
set properly everywhere, and I suspect this isn't the case.  A quick look shows sm_disk_inc_block()
doesn't adjust it.  sm_disk_inc_block() can be called when breaking sharing of a neighbouring entry
in a leaf btree node ... and we know you use snapshots very heavily.

I'll get a patch to you later today.

- Joe



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] [lvm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2020-01-07 10:46           ` Joe Thornber
  (?)
@ 2020-01-07 12:28             ` Joe Thornber
  -1 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 12:28 UTC (permalink / raw)
  To: LVM2 development, Mike Snitzer, markus.schade, ejt, linux-block,
	dm-devel, joe.thornber

On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> I'll get a patch to you later today.

Eric,

Patch below.  I've run it through a bunch of tests in the dm test suite.  But
obviously I have never hit your issue.  Will do more testing today.

- Joe



Author: Joe Thornber <ejt@redhat.com>
Date:   Tue Jan 7 11:58:42 2020 +0000

    [dm-thin, dm-cache] Fix bug in space-maps.
    
    The space-maps track the reference counts for disk blocks.  There are variants
    for tracking metadata blocks, and data blocks.
    
    We implement transactionality by never touching blocks from the previous
    transaction, so we can rollback in the event of a crash.
    
    When allocating a new block we need to ensure the block is free (has reference
    count of 0) in both the current and previous transaction.  Prior to this patch we
    were doing this by searching for a free block in the previous transaction, and
    relying on a 'begin' counter to track where the last allocation in the current
    transaction was.  This 'begin' field was not being updated in all code paths (eg,
    increment of a data block reference count due to breaking sharing of a neighbour
    block in the same btree leaf).
    
    This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
    Instead we search the current transaction for a free block, and then double check
    it's free in the old transaction.  Much simpler.

diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
index bd68f6fef694..b4983e4022e6 100644
--- a/drivers/md/persistent-data/dm-space-map-common.c
+++ b/drivers/md/persistent-data/dm-space-map-common.c
@@ -380,6 +380,34 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
 	return -ENOSPC;
 }
 
+int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
+	                         dm_block_t begin, dm_block_t end, dm_block_t *b)
+{
+	int r;
+	uint32_t count;
+
+	do {
+		r = sm_ll_find_free_block(new_ll, begin, new_ll->nr_blocks, b);
+		if (r)
+			break;
+
+		/* double check this block wasn't used in the old transaction */
+		if (*b >= old_ll->nr_blocks)
+			count = 0;
+
+		else {
+			r = sm_ll_lookup(old_ll, *b, &count);
+			if (r)
+				break;
+
+			if (count)
+				begin = *b + 1;
+		}
+	} while (count);
+
+	return r;
+}
+
 static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b,
 			int (*mutator)(void *context, uint32_t old, uint32_t *new),
 			void *context, enum allocation_event *ev)
diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h
index b3078d5eda0c..8de63ce39bdd 100644
--- a/drivers/md/persistent-data/dm-space-map-common.h
+++ b/drivers/md/persistent-data/dm-space-map-common.h
@@ -109,6 +109,8 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
 int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
 int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
 			  dm_block_t end, dm_block_t *result);
+int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
+	                         dm_block_t begin, dm_block_t end, dm_block_t *result);
 int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
 int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
 int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b4a9c7..bf4c5e2ccb6f 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -167,8 +167,10 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	enum allocation_event ev;
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
-	/* FIXME: we should loop round a couple of times */
-	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
+	/*
+	 * Any block we allocate has to be free in both the old and current ll.
+	 */
+	r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b);
 	if (r)
 		return r;
 
diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
index 25328582cc48..9e3c64ec2026 100644
--- a/drivers/md/persistent-data/dm-space-map-metadata.c
+++ b/drivers/md/persistent-data/dm-space-map-metadata.c
@@ -448,7 +448,10 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b)
 	enum allocation_event ev;
 	struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
 
-	r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b);
+	/*
+	 * Any block we allocate has to be free in both the old and current ll.
+	 */
+	r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b);
 	if (r)
 		return r;
 


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [lvm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 12:28             ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 12:28 UTC (permalink / raw)
  To: LVM2 development, Mike Snitzer, markus.schade, ejt, linux-block,
	dm-devel, joe.thornber

On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> I'll get a patch to you later today.

Eric,

Patch below.  I've run it through a bunch of tests in the dm test suite.  But
obviously I have never hit your issue.  Will do more testing today.

- Joe



Author: Joe Thornber <ejt@redhat.com>
Date:   Tue Jan 7 11:58:42 2020 +0000

    [dm-thin, dm-cache] Fix bug in space-maps.
    
    The space-maps track the reference counts for disk blocks.  There are variants
    for tracking metadata blocks, and data blocks.
    
    We implement transactionality by never touching blocks from the previous
    transaction, so we can rollback in the event of a crash.
    
    When allocating a new block we need to ensure the block is free (has reference
    count of 0) in both the current and previous transaction.  Prior to this patch we
    were doing this by searching for a free block in the previous transaction, and
    relying on a 'begin' counter to track where the last allocation in the current
    transaction was.  This 'begin' field was not being updated in all code paths (eg,
    increment of a data block reference count due to breaking sharing of a neighbour
    block in the same btree leaf).
    
    This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
    Instead we search the current transaction for a free block, and then double check
    it's free in the old transaction.  Much simpler.

diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
index bd68f6fef694..b4983e4022e6 100644
--- a/drivers/md/persistent-data/dm-space-map-common.c
+++ b/drivers/md/persistent-data/dm-space-map-common.c
@@ -380,6 +380,34 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
 	return -ENOSPC;
 }
 
+int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
+	                         dm_block_t begin, dm_block_t end, dm_block_t *b)
+{
+	int r;
+	uint32_t count;
+
+	do {
+		r = sm_ll_find_free_block(new_ll, begin, new_ll->nr_blocks, b);
+		if (r)
+			break;
+
+		/* double check this block wasn't used in the old transaction */
+		if (*b >= old_ll->nr_blocks)
+			count = 0;
+
+		else {
+			r = sm_ll_lookup(old_ll, *b, &count);
+			if (r)
+				break;
+
+			if (count)
+				begin = *b + 1;
+		}
+	} while (count);
+
+	return r;
+}
+
 static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b,
 			int (*mutator)(void *context, uint32_t old, uint32_t *new),
 			void *context, enum allocation_event *ev)
diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h
index b3078d5eda0c..8de63ce39bdd 100644
--- a/drivers/md/persistent-data/dm-space-map-common.h
+++ b/drivers/md/persistent-data/dm-space-map-common.h
@@ -109,6 +109,8 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
 int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
 int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
 			  dm_block_t end, dm_block_t *result);
+int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
+	                         dm_block_t begin, dm_block_t end, dm_block_t *result);
 int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
 int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
 int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b4a9c7..bf4c5e2ccb6f 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -167,8 +167,10 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	enum allocation_event ev;
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
-	/* FIXME: we should loop round a couple of times */
-	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
+	/*
+	 * Any block we allocate has to be free in both the old and current ll.
+	 */
+	r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b);
 	if (r)
 		return r;
 
diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
index 25328582cc48..9e3c64ec2026 100644
--- a/drivers/md/persistent-data/dm-space-map-metadata.c
+++ b/drivers/md/persistent-data/dm-space-map-metadata.c
@@ -448,7 +448,10 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b)
 	enum allocation_event ev;
 	struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
 
-	r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b);
+	/*
+	 * Any block we allocate has to be free in both the old and current ll.
+	 */
+	r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b);
 	if (r)
 		return r;
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 12:28             ` Joe Thornber
  0 siblings, 0 replies; 43+ messages in thread
From: Joe Thornber @ 2020-01-07 12:28 UTC (permalink / raw)
  To: lvm-devel

On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> I'll get a patch to you later today.

Eric,

Patch below.  I've run it through a bunch of tests in the dm test suite.  But
obviously I have never hit your issue.  Will do more testing today.

- Joe



Author: Joe Thornber <ejt@redhat.com>
Date:   Tue Jan 7 11:58:42 2020 +0000

    [dm-thin, dm-cache] Fix bug in space-maps.
    
    The space-maps track the reference counts for disk blocks.  There are variants
    for tracking metadata blocks, and data blocks.
    
    We implement transactionality by never touching blocks from the previous
    transaction, so we can rollback in the event of a crash.
    
    When allocating a new block we need to ensure the block is free (has reference
    count of 0) in both the current and previous transaction.  Prior to this patch we
    were doing this by searching for a free block in the previous transaction, and
    relying on a 'begin' counter to track where the last allocation in the current
    transaction was.  This 'begin' field was not being updated in all code paths (eg,
    increment of a data block reference count due to breaking sharing of a neighbour
    block in the same btree leaf).
    
    This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
    Instead we search the current transaction for a free block, and then double check
    it's free in the old transaction.  Much simpler.

diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
index bd68f6fef694..b4983e4022e6 100644
--- a/drivers/md/persistent-data/dm-space-map-common.c
+++ b/drivers/md/persistent-data/dm-space-map-common.c
@@ -380,6 +380,34 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
 	return -ENOSPC;
 }
 
+int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
+	                         dm_block_t begin, dm_block_t end, dm_block_t *b)
+{
+	int r;
+	uint32_t count;
+
+	do {
+		r = sm_ll_find_free_block(new_ll, begin, new_ll->nr_blocks, b);
+		if (r)
+			break;
+
+		/* double check this block wasn't used in the old transaction */
+		if (*b >= old_ll->nr_blocks)
+			count = 0;
+
+		else {
+			r = sm_ll_lookup(old_ll, *b, &count);
+			if (r)
+				break;
+
+			if (count)
+				begin = *b + 1;
+		}
+	} while (count);
+
+	return r;
+}
+
 static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b,
 			int (*mutator)(void *context, uint32_t old, uint32_t *new),
 			void *context, enum allocation_event *ev)
diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h
index b3078d5eda0c..8de63ce39bdd 100644
--- a/drivers/md/persistent-data/dm-space-map-common.h
+++ b/drivers/md/persistent-data/dm-space-map-common.h
@@ -109,6 +109,8 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
 int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
 int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
 			  dm_block_t end, dm_block_t *result);
+int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
+	                         dm_block_t begin, dm_block_t end, dm_block_t *result);
 int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
 int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
 int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b4a9c7..bf4c5e2ccb6f 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -167,8 +167,10 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	enum allocation_event ev;
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
-	/* FIXME: we should loop round a couple of times */
-	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
+	/*
+	 * Any block we allocate has to be free in both the old and current ll.
+	 */
+	r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b);
 	if (r)
 		return r;
 
diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
index 25328582cc48..9e3c64ec2026 100644
--- a/drivers/md/persistent-data/dm-space-map-metadata.c
+++ b/drivers/md/persistent-data/dm-space-map-metadata.c
@@ -448,7 +448,10 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b)
 	enum allocation_event ev;
 	struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
 
-	r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b);
+	/*
+	 * Any block we allocate has to be free in both the old and current ll.
+	 */
+	r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b);
 	if (r)
 		return r;
 



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [dm-devel] [lvm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2020-01-07 12:28             ` [lvm-devel] " Joe Thornber
  (?)
@ 2020-01-07 18:47               ` Eric Wheeler
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2020-01-07 18:47 UTC (permalink / raw)
  To: Joe Thornber
  Cc: LVM2 development, Mike Snitzer, markus.schade, ejt, linux-block,
	dm-devel, joe.thornber

On Tue, 7 Jan 2020, Joe Thornber wrote:

> On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > I'll get a patch to you later today.
> 
> Eric,
> 
> Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> obviously I have never hit your issue.  Will do more testing today.


Thank you Joe, I'll apply the patch and pull out the spinlock.  

I'm not familiar with how sync IO prevents a spinlock.  Can you give a 
brief explaination or point me at documentation?

--
Eric Wheeler



> 
> - Joe
> 
> 
> 
> Author: Joe Thornber <ejt@redhat.com>
> Date:   Tue Jan 7 11:58:42 2020 +0000
> 
>     [dm-thin, dm-cache] Fix bug in space-maps.
>     
>     The space-maps track the reference counts for disk blocks.  There are variants
>     for tracking metadata blocks, and data blocks.
>     
>     We implement transactionality by never touching blocks from the previous
>     transaction, so we can rollback in the event of a crash.
>     
>     When allocating a new block we need to ensure the block is free (has reference
>     count of 0) in both the current and previous transaction.  Prior to this patch we
>     were doing this by searching for a free block in the previous transaction, and
>     relying on a 'begin' counter to track where the last allocation in the current
>     transaction was.  This 'begin' field was not being updated in all code paths (eg,
>     increment of a data block reference count due to breaking sharing of a neighbour
>     block in the same btree leaf).
>     
>     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
>     Instead we search the current transaction for a free block, and then double check
>     it's free in the old transaction.  Much simpler.
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
> index bd68f6fef694..b4983e4022e6 100644
> --- a/drivers/md/persistent-data/dm-space-map-common.c
> +++ b/drivers/md/persistent-data/dm-space-map-common.c
> @@ -380,6 +380,34 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
>  	return -ENOSPC;
>  }
>  
> +int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
> +	                         dm_block_t begin, dm_block_t end, dm_block_t *b)
> +{
> +	int r;
> +	uint32_t count;
> +
> +	do {
> +		r = sm_ll_find_free_block(new_ll, begin, new_ll->nr_blocks, b);
> +		if (r)
> +			break;
> +
> +		/* double check this block wasn't used in the old transaction */
> +		if (*b >= old_ll->nr_blocks)
> +			count = 0;
> +
> +		else {
> +			r = sm_ll_lookup(old_ll, *b, &count);
> +			if (r)
> +				break;
> +
> +			if (count)
> +				begin = *b + 1;
> +		}
> +	} while (count);
> +
> +	return r;
> +}
> +
>  static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b,
>  			int (*mutator)(void *context, uint32_t old, uint32_t *new),
>  			void *context, enum allocation_event *ev)
> diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h
> index b3078d5eda0c..8de63ce39bdd 100644
> --- a/drivers/md/persistent-data/dm-space-map-common.h
> +++ b/drivers/md/persistent-data/dm-space-map-common.h
> @@ -109,6 +109,8 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
>  int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
>  int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
>  			  dm_block_t end, dm_block_t *result);
> +int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
> +	                         dm_block_t begin, dm_block_t end, dm_block_t *result);
>  int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
>  int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
>  int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b4a9c7..bf4c5e2ccb6f 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -167,8 +167,10 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	enum allocation_event ev;
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
> -	/* FIXME: we should loop round a couple of times */
> -	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> +	/*
> +	 * Any block we allocate has to be free in both the old and current ll.
> +	 */
> +	r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
> index 25328582cc48..9e3c64ec2026 100644
> --- a/drivers/md/persistent-data/dm-space-map-metadata.c
> +++ b/drivers/md/persistent-data/dm-space-map-metadata.c
> @@ -448,7 +448,10 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b)
>  	enum allocation_event ev;
>  	struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
>  
> -	r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b);
> +	/*
> +	 * Any block we allocate has to be free in both the old and current ll.
> +	 */
> +	r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b);
>  	if (r)
>  		return r;
>  
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 18:47               ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2020-01-07 18:47 UTC (permalink / raw)
  To: Joe Thornber
  Cc: Mike Snitzer, markus.schade, dm-devel, LVM2 development,
	linux-block, ejt, joe.thornber

On Tue, 7 Jan 2020, Joe Thornber wrote:

> On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > I'll get a patch to you later today.
> 
> Eric,
> 
> Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> obviously I have never hit your issue.  Will do more testing today.


Thank you Joe, I'll apply the patch and pull out the spinlock.  

I'm not familiar with how sync IO prevents a spinlock.  Can you give a 
brief explaination or point me at documentation?

--
Eric Wheeler



> 
> - Joe
> 
> 
> 
> Author: Joe Thornber <ejt@redhat.com>
> Date:   Tue Jan 7 11:58:42 2020 +0000
> 
>     [dm-thin, dm-cache] Fix bug in space-maps.
>     
>     The space-maps track the reference counts for disk blocks.  There are variants
>     for tracking metadata blocks, and data blocks.
>     
>     We implement transactionality by never touching blocks from the previous
>     transaction, so we can rollback in the event of a crash.
>     
>     When allocating a new block we need to ensure the block is free (has reference
>     count of 0) in both the current and previous transaction.  Prior to this patch we
>     were doing this by searching for a free block in the previous transaction, and
>     relying on a 'begin' counter to track where the last allocation in the current
>     transaction was.  This 'begin' field was not being updated in all code paths (eg,
>     increment of a data block reference count due to breaking sharing of a neighbour
>     block in the same btree leaf).
>     
>     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
>     Instead we search the current transaction for a free block, and then double check
>     it's free in the old transaction.  Much simpler.
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
> index bd68f6fef694..b4983e4022e6 100644
> --- a/drivers/md/persistent-data/dm-space-map-common.c
> +++ b/drivers/md/persistent-data/dm-space-map-common.c
> @@ -380,6 +380,34 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
>  	return -ENOSPC;
>  }
>  
> +int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
> +	                         dm_block_t begin, dm_block_t end, dm_block_t *b)
> +{
> +	int r;
> +	uint32_t count;
> +
> +	do {
> +		r = sm_ll_find_free_block(new_ll, begin, new_ll->nr_blocks, b);
> +		if (r)
> +			break;
> +
> +		/* double check this block wasn't used in the old transaction */
> +		if (*b >= old_ll->nr_blocks)
> +			count = 0;
> +
> +		else {
> +			r = sm_ll_lookup(old_ll, *b, &count);
> +			if (r)
> +				break;
> +
> +			if (count)
> +				begin = *b + 1;
> +		}
> +	} while (count);
> +
> +	return r;
> +}
> +
>  static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b,
>  			int (*mutator)(void *context, uint32_t old, uint32_t *new),
>  			void *context, enum allocation_event *ev)
> diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h
> index b3078d5eda0c..8de63ce39bdd 100644
> --- a/drivers/md/persistent-data/dm-space-map-common.h
> +++ b/drivers/md/persistent-data/dm-space-map-common.h
> @@ -109,6 +109,8 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
>  int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
>  int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
>  			  dm_block_t end, dm_block_t *result);
> +int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
> +	                         dm_block_t begin, dm_block_t end, dm_block_t *result);
>  int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
>  int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
>  int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b4a9c7..bf4c5e2ccb6f 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -167,8 +167,10 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	enum allocation_event ev;
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
> -	/* FIXME: we should loop round a couple of times */
> -	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> +	/*
> +	 * Any block we allocate has to be free in both the old and current ll.
> +	 */
> +	r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
> index 25328582cc48..9e3c64ec2026 100644
> --- a/drivers/md/persistent-data/dm-space-map-metadata.c
> +++ b/drivers/md/persistent-data/dm-space-map-metadata.c
> @@ -448,7 +448,10 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b)
>  	enum allocation_event ev;
>  	struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
>  
> -	r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b);
> +	/*
> +	 * Any block we allocate has to be free in both the old and current ll.
> +	 */
> +	r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b);
>  	if (r)
>  		return r;
>  
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 


--
lvm-devel mailing list
lvm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/lvm-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-07 18:47               ` Eric Wheeler
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Wheeler @ 2020-01-07 18:47 UTC (permalink / raw)
  To: lvm-devel

On Tue, 7 Jan 2020, Joe Thornber wrote:

> On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > I'll get a patch to you later today.
> 
> Eric,
> 
> Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> obviously I have never hit your issue.  Will do more testing today.


Thank you Joe, I'll apply the patch and pull out the spinlock.  

I'm not familiar with how sync IO prevents a spinlock.  Can you give a 
brief explaination or point me at documentation?

--
Eric Wheeler



> 
> - Joe
> 
> 
> 
> Author: Joe Thornber <ejt@redhat.com>
> Date:   Tue Jan 7 11:58:42 2020 +0000
> 
>     [dm-thin, dm-cache] Fix bug in space-maps.
>     
>     The space-maps track the reference counts for disk blocks.  There are variants
>     for tracking metadata blocks, and data blocks.
>     
>     We implement transactionality by never touching blocks from the previous
>     transaction, so we can rollback in the event of a crash.
>     
>     When allocating a new block we need to ensure the block is free (has reference
>     count of 0) in both the current and previous transaction.  Prior to this patch we
>     were doing this by searching for a free block in the previous transaction, and
>     relying on a 'begin' counter to track where the last allocation in the current
>     transaction was.  This 'begin' field was not being updated in all code paths (eg,
>     increment of a data block reference count due to breaking sharing of a neighbour
>     block in the same btree leaf).
>     
>     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
>     Instead we search the current transaction for a free block, and then double check
>     it's free in the old transaction.  Much simpler.
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c
> index bd68f6fef694..b4983e4022e6 100644
> --- a/drivers/md/persistent-data/dm-space-map-common.c
> +++ b/drivers/md/persistent-data/dm-space-map-common.c
> @@ -380,6 +380,34 @@ int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
>  	return -ENOSPC;
>  }
>  
> +int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
> +	                         dm_block_t begin, dm_block_t end, dm_block_t *b)
> +{
> +	int r;
> +	uint32_t count;
> +
> +	do {
> +		r = sm_ll_find_free_block(new_ll, begin, new_ll->nr_blocks, b);
> +		if (r)
> +			break;
> +
> +		/* double check this block wasn't used in the old transaction */
> +		if (*b >= old_ll->nr_blocks)
> +			count = 0;
> +
> +		else {
> +			r = sm_ll_lookup(old_ll, *b, &count);
> +			if (r)
> +				break;
> +
> +			if (count)
> +				begin = *b + 1;
> +		}
> +	} while (count);
> +
> +	return r;
> +}
> +
>  static int sm_ll_mutate(struct ll_disk *ll, dm_block_t b,
>  			int (*mutator)(void *context, uint32_t old, uint32_t *new),
>  			void *context, enum allocation_event *ev)
> diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h
> index b3078d5eda0c..8de63ce39bdd 100644
> --- a/drivers/md/persistent-data/dm-space-map-common.h
> +++ b/drivers/md/persistent-data/dm-space-map-common.h
> @@ -109,6 +109,8 @@ int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result);
>  int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result);
>  int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin,
>  			  dm_block_t end, dm_block_t *result);
> +int sm_ll_find_common_free_block(struct ll_disk *old_ll, struct ll_disk *new_ll,
> +	                         dm_block_t begin, dm_block_t end, dm_block_t *result);
>  int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev);
>  int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
>  int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev);
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b4a9c7..bf4c5e2ccb6f 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -167,8 +167,10 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	enum allocation_event ev;
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
> -	/* FIXME: we should loop round a couple of times */
> -	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> +	/*
> +	 * Any block we allocate has to be free in both the old and current ll.
> +	 */
> +	r = sm_ll_find_common_free_block(&smd->old_ll, &smd->ll, smd->begin, smd->ll.nr_blocks, b);
>  	if (r)
>  		return r;
>  
> diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c
> index 25328582cc48..9e3c64ec2026 100644
> --- a/drivers/md/persistent-data/dm-space-map-metadata.c
> +++ b/drivers/md/persistent-data/dm-space-map-metadata.c
> @@ -448,7 +448,10 @@ static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b)
>  	enum allocation_event ev;
>  	struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
>  
> -	r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b);
> +	/*
> +	 * Any block we allocate has to be free in both the old and current ll.
> +	 */
> +	r = sm_ll_find_common_free_block(&smm->old_ll, &smm->ll, smm->begin, smm->ll.nr_blocks, b);
>  	if (r)
>  		return r;
>  
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2020-01-07 12:28             ` [lvm-devel] " Joe Thornber
@ 2020-01-14 21:52               ` Eric Biggers
  -1 siblings, 0 replies; 43+ messages in thread
From: Eric Biggers @ 2020-01-14 21:52 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: LVM2 development, markus.schade, ejt, linux-block, dm-devel,
	joe.thornber, dm-devel

On Tue, Jan 07, 2020 at 12:28:25PM +0000, Joe Thornber wrote:
> On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > I'll get a patch to you later today.
> 
> Eric,
> 
> Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> obviously I have never hit your issue.  Will do more testing today.
> 
> - Joe
> 
> 
> 
> Author: Joe Thornber <ejt@redhat.com>
> Date:   Tue Jan 7 11:58:42 2020 +0000
> 
>     [dm-thin, dm-cache] Fix bug in space-maps.
>     
>     The space-maps track the reference counts for disk blocks.  There are variants
>     for tracking metadata blocks, and data blocks.
>     
>     We implement transactionality by never touching blocks from the previous
>     transaction, so we can rollback in the event of a crash.
>     
>     When allocating a new block we need to ensure the block is free (has reference
>     count of 0) in both the current and previous transaction.  Prior to this patch we
>     were doing this by searching for a free block in the previous transaction, and
>     relying on a 'begin' counter to track where the last allocation in the current
>     transaction was.  This 'begin' field was not being updated in all code paths (eg,
>     increment of a data block reference count due to breaking sharing of a neighbour
>     block in the same btree leaf).
>     
>     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
>     Instead we search the current transaction for a free block, and then double check
>     it's free in the old transaction.  Much simpler.
> 

I happened to notice this patch is on the linux-dm/for-next branch
(https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=2137c0dcc04b24efb4c38d4b46b7194575718dd5)
and it has:

	Reported-by: Eric Biggers <ebiggers@google.com>

This is wrong, I didn't report this.  I think you meant to put:

	Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net>

- Eric (the other one)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-14 21:52               ` Eric Biggers
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Biggers @ 2020-01-14 21:52 UTC (permalink / raw)
  To: lvm-devel

On Tue, Jan 07, 2020 at 12:28:25PM +0000, Joe Thornber wrote:
> On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > I'll get a patch to you later today.
> 
> Eric,
> 
> Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> obviously I have never hit your issue.  Will do more testing today.
> 
> - Joe
> 
> 
> 
> Author: Joe Thornber <ejt@redhat.com>
> Date:   Tue Jan 7 11:58:42 2020 +0000
> 
>     [dm-thin, dm-cache] Fix bug in space-maps.
>     
>     The space-maps track the reference counts for disk blocks.  There are variants
>     for tracking metadata blocks, and data blocks.
>     
>     We implement transactionality by never touching blocks from the previous
>     transaction, so we can rollback in the event of a crash.
>     
>     When allocating a new block we need to ensure the block is free (has reference
>     count of 0) in both the current and previous transaction.  Prior to this patch we
>     were doing this by searching for a free block in the previous transaction, and
>     relying on a 'begin' counter to track where the last allocation in the current
>     transaction was.  This 'begin' field was not being updated in all code paths (eg,
>     increment of a data block reference count due to breaking sharing of a neighbour
>     block in the same btree leaf).
>     
>     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
>     Instead we search the current transaction for a free block, and then double check
>     it's free in the old transaction.  Much simpler.
> 

I happened to notice this patch is on the linux-dm/for-next branch
(https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=2137c0dcc04b24efb4c38d4b46b7194575718dd5)
and it has:

	Reported-by: Eric Biggers <ebiggers@google.com>

This is wrong, I didn't report this.  I think you meant to put:

	Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net>

- Eric (the other one)




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
  2020-01-14 21:52               ` Eric Biggers
@ 2020-01-15  1:22                 ` Mike Snitzer
  -1 siblings, 0 replies; 43+ messages in thread
From: Mike Snitzer @ 2020-01-15  1:22 UTC (permalink / raw)
  To: Eric Biggers
  Cc: LVM2 development, markus.schade, Joe Thornber, linux-block,
	device-mapper development, Joe Thornber, Eric Wheeler

On Tue, Jan 14, 2020 at 4:53 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Jan 07, 2020 at 12:28:25PM +0000, Joe Thornber wrote:
> > On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > > I'll get a patch to you later today.
> >
> > Eric,
> >
> > Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> > obviously I have never hit your issue.  Will do more testing today.
> >
> > - Joe
> >
> >
> >
> > Author: Joe Thornber <ejt@redhat.com>
> > Date:   Tue Jan 7 11:58:42 2020 +0000
> >
> >     [dm-thin, dm-cache] Fix bug in space-maps.
> >
> >     The space-maps track the reference counts for disk blocks.  There are variants
> >     for tracking metadata blocks, and data blocks.
> >
> >     We implement transactionality by never touching blocks from the previous
> >     transaction, so we can rollback in the event of a crash.
> >
> >     When allocating a new block we need to ensure the block is free (has reference
> >     count of 0) in both the current and previous transaction.  Prior to this patch we
> >     were doing this by searching for a free block in the previous transaction, and
> >     relying on a 'begin' counter to track where the last allocation in the current
> >     transaction was.  This 'begin' field was not being updated in all code paths (eg,
> >     increment of a data block reference count due to breaking sharing of a neighbour
> >     block in the same btree leaf).
> >
> >     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
> >     Instead we search the current transaction for a free block, and then double check
> >     it's free in the old transaction.  Much simpler.
> >
>
> I happened to notice this patch is on the linux-dm/for-next branch
> (https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=2137c0dcc04b24efb4c38d4b46b7194575718dd5)
> and it has:
>
>         Reported-by: Eric Biggers <ebiggers@google.com>
>
> This is wrong, I didn't report this.  I think you meant to put:
>
>         Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
>
> - Eric (the other one)

Fixed it up, not sure how that happened, sorry about that!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178
@ 2020-01-15  1:22                 ` Mike Snitzer
  0 siblings, 0 replies; 43+ messages in thread
From: Mike Snitzer @ 2020-01-15  1:22 UTC (permalink / raw)
  To: lvm-devel

On Tue, Jan 14, 2020 at 4:53 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Tue, Jan 07, 2020 at 12:28:25PM +0000, Joe Thornber wrote:
> > On Tue, Jan 07, 2020 at 10:46:27AM +0000, Joe Thornber wrote:
> > > I'll get a patch to you later today.
> >
> > Eric,
> >
> > Patch below.  I've run it through a bunch of tests in the dm test suite.  But
> > obviously I have never hit your issue.  Will do more testing today.
> >
> > - Joe
> >
> >
> >
> > Author: Joe Thornber <ejt@redhat.com>
> > Date:   Tue Jan 7 11:58:42 2020 +0000
> >
> >     [dm-thin, dm-cache] Fix bug in space-maps.
> >
> >     The space-maps track the reference counts for disk blocks.  There are variants
> >     for tracking metadata blocks, and data blocks.
> >
> >     We implement transactionality by never touching blocks from the previous
> >     transaction, so we can rollback in the event of a crash.
> >
> >     When allocating a new block we need to ensure the block is free (has reference
> >     count of 0) in both the current and previous transaction.  Prior to this patch we
> >     were doing this by searching for a free block in the previous transaction, and
> >     relying on a 'begin' counter to track where the last allocation in the current
> >     transaction was.  This 'begin' field was not being updated in all code paths (eg,
> >     increment of a data block reference count due to breaking sharing of a neighbour
> >     block in the same btree leaf).
> >
> >     This patch keeps the 'begin' field, but now it's just a hint to speed up the search.
> >     Instead we search the current transaction for a free block, and then double check
> >     it's free in the old transaction.  Much simpler.
> >
>
> I happened to notice this patch is on the linux-dm/for-next branch
> (https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=2137c0dcc04b24efb4c38d4b46b7194575718dd5)
> and it has:
>
>         Reported-by: Eric Biggers <ebiggers@google.com>
>
> This is wrong, I didn't report this.  I think you meant to put:
>
>         Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
>
> - Eric (the other one)

Fixed it up, not sure how that happened, sorry about that!




^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-01-15  1:22 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-25 18:40 kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 with scsi_mod.use_blk_mq=y Eric Wheeler
2019-09-25 18:40 ` Eric Wheeler
2019-09-25 18:40 ` Eric Wheeler
2019-09-25 20:01 ` Mike Snitzer
2019-09-25 20:01   ` Mike Snitzer
2019-09-25 20:01   ` Mike Snitzer
2019-09-25 20:33   ` Eric Wheeler
2019-09-25 20:33     ` Eric Wheeler
2019-09-25 20:33     ` Eric Wheeler
2019-09-26 18:27   ` Eric Wheeler
2019-09-26 18:27     ` Eric Wheeler
2019-09-26 18:27     ` Eric Wheeler
2019-09-27  8:32     ` Joe Thornber
2019-09-27  8:32       ` Joe Thornber
2019-09-27  8:32       ` Joe Thornber
2019-09-27 18:45       ` Eric Wheeler
2019-09-27 18:45         ` Eric Wheeler
2019-09-27 18:45         ` Eric Wheeler
2019-12-20 19:54 ` [dm-devel] " Eric Wheeler
2019-12-20 19:54   ` Eric Wheeler
2019-12-20 19:54   ` Eric Wheeler
2019-12-27  1:47   ` [dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178 Eric Wheeler
2019-12-27  1:47     ` Eric Wheeler
2019-12-27  1:47     ` Eric Wheeler
2019-12-28  2:13     ` Eric Wheeler
2019-12-28  2:13       ` Eric Wheeler
2019-12-28  2:13       ` Eric Wheeler
2020-01-07 10:35       ` [lvm-devel] " Joe Thornber
2020-01-07 10:35         ` Joe Thornber
2020-01-07 10:35         ` Joe Thornber
2020-01-07 10:46         ` [lvm-devel] " Joe Thornber
2020-01-07 10:46           ` Joe Thornber
2020-01-07 10:46           ` Joe Thornber
2020-01-07 12:28           ` [dm-devel] [lvm-devel] " Joe Thornber
2020-01-07 12:28             ` [dm-devel] " Joe Thornber
2020-01-07 12:28             ` [lvm-devel] " Joe Thornber
2020-01-07 18:47             ` [dm-devel] " Eric Wheeler
2020-01-07 18:47               ` [dm-devel] " Eric Wheeler
2020-01-07 18:47               ` Eric Wheeler
2020-01-14 21:52             ` Eric Biggers
2020-01-14 21:52               ` Eric Biggers
2020-01-15  1:22               ` Mike Snitzer
2020-01-15  1:22                 ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.