All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] kernel BUG at drivers/md/dm-bufio.c:1155!
@ 2022-05-02 18:03 Marc Smith
  2022-05-04 17:50 ` Marc Smith
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Smith @ 2022-05-02 18:03 UTC (permalink / raw)
  To: dm-devel

Hi,

I'm using Linux 5.4.190 (kernel.org) with 'lvmcache' (LVM2 version
2.03.14). I use dm-cache (in write-back mode) and dm-writecache, both
via lvmcache for several logical volumes. I was recently performing
power-loss tests, and noticed one of the dm-cache (in write-back mode)
LV's is encountering a problem when I attempt to activate that volume
group:
[root@fusion-dvt3a ~]# vgchange -ay 10072a_1_default
Segmentation fault
[root@fusion-dvt3a ~]#

Then in the kernel log I see this:
...
[ 4528.074501] device-mapper: cache: Origin device (dm-30) discard
unsupported: Disabling discard passdown.
[ 4528.074716] device-mapper: array: array_block_check failed: blocknr
16818504669106333287 != wanted 10
[ 4528.074717] device-mapper: block manager: array validator check
failed for block 10
[ 4528.074717] device-mapper: array: get_ablock failed
[ 4528.074718] device-mapper: cache metadata: dm_array_cursor_next for
mapping failed
[ 4528.074727] ------------[ cut here ]------------
[ 4528.074728] kernel BUG at drivers/md/dm-bufio.c:1155!
[ 4528.074732] invalid opcode: 0000 [#1] SMP NOPTI
[ 4528.074757] CPU: 9 PID: 53811 Comm: vgchange Kdump: loaded Tainted:
P           OE     5.4.190-esos.prod #1
[ 4528.074797] Hardware name: Quantum H2012/H12SSW-NT, BIOS
T20201009143356 10/09/2020
[ 4528.074833] RIP: 0010:dm_bufio_release+0x19/0x75
[ 4528.074855] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
00 49
[ 4528.074928] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
[ 4528.074952] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
[ 4528.074982] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
[ 4528.075012] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
[ 4528.075042] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
[ 4528.075073] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
[ 4528.075103] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
knlGS:0000000000000000
[ 4528.075137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4528.075161] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
[ 4528.075191] Call Trace:
[ 4528.075207]  dm_array_cursor_end+0x1c/0x27
[ 4528.075255]  dm_cache_load_mappings+0x2d0/0x322
[ 4528.075304]  ? clear_discard+0x34/0x34
[ 4528.075348]  cache_preresume+0x108/0x1e3
[ 4528.075395]  dm_table_resume_targets+0x38/0xa9
[ 4528.075443]  __dm_resume+0x33/0x45
[ 4528.075486]  dm_resume+0x8c/0xb1
[ 4528.075529]  dev_suspend+0x173/0x1d8
[ 4528.075573]  ctl_ioctl+0x2a6/0x39d
[ 4528.075617]  ? retrieve_status+0x19e/0x19e
[ 4528.075663]  dm_ctl_ioctl+0x5/0x8
[ 4528.075708]  vfs_ioctl+0x19/0x26
[ 4528.075751]  do_vfs_ioctl+0x52a/0x554
[ 4528.075797]  ? ksys_semctl.constprop.0+0xfe/0x15e
[ 4528.075847]  ? rcu_is_watching+0x5/0x9
[ 4528.075891]  ksys_ioctl+0x37/0x56
[ 4528.075935]  __x64_sys_ioctl+0x11/0x14
[ 4528.075980]  do_syscall_64+0x52/0x60
[ 4528.076027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4528.076077] RIP: 0033:0x7f48fac5f617
[ 4528.076121] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
01 48
[ 4528.076276] RSP: 002b:00007ffe25fa3298 EFLAGS: 00000206 ORIG_RAX:
0000000000000010
[ 4528.076362] RAX: ffffffffffffffda RBX: 0000556c551c3391 RCX: 00007f48fac5f617
[ 4528.076419] RDX: 0000556c564ff2c0 RSI: 00000000c138fd06 RDI: 0000000000000003
[ 4528.076475] RBP: 0000556c564f3d30 R08: 0000556c552b7a5b R09: 00007ffe25fa30f8
[ 4528.076532] R10: 0000556c552beb30 R11: 0000000000000206 R12: 0000556c564ff2f0
[ 4528.076589] R13: 0000556c564ff2c0 R14: 0000556c564ff370 R15: 0000000000000001
[ 4528.076646] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
[ 4528.077476] ---[ end trace 97c700627c684afd ]---
[ 4528.083087] RIP: 0010:dm_bufio_release+0x19/0x75
[ 4528.083189] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
00 49
[ 4528.083397] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
[ 4528.083499] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
[ 4528.083608] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
[ 4528.083717] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
[ 4528.083827] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
[ 4528.083936] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
[ 4528.084051] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
knlGS:0000000000000000
[ 4528.084194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4528.084298] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
...

The "blocknr" value of "16818504669106333287" caught my eye, as that
number seems way out of bounds for the origin/cache devices in this
setup (the cache LV size in this case is 256 MiB, and the origin
device is less than 500 GiB).

At first I suspected cache device corruption, so I used the PV data
offset + PE segment sizes and offset to extract an image of the cache
LV into a separate file so I can examine it with the cache_* user land
tools, and this is what I see:
[root@fusion-dvt3a ~]# cache_dump /mnt/data/md122_C02_cvol.img
<superblock uuid="" block_size="128" nr_cache_blocks="3968"
policy="mq" hint_width="4">
  <mappings>
    <mapping cache_block="0" origin_block="7325368" dirty="false"/>
    <mapping cache_block="1" origin_block="7325369" dirty="false"/>
    <mapping cache_block="2" origin_block="7325370" dirty="false"/>
    <mapping cache_block="3" origin_block="7325371" dirty="false"/>
    <mapping cache_block="4" origin_block="7325372" dirty="false"/>
    <mapping cache_block="5" origin_block="7325373" dirty="false"/>
    <mapping cache_block="6" origin_block="7325374" dirty="false"/>
    <mapping cache_block="7" origin_block="7325375" dirty="false"/>
    <mapping cache_block="8" origin_block="0" dirty="false"/>
    <mapping cache_block="9" origin_block="7325351" dirty="false"/>
    <mapping cache_block="10" origin_block="1" dirty="false"/>
    <mapping cache_block="11" origin_block="2" dirty="false"/>
    <mapping cache_block="12" origin_block="4" dirty="false"/>
    <mapping cache_block="13" origin_block="8" dirty="false"/>
    <mapping cache_block="14" origin_block="16" dirty="false"/>
    <mapping cache_block="15" origin_block="32" dirty="false"/>
    <mapping cache_block="16" origin_block="64" dirty="false"/>
    <mapping cache_block="17" origin_block="3" dirty="false"/>
    <mapping cache_block="18" origin_block="5" dirty="false"/>
    <mapping cache_block="19" origin_block="6" dirty="false"/>
    <mapping cache_block="20" origin_block="7" dirty="false"/>
    <mapping cache_block="21" origin_block="9" dirty="false"/>
    <mapping cache_block="22" origin_block="10" dirty="false"/>
    <mapping cache_block="23" origin_block="11" dirty="false"/>
    <mapping cache_block="24" origin_block="12" dirty="false"/>
    <mapping cache_block="25" origin_block="13" dirty="false"/>
    <mapping cache_block="26" origin_block="14" dirty="false"/>
    <mapping cache_block="27" origin_block="15" dirty="false"/>
    <mapping cache_block="28" origin_block="7322946" dirty="false"/>
    <mapping cache_block="29" origin_block="7324340" dirty="false"/>
    <mapping cache_block="30" origin_block="7324341" dirty="false"/>
    <mapping cache_block="31" origin_block="7324342" dirty="false"/>
    <mapping cache_block="32" origin_block="7324343" dirty="false"/>
    <mapping cache_block="33" origin_block="7324344" dirty="false"/>
    <mapping cache_block="34" origin_block="7324345" dirty="false"/>
    <mapping cache_block="35" origin_block="7324346" dirty="false"/>
    <mapping cache_block="36" origin_block="7324347" dirty="false"/>
  </mappings>
  <hints>
    <hint cache_block="0" data="PwAAAA=="/>
    <hint cache_block="1" data="PwAAAA=="/>
    <hint cache_block="2" data="PwAAAA=="/>
    <hint cache_block="3" data="PwAAAA=="/>
    <hint cache_block="4" data="PwAAAA=="/>
    <hint cache_block="5" data="PwAAAA=="/>
    <hint cache_block="6" data="PwAAAA=="/>
    <hint cache_block="7" data="PwAAAA=="/>
    <hint cache_block="8" data="PwAAAA=="/>
    <hint cache_block="9" data="PwAAAA=="/>
    <hint cache_block="10" data="PwAAAA=="/>
    <hint cache_block="11" data="PwAAAA=="/>
    <hint cache_block="12" data="PwAAAA=="/>
    <hint cache_block="13" data="PwAAAA=="/>
    <hint cache_block="14" data="PwAAAA=="/>
    <hint cache_block="15" data="PwAAAA=="/>
    <hint cache_block="16" data="PwAAAA=="/>
    <hint cache_block="17" data="PwAAAA=="/>
    <hint cache_block="18" data="PwAAAA=="/>
    <hint cache_block="19" data="PwAAAA=="/>
    <hint cache_block="20" data="PwAAAA=="/>
    <hint cache_block="21" data="AAAAAA=="/>
    <hint cache_block="22" data="AAAAAA=="/>
    <hint cache_block="23" data="AAAAAA=="/>
    <hint cache_block="24" data="AAAAAA=="/>
    <hint cache_block="25" data="AAAAAA=="/>
    <hint cache_block="26" data="AAAAAA=="/>
    <hint cache_block="27" data="AAAAAA=="/>
    <hint cache_block="28" data="AAAAAA=="/>
    <hint cache_block="29" data="AAAAAA=="/>
    <hint cache_block="30" data="AAAAAA=="/>
    <hint cache_block="31" data="AAAAAA=="/>
    <hint cache_block="32" data="AAAAAA=="/>
    <hint cache_block="33" data="AAAAAA=="/>
    <hint cache_block="34" data="AAAAAA=="/>
    <hint cache_block="35" data="AAAAAA=="/>
    <hint cache_block="36" data="AAAAAA=="/>
  </hints>
</superblock>
[root@fusion-dvt3a ~]# cache_check /mnt/data/md122_C02_cvol.img
examining superblock
examining mapping array
examining hint array
examining discard bitset
[root@fusion-dvt3a ~]# echo $?
0
[root@fusion-dvt3a ~]#

Nothing seems out of the ordinary to me, and 'cache_check' doesn't
report any errors/problems with the metadata.

So I then used a "debug" Linux Kernel (same 5.4.190 code) that
contains additional debugging features (full preemption, lock
debugging, etc.). It also has the
CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING feature enabled, which I suspect
might be related to the persistent-data stuff used by dm-cache. And
when I boot this kernel that has the additional debug features, I do
NOT hit the BUG_ON() in dm_bufio_release() and everything works
properly -- the VG/LV's activate just fine and are usable, no kernel
errors / call traces logged.

I switched back and fourth between these two kernels several times,
just to make sure it wasn't a fluke. And indeed, I am unable to hit
this problem with the "debug" kernel.

Based on the experiment between the two kernel configs and the user
land tools not reporting an error, it sounds like my cache metadata is
sound, but hitting some other issue that I don't understand yet.

Wondering if anyone has any tips / guidance on additional debugging I
can do? I plan to circle back and take my original kernel config and
enable CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING to see if that has any
effect.


Thanks for your time.

--Marc

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/dm-bufio.c:1155!
  2022-05-02 18:03 [dm-devel] kernel BUG at drivers/md/dm-bufio.c:1155! Marc Smith
@ 2022-05-04 17:50 ` Marc Smith
  2022-05-12 15:48   ` Marc Smith
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Smith @ 2022-05-04 17:50 UTC (permalink / raw)
  To: dm-devel

On Mon, May 2, 2022 at 2:03 PM Marc Smith <msmith626@gmail.com> wrote:
>
> Hi,
>
> I'm using Linux 5.4.190 (kernel.org) with 'lvmcache' (LVM2 version
> 2.03.14). I use dm-cache (in write-back mode) and dm-writecache, both
> via lvmcache for several logical volumes. I was recently performing
> power-loss tests, and noticed one of the dm-cache (in write-back mode)
> LV's is encountering a problem when I attempt to activate that volume
> group:
> [root@fusion-dvt3a ~]# vgchange -ay 10072a_1_default
> Segmentation fault
> [root@fusion-dvt3a ~]#
>
> Then in the kernel log I see this:
> ...
> [ 4528.074501] device-mapper: cache: Origin device (dm-30) discard
> unsupported: Disabling discard passdown.
> [ 4528.074716] device-mapper: array: array_block_check failed: blocknr
> 16818504669106333287 != wanted 10
> [ 4528.074717] device-mapper: block manager: array validator check
> failed for block 10
> [ 4528.074717] device-mapper: array: get_ablock failed
> [ 4528.074718] device-mapper: cache metadata: dm_array_cursor_next for
> mapping failed
> [ 4528.074727] ------------[ cut here ]------------
> [ 4528.074728] kernel BUG at drivers/md/dm-bufio.c:1155!
> [ 4528.074732] invalid opcode: 0000 [#1] SMP NOPTI
> [ 4528.074757] CPU: 9 PID: 53811 Comm: vgchange Kdump: loaded Tainted:
> P           OE     5.4.190-esos.prod #1
> [ 4528.074797] Hardware name: Quantum H2012/H12SSW-NT, BIOS
> T20201009143356 10/09/2020
> [ 4528.074833] RIP: 0010:dm_bufio_release+0x19/0x75
> [ 4528.074855] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
> ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
> c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
> 00 49
> [ 4528.074928] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
> [ 4528.074952] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
> [ 4528.074982] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
> [ 4528.075012] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
> [ 4528.075042] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
> [ 4528.075073] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
> [ 4528.075103] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
> knlGS:0000000000000000
> [ 4528.075137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4528.075161] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
> [ 4528.075191] Call Trace:
> [ 4528.075207]  dm_array_cursor_end+0x1c/0x27
> [ 4528.075255]  dm_cache_load_mappings+0x2d0/0x322
> [ 4528.075304]  ? clear_discard+0x34/0x34
> [ 4528.075348]  cache_preresume+0x108/0x1e3
> [ 4528.075395]  dm_table_resume_targets+0x38/0xa9
> [ 4528.075443]  __dm_resume+0x33/0x45
> [ 4528.075486]  dm_resume+0x8c/0xb1
> [ 4528.075529]  dev_suspend+0x173/0x1d8
> [ 4528.075573]  ctl_ioctl+0x2a6/0x39d
> [ 4528.075617]  ? retrieve_status+0x19e/0x19e
> [ 4528.075663]  dm_ctl_ioctl+0x5/0x8
> [ 4528.075708]  vfs_ioctl+0x19/0x26
> [ 4528.075751]  do_vfs_ioctl+0x52a/0x554
> [ 4528.075797]  ? ksys_semctl.constprop.0+0xfe/0x15e
> [ 4528.075847]  ? rcu_is_watching+0x5/0x9
> [ 4528.075891]  ksys_ioctl+0x37/0x56
> [ 4528.075935]  __x64_sys_ioctl+0x11/0x14
> [ 4528.075980]  do_syscall_64+0x52/0x60
> [ 4528.076027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 4528.076077] RIP: 0033:0x7f48fac5f617
> [ 4528.076121] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
> 0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
> 01 48
> [ 4528.076276] RSP: 002b:00007ffe25fa3298 EFLAGS: 00000206 ORIG_RAX:
> 0000000000000010
> [ 4528.076362] RAX: ffffffffffffffda RBX: 0000556c551c3391 RCX: 00007f48fac5f617
> [ 4528.076419] RDX: 0000556c564ff2c0 RSI: 00000000c138fd06 RDI: 0000000000000003
> [ 4528.076475] RBP: 0000556c564f3d30 R08: 0000556c552b7a5b R09: 00007ffe25fa30f8
> [ 4528.076532] R10: 0000556c552beb30 R11: 0000000000000206 R12: 0000556c564ff2f0
> [ 4528.076589] R13: 0000556c564ff2c0 R14: 0000556c564ff370 R15: 0000000000000001
> [ 4528.076646] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
> bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
> nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
> cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
> ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
> ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
> ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
> bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
> [ 4528.077476] ---[ end trace 97c700627c684afd ]---
> [ 4528.083087] RIP: 0010:dm_bufio_release+0x19/0x75
> [ 4528.083189] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
> ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
> c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
> 00 49
> [ 4528.083397] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
> [ 4528.083499] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
> [ 4528.083608] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
> [ 4528.083717] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
> [ 4528.083827] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
> [ 4528.083936] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
> [ 4528.084051] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
> knlGS:0000000000000000
> [ 4528.084194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4528.084298] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
> ...
>
> The "blocknr" value of "16818504669106333287" caught my eye, as that
> number seems way out of bounds for the origin/cache devices in this
> setup (the cache LV size in this case is 256 MiB, and the origin
> device is less than 500 GiB).
>
> At first I suspected cache device corruption, so I used the PV data
> offset + PE segment sizes and offset to extract an image of the cache
> LV into a separate file so I can examine it with the cache_* user land
> tools, and this is what I see:
> [root@fusion-dvt3a ~]# cache_dump /mnt/data/md122_C02_cvol.img
> <superblock uuid="" block_size="128" nr_cache_blocks="3968"
> policy="mq" hint_width="4">
>   <mappings>
>     <mapping cache_block="0" origin_block="7325368" dirty="false"/>
>     <mapping cache_block="1" origin_block="7325369" dirty="false"/>
>     <mapping cache_block="2" origin_block="7325370" dirty="false"/>
>     <mapping cache_block="3" origin_block="7325371" dirty="false"/>
>     <mapping cache_block="4" origin_block="7325372" dirty="false"/>
>     <mapping cache_block="5" origin_block="7325373" dirty="false"/>
>     <mapping cache_block="6" origin_block="7325374" dirty="false"/>
>     <mapping cache_block="7" origin_block="7325375" dirty="false"/>
>     <mapping cache_block="8" origin_block="0" dirty="false"/>
>     <mapping cache_block="9" origin_block="7325351" dirty="false"/>
>     <mapping cache_block="10" origin_block="1" dirty="false"/>
>     <mapping cache_block="11" origin_block="2" dirty="false"/>
>     <mapping cache_block="12" origin_block="4" dirty="false"/>
>     <mapping cache_block="13" origin_block="8" dirty="false"/>
>     <mapping cache_block="14" origin_block="16" dirty="false"/>
>     <mapping cache_block="15" origin_block="32" dirty="false"/>
>     <mapping cache_block="16" origin_block="64" dirty="false"/>
>     <mapping cache_block="17" origin_block="3" dirty="false"/>
>     <mapping cache_block="18" origin_block="5" dirty="false"/>
>     <mapping cache_block="19" origin_block="6" dirty="false"/>
>     <mapping cache_block="20" origin_block="7" dirty="false"/>
>     <mapping cache_block="21" origin_block="9" dirty="false"/>
>     <mapping cache_block="22" origin_block="10" dirty="false"/>
>     <mapping cache_block="23" origin_block="11" dirty="false"/>
>     <mapping cache_block="24" origin_block="12" dirty="false"/>
>     <mapping cache_block="25" origin_block="13" dirty="false"/>
>     <mapping cache_block="26" origin_block="14" dirty="false"/>
>     <mapping cache_block="27" origin_block="15" dirty="false"/>
>     <mapping cache_block="28" origin_block="7322946" dirty="false"/>
>     <mapping cache_block="29" origin_block="7324340" dirty="false"/>
>     <mapping cache_block="30" origin_block="7324341" dirty="false"/>
>     <mapping cache_block="31" origin_block="7324342" dirty="false"/>
>     <mapping cache_block="32" origin_block="7324343" dirty="false"/>
>     <mapping cache_block="33" origin_block="7324344" dirty="false"/>
>     <mapping cache_block="34" origin_block="7324345" dirty="false"/>
>     <mapping cache_block="35" origin_block="7324346" dirty="false"/>
>     <mapping cache_block="36" origin_block="7324347" dirty="false"/>
>   </mappings>
>   <hints>
>     <hint cache_block="0" data="PwAAAA=="/>
>     <hint cache_block="1" data="PwAAAA=="/>
>     <hint cache_block="2" data="PwAAAA=="/>
>     <hint cache_block="3" data="PwAAAA=="/>
>     <hint cache_block="4" data="PwAAAA=="/>
>     <hint cache_block="5" data="PwAAAA=="/>
>     <hint cache_block="6" data="PwAAAA=="/>
>     <hint cache_block="7" data="PwAAAA=="/>
>     <hint cache_block="8" data="PwAAAA=="/>
>     <hint cache_block="9" data="PwAAAA=="/>
>     <hint cache_block="10" data="PwAAAA=="/>
>     <hint cache_block="11" data="PwAAAA=="/>
>     <hint cache_block="12" data="PwAAAA=="/>
>     <hint cache_block="13" data="PwAAAA=="/>
>     <hint cache_block="14" data="PwAAAA=="/>
>     <hint cache_block="15" data="PwAAAA=="/>
>     <hint cache_block="16" data="PwAAAA=="/>
>     <hint cache_block="17" data="PwAAAA=="/>
>     <hint cache_block="18" data="PwAAAA=="/>
>     <hint cache_block="19" data="PwAAAA=="/>
>     <hint cache_block="20" data="PwAAAA=="/>
>     <hint cache_block="21" data="AAAAAA=="/>
>     <hint cache_block="22" data="AAAAAA=="/>
>     <hint cache_block="23" data="AAAAAA=="/>
>     <hint cache_block="24" data="AAAAAA=="/>
>     <hint cache_block="25" data="AAAAAA=="/>
>     <hint cache_block="26" data="AAAAAA=="/>
>     <hint cache_block="27" data="AAAAAA=="/>
>     <hint cache_block="28" data="AAAAAA=="/>
>     <hint cache_block="29" data="AAAAAA=="/>
>     <hint cache_block="30" data="AAAAAA=="/>
>     <hint cache_block="31" data="AAAAAA=="/>
>     <hint cache_block="32" data="AAAAAA=="/>
>     <hint cache_block="33" data="AAAAAA=="/>
>     <hint cache_block="34" data="AAAAAA=="/>
>     <hint cache_block="35" data="AAAAAA=="/>
>     <hint cache_block="36" data="AAAAAA=="/>
>   </hints>
> </superblock>
> [root@fusion-dvt3a ~]# cache_check /mnt/data/md122_C02_cvol.img
> examining superblock
> examining mapping array
> examining hint array
> examining discard bitset
> [root@fusion-dvt3a ~]# echo $?
> 0
> [root@fusion-dvt3a ~]#
>
> Nothing seems out of the ordinary to me, and 'cache_check' doesn't
> report any errors/problems with the metadata.
>
> So I then used a "debug" Linux Kernel (same 5.4.190 code) that
> contains additional debugging features (full preemption, lock
> debugging, etc.). It also has the
> CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING feature enabled, which I suspect
> might be related to the persistent-data stuff used by dm-cache. And
> when I boot this kernel that has the additional debug features, I do
> NOT hit the BUG_ON() in dm_bufio_release() and everything works
> properly -- the VG/LV's activate just fine and are usable, no kernel
> errors / call traces logged.
>
> I switched back and fourth between these two kernels several times,
> just to make sure it wasn't a fluke. And indeed, I am unable to hit
> this problem with the "debug" kernel.
>
> Based on the experiment between the two kernel configs and the user
> land tools not reporting an error, it sounds like my cache metadata is
> sound, but hitting some other issue that I don't understand yet.
>
> Wondering if anyone has any tips / guidance on additional debugging I
> can do? I plan to circle back and take my original kernel config and
> enable CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING to see if that has any
> effect.

I enabled these two additional kernel config options and re-tested:
CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING=y
CONFIG_DM_DEBUG_BLOCK_STACK_TRACING=y

And then when activating the VG, the 'lvm' (vgchange) tool still seg
faults, but the kernel log and call trace is a bit different with
these options enabled:
...
[ 7134.908886] device-mapper: cache: Origin device (dm-30) discard
unsupported: Disabling discard passdown.
[ 7134.909088] device-mapper: array: array_block_check failed: blocknr
16818504669106333287 != wanted 10
[ 7134.909090] device-mapper: block manager: array validator check
failed for block 10
[ 7134.909090] device-mapper: array: get_ablock failed
[ 7134.909091] device-mapper: cache metadata: dm_array_cursor_next for
mapping failed
[ 7134.909102] ------------[ cut here ]------------
[ 7134.909102] kernel BUG at drivers/md/persistent-data/dm-block-manager.c:246!
[ 7134.909107] invalid opcode: 0000 [#1] SMP NOPTI
[ 7134.909133] CPU: 13 PID: 75361 Comm: vgchange Kdump: loaded
Tainted: P           OE     5.4.190-esos.prod #1
[ 7134.909175] Hardware name: Quantum H2012/H12SSW-NT, BIOS
T20201009143356 10/09/2020
[ 7134.909214] RIP: 0010:bl_up_read+0xf/0x43
[ 7134.909233] Code: 48 83 f8 04 75 f0 e8 69 fc ff ff 89 c0 48 c7 44
c2 18 00 00 00 00 e9 e1 fb ff ff 55 48 89 fd e8 ab ff 25 00 83 7d 04
00 7f 02 <0f> 0b 48 89 ef 65 48 8b 34 25 40 ad 01 00 e8 b1 ff ff ff 48
8b 45
[ 7134.909311] RSP: 0018:ffffc90009f1bb68 EFLAGS: 00010246
[ 7134.909335] RAX: 0000000000000000 RBX: ffff88878e2e6b40 RCX: 0000000000000007
[ 7134.909366] RDX: 0000000000000001 RSI: ffffc90009f1bb54 RDI: ffff888748f49858
[ 7134.909397] RBP: ffff888748f49858 R08: 0000000000000001 R09: 0000000000036d00
[ 7134.909428] R10: 0000000000000000 R11: 0000000000000068 R12: ffff888748f49858
[ 7134.909459] R13: ffff88878e2e6c78 R14: ffff88878e2e6b40 R15: 00000000000001fc
[ 7134.909491] FS:  00007fdfa39b72c0(0000) GS:ffff88882b140000(0000)
knlGS:0000000000000000
[ 7134.909526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7134.909552] CR2: 0000559be7639038 CR3: 00000007a5a72000 CR4: 0000000000340ee0
[ 7134.909583] Call Trace:
[ 7134.909597]  dm_bm_unlock+0x2f/0x3a
[ 7134.909642]  dm_array_cursor_end+0x1c/0x27
[ 7134.909690]  dm_cache_load_mappings+0x2d0/0x322
[ 7134.909739]  ? clear_discard+0x34/0x34
[ 7134.909784]  cache_preresume+0x108/0x1e3
[ 7134.909831]  dm_table_resume_targets+0x38/0xa9
[ 7134.909880]  __dm_resume+0x33/0x45
[ 7134.909923]  dm_resume+0x8c/0xb1
[ 7134.909967]  dev_suspend+0x173/0x1d8
[ 7134.910011]  ctl_ioctl+0x2a6/0x39d
[ 7134.910055]  ? retrieve_status+0x19e/0x19e
[ 7134.910103]  dm_ctl_ioctl+0x5/0x8
[ 7134.910148]  vfs_ioctl+0x19/0x26
[ 7134.910191]  do_vfs_ioctl+0x52a/0x554
[ 7134.910237]  ? ksys_semctl.constprop.0+0xfe/0x15e
[ 7134.910287]  ? rcu_is_watching+0x5/0x9
[ 7134.910333]  ksys_ioctl+0x37/0x56
[ 7134.910376]  __x64_sys_ioctl+0x11/0x14
[ 7134.910423]  do_syscall_64+0x52/0x60
[ 7134.910469]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7134.910520] RIP: 0033:0x7fdfa3a7b617
[ 7134.910564] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
01 48
[ 7134.910722] RSP: 002b:00007fffcfccb488 EFLAGS: 00000206 ORIG_RAX:
0000000000000010
[ 7134.910810] RAX: ffffffffffffffda RBX: 0000559be5dae391 RCX: 00007fdfa3a7b617
[ 7134.910868] RDX: 0000559be76589c0 RSI: 00000000c138fd06 RDI: 0000000000000003
[ 7134.910925] RBP: 0000559be764c830 R08: 0000559be5ea2a5b R09: 00007fffcfccb2e8
[ 7134.910983] R10: 0000559be5ea9b30 R11: 0000000000000206 R12: 0000559be76589f0
[ 7134.911041] R13: 0000559be76589c0 R14: 0000559be7658a70 R15: 0000000000000001
[ 7134.911100] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
[ 7134.911937] ---[ end trace bf6823c5ee51ae83 ]---
[ 7134.917732] RIP: 0010:bl_up_read+0xf/0x43
[ 7134.917831] Code: 48 83 f8 04 75 f0 e8 69 fc ff ff 89 c0 48 c7 44
c2 18 00 00 00 00 e9 e1 fb ff ff 55 48 89 fd e8 ab ff 25 00 83 7d 04
00 7f 02 <0f> 0b 48 89 ef 65 48 8b 34 25 40 ad 01 00 e8 b1 ff ff ff 48
8b 45
[ 7134.918042] RSP: 0018:ffffc90009f1bb68 EFLAGS: 00010246
[ 7134.918145] RAX: 0000000000000000 RBX: ffff88878e2e6b40 RCX: 0000000000000007
[ 7134.918255] RDX: 0000000000000001 RSI: ffffc90009f1bb54 RDI: ffff888748f49858
[ 7134.918365] RBP: ffff888748f49858 R08: 0000000000000001 R09: 0000000000036d00
[ 7134.918475] R10: 0000000000000000 R11: 0000000000000068 R12: ffff888748f49858
[ 7134.918585] R13: ffff88878e2e6c78 R14: ffff88878e2e6b40 R15: 00000000000001fc
[ 7134.918699] FS:  00007fdfa39b72c0(0000) GS:ffff88882b140000(0000)
knlGS:0000000000000000
[ 7134.918843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7134.918950] CR2: 0000559be7639038 CR3: 00000007a5a72000 CR4: 0000000000340ee0
...


--Marc


>
>
> Thanks for your time.
>
> --Marc

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dm-devel] kernel BUG at drivers/md/dm-bufio.c:1155!
  2022-05-04 17:50 ` Marc Smith
@ 2022-05-12 15:48   ` Marc Smith
  0 siblings, 0 replies; 3+ messages in thread
From: Marc Smith @ 2022-05-12 15:48 UTC (permalink / raw)
  To: dm-devel

On Wed, May 4, 2022 at 1:50 PM Marc Smith <msmith626@gmail.com> wrote:
>
> On Mon, May 2, 2022 at 2:03 PM Marc Smith <msmith626@gmail.com> wrote:
> >
> > Hi,
> >
> > I'm using Linux 5.4.190 (kernel.org) with 'lvmcache' (LVM2 version
> > 2.03.14). I use dm-cache (in write-back mode) and dm-writecache, both
> > via lvmcache for several logical volumes. I was recently performing
> > power-loss tests, and noticed one of the dm-cache (in write-back mode)
> > LV's is encountering a problem when I attempt to activate that volume
> > group:
> > [root@fusion-dvt3a ~]# vgchange -ay 10072a_1_default
> > Segmentation fault
> > [root@fusion-dvt3a ~]#
> >
> > Then in the kernel log I see this:
> > ...
> > [ 4528.074501] device-mapper: cache: Origin device (dm-30) discard
> > unsupported: Disabling discard passdown.
> > [ 4528.074716] device-mapper: array: array_block_check failed: blocknr
> > 16818504669106333287 != wanted 10
> > [ 4528.074717] device-mapper: block manager: array validator check
> > failed for block 10
> > [ 4528.074717] device-mapper: array: get_ablock failed
> > [ 4528.074718] device-mapper: cache metadata: dm_array_cursor_next for
> > mapping failed
> > [ 4528.074727] ------------[ cut here ]------------
> > [ 4528.074728] kernel BUG at drivers/md/dm-bufio.c:1155!
> > [ 4528.074732] invalid opcode: 0000 [#1] SMP NOPTI
> > [ 4528.074757] CPU: 9 PID: 53811 Comm: vgchange Kdump: loaded Tainted:
> > P           OE     5.4.190-esos.prod #1
> > [ 4528.074797] Hardware name: Quantum H2012/H12SSW-NT, BIOS
> > T20201009143356 10/09/2020
> > [ 4528.074833] RIP: 0010:dm_bufio_release+0x19/0x75
> > [ 4528.074855] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
> > ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
> > c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
> > 00 49
> > [ 4528.074928] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
> > [ 4528.074952] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
> > [ 4528.074982] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
> > [ 4528.075012] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
> > [ 4528.075042] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
> > [ 4528.075073] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
> > [ 4528.075103] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
> > knlGS:0000000000000000
> > [ 4528.075137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 4528.075161] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
> > [ 4528.075191] Call Trace:
> > [ 4528.075207]  dm_array_cursor_end+0x1c/0x27
> > [ 4528.075255]  dm_cache_load_mappings+0x2d0/0x322
> > [ 4528.075304]  ? clear_discard+0x34/0x34
> > [ 4528.075348]  cache_preresume+0x108/0x1e3
> > [ 4528.075395]  dm_table_resume_targets+0x38/0xa9
> > [ 4528.075443]  __dm_resume+0x33/0x45
> > [ 4528.075486]  dm_resume+0x8c/0xb1
> > [ 4528.075529]  dev_suspend+0x173/0x1d8
> > [ 4528.075573]  ctl_ioctl+0x2a6/0x39d
> > [ 4528.075617]  ? retrieve_status+0x19e/0x19e
> > [ 4528.075663]  dm_ctl_ioctl+0x5/0x8
> > [ 4528.075708]  vfs_ioctl+0x19/0x26
> > [ 4528.075751]  do_vfs_ioctl+0x52a/0x554
> > [ 4528.075797]  ? ksys_semctl.constprop.0+0xfe/0x15e
> > [ 4528.075847]  ? rcu_is_watching+0x5/0x9
> > [ 4528.075891]  ksys_ioctl+0x37/0x56
> > [ 4528.075935]  __x64_sys_ioctl+0x11/0x14
> > [ 4528.075980]  do_syscall_64+0x52/0x60
> > [ 4528.076027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 4528.076077] RIP: 0033:0x7f48fac5f617
> > [ 4528.076121] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
> > 0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
> > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
> > 01 48
> > [ 4528.076276] RSP: 002b:00007ffe25fa3298 EFLAGS: 00000206 ORIG_RAX:
> > 0000000000000010
> > [ 4528.076362] RAX: ffffffffffffffda RBX: 0000556c551c3391 RCX: 00007f48fac5f617
> > [ 4528.076419] RDX: 0000556c564ff2c0 RSI: 00000000c138fd06 RDI: 0000000000000003
> > [ 4528.076475] RBP: 0000556c564f3d30 R08: 0000556c552b7a5b R09: 00007ffe25fa30f8
> > [ 4528.076532] R10: 0000556c552beb30 R11: 0000000000000206 R12: 0000556c564ff2f0
> > [ 4528.076589] R13: 0000556c564ff2c0 R14: 0000556c564ff370 R15: 0000000000000001
> > [ 4528.076646] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
> > bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
> > nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
> > cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
> > ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
> > ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
> > ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
> > bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
> > [ 4528.077476] ---[ end trace 97c700627c684afd ]---
> > [ 4528.083087] RIP: 0010:dm_bufio_release+0x19/0x75
> > [ 4528.083189] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
> > ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
> > c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
> > 00 49
> > [ 4528.083397] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
> > [ 4528.083499] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
> > [ 4528.083608] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
> > [ 4528.083717] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
> > [ 4528.083827] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
> > [ 4528.083936] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
> > [ 4528.084051] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
> > knlGS:0000000000000000
> > [ 4528.084194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 4528.084298] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
> > ...
> >
> > The "blocknr" value of "16818504669106333287" caught my eye, as that
> > number seems way out of bounds for the origin/cache devices in this
> > setup (the cache LV size in this case is 256 MiB, and the origin
> > device is less than 500 GiB).
> >
> > At first I suspected cache device corruption, so I used the PV data
> > offset + PE segment sizes and offset to extract an image of the cache
> > LV into a separate file so I can examine it with the cache_* user land
> > tools, and this is what I see:
> > [root@fusion-dvt3a ~]# cache_dump /mnt/data/md122_C02_cvol.img
> > <superblock uuid="" block_size="128" nr_cache_blocks="3968"
> > policy="mq" hint_width="4">
> >   <mappings>
> >     <mapping cache_block="0" origin_block="7325368" dirty="false"/>
> >     <mapping cache_block="1" origin_block="7325369" dirty="false"/>
> >     <mapping cache_block="2" origin_block="7325370" dirty="false"/>
> >     <mapping cache_block="3" origin_block="7325371" dirty="false"/>
> >     <mapping cache_block="4" origin_block="7325372" dirty="false"/>
> >     <mapping cache_block="5" origin_block="7325373" dirty="false"/>
> >     <mapping cache_block="6" origin_block="7325374" dirty="false"/>
> >     <mapping cache_block="7" origin_block="7325375" dirty="false"/>
> >     <mapping cache_block="8" origin_block="0" dirty="false"/>
> >     <mapping cache_block="9" origin_block="7325351" dirty="false"/>
> >     <mapping cache_block="10" origin_block="1" dirty="false"/>
> >     <mapping cache_block="11" origin_block="2" dirty="false"/>
> >     <mapping cache_block="12" origin_block="4" dirty="false"/>
> >     <mapping cache_block="13" origin_block="8" dirty="false"/>
> >     <mapping cache_block="14" origin_block="16" dirty="false"/>
> >     <mapping cache_block="15" origin_block="32" dirty="false"/>
> >     <mapping cache_block="16" origin_block="64" dirty="false"/>
> >     <mapping cache_block="17" origin_block="3" dirty="false"/>
> >     <mapping cache_block="18" origin_block="5" dirty="false"/>
> >     <mapping cache_block="19" origin_block="6" dirty="false"/>
> >     <mapping cache_block="20" origin_block="7" dirty="false"/>
> >     <mapping cache_block="21" origin_block="9" dirty="false"/>
> >     <mapping cache_block="22" origin_block="10" dirty="false"/>
> >     <mapping cache_block="23" origin_block="11" dirty="false"/>
> >     <mapping cache_block="24" origin_block="12" dirty="false"/>
> >     <mapping cache_block="25" origin_block="13" dirty="false"/>
> >     <mapping cache_block="26" origin_block="14" dirty="false"/>
> >     <mapping cache_block="27" origin_block="15" dirty="false"/>
> >     <mapping cache_block="28" origin_block="7322946" dirty="false"/>
> >     <mapping cache_block="29" origin_block="7324340" dirty="false"/>
> >     <mapping cache_block="30" origin_block="7324341" dirty="false"/>
> >     <mapping cache_block="31" origin_block="7324342" dirty="false"/>
> >     <mapping cache_block="32" origin_block="7324343" dirty="false"/>
> >     <mapping cache_block="33" origin_block="7324344" dirty="false"/>
> >     <mapping cache_block="34" origin_block="7324345" dirty="false"/>
> >     <mapping cache_block="35" origin_block="7324346" dirty="false"/>
> >     <mapping cache_block="36" origin_block="7324347" dirty="false"/>
> >   </mappings>
> >   <hints>
> >     <hint cache_block="0" data="PwAAAA=="/>
> >     <hint cache_block="1" data="PwAAAA=="/>
> >     <hint cache_block="2" data="PwAAAA=="/>
> >     <hint cache_block="3" data="PwAAAA=="/>
> >     <hint cache_block="4" data="PwAAAA=="/>
> >     <hint cache_block="5" data="PwAAAA=="/>
> >     <hint cache_block="6" data="PwAAAA=="/>
> >     <hint cache_block="7" data="PwAAAA=="/>
> >     <hint cache_block="8" data="PwAAAA=="/>
> >     <hint cache_block="9" data="PwAAAA=="/>
> >     <hint cache_block="10" data="PwAAAA=="/>
> >     <hint cache_block="11" data="PwAAAA=="/>
> >     <hint cache_block="12" data="PwAAAA=="/>
> >     <hint cache_block="13" data="PwAAAA=="/>
> >     <hint cache_block="14" data="PwAAAA=="/>
> >     <hint cache_block="15" data="PwAAAA=="/>
> >     <hint cache_block="16" data="PwAAAA=="/>
> >     <hint cache_block="17" data="PwAAAA=="/>
> >     <hint cache_block="18" data="PwAAAA=="/>
> >     <hint cache_block="19" data="PwAAAA=="/>
> >     <hint cache_block="20" data="PwAAAA=="/>
> >     <hint cache_block="21" data="AAAAAA=="/>
> >     <hint cache_block="22" data="AAAAAA=="/>
> >     <hint cache_block="23" data="AAAAAA=="/>
> >     <hint cache_block="24" data="AAAAAA=="/>
> >     <hint cache_block="25" data="AAAAAA=="/>
> >     <hint cache_block="26" data="AAAAAA=="/>
> >     <hint cache_block="27" data="AAAAAA=="/>
> >     <hint cache_block="28" data="AAAAAA=="/>
> >     <hint cache_block="29" data="AAAAAA=="/>
> >     <hint cache_block="30" data="AAAAAA=="/>
> >     <hint cache_block="31" data="AAAAAA=="/>
> >     <hint cache_block="32" data="AAAAAA=="/>
> >     <hint cache_block="33" data="AAAAAA=="/>
> >     <hint cache_block="34" data="AAAAAA=="/>
> >     <hint cache_block="35" data="AAAAAA=="/>
> >     <hint cache_block="36" data="AAAAAA=="/>
> >   </hints>
> > </superblock>
> > [root@fusion-dvt3a ~]# cache_check /mnt/data/md122_C02_cvol.img
> > examining superblock
> > examining mapping array
> > examining hint array
> > examining discard bitset
> > [root@fusion-dvt3a ~]# echo $?
> > 0
> > [root@fusion-dvt3a ~]#
> >
> > Nothing seems out of the ordinary to me, and 'cache_check' doesn't
> > report any errors/problems with the metadata.
> >
> > So I then used a "debug" Linux Kernel (same 5.4.190 code) that
> > contains additional debugging features (full preemption, lock
> > debugging, etc.). It also has the
> > CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING feature enabled, which I suspect
> > might be related to the persistent-data stuff used by dm-cache. And
> > when I boot this kernel that has the additional debug features, I do
> > NOT hit the BUG_ON() in dm_bufio_release() and everything works
> > properly -- the VG/LV's activate just fine and are usable, no kernel
> > errors / call traces logged.
> >
> > I switched back and fourth between these two kernels several times,
> > just to make sure it wasn't a fluke. And indeed, I am unable to hit
> > this problem with the "debug" kernel.
> >
> > Based on the experiment between the two kernel configs and the user
> > land tools not reporting an error, it sounds like my cache metadata is
> > sound, but hitting some other issue that I don't understand yet.
> >
> > Wondering if anyone has any tips / guidance on additional debugging I
> > can do? I plan to circle back and take my original kernel config and
> > enable CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING to see if that has any
> > effect.
>
> I enabled these two additional kernel config options and re-tested:
> CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING=y
> CONFIG_DM_DEBUG_BLOCK_STACK_TRACING=y
>
> And then when activating the VG, the 'lvm' (vgchange) tool still seg
> faults, but the kernel log and call trace is a bit different with
> these options enabled:
> ...
> [ 7134.908886] device-mapper: cache: Origin device (dm-30) discard
> unsupported: Disabling discard passdown.
> [ 7134.909088] device-mapper: array: array_block_check failed: blocknr
> 16818504669106333287 != wanted 10
> [ 7134.909090] device-mapper: block manager: array validator check
> failed for block 10
> [ 7134.909090] device-mapper: array: get_ablock failed
> [ 7134.909091] device-mapper: cache metadata: dm_array_cursor_next for
> mapping failed
> [ 7134.909102] ------------[ cut here ]------------
> [ 7134.909102] kernel BUG at drivers/md/persistent-data/dm-block-manager.c:246!
> [ 7134.909107] invalid opcode: 0000 [#1] SMP NOPTI
> [ 7134.909133] CPU: 13 PID: 75361 Comm: vgchange Kdump: loaded
> Tainted: P           OE     5.4.190-esos.prod #1
> [ 7134.909175] Hardware name: Quantum H2012/H12SSW-NT, BIOS
> T20201009143356 10/09/2020
> [ 7134.909214] RIP: 0010:bl_up_read+0xf/0x43
> [ 7134.909233] Code: 48 83 f8 04 75 f0 e8 69 fc ff ff 89 c0 48 c7 44
> c2 18 00 00 00 00 e9 e1 fb ff ff 55 48 89 fd e8 ab ff 25 00 83 7d 04
> 00 7f 02 <0f> 0b 48 89 ef 65 48 8b 34 25 40 ad 01 00 e8 b1 ff ff ff 48
> 8b 45
> [ 7134.909311] RSP: 0018:ffffc90009f1bb68 EFLAGS: 00010246
> [ 7134.909335] RAX: 0000000000000000 RBX: ffff88878e2e6b40 RCX: 0000000000000007
> [ 7134.909366] RDX: 0000000000000001 RSI: ffffc90009f1bb54 RDI: ffff888748f49858
> [ 7134.909397] RBP: ffff888748f49858 R08: 0000000000000001 R09: 0000000000036d00
> [ 7134.909428] R10: 0000000000000000 R11: 0000000000000068 R12: ffff888748f49858
> [ 7134.909459] R13: ffff88878e2e6c78 R14: ffff88878e2e6b40 R15: 00000000000001fc
> [ 7134.909491] FS:  00007fdfa39b72c0(0000) GS:ffff88882b140000(0000)
> knlGS:0000000000000000
> [ 7134.909526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7134.909552] CR2: 0000559be7639038 CR3: 00000007a5a72000 CR4: 0000000000340ee0
> [ 7134.909583] Call Trace:
> [ 7134.909597]  dm_bm_unlock+0x2f/0x3a
> [ 7134.909642]  dm_array_cursor_end+0x1c/0x27
> [ 7134.909690]  dm_cache_load_mappings+0x2d0/0x322
> [ 7134.909739]  ? clear_discard+0x34/0x34
> [ 7134.909784]  cache_preresume+0x108/0x1e3
> [ 7134.909831]  dm_table_resume_targets+0x38/0xa9
> [ 7134.909880]  __dm_resume+0x33/0x45
> [ 7134.909923]  dm_resume+0x8c/0xb1
> [ 7134.909967]  dev_suspend+0x173/0x1d8
> [ 7134.910011]  ctl_ioctl+0x2a6/0x39d
> [ 7134.910055]  ? retrieve_status+0x19e/0x19e
> [ 7134.910103]  dm_ctl_ioctl+0x5/0x8
> [ 7134.910148]  vfs_ioctl+0x19/0x26
> [ 7134.910191]  do_vfs_ioctl+0x52a/0x554
> [ 7134.910237]  ? ksys_semctl.constprop.0+0xfe/0x15e
> [ 7134.910287]  ? rcu_is_watching+0x5/0x9
> [ 7134.910333]  ksys_ioctl+0x37/0x56
> [ 7134.910376]  __x64_sys_ioctl+0x11/0x14
> [ 7134.910423]  do_syscall_64+0x52/0x60
> [ 7134.910469]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 7134.910520] RIP: 0033:0x7fdfa3a7b617
> [ 7134.910564] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
> 0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
> 01 48
> [ 7134.910722] RSP: 002b:00007fffcfccb488 EFLAGS: 00000206 ORIG_RAX:
> 0000000000000010
> [ 7134.910810] RAX: ffffffffffffffda RBX: 0000559be5dae391 RCX: 00007fdfa3a7b617
> [ 7134.910868] RDX: 0000559be76589c0 RSI: 00000000c138fd06 RDI: 0000000000000003
> [ 7134.910925] RBP: 0000559be764c830 R08: 0000559be5ea2a5b R09: 00007fffcfccb2e8
> [ 7134.910983] R10: 0000559be5ea9b30 R11: 0000000000000206 R12: 0000559be76589f0
> [ 7134.911041] R13: 0000559be76589c0 R14: 0000559be7658a70 R15: 0000000000000001
> [ 7134.911100] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
> bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
> nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
> cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
> ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
> ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
> ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
> bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
> [ 7134.911937] ---[ end trace bf6823c5ee51ae83 ]---
> [ 7134.917732] RIP: 0010:bl_up_read+0xf/0x43
> [ 7134.917831] Code: 48 83 f8 04 75 f0 e8 69 fc ff ff 89 c0 48 c7 44
> c2 18 00 00 00 00 e9 e1 fb ff ff 55 48 89 fd e8 ab ff 25 00 83 7d 04
> 00 7f 02 <0f> 0b 48 89 ef 65 48 8b 34 25 40 ad 01 00 e8 b1 ff ff ff 48
> 8b 45
> [ 7134.918042] RSP: 0018:ffffc90009f1bb68 EFLAGS: 00010246
> [ 7134.918145] RAX: 0000000000000000 RBX: ffff88878e2e6b40 RCX: 0000000000000007
> [ 7134.918255] RDX: 0000000000000001 RSI: ffffc90009f1bb54 RDI: ffff888748f49858
> [ 7134.918365] RBP: ffff888748f49858 R08: 0000000000000001 R09: 0000000000036d00
> [ 7134.918475] R10: 0000000000000000 R11: 0000000000000068 R12: ffff888748f49858
> [ 7134.918585] R13: ffff88878e2e6c78 R14: ffff88878e2e6b40 R15: 00000000000001fc
> [ 7134.918699] FS:  00007fdfa39b72c0(0000) GS:ffff88882b140000(0000)
> knlGS:0000000000000000
> [ 7134.918843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7134.918950] CR2: 0000559be7639038 CR3: 00000007a5a72000 CR4: 0000000000340ee0
> ...
>
>

Okay, I played a bit with print_hex_dump() to examine the
dm_buffer->data regions while this occurs, and I can see quite a few
blocks read do not contain the correct value (block "10" in my case
above is just the first that fails the array_block_check() validator).
I believe hitting the BUG_ON() statements is simply a consequence of
the validator check failing, but I still need to sort that out.

At least for this scenario, even when examining the buffer contents in
bio_complete() the data is wrong (verified incorrect by reading real
data from the underlying block device using 'dd' and 'od' to compare).
The incorrect data doesn't seem to match any other pattern on the
block device so I assume it's just junk.

And I suspect using my debug kernel config makes it work simply due to
timing and nothing else. I have many more machines using dm-cache, and
this is the only one where I've encountered this issue. And oddly
enough, I can reproduce it _every_ time on this box, so for a possible
timing/race condition, that feels a bit off.

I've found that the corrupted dm_buffer->data values occur when using
dm_bufio_prefetch() (invoked via dm_bm_prefetch() in
dm-block-manager.c). If I comment out the dm_bufio_prefetch() call in
dm_bm_prefetch() (no pre-fetching of dm-cache metadata) then I am
unable to reproduce the problem!

The root cause is still not clear to me, and unfortunately I'm losing
access to the hardware where this issue occurs very soon. I looked
through DM-related commits in linux-next and nothing really sounds
applicable (at least to me).


Thanks,

Marc

> --Marc
>
>
> >
> >
> > Thanks for your time.
> >
> > --Marc

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-05-12 15:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02 18:03 [dm-devel] kernel BUG at drivers/md/dm-bufio.c:1155! Marc Smith
2022-05-04 17:50 ` Marc Smith
2022-05-12 15:48   ` Marc Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.