blktests failures with v5.19-rc1

* blktests failures with v5.19-rc1
@ 2022-06-09 23:53 Shinichiro Kawasaki
  2022-06-10  8:07 ` Christoph Hellwig
  2022-06-10  9:22 ` Chaitanya Kulkarni
  0 siblings, 2 replies; 23+ messages in thread
From: Shinichiro Kawasaki @ 2022-06-09 23:53 UTC (permalink / raw)
  To: linux-block, linux-nvme, linux-scsi

Hi all,

I ran the latest blktests (git hash: c17a19d) with v5.19-rc1 kernel and observed
6 test cases fail. I call for helps to fix them. Please refer the list of the
failure cases and descriptions below.

- I don't know details of these failures. Not sure if they are kernel bugs or
  not. Some of them might be test script bugs or test system set up issues.
- All of the test cases failed also with v5.18 kernel, except scsi/007. The
  scsi/007 failure could be a new issue introduced with v5.19-rc1.
- I used Fedora 36 on QEMU for test runs. Related drivers were built as modules.

If further information is required for fixes, please let me know.

List of failure cases
=====================
#1: block/002
#2: scsi/007
#3: nvme/002
#4: nvme/016
#5: nvme/017
#6: nvme/032

Failure descriptiion
====================
#1: block/002: Failed 3 times with 3 trials.
    Known issue. It is suggested to remove this test case [1].
    [1] https://lore.kernel.org/linux-block/20220530134548.3108185-3-hch@lst.de/

    runtime    ...  1.250s
    --- tests/block/002.out     2022-06-02 10:18:53.526739780 +0900
    +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/block/002.out.bad   2022-06-09 09:44:02.779122029 +0900
    @@ -1,2 +1,3 @@
     Running block/002
    +debugfs directory leaked
     Test complete

#2: scsi/007: Failed 3 times with 3 trials.

    runtime  29.867s  ...  40.688s
    --- tests/scsi/007.out      2022-06-02 10:18:53.550739780 +0900
    +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/scsi/007.out.bad    2022-06-09 09:18:38.644237681 +0900
    @@ -1,3 +1,3 @@
     Running scsi/007
    -Reading from scsi_debug failed
    +Reading from scsi_debug succeeded
     Test complete

#3: nvme/002: Failed 3 times with 3 trials.

    runtime    ...  85.697s
    --- tests/nvme/002.out      2022-06-02 10:18:53.538739780 +0900
    +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/nvme/002.out.bad    2022-06-09 09:51:45.714027241 +0900
    @@ -1,3003 +1,3006 @@
     Running nvme/002
    -Discovery Log Number of Records 1000, Generation counter X
    +Discovery Log Number of Records 1001, Generation counter X
     =====Discovery Log Entry 0======
     trtype:  loop
    -subnqn:  blktests-subsystem-0
    +subnqn:  nqn.2014-08.org.nvmexpress.discovery
    ...

#4: nvme/016: Failed 3 times with 3 trials.

    runtime  36.240s  ...  35.984s
    --- tests/nvme/016.out      2022-06-02 10:18:53.541739780 +0900
    +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/nvme/016.out.bad    2022-06-09 10:06:11.797004672 +0900
    @@ -1,6 +1,9 @@
     Running nvme/016
    -Discovery Log Number of Records 1, Generation counter X
    +Discovery Log Number of Records 2, Generation counter X
     =====Discovery Log Entry 0======
     trtype:  loop
    +subnqn:  nqn.2014-08.org.nvmexpress.discovery
    +=====Discovery Log Entry 1======
    ...

#5: nvme/017: Failed 3 times with 3 trials.

    runtime  43.724s  ...  42.912s
    --- tests/nvme/017.out      2022-06-02 10:18:53.541739780 +0900
    +++ /home/shin/kts/kernel-test-suite/src/blktests/results/nodev/nvme/017.out.bad    2022-06-09 10:09:18.016394996 +0900
    @@ -1,6 +1,9 @@
     Running nvme/017
    -Discovery Log Number of Records 1, Generation counter X
    +Discovery Log Number of Records 2, Generation counter X
     =====Discovery Log Entry 0======
     trtype:  loop
    +subnqn:  nqn.2014-08.org.nvmexpress.discovery
    +=====Discovery Log Entry 1======
    ...

#6: nvme/032: Failed at the first run after system reboot.
              Used QEMU NVME device as TEST_DEV.

    runtime    ...  8.458s
    something found in dmesg:
    [ 1589.976481] run blktests nvme/032 at 2022-06-09 10:57:20

    [ 1597.221547] ======================================================
    [ 1597.221549] WARNING: possible circular locking dependency detected
    [ 1597.221551] 5.19.0-rc1 #1 Not tainted
    [ 1597.221554] ------------------------------------------------------
    [ 1597.221554] check/970 is trying to acquire lock:
    [ 1597.221556] ffff8881026f8cb8 (kn->active#227){++++}-{0:0}, at: kernfs_remove_by_name_ns+0x90/0xe0
    [ 1597.221580]
                   but task is already holding lock:
    ...

dmesg output:
[ 1589.976481] run blktests nvme/032 at 2022-06-09 10:57:20

[ 1597.221547] ======================================================
[ 1597.221549] WARNING: possible circular locking dependency detected
[ 1597.221551] 5.19.0-rc1 #1 Not tainted
[ 1597.221554] ------------------------------------------------------
[ 1597.221554] check/970 is trying to acquire lock:
[ 1597.221556] ffff8881026f8cb8 (kn->active#227){++++}-{0:0}, at: kernfs_remove_by_name_ns+0x90/0xe0
[ 1597.221580] 
               but task is already holding lock:
[ 1597.221580] ffffffff889c0668 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[ 1597.221590] 
               which lock already depends on the new lock.

[ 1597.221591] 
               the existing dependency chain (in reverse order) is:
[ 1597.221592] 
               -> #1 (pci_rescan_remove_lock){+.+.}-{3:3}:
[ 1597.221598]        __mutex_lock+0x14c/0x12b0
[ 1597.221604]        dev_rescan_store+0x94/0xd0
[ 1597.221607]        kernfs_fop_write_iter+0x34f/0x520
[ 1597.221610]        new_sync_write+0x2ca/0x500
[ 1597.221614]        vfs_write+0x62d/0x980
[ 1597.221616]        ksys_write+0xe7/0x1b0
[ 1597.221619]        do_syscall_64+0x37/0x80
[ 1597.221622]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1597.221625] 
               -> #0 (kn->active#227){++++}-{0:0}:
[ 1597.221629]        __lock_acquire+0x2875/0x5510
[ 1597.221634]        lock_acquire+0x194/0x4f0
[ 1597.221636]        __kernfs_remove+0x6f3/0x910
[ 1597.221638]        kernfs_remove_by_name_ns+0x90/0xe0
[ 1597.221641]        remove_files+0x8c/0x1a0
[ 1597.221644]        sysfs_remove_group+0x77/0x150
[ 1597.221646]        sysfs_remove_groups+0x4f/0x90
[ 1597.221649]        device_remove_attrs+0x19e/0x240
[ 1597.221652]        device_del+0x492/0xb60
[ 1597.221654]        pci_remove_bus_device+0x12c/0x350
[ 1597.221656]        pci_stop_and_remove_bus_device_locked+0x1e/0x30
[ 1597.221659]        remove_store+0xac/0xc0
[ 1597.221662]        kernfs_fop_write_iter+0x34f/0x520
[ 1597.221664]        new_sync_write+0x2ca/0x500
[ 1597.221666]        vfs_write+0x62d/0x980
[ 1597.221669]        ksys_write+0xe7/0x1b0
[ 1597.221671]        do_syscall_64+0x37/0x80
[ 1597.221673]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1597.221675] 
               other info that might help us debug this:

[ 1597.221676]  Possible unsafe locking scenario:

[ 1597.221677]        CPU0                    CPU1
[ 1597.221678]        ----                    ----
[ 1597.221678]   lock(pci_rescan_remove_lock);
[ 1597.221680]                                lock(kn->active#227);
[ 1597.221683]                                lock(pci_rescan_remove_lock);
[ 1597.221685]   lock(kn->active#227);
[ 1597.221687] 
                *** DEADLOCK ***

[ 1597.221688] 3 locks held by check/970:
[ 1597.221689]  #0: ffff888110106460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe7/0x1b0
[ 1597.221697]  #1: ffff888120f69088 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x520
[ 1597.221703]  #2: ffffffff889c0668 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0xe/0x30
[ 1597.221709] 
               stack backtrace:
[ 1597.221714] CPU: 1 PID: 970 Comm: check Not tainted 5.19.0-rc1 #1
[ 1597.221717] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 1597.221723] Call Trace:
[ 1597.221728]  <TASK>
[ 1597.221732]  dump_stack_lvl+0x5b/0x74
[ 1597.221742]  check_noncircular+0x23f/0x2e0
[ 1597.221745]  ? lock_chain_count+0x20/0x20
[ 1597.221748]  ? print_circular_bug+0x1e0/0x1e0
[ 1597.221751]  ? mark_lock+0xee/0x1610
[ 1597.221754]  ? mark_lock+0xee/0x1610
[ 1597.221759]  ? lockdep_lock+0xb8/0x1a0
[ 1597.221762]  ? call_rcu_zapped+0xb0/0xb0
[ 1597.221766]  __lock_acquire+0x2875/0x5510
[ 1597.221773]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 1597.221779]  lock_acquire+0x194/0x4f0
[ 1597.221782]  ? kernfs_remove_by_name_ns+0x90/0xe0
[ 1597.221786]  ? lock_downgrade+0x6b0/0x6b0
[ 1597.221791]  ? up_write+0x14d/0x460
[ 1597.221795]  __kernfs_remove+0x6f3/0x910
[ 1597.221798]  ? kernfs_remove_by_name_ns+0x90/0xe0
[ 1597.221803]  ? kernfs_next_descendant_post+0x280/0x280
[ 1597.221807]  ? lock_is_held_type+0xe3/0x140
[ 1597.221811]  ? kernfs_name_hash+0x16/0xc0
[ 1597.221815]  ? kernfs_find_ns+0x1e3/0x330
[ 1597.221819]  kernfs_remove_by_name_ns+0x90/0xe0
[ 1597.221822]  remove_files+0x8c/0x1a0
[ 1597.221826]  sysfs_remove_group+0x77/0x150
[ 1597.221831]  sysfs_remove_groups+0x4f/0x90
[ 1597.221835]  device_remove_attrs+0x19e/0x240
[ 1597.221838]  ? device_remove_file+0x20/0x20
[ 1597.221842]  device_del+0x492/0xb60
[ 1597.221846]  ? __device_link_del+0x350/0x350
[ 1597.221848]  ? kfree+0xc5/0x340
[ 1597.221856]  pci_remove_bus_device+0x12c/0x350
[ 1597.221860]  pci_stop_and_remove_bus_device_locked+0x1e/0x30
[ 1597.221863]  remove_store+0xac/0xc0
[ 1597.221867]  ? subordinate_bus_number_show+0xa0/0xa0
[ 1597.221870]  ? sysfs_kf_write+0x3d/0x170
[ 1597.221874]  kernfs_fop_write_iter+0x34f/0x520
[ 1597.221881]  new_sync_write+0x2ca/0x500
[ 1597.221885]  ? new_sync_read+0x500/0x500
[ 1597.221888]  ? perf_callchain_user+0x7c0/0xaa0
[ 1597.221893]  ? lock_downgrade+0x6b0/0x6b0
[ 1597.221896]  ? inode_security+0x54/0xf0
[ 1597.221903]  ? lock_is_held_type+0xe3/0x140
[ 1597.221909]  vfs_write+0x62d/0x980
[ 1597.221913]  ksys_write+0xe7/0x1b0
[ 1597.221916]  ? __ia32_sys_read+0xa0/0xa0
[ 1597.221919]  ? syscall_enter_from_user_mode+0x20/0x70
[ 1597.221925]  do_syscall_64+0x37/0x80
[ 1597.221928]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1597.221931] RIP: 0033:0x7f5608901c17
[ 1597.221939] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 1597.221942] RSP: 002b:00007fff89480c08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1597.221945] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5608901c17
[ 1597.221948] RDX: 0000000000000002 RSI: 000055572288f080 RDI: 0000000000000001
[ 1597.221949] RBP: 000055572288f080 R08: 0000000000000000 R09: 0000000000000073
[ 1597.221951] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[ 1597.221953] R13: 00007f56089f8780 R14: 0000000000000002 R15: 00007f56089f39e0
[ 1597.221960]  </TASK>
[ 1597.227490] pci 0000:00:09.0: [1b36:0010] type 00 class 0x010802
[ 1597.230489] pci 0000:00:09.0: reg 0x10: [mem 0xfebc4000-0xfebc7fff 64bit]
[ 1597.278136] pci 0000:00:09.0: BAR 0: assigned [mem 0x640000000-0x640003fff 64bit]
[ 1597.283549] nvme nvme5: pci function 0000:00:09.0
[ 1598.372141] nvme nvme5: 10/0/0 default/read/poll queues
[ 1598.375349] nvme nvme5: Ignoring bogus Namespace Identifiers
[ 1618.928153] run blktests nvme/032 at 2022-06-09 10:57:49
[ 1625.557584] pci 0000:00:09.0: [1b36:0010] type 00 class 0x010802
[ 1625.558348] pci 0000:00:09.0: reg 0x10: [mem 0x640000000-0x640003fff 64bit]
[ 1625.577856] pci 0000:00:09.0: BAR 0: assigned [mem 0x640000000-0x640003fff 64bit]
[ 1625.581168] nvme nvme5: pci function 0000:00:09.0
[ 1626.695897] nvme nvme5: 10/0/0 default/read/poll queues
[ 1626.701014] nvme nvme5: Ignoring bogus Namespace Identifiers

-- 
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 23+ messages in thread