linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Bernard Metzler <BMT@zurich.ibm.com>,
	linux-rdma <linux-rdma@vger.kernel.org>
Cc: Yi Zhang <yi.zhang@redhat.com>,
	Robert Pearson <rpearsonhpe@gmail.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: Still issues with blktest/srp on 5.15-rc1 and software rdma providers
Date: Tue, 21 Sep 2021 13:16:28 -0700	[thread overview]
Message-ID: <559a2433-da0d-2800-dd31-4d44e8fb558e@acm.org> (raw)
In-Reply-To: <OFE1CA20E9.CCEF92D5-ON00258757.006B589C-00258757.006E9A27@ibm.com>

On 9/21/21 1:08 PM, Bernard Metzler wrote:
> I further investigated srp blktest with software rdma
> drivers and I am still running into issues. These seem
> not to be specific to using rxe or siw driver, but happen
> with both occasionally. Can we run tests using hardware
> rdma drivers with that blktest tool as well?
> 
> 
> First I see some WARNINGs which relate to resources not
> created or unable to get destroyed (maybe since not created
> before):
> 
> ...
> 
> [ 1437.197989] sd 11:0:0:1: [sde] Attached SCSI disk
> [ 1437.845266] ------------[ cut here ]------------
> [ 1437.845269] WARNING: CPU: 3 PID: 26257 at block/genhd.c:537 device_add_disk+0x1cb/0x3b0
> ...
> [ 1437.845360] Call Trace:
> [ 1437.845363]  dm_setup_md_queue+0xc8/0x100
> [ 1437.845368]  table_load+0x1be/0x2d0
> [ 1437.845371]  ctl_ioctl+0x1d6/0x4c0
> [ 1437.845373]  ? retrieve_status+0x1d0/0x1d0
> [ 1437.845377]  dm_ctl_ioctl+0xe/0x20
> [ 1437.845379]  __x64_sys_ioctl+0x118/0x910
> [ 1437.845384]  ? switch_fpu_return+0x56/0xc0
> [ 1437.845388]  do_syscall_64+0x3a/0x80
> [ 1437.845391]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1437.845395] RIP: 0033:0x7f81419dbb97
> [ 1437.845398] Code: 00 00 90 48 8b 05 09 73 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 72 2c 00 f7 d8 64 89 01 48
> [ 1437.845400] RSP: 002b:00007f814363b508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> [ 1437.845402] RAX: ffffffffffffffda RBX: 00007f81423b8d60 RCX: 00007f81419dbb97
> [ 1437.845403] RDX: 00007f812c026c30 RSI: 00000000c138fd09 RDI: 0000000000000009
> [ 1437.845403] RBP: 00007f81423f38b3 R08: 00007f8143639260 R09: 00007f81426018f8
> [ 1437.845404] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f812c026c30
> [ 1437.845405] R13: 0000000000000000 R14: 00007f812c026ce0 R15: 00007f812c00adc0
> [ 1437.845407] ---[ end trace c416dea93915334e ]---
> 
> 
> 
> 
> 
> ...
> 
> [ 1437.845411] kobject_add_internal failed for dm (error: -2 parent: dm-2)
> [ 1437.845451] ------------[ cut here ]------------
> [ 1437.845451] WARNING: CPU: 3 PID: 26257 at block/genhd.c:564 del_gendisk+0x1a4/0x1e0
> ...
> [ 1437.845516] Call Trace:
> [ 1437.845517]  dm_setup_md_queue+0xef/0x100
> [ 1437.845520]  table_load+0x1be/0x2d0
> [ 1437.845522]  ctl_ioctl+0x1d6/0x4c0
> [ 1437.845523]  ? retrieve_status+0x1d0/0x1d0
> [ 1437.845527]  dm_ctl_ioctl+0xe/0x20
> [ 1437.845528]  __x64_sys_ioctl+0x118/0x910
> [ 1437.845531]  ? switch_fpu_return+0x56/0xc0
> [ 1437.845533]  do_syscall_64+0x3a/0x80
> [ 1437.845535]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1437.845537] RIP: 0033:0x7f81419dbb97
> [ 1437.845538] Code: 00 00 90 48 8b 05 09 73 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 72 2c 00 f7 d8 64 89 01 48
> [ 1437.845540] RSP: 002b:00007f814363b508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> [ 1437.845542] RAX: ffffffffffffffda RBX: 00007f81423b8d60 RCX: 00007f81419dbb97
> [ 1437.845543] RDX: 00007f812c026c30 RSI: 00000000c138fd09 RDI: 0000000000000009
> [ 1437.845544] RBP: 00007f81423f38b3 R08: 00007f8143639260 R09: 00007f81426018f8
> [ 1437.845545] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f812c026c30
> [ 1437.845546] R13: 0000000000000000 R14: 00007f812c026ce0 R15: 00007f812c00adc0
> [ 1437.845547] ---[ end trace c416dea93915334f ]---
> 
> 
> 
> ...
> [ 1437.845552] ------------[ cut here ]------------
> [ 1437.845553] kernfs: can not remove 'sdc', no directory
> [ 1437.845557] WARNING: CPU: 3 PID: 26257 at fs/kernfs/dir.c:1524 kernfs_remove_by_name_ns+0x88/0xa0
> [ 1437.845562] Modules linked in:
> ...
> [ 1437.845619] Call Trace:
> [ 1437.845620]  sysfs_remove_link+0x19/0x30
> [ 1437.845623]  bd_unlink_disk_holder+0x6d/0xd0
> [ 1437.845627]  dm_put_table_device+0x62/0xe0
> [ 1437.845629]  dm_put_device+0x88/0xe0
> [ 1437.845631]  ? dm_put_path_selector+0x40/0x50 [dm_multipath]
> [ 1437.845635]  free_priority_group+0x8e/0xc0 [dm_multipath]
> [ 1437.845638]  free_multipath+0x78/0xb0 [dm_multipath]
> [ 1437.845640]  multipath_dtr+0x2a/0x30 [dm_multipath]
> [ 1437.845642]  dm_table_destroy+0x67/0x130
> [ 1437.845645]  table_load+0x110/0x2d0
> [ 1437.845647]  ctl_ioctl+0x1d6/0x4c0
> [ 1437.845648]  ? retrieve_status+0x1d0/0x1d0
> [ 1437.845651]  dm_ctl_ioctl+0xe/0x20
> [ 1437.845653]  __x64_sys_ioctl+0x118/0x910
> [ 1437.845655]  ? switch_fpu_return+0x56/0xc0
> [ 1437.845657]  do_syscall_64+0x3a/0x80
> [ 1437.845659]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1437.845662] RIP: 0033:0x7f81419dbb97
> [ 1437.845663] Code: 00 00 90 48 8b 05 09 73 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 72 2c 00 f7 d8 64 89 01 48
> [ 1437.845664] RSP: 002b:00007f814363b508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> [ 1437.845665] RAX: ffffffffffffffda RBX: 00007f81423b8d60 RCX: 00007f81419dbb97
> [ 1437.845666] RDX: 00007f812c026c30 RSI: 00000000c138fd09 RDI: 0000000000000009
> [ 1437.845667] RBP: 00007f81423f38b3 R08: 00007f8143639260 R09: 00007f81426018f8
> [ 1437.845668] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f812c026c30
> [ 1437.845669] R13: 0000000000000000 R14: 00007f812c026ce0 R15: 00007f812c00adc0
> [ 1437.845670] ---[ end trace c416dea939153350 ]---
> 
> 
> 
> and a final Oops close to blk_mq_free_rqs:
> 
> [ 1438.976875] scsi 11:0:0:1: alua: Detached
> [ 1438.980927] BUG: unable to handle page fault for address: ffffffffc0d83160
> [ 1438.980960] #PF: supervisor read access in kernel mode
> [ 1438.980978] #PF: error_code(0x0000) - not-present page
> [ 1438.980995] PGD 15f60e067 P4D 15f60e067 PUD 15f610067 PMD 1bc2e3067 PTE 0
> [ 1438.981019] Oops: 0000 [#1] SMP PTI
> [ 1438.981033] CPU: 3 PID: 26257 Comm: multipathd Tainted: G        W         5.15.0-rc1+ #1
> [ 1438.981059] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
> [ 1438.981088] RIP: 0010:scsi_mq_exit_request+0x18/0x50
> [ 1438.981107] Code: 00 00 e8 5b 14 76 00 5d c3 e8 e4 cb e1 ff 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 8b 7f 60 48 89 f3 48 8b 87 98 00 00 00 <48> 8b 40 40 48 85 c0 74 0c 48 8d b6 10 01 00 00 e8 23 14 76 00 48
> [ 1438.981160] RSP: 0018:ffffa289c0447b38 EFLAGS: 00010286
> [ 1438.981178] RAX: ffffffffc0d83120 RBX: ffff975354360000 RCX: 0000000000000000
> [ 1438.981201] RDX: 0000000000000000 RSI: ffff975354360000 RDI: ffff97534cfd1000
> [ 1438.981223] RBP: ffffa289c0447b40 R08: 0000000000009c6b R09: 0000000000009c6b
> [ 1438.981245] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> [ 1438.981266] R13: ffff97534a34a240 R14: 0000000000000000 R15: 0000000000000000
> [ 1438.981288] FS:  00007f814363d700(0000) GS:ffff975357780000(0000) knlGS:0000000000000000
> [ 1438.981313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1438.981331] CR2: ffffffffc0d83160 CR3: 00000001b7fcc006 CR4: 00000000001706e0
> [ 1438.981354] Call Trace:
> [ 1438.981365]  blk_mq_free_rqs+0x5f/0x1b0
> [ 1438.981381]  blk_mq_free_map_and_requests+0x37/0x70
> [ 1438.981398]  blk_mq_free_tag_set+0x27/0x90
> [ 1438.981413]  scsi_mq_destroy_tags+0x15/0x20
> [ 1438.981429]  scsi_host_dev_release+0x8b/0xf0
> [ 1438.981445]  device_release+0x38/0x90
> [ 1438.981459]  kobject_put+0x87/0x190
> [ 1438.981475]  put_device+0x13/0x20
> [ 1438.981488]  scsi_target_dev_release+0x1f/0x30
> [ 1438.981504]  device_release+0x38/0x90
> [ 1438.981518]  kobject_put+0x87/0x190
> [ 1438.981532]  put_device+0x13/0x20
> [ 1438.981544]  scsi_device_dev_release_usercontext+0x2a0/0x2b0
> [ 1438.981565]  execute_in_process_context+0x25/0x70
> [ 1438.981583]  scsi_device_dev_release+0x1c/0x20
> [ 1438.981600]  device_release+0x38/0x90
> [ 1438.981613]  kobject_put+0x87/0x190
> [ 1438.981627]  put_device+0x13/0x20
> [ 1438.981639]  scsi_device_put+0x2c/0x30
> [ 1438.981653]  scsi_disk_put+0x30/0x50
> [ 1438.981668]  sd_release+0x37/0xb0
> [ 1438.981681]  blkdev_put_whole+0x30/0x50
> [ 1438.981696]  blkdev_put+0x92/0x150
> [ 1438.981710]  blkdev_close+0x27/0x30
> [ 1438.981723]  __fput+0x8b/0x240
> [ 1438.981736]  ____fput+0xe/0x10
> [ 1438.981748]  task_work_run+0x74/0xb0
> [ 1438.981762]  exit_to_user_mode_prepare+0x14e/0x150
> [ 1438.981782]  syscall_exit_to_user_mode+0x16/0x30
> [ 1438.981799]  do_syscall_64+0x46/0x80
> [ 1438.981813]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1438.981831] RIP: 0033:0x7f8142613c47
> [ 1438.981845] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 06 fc ff ff 8b 44 24
> [ 1438.981897] RSP: 002b:00007f814363b840 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> [ 1438.981920] RAX: 0000000000000000 RBX: 000000000000000a RCX: 00007f8142613c47
> [ 1438.981942] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000000a
> [ 1438.981964] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000007
> [ 1438.981986] R10: 0000000000000000 R11: 0000000000000293 R12: 0000564949b25700
> [ 1438.982007] R13: 00007f81432a1ccf R14: 00007f812c02c710 R15: 00007f812c02c710
> [ 1438.983180] Modules linked in: ib_srpt target_core_iblock target_core_mod scsi_debug rdma_rxe ip6_udp_tunnel udp_tunnel null_blk dm_service_time configs bridge stp llc nf_nat_ftp nf_conntrack_ftp xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ib_iser ip_set nfnetlink libiscsi ebtable_nat ebtable_broute scsi_transport_iscsi ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6table_nat ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcrdma sunrpc ib_ipoib rdma_ucm ib_umad dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua iw_cxgb4 libcxgb intel_rapl_msr intel_rapl_common ib_uverbs x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel rdma_cm iw_cm kvm ib_cm ib_core snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg irqbypass snd_hda_codec crc32_pclmul rapl snd_hwdep snd_hda_core intel_cstate intel_uncore
> [ 1438.983224]  snd_pcm snd_timer iTCO_wdt mei_me snd iTCO_vendor_support mxm_wmi mei soundcore i2c_i801 i2c_smbus lpc_ich wmi xfs i915 i2c_algo_bit ttm drm_kms_helper firewire_ohci firewire_core syscopyarea sysfillrect cxgb4 crc_itu_t sysimgblt fb_sys_fops tg3 drm ptp crc32c_intel csiostor scsi_transport_fc pps_core video [last unloaded: scsi_transport_srp]
> [ 1438.992637] CR2: ffffffffc0d83160
> [ 1438.994057] ---[ end trace c416dea939153351 ]---
> [ 1438.995476] RIP: 0010:scsi_mq_exit_request+0x18/0x50
> [ 1438.996905] Code: 00 00 e8 5b 14 76 00 5d c3 e8 e4 cb e1 ff 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 8b 7f 60 48 89 f3 48 8b 87 98 00 00 00 <48> 8b 40 40 48 85 c0 74 0c 48 8d b6 10 01 00 00 e8 23 14 76 00 48
> [ 1438.998414] RSP: 0018:ffffa289c0447b38 EFLAGS: 00010286
> [ 1438.999954] RAX: ffffffffc0d83120 RBX: ffff975354360000 RCX: 0000000000000000
> [ 1439.001513] RDX: 0000000000000000 RSI: ffff975354360000 RDI: ffff97534cfd1000
> [ 1439.003079] RBP: ffffa289c0447b40 R08: 0000000000009c6b R09: 0000000000009c6b
> [ 1439.004652] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
> [ 1439.006218] R13: ffff97534a34a240 R14: 0000000000000000 R15: 0000000000000000
> [ 1439.007777] FS:  00007f814363d700(0000) GS:ffff975357780000(0000) knlGS:0000000000000000
> [ 1439.009340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1439.010906] CR2: ffffffffc0d83160 CR3: 00000001b7fcc006 CR4: 00000000001706e0

(+linux-block)

Hi Bernard,

If I remember correctly all tests from the blktests suite pass on my test
setup with kernel v5.13. I think the above call traces are regressions that
have been introduced during the kernel v5.15 merge window in the block layer.

Bart.


  reply	other threads:[~2021-09-21 20:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHj4cs9Rzte5zbgy7o158m7JA8dbSEpxy5oR-+K0NQCK1gxG=Q@mail.gmail.com>
2021-09-16 19:36 ` Issus with blktest/srp on 5.15-rc1 and rdma_rxe Bernard Metzler
2021-09-16 22:21   ` Robert Pearson
2021-09-17  8:23     ` Bernard Metzler
2021-09-18  2:56       ` Bart Van Assche
2021-09-18 12:15         ` Yi Zhang
2021-09-21 20:08       ` Still issues with blktest/srp on 5.15-rc1 and software rdma providers Bernard Metzler
2021-09-21 20:16         ` Bart Van Assche [this message]
2021-09-17  8:15   ` Re: Issus with blktest/srp on 5.15-rc1 and rdma_rxe Bernard Metzler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559a2433-da0d-2800-dd31-4d44e8fb558e@acm.org \
    --to=bvanassche@acm.org \
    --cc=BMT@zurich.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).