All of lore.kernel.org
 help / color / mirror / Atom feed
* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
       [not found] <1832491330.31443919.1488709276951.JavaMail.zimbra@redhat.com>
@ 2017-03-05 13:39     ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-05 13:39 UTC (permalink / raw)
  To: linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g, sagi-NQWnxTmZq1alnMjI0IkVqw

Hi experts

If I offline one CPU on initiator side and nvmetcli clear on target side, it will cause kernel NULL pointer on initiator side, could you help check it, thanks

Steps to reproduce:
1. setup nvmet target with null-blk device:
#modprobe nvmet
#modprobe nvmet-rdma
#modprobe null_blk nr_devices=1
#nvmetcli restore rdma.json

2. connect the target on initiator side and offline one cpu:
#modprobe nvme-rdma
#nvme connect-all -t rdma -a 172.31.2.3 -s 1023
#echo 0 > /sys/devices/system/cpu/cpu1/online

3. nvmetcli clear on target side
#nvmetcli clear

Kernel log:

[  125.039340] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.2.3:1023
[  125.160587] nvme nvme0: creating 16 I/O queues.
[  125.602244] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.2.3:1023
[  140.930343] Broke affinity for irq 16
[  140.950295] Broke affinity for irq 28
[  140.969957] Broke affinity for irq 70
[  140.986584] Broke affinity for irq 90
[  141.003160] Broke affinity for irq 93
[  141.019779] Broke affinity for irq 97
[  141.036341] Broke affinity for irq 100
[  141.053782] Broke affinity for irq 104
[  141.072860] smpboot: CPU 1 is now offline
[  154.768104] nvme nvme0: reconnecting in 10 seconds
[  165.349689] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  165.387783] IP: blk_mq_reinit_tagset+0x35/0x80
[  165.409550] PGD 0 
[  165.409550] 
[  165.427269] Oops: 0000 [#1] SMP
[  165.442876] Modules linked in: nvme_rdma nvme_fabrics nvme_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf iTCO_wdt ipmi_si iTCO_vendor_support wmi hpwdt pcspkr sg ipmi_de
 vintf hpilo
[  165.769732]  acpi_power_meter ipmi_msghandler ioatdma shpchp acpi_cpufreq lpc_ich dca nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs mlx4_en sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt ata_generic fb_sys_fops pata_acpi ttm bnx2x drm e1000e ata_piix mdio ptp mlx4_core i2c_core serio_raw libata pps_core hpsa libcrc32c devlink fjes scsi_transport_sas crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
[  165.957288] CPU: 6 PID: 424 Comm: kworker/6:2 Not tainted 4.10.0+ #3
[  165.985856] Hardware name: HP ProLiant DL388p Gen8, BIOS P70 12/20/2013
[  166.015576] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[  166.047813] task: ffff8804291f9680 task.stack: ffffc90004fa4000
[  166.074543] RIP: 0010:blk_mq_reinit_tagset+0x35/0x80
[  166.096784] RSP: 0018:ffffc90004fa7e00 EFLAGS: 00010246
[  166.120205] RAX: ffff88082a97f600 RBX: 0000000000000000 RCX: 000000018020001a
[  166.152099] RDX: 0000000000000001 RSI: ffff88042c1b5240 RDI: ffff88042c163680
[  166.183997] RBP: ffffc90004fa7e20 R08: ffff88042c388400 R09: 000000018020001a
[  166.216018] R10: 000000002c388801 R11: ffff88042c388400 R12: 0000000000000000
[  166.248248] R13: 0000000000000001 R14: ffff8804be65d018 R15: 0000000000000180
[  166.280594] FS:  0000000000000000(0000) GS:ffff88042f780000(0000) knlGS:0000000000000000
[  166.317022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  166.342821] CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000406e0
[  166.374899] Call Trace:
[  166.385854]  nvme_rdma_reconnect_ctrl_work+0x60/0x1f0 [nvme_rdma]
[  166.414954]  process_one_work+0x165/0x410
[  166.434888]  worker_thread+0x137/0x4c0
[  166.453275]  kthread+0x101/0x140
[  166.469530]  ? rescuer_thread+0x3b0/0x3b0
[  166.487549]  ? kthread_park+0x90/0x90
[  166.503966]  ret_from_fork+0x2c/0x40
[  166.520071] Code: 56 49 89 fe 41 55 41 54 53 48 8b 47 08 48 83 78 40 00 74 55 8b 57 10 85 d2 74 4e 45 31 ed 49 8b 46 38 49 63 d5 31 db 4c 8b 24 d0 <41> 8b 04 24 85 c0 74 2c 49 8b 84 24 80 00 00 00 48 63 d3 48 8b 
[  166.605127] RIP: blk_mq_reinit_tagset+0x35/0x80 RSP: ffffc90004fa7e00
[  166.634093] CR2: 0000000000000000
[  166.648963] ---[ end trace cabb6f7f7f9f7187 ]---
[  166.674180] Kernel panic - not syncing: Fatal exception
[  166.697717] Kernel Offset: disabled
[  166.717719] ---[ end Kernel panic - not syncing: Fatal exception
[  166.746440] ------------[ cut here ]------------
[  166.767150] WARNING: CPU: 6 PID: 424 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3f/0x50
[  166.808742] Modules linked in: nvme_rdma nvme_fabrics nvme_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf iTCO_wdt ipmi_si iTCO_vendor_support wmi hpwdt pcspkr sg ipmi_de
 vintf hpilo
[  167.131981]  acpi_power_meter ipmi_msghandler ioatdma shpchp acpi_cpufreq lpc_ich dca nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs mlx4_en sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt ata_generic fb_sys_fops pata_acpi ttm bnx2x drm e1000e ata_piix mdio ptp mlx4_core i2c_core serio_raw libata pps_core hpsa libcrc32c devlink fjes scsi_transport_sas crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
[  167.315426] CPU: 6 PID: 424 Comm: kworker/6:2 Tainted: G      D         4.10.0+ #3
[  167.349430] Hardware name: HP ProLiant DL388p Gen8, BIOS P70 12/20/2013
[  167.379147] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[  167.411437] Call Trace:
[  167.422486]  <IRQ>
[  167.432587]  dump_stack+0x63/0x87
[  167.449042]  __warn+0xd1/0xf0
[  167.463891]  warn_slowpath_null+0x1d/0x20
[  167.483697]  native_smp_send_reschedule+0x3f/0x50
[  167.506498]  resched_curr+0xa1/0xc0
[  167.522992]  check_preempt_curr+0x70/0x90
[  167.541625]  ttwu_do_wakeup+0x19/0xe0
[  167.559098]  ttwu_do_activate+0x6f/0x80
[  167.577357]  try_to_wake_up+0x1aa/0x3b0
[  167.594742]  ? select_idle_sibling+0x2c/0x3d0
[  167.614498]  default_wake_function+0x12/0x20
[  167.633655]  __wake_up_common+0x55/0x90
[  167.650534]  __wake_up_locked+0x13/0x20
[  167.667784]  ep_poll_callback+0xbb/0x240
[  167.685405]  __wake_up_common+0x55/0x90
[  167.702615]  __wake_up+0x39/0x50
[  167.717046]  wake_up_klogd_work_func+0x40/0x60
[  167.736993]  irq_work_run_list+0x4d/0x70
[  167.755647]  ? tick_sched_do_timer+0x70/0x70
[  167.776239]  irq_work_tick+0x40/0x50
[  167.792914]  update_process_times+0x42/0x60
[  167.812138]  tick_sched_handle.isra.18+0x25/0x60
[  167.833794]  tick_sched_timer+0x3d/0x70
[  167.851391]  __hrtimer_run_queues+0xf3/0x280
[  167.871180]  hrtimer_interrupt+0xa8/0x1a0
[  167.889854]  local_apic_timer_interrupt+0x35/0x60
[  167.912036]  smp_apic_timer_interrupt+0x38/0x50
[  167.933375]  apic_timer_interrupt+0x93/0xa0
[  167.954586] RIP: 0010:panic+0x1f5/0x239
[  167.974032] RSP: 0018:ffffc90004fa7b50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  168.009365] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[  168.041566] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff88042f78e000
[  168.073801] RBP: ffffc90004fa7bc0 R08: 00000000fffffffe R09: 00000000000004d9
[  168.105833] R10: 0000000000000005 R11: 00000000000004d8 R12: ffffffff81a0e2e1
[  168.137892] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000046
[  168.170234]  </IRQ>
[  168.179603]  oops_end+0xb8/0xd0
[  168.193685]  no_context+0x19e/0x3f0
[  168.209369]  ? lock_timer_base+0xa0/0xa0
[  168.227067]  __bad_area_nosemaphore+0xee/0x1d0
[  168.246978]  bad_area_nosemaphore+0x14/0x20
[  168.266108]  __do_page_fault+0x89/0x4a0
[  168.283345]  ? __slab_free+0x9b/0x2c0
[  168.299742]  do_page_fault+0x30/0x80
[  168.315903]  page_fault+0x28/0x30
[  168.330741] RIP: 0010:blk_mq_reinit_tagset+0x35/0x80
[  168.353028] RSP: 0018:ffffc90004fa7e00 EFLAGS: 00010246
[  168.376493] RAX: ffff88082a97f600 RBX: 0000000000000000 RCX: 000000018020001a
[  168.408373] RDX: 0000000000000001 RSI: ffff88042c1b5240 RDI: ffff88042c163680
[  168.440447] RBP: ffffc90004fa7e20 R08: ffff88042c388400 R09: 000000018020001a
[  168.476491] R10: 000000002c388801 R11: ffff88042c388400 R12: 0000000000000000
[  168.510913] R13: 0000000000000001 R14: ffff8804be65d018 R15: 0000000000000180
[  168.543964]  nvme_rdma_reconnect_ctrl_work+0x60/0x1f0 [nvme_rdma]
[  168.571458]  process_one_work+0x165/0x410
[  168.589496]  worker_thread+0x137/0x4c0
[  168.606267]  kthread+0x101/0x140
[  168.620712]  ? rescuer_thread+0x3b0/0x3b0
[  168.638747]  ? kthread_park+0x90/0x90
[  168.655224]  ret_from_fork+0x2c/0x40
[  168.671278] ---[ end trace cabb6f7f7f9f7188 ]---

Best Regards,
  Yi Zhang
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-05 13:39     ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-05 13:39 UTC (permalink / raw)


Hi experts

If I offline one CPU on initiator side and nvmetcli clear on target side, it will cause kernel NULL pointer on initiator side, could you help check it, thanks

Steps to reproduce:
1. setup nvmet target with null-blk device:
#modprobe nvmet
#modprobe nvmet-rdma
#modprobe null_blk nr_devices=1
#nvmetcli restore rdma.json

2. connect the target on initiator side and offline one cpu:
#modprobe nvme-rdma
#nvme connect-all -t rdma -a 172.31.2.3 -s 1023
#echo 0 > /sys/devices/system/cpu/cpu1/online

3. nvmetcli clear on target side
#nvmetcli clear

Kernel log:

[  125.039340] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.2.3:1023
[  125.160587] nvme nvme0: creating 16 I/O queues.
[  125.602244] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.2.3:1023
[  140.930343] Broke affinity for irq 16
[  140.950295] Broke affinity for irq 28
[  140.969957] Broke affinity for irq 70
[  140.986584] Broke affinity for irq 90
[  141.003160] Broke affinity for irq 93
[  141.019779] Broke affinity for irq 97
[  141.036341] Broke affinity for irq 100
[  141.053782] Broke affinity for irq 104
[  141.072860] smpboot: CPU 1 is now offline
[  154.768104] nvme nvme0: reconnecting in 10 seconds
[  165.349689] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  165.387783] IP: blk_mq_reinit_tagset+0x35/0x80
[  165.409550] PGD 0 
[  165.409550] 
[  165.427269] Oops: 0000 [#1] SMP
[  165.442876] Modules linked in: nvme_rdma nvme_fabrics nvme_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf iTCO_wdt ipmi_si iTCO_vendor_support wmi hpwdt pcspkr sg ipmi_devintf hpilo
[  165.769732]  acpi_power_meter ipmi_msghandler ioatdma shpchp acpi_cpufreq lpc_ich dca nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs mlx4_en sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt ata_generic fb_sys_fops pata_acpi ttm bnx2x drm e1000e ata_piix mdio ptp mlx4_core i2c_core serio_raw libata pps_core hpsa libcrc32c devlink fjes scsi_transport_sas crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
[  165.957288] CPU: 6 PID: 424 Comm: kworker/6:2 Not tainted 4.10.0+ #3
[  165.985856] Hardware name: HP ProLiant DL388p Gen8, BIOS P70 12/20/2013
[  166.015576] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[  166.047813] task: ffff8804291f9680 task.stack: ffffc90004fa4000
[  166.074543] RIP: 0010:blk_mq_reinit_tagset+0x35/0x80
[  166.096784] RSP: 0018:ffffc90004fa7e00 EFLAGS: 00010246
[  166.120205] RAX: ffff88082a97f600 RBX: 0000000000000000 RCX: 000000018020001a
[  166.152099] RDX: 0000000000000001 RSI: ffff88042c1b5240 RDI: ffff88042c163680
[  166.183997] RBP: ffffc90004fa7e20 R08: ffff88042c388400 R09: 000000018020001a
[  166.216018] R10: 000000002c388801 R11: ffff88042c388400 R12: 0000000000000000
[  166.248248] R13: 0000000000000001 R14: ffff8804be65d018 R15: 0000000000000180
[  166.280594] FS:  0000000000000000(0000) GS:ffff88042f780000(0000) knlGS:0000000000000000
[  166.317022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  166.342821] CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000406e0
[  166.374899] Call Trace:
[  166.385854]  nvme_rdma_reconnect_ctrl_work+0x60/0x1f0 [nvme_rdma]
[  166.414954]  process_one_work+0x165/0x410
[  166.434888]  worker_thread+0x137/0x4c0
[  166.453275]  kthread+0x101/0x140
[  166.469530]  ? rescuer_thread+0x3b0/0x3b0
[  166.487549]  ? kthread_park+0x90/0x90
[  166.503966]  ret_from_fork+0x2c/0x40
[  166.520071] Code: 56 49 89 fe 41 55 41 54 53 48 8b 47 08 48 83 78 40 00 74 55 8b 57 10 85 d2 74 4e 45 31 ed 49 8b 46 38 49 63 d5 31 db 4c 8b 24 d0 <41> 8b 04 24 85 c0 74 2c 49 8b 84 24 80 00 00 00 48 63 d3 48 8b 
[  166.605127] RIP: blk_mq_reinit_tagset+0x35/0x80 RSP: ffffc90004fa7e00
[  166.634093] CR2: 0000000000000000
[  166.648963] ---[ end trace cabb6f7f7f9f7187 ]---
[  166.674180] Kernel panic - not syncing: Fatal exception
[  166.697717] Kernel Offset: disabled
[  166.717719] ---[ end Kernel panic - not syncing: Fatal exception
[  166.746440] ------------[ cut here ]------------
[  166.767150] WARNING: CPU: 6 PID: 424 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3f/0x50
[  166.808742] Modules linked in: nvme_rdma nvme_fabrics nvme_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf iTCO_wdt ipmi_si iTCO_vendor_support wmi hpwdt pcspkr sg ipmi_devintf hpilo
[  167.131981]  acpi_power_meter ipmi_msghandler ioatdma shpchp acpi_cpufreq lpc_ich dca nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs mlx4_en sr_mod sd_mod cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt ata_generic fb_sys_fops pata_acpi ttm bnx2x drm e1000e ata_piix mdio ptp mlx4_core i2c_core serio_raw libata pps_core hpsa libcrc32c devlink fjes scsi_transport_sas crc32c_intel dm_mirror dm_region_hash dm_log dm_mod
[  167.315426] CPU: 6 PID: 424 Comm: kworker/6:2 Tainted: G      D         4.10.0+ #3
[  167.349430] Hardware name: HP ProLiant DL388p Gen8, BIOS P70 12/20/2013
[  167.379147] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[  167.411437] Call Trace:
[  167.422486]  <IRQ>
[  167.432587]  dump_stack+0x63/0x87
[  167.449042]  __warn+0xd1/0xf0
[  167.463891]  warn_slowpath_null+0x1d/0x20
[  167.483697]  native_smp_send_reschedule+0x3f/0x50
[  167.506498]  resched_curr+0xa1/0xc0
[  167.522992]  check_preempt_curr+0x70/0x90
[  167.541625]  ttwu_do_wakeup+0x19/0xe0
[  167.559098]  ttwu_do_activate+0x6f/0x80
[  167.577357]  try_to_wake_up+0x1aa/0x3b0
[  167.594742]  ? select_idle_sibling+0x2c/0x3d0
[  167.614498]  default_wake_function+0x12/0x20
[  167.633655]  __wake_up_common+0x55/0x90
[  167.650534]  __wake_up_locked+0x13/0x20
[  167.667784]  ep_poll_callback+0xbb/0x240
[  167.685405]  __wake_up_common+0x55/0x90
[  167.702615]  __wake_up+0x39/0x50
[  167.717046]  wake_up_klogd_work_func+0x40/0x60
[  167.736993]  irq_work_run_list+0x4d/0x70
[  167.755647]  ? tick_sched_do_timer+0x70/0x70
[  167.776239]  irq_work_tick+0x40/0x50
[  167.792914]  update_process_times+0x42/0x60
[  167.812138]  tick_sched_handle.isra.18+0x25/0x60
[  167.833794]  tick_sched_timer+0x3d/0x70
[  167.851391]  __hrtimer_run_queues+0xf3/0x280
[  167.871180]  hrtimer_interrupt+0xa8/0x1a0
[  167.889854]  local_apic_timer_interrupt+0x35/0x60
[  167.912036]  smp_apic_timer_interrupt+0x38/0x50
[  167.933375]  apic_timer_interrupt+0x93/0xa0
[  167.954586] RIP: 0010:panic+0x1f5/0x239
[  167.974032] RSP: 0018:ffffc90004fa7b50 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  168.009365] RAX: 0000000000000034 RBX: 0000000000000000 RCX: 0000000000000006
[  168.041566] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff88042f78e000
[  168.073801] RBP: ffffc90004fa7bc0 R08: 00000000fffffffe R09: 00000000000004d9
[  168.105833] R10: 0000000000000005 R11: 00000000000004d8 R12: ffffffff81a0e2e1
[  168.137892] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000046
[  168.170234]  </IRQ>
[  168.179603]  oops_end+0xb8/0xd0
[  168.193685]  no_context+0x19e/0x3f0
[  168.209369]  ? lock_timer_base+0xa0/0xa0
[  168.227067]  __bad_area_nosemaphore+0xee/0x1d0
[  168.246978]  bad_area_nosemaphore+0x14/0x20
[  168.266108]  __do_page_fault+0x89/0x4a0
[  168.283345]  ? __slab_free+0x9b/0x2c0
[  168.299742]  do_page_fault+0x30/0x80
[  168.315903]  page_fault+0x28/0x30
[  168.330741] RIP: 0010:blk_mq_reinit_tagset+0x35/0x80
[  168.353028] RSP: 0018:ffffc90004fa7e00 EFLAGS: 00010246
[  168.376493] RAX: ffff88082a97f600 RBX: 0000000000000000 RCX: 000000018020001a
[  168.408373] RDX: 0000000000000001 RSI: ffff88042c1b5240 RDI: ffff88042c163680
[  168.440447] RBP: ffffc90004fa7e20 R08: ffff88042c388400 R09: 000000018020001a
[  168.476491] R10: 000000002c388801 R11: ffff88042c388400 R12: 0000000000000000
[  168.510913] R13: 0000000000000001 R14: ffff8804be65d018 R15: 0000000000000180
[  168.543964]  nvme_rdma_reconnect_ctrl_work+0x60/0x1f0 [nvme_rdma]
[  168.571458]  process_one_work+0x165/0x410
[  168.589496]  worker_thread+0x137/0x4c0
[  168.606267]  kthread+0x101/0x140
[  168.620712]  ? rescuer_thread+0x3b0/0x3b0
[  168.638747]  ? kthread_park+0x90/0x90
[  168.655224]  ret_from_fork+0x2c/0x40
[  168.671278] ---[ end trace cabb6f7f7f9f7188 ]---

Best Regards,
  Yi Zhang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-05 13:39     ` Yi Zhang
@ 2017-03-06 11:25         ` Sagi Grimberg
  -1 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-06 11:25 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g


> Hi experts
>
> If I offline one CPU on initiator side and nvmetcli clear on target side, it will cause kernel NULL pointer on initiator side, could you help check it, thanks
>
> Steps to reproduce:
> 1. setup nvmet target with null-blk device:
> #modprobe nvmet
> #modprobe nvmet-rdma
> #modprobe null_blk nr_devices=1
> #nvmetcli restore rdma.json
>
> 2. connect the target on initiator side and offline one cpu:
> #modprobe nvme-rdma
> #nvme connect-all -t rdma -a 172.31.2.3 -s 1023
> #echo 0 > /sys/devices/system/cpu/cpu1/online
>
> 3. nvmetcli clear on target side
> #nvmetcli clear
>
> Kernel log:
>
> [  125.039340] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.2.3:1023
> [  125.160587] nvme nvme0: creating 16 I/O queues.
> [  125.602244] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.2.3:1023
> [  140.930343] Broke affinity for irq 16
> [  140.950295] Broke affinity for irq 28
> [  140.969957] Broke affinity for irq 70
> [  140.986584] Broke affinity for irq 90
> [  141.003160] Broke affinity for irq 93
> [  141.019779] Broke affinity for irq 97
> [  141.036341] Broke affinity for irq 100
> [  141.053782] Broke affinity for irq 104
> [  141.072860] smpboot: CPU 1 is now offline
> [  154.768104] nvme nvme0: reconnecting in 10 seconds
> [  165.349689] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [  165.387783] IP: blk_mq_reinit_tagset+0x35/0x80

Looks like blk_mq_reinit_tagset is not aware that tags can go away with
cpu hotplug...

Does this fix your issue:
--
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e48bc2c72615..9d97bfc4d465 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -295,6 +295,9 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
         for (i = 0; i < set->nr_hw_queues; i++) {
                 struct blk_mq_tags *tags = set->tags[i];

+               if (!tags)
+                       continue;
+
                 for (j = 0; j < tags->nr_tags; j++) {
                         if (!tags->static_rqs[j])
                                 continue;
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-06 11:25         ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-06 11:25 UTC (permalink / raw)



> Hi experts
>
> If I offline one CPU on initiator side and nvmetcli clear on target side, it will cause kernel NULL pointer on initiator side, could you help check it, thanks
>
> Steps to reproduce:
> 1. setup nvmet target with null-blk device:
> #modprobe nvmet
> #modprobe nvmet-rdma
> #modprobe null_blk nr_devices=1
> #nvmetcli restore rdma.json
>
> 2. connect the target on initiator side and offline one cpu:
> #modprobe nvme-rdma
> #nvme connect-all -t rdma -a 172.31.2.3 -s 1023
> #echo 0 > /sys/devices/system/cpu/cpu1/online
>
> 3. nvmetcli clear on target side
> #nvmetcli clear
>
> Kernel log:
>
> [  125.039340] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.2.3:1023
> [  125.160587] nvme nvme0: creating 16 I/O queues.
> [  125.602244] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.2.3:1023
> [  140.930343] Broke affinity for irq 16
> [  140.950295] Broke affinity for irq 28
> [  140.969957] Broke affinity for irq 70
> [  140.986584] Broke affinity for irq 90
> [  141.003160] Broke affinity for irq 93
> [  141.019779] Broke affinity for irq 97
> [  141.036341] Broke affinity for irq 100
> [  141.053782] Broke affinity for irq 104
> [  141.072860] smpboot: CPU 1 is now offline
> [  154.768104] nvme nvme0: reconnecting in 10 seconds
> [  165.349689] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [  165.387783] IP: blk_mq_reinit_tagset+0x35/0x80

Looks like blk_mq_reinit_tagset is not aware that tags can go away with
cpu hotplug...

Does this fix your issue:
--
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e48bc2c72615..9d97bfc4d465 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -295,6 +295,9 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
         for (i = 0; i < set->nr_hw_queues; i++) {
                 struct blk_mq_tags *tags = set->tags[i];

+               if (!tags)
+                       continue;
+
                 for (j = 0; j < tags->nr_tags; j++) {
                         if (!tags->static_rqs[j])
                                 continue;
--

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-06 11:25         ` Sagi Grimberg
@ 2017-03-09  4:02             ` Yi Zhang
  -1 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-09  4:02 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g

Looks like blk_mq_reinit_tagset is not aware that tags can go away with
> cpu hotplug...
>
> Does this fix your issue:
> -- 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index e48bc2c72615..9d97bfc4d465 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -295,6 +295,9 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
>         for (i = 0; i < set->nr_hw_queues; i++) {
>                 struct blk_mq_tags *tags = set->tags[i];
>
> +               if (!tags)
> +                       continue;
> +
>                 for (j = 0; j < tags->nr_tags; j++) {
>                         if (!tags->static_rqs[j])
>                                 continue;
> -- 
Hi Sagi
With this patch, the NULL pointer fixed now.
But from below log, we can see it will continue reconnecting in 10 
seconds and cannot be stopped.

[36288.963890] Broke affinity for irq 16
[36288.983090] Broke affinity for irq 28
[36289.003104] Broke affinity for irq 90
[36289.020488] Broke affinity for irq 93
[36289.036911] Broke affinity for irq 97
[36289.053344] Broke affinity for irq 100
[36289.070166] Broke affinity for irq 104
[36289.088076] smpboot: CPU 1 is now offline
[36302.371160] nvme nvme0: reconnecting in 10 seconds
[36312.953684] blk_mq_reinit_tagset: tag is null, continue
[36312.983267] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36313.017290] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36313.044937] nvme nvme0: Failed reconnect attempt, requeueing...
[36323.171983] blk_mq_reinit_tagset: tag is null, continue
[36323.200733] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36323.233820] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36323.261027] nvme nvme0: Failed reconnect attempt, requeueing...
[36333.412341] blk_mq_reinit_tagset: tag is null, continue
[36333.441346] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36333.476139] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36333.502794] nvme nvme0: Failed reconnect attempt, requeueing...
[36343.652755] blk_mq_reinit_tagset: tag is null, continue
[36343.682103] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36343.716645] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36343.743581] nvme nvme0: Failed reconnect attempt, requeueing...
[36353.893103] blk_mq_reinit_tagset: tag is null, continue
[36353.921041] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36353.953541] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36353.983528] nvme nvme0: Failed reconnect attempt, requeueing...
[36364.133544] blk_mq_reinit_tagset: tag is null, continue
[36364.162012] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36364.195002] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36364.221671] nvme nvme0: Failed reconnect attempt, requeueing...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-09  4:02             ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-09  4:02 UTC (permalink / raw)


Looks like blk_mq_reinit_tagset is not aware that tags can go away with
> cpu hotplug...
>
> Does this fix your issue:
> -- 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index e48bc2c72615..9d97bfc4d465 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -295,6 +295,9 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
>         for (i = 0; i < set->nr_hw_queues; i++) {
>                 struct blk_mq_tags *tags = set->tags[i];
>
> +               if (!tags)
> +                       continue;
> +
>                 for (j = 0; j < tags->nr_tags; j++) {
>                         if (!tags->static_rqs[j])
>                                 continue;
> -- 
Hi Sagi
With this patch, the NULL pointer fixed now.
But from below log, we can see it will continue reconnecting in 10 
seconds and cannot be stopped.

[36288.963890] Broke affinity for irq 16
[36288.983090] Broke affinity for irq 28
[36289.003104] Broke affinity for irq 90
[36289.020488] Broke affinity for irq 93
[36289.036911] Broke affinity for irq 97
[36289.053344] Broke affinity for irq 100
[36289.070166] Broke affinity for irq 104
[36289.088076] smpboot: CPU 1 is now offline
[36302.371160] nvme nvme0: reconnecting in 10 seconds
[36312.953684] blk_mq_reinit_tagset: tag is null, continue
[36312.983267] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36313.017290] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36313.044937] nvme nvme0: Failed reconnect attempt, requeueing...
[36323.171983] blk_mq_reinit_tagset: tag is null, continue
[36323.200733] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36323.233820] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36323.261027] nvme nvme0: Failed reconnect attempt, requeueing...
[36333.412341] blk_mq_reinit_tagset: tag is null, continue
[36333.441346] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36333.476139] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36333.502794] nvme nvme0: Failed reconnect attempt, requeueing...
[36343.652755] blk_mq_reinit_tagset: tag is null, continue
[36343.682103] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36343.716645] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36343.743581] nvme nvme0: Failed reconnect attempt, requeueing...
[36353.893103] blk_mq_reinit_tagset: tag is null, continue
[36353.921041] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36353.953541] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36353.983528] nvme nvme0: Failed reconnect attempt, requeueing...
[36364.133544] blk_mq_reinit_tagset: tag is null, continue
[36364.162012] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36364.195002] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36364.221671] nvme nvme0: Failed reconnect attempt, requeueing...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-09  4:02             ` Yi Zhang
@ 2017-03-09 11:23                 ` Sagi Grimberg
  -1 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-09 11:23 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g


> Hi Sagi
> With this patch, the NULL pointer fixed now.
> But from below log, we can see it will continue reconnecting in 10
> seconds and cannot be stopped.
>
> [36288.963890] Broke affinity for irq 16
> [36288.983090] Broke affinity for irq 28
> [36289.003104] Broke affinity for irq 90
> [36289.020488] Broke affinity for irq 93
> [36289.036911] Broke affinity for irq 97
> [36289.053344] Broke affinity for irq 100
> [36289.070166] Broke affinity for irq 104
> [36289.088076] smpboot: CPU 1 is now offline
> [36302.371160] nvme nvme0: reconnecting in 10 seconds
> [36312.953684] blk_mq_reinit_tagset: tag is null, continue
> [36312.983267] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36313.017290] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36313.044937] nvme nvme0: Failed reconnect attempt, requeueing...
> [36323.171983] blk_mq_reinit_tagset: tag is null, continue
> [36323.200733] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36323.233820] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36323.261027] nvme nvme0: Failed reconnect attempt, requeueing...
> [36333.412341] blk_mq_reinit_tagset: tag is null, continue
> [36333.441346] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36333.476139] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36333.502794] nvme nvme0: Failed reconnect attempt, requeueing...
> [36343.652755] blk_mq_reinit_tagset: tag is null, continue
> [36343.682103] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36343.716645] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36343.743581] nvme nvme0: Failed reconnect attempt, requeueing...
> [36353.893103] blk_mq_reinit_tagset: tag is null, continue
> [36353.921041] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36353.953541] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36353.983528] nvme nvme0: Failed reconnect attempt, requeueing...
> [36364.133544] blk_mq_reinit_tagset: tag is null, continue
> [36364.162012] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36364.195002] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36364.221671] nvme nvme0: Failed reconnect attempt, requeueing...
>

Yep... looks like we don't take into account that we can't use all the
queues now...

Does this patch help:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 29ac8fcb8d2c..25af3f75f6f1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -337,8 +337,6 @@ static int __nvme_rdma_init_request(struct 
nvme_rdma_ctrl *ctrl,
         struct ib_device *ibdev = dev->dev;
         int ret;

-       BUG_ON(queue_idx >= ctrl->queue_count);
-
         ret = nvme_rdma_alloc_qe(ibdev, &req->sqe, sizeof(struct 
nvme_command),
                         DMA_TO_DEVICE);
         if (ret)
@@ -647,8 +645,22 @@ static int nvme_rdma_connect_io_queues(struct 
nvme_rdma_ctrl *ctrl)

  static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
  {
+       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
+       unsigned int nr_io_queues;
         int i, ret;

+       nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
+       ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
+       if (ret)
+               return ret;
+
+       ctrl->queue_count = nr_io_queues + 1;
+       if (ctrl->queue_count < 2)
+               return 0;
+
+       dev_info(ctrl->ctrl.device,
+               "creating %d I/O queues.\n", nr_io_queues);
+
         for (i = 1; i < ctrl->queue_count; i++) {
                 ret = nvme_rdma_init_queue(ctrl, i,
                                            ctrl->ctrl.opts->queue_size);
@@ -1793,20 +1805,8 @@ static const struct nvme_ctrl_ops 
nvme_rdma_ctrl_ops = {

  static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl)
  {
-       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
         int ret;

-       ret = nvme_set_queue_count(&ctrl->ctrl, &opts->nr_io_queues);
-       if (ret)
-               return ret;
-
-       ctrl->queue_count = opts->nr_io_queues + 1;
-       if (ctrl->queue_count < 2)
-               return 0;
-
-       dev_info(ctrl->ctrl.device,
-               "creating %d I/O queues.\n", opts->nr_io_queues);
-
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-09 11:23                 ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-09 11:23 UTC (permalink / raw)



> Hi Sagi
> With this patch, the NULL pointer fixed now.
> But from below log, we can see it will continue reconnecting in 10
> seconds and cannot be stopped.
>
> [36288.963890] Broke affinity for irq 16
> [36288.983090] Broke affinity for irq 28
> [36289.003104] Broke affinity for irq 90
> [36289.020488] Broke affinity for irq 93
> [36289.036911] Broke affinity for irq 97
> [36289.053344] Broke affinity for irq 100
> [36289.070166] Broke affinity for irq 104
> [36289.088076] smpboot: CPU 1 is now offline
> [36302.371160] nvme nvme0: reconnecting in 10 seconds
> [36312.953684] blk_mq_reinit_tagset: tag is null, continue
> [36312.983267] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36313.017290] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36313.044937] nvme nvme0: Failed reconnect attempt, requeueing...
> [36323.171983] blk_mq_reinit_tagset: tag is null, continue
> [36323.200733] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36323.233820] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36323.261027] nvme nvme0: Failed reconnect attempt, requeueing...
> [36333.412341] blk_mq_reinit_tagset: tag is null, continue
> [36333.441346] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36333.476139] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36333.502794] nvme nvme0: Failed reconnect attempt, requeueing...
> [36343.652755] blk_mq_reinit_tagset: tag is null, continue
> [36343.682103] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36343.716645] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36343.743581] nvme nvme0: Failed reconnect attempt, requeueing...
> [36353.893103] blk_mq_reinit_tagset: tag is null, continue
> [36353.921041] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36353.953541] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36353.983528] nvme nvme0: Failed reconnect attempt, requeueing...
> [36364.133544] blk_mq_reinit_tagset: tag is null, continue
> [36364.162012] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [36364.195002] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [36364.221671] nvme nvme0: Failed reconnect attempt, requeueing...
>

Yep... looks like we don't take into account that we can't use all the
queues now...

Does this patch help:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 29ac8fcb8d2c..25af3f75f6f1 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -337,8 +337,6 @@ static int __nvme_rdma_init_request(struct 
nvme_rdma_ctrl *ctrl,
         struct ib_device *ibdev = dev->dev;
         int ret;

-       BUG_ON(queue_idx >= ctrl->queue_count);
-
         ret = nvme_rdma_alloc_qe(ibdev, &req->sqe, sizeof(struct 
nvme_command),
                         DMA_TO_DEVICE);
         if (ret)
@@ -647,8 +645,22 @@ static int nvme_rdma_connect_io_queues(struct 
nvme_rdma_ctrl *ctrl)

  static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
  {
+       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
+       unsigned int nr_io_queues;
         int i, ret;

+       nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
+       ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
+       if (ret)
+               return ret;
+
+       ctrl->queue_count = nr_io_queues + 1;
+       if (ctrl->queue_count < 2)
+               return 0;
+
+       dev_info(ctrl->ctrl.device,
+               "creating %d I/O queues.\n", nr_io_queues);
+
         for (i = 1; i < ctrl->queue_count; i++) {
                 ret = nvme_rdma_init_queue(ctrl, i,
                                            ctrl->ctrl.opts->queue_size);
@@ -1793,20 +1805,8 @@ static const struct nvme_ctrl_ops 
nvme_rdma_ctrl_ops = {

  static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl)
  {
-       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
         int ret;

-       ret = nvme_set_queue_count(&ctrl->ctrl, &opts->nr_io_queues);
-       if (ret)
-               return ret;
-
-       ctrl->queue_count = opts->nr_io_queues + 1;
-       if (ctrl->queue_count < 2)
-               return 0;
-
-       dev_info(ctrl->ctrl.device,
-               "creating %d I/O queues.\n", opts->nr_io_queues);
-
--

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-09 11:23                 ` Sagi Grimberg
@ 2017-03-10  7:59                     ` Yi Zhang
  -1 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-10  7:59 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g


>>
>
> Yep... looks like we don't take into account that we can't use all the
> queues now...
>
> Does this patch help:
Still can reproduce the reconnect in 10 seconds issues with the patch, 
here is the log:

[  193.574183] nvme nvme0: new ctrl: NQN "nvme-subsystem-name", addr 
172.31.2.3:1023
[  193.612039] __nvme_rdma_init_request: changing called
[  193.638723] __nvme_rdma_init_request: changing called
[  193.661767] __nvme_rdma_init_request: changing called
[  193.684579] __nvme_rdma_init_request: changing called
[  193.707327] __nvme_rdma_init_request: changing called
[  193.730071] __nvme_rdma_init_request: changing called
[  193.752896] __nvme_rdma_init_request: changing called
[  193.775699] __nvme_rdma_init_request: changing called
[  193.798813] __nvme_rdma_init_request: changing called
[  193.821257] __nvme_rdma_init_request: changing called
[  193.844090] __nvme_rdma_init_request: changing called
[  193.866472] __nvme_rdma_init_request: changing called
[  193.889375] __nvme_rdma_init_request: changing called
[  193.912094] __nvme_rdma_init_request: changing called
[  193.934942] __nvme_rdma_init_request: changing called
[  193.957688] __nvme_rdma_init_request: changing called
[  606.273376] Broke affinity for irq 16
[  606.291940] Broke affinity for irq 28
[  606.310201] Broke affinity for irq 90
[  606.328211] Broke affinity for irq 93
[  606.346263] Broke affinity for irq 97
[  606.364314] Broke affinity for irq 100
[  606.382105] Broke affinity for irq 104
[  606.400727] smpboot: CPU 1 is now offline
[  616.820505] nvme nvme0: reconnecting in 10 seconds
[  626.882747] blk_mq_reinit_tagset: tag is null, continue
[  626.914000] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  626.947965] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  626.974673] nvme nvme0: Failed reconnect attempt, requeueing...
[  637.100252] blk_mq_reinit_tagset: tag is null, continue
[  637.129200] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  637.163578] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  637.190246] nvme nvme0: Failed reconnect attempt, requeueing...
[  647.340147] blk_mq_reinit_tagset: tag is null, continue
[  647.367612] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  647.402527] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  647.430338] nvme nvme0: Failed reconnect attempt, requeueing...
[  657.579993] blk_mq_reinit_tagset: tag is null, continue
[  657.608478] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  657.643947] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  657.670579] nvme nvme0: Failed reconnect attempt, requeueing...
[  667.819897] blk_mq_reinit_tagset: tag is null, continue
[  667.848786] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  667.881951] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  667.908578] nvme nvme0: Failed reconnect attempt, requeueing...
[  678.059821] blk_mq_reinit_tagset: tag is null, continue
[  678.089295] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  678.123602] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  678.150317] nvme nvme0: Failed reconnect attempt, requeueing...


> -- 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 29ac8fcb8d2c..25af3f75f6f1 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -337,8 +337,6 @@ static int __nvme_rdma_init_request(struct 
> nvme_rdma_ctrl *ctrl,
>         struct ib_device *ibdev = dev->dev;
>         int ret;
>
> -       BUG_ON(queue_idx >= ctrl->queue_count);
> -
>         ret = nvme_rdma_alloc_qe(ibdev, &req->sqe, sizeof(struct 
> nvme_command),
>                         DMA_TO_DEVICE);
>         if (ret)
> @@ -647,8 +645,22 @@ static int nvme_rdma_connect_io_queues(struct 
> nvme_rdma_ctrl *ctrl)
>
>  static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
>  {
> +       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> +       unsigned int nr_io_queues;
>         int i, ret;
>
> +       nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> +       ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> +       if (ret)
> +               return ret;
> +
> +       ctrl->queue_count = nr_io_queues + 1;
> +       if (ctrl->queue_count < 2)
> +               return 0;
> +
> +       dev_info(ctrl->ctrl.device,
> +               "creating %d I/O queues.\n", nr_io_queues);
> +
>         for (i = 1; i < ctrl->queue_count; i++) {
>                 ret = nvme_rdma_init_queue(ctrl, i,
> ctrl->ctrl.opts->queue_size);
> @@ -1793,20 +1805,8 @@ static const struct nvme_ctrl_ops 
> nvme_rdma_ctrl_ops = {
>
>  static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl)
>  {
> -       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>         int ret;
>
> -       ret = nvme_set_queue_count(&ctrl->ctrl, &opts->nr_io_queues);
> -       if (ret)
> -               return ret;
> -
> -       ctrl->queue_count = opts->nr_io_queues + 1;
> -       if (ctrl->queue_count < 2)
> -               return 0;
> -
> -       dev_info(ctrl->ctrl.device,
> -               "creating %d I/O queues.\n", opts->nr_io_queues);
> -
> -- 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-10  7:59                     ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-10  7:59 UTC (permalink / raw)



>>
>
> Yep... looks like we don't take into account that we can't use all the
> queues now...
>
> Does this patch help:
Still can reproduce the reconnect in 10 seconds issues with the patch, 
here is the log:

[  193.574183] nvme nvme0: new ctrl: NQN "nvme-subsystem-name", addr 
172.31.2.3:1023
[  193.612039] __nvme_rdma_init_request: changing called
[  193.638723] __nvme_rdma_init_request: changing called
[  193.661767] __nvme_rdma_init_request: changing called
[  193.684579] __nvme_rdma_init_request: changing called
[  193.707327] __nvme_rdma_init_request: changing called
[  193.730071] __nvme_rdma_init_request: changing called
[  193.752896] __nvme_rdma_init_request: changing called
[  193.775699] __nvme_rdma_init_request: changing called
[  193.798813] __nvme_rdma_init_request: changing called
[  193.821257] __nvme_rdma_init_request: changing called
[  193.844090] __nvme_rdma_init_request: changing called
[  193.866472] __nvme_rdma_init_request: changing called
[  193.889375] __nvme_rdma_init_request: changing called
[  193.912094] __nvme_rdma_init_request: changing called
[  193.934942] __nvme_rdma_init_request: changing called
[  193.957688] __nvme_rdma_init_request: changing called
[  606.273376] Broke affinity for irq 16
[  606.291940] Broke affinity for irq 28
[  606.310201] Broke affinity for irq 90
[  606.328211] Broke affinity for irq 93
[  606.346263] Broke affinity for irq 97
[  606.364314] Broke affinity for irq 100
[  606.382105] Broke affinity for irq 104
[  606.400727] smpboot: CPU 1 is now offline
[  616.820505] nvme nvme0: reconnecting in 10 seconds
[  626.882747] blk_mq_reinit_tagset: tag is null, continue
[  626.914000] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  626.947965] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  626.974673] nvme nvme0: Failed reconnect attempt, requeueing...
[  637.100252] blk_mq_reinit_tagset: tag is null, continue
[  637.129200] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  637.163578] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  637.190246] nvme nvme0: Failed reconnect attempt, requeueing...
[  647.340147] blk_mq_reinit_tagset: tag is null, continue
[  647.367612] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  647.402527] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  647.430338] nvme nvme0: Failed reconnect attempt, requeueing...
[  657.579993] blk_mq_reinit_tagset: tag is null, continue
[  657.608478] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  657.643947] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  657.670579] nvme nvme0: Failed reconnect attempt, requeueing...
[  667.819897] blk_mq_reinit_tagset: tag is null, continue
[  667.848786] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  667.881951] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  667.908578] nvme nvme0: Failed reconnect attempt, requeueing...
[  678.059821] blk_mq_reinit_tagset: tag is null, continue
[  678.089295] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  678.123602] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  678.150317] nvme nvme0: Failed reconnect attempt, requeueing...


> -- 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 29ac8fcb8d2c..25af3f75f6f1 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -337,8 +337,6 @@ static int __nvme_rdma_init_request(struct 
> nvme_rdma_ctrl *ctrl,
>         struct ib_device *ibdev = dev->dev;
>         int ret;
>
> -       BUG_ON(queue_idx >= ctrl->queue_count);
> -
>         ret = nvme_rdma_alloc_qe(ibdev, &req->sqe, sizeof(struct 
> nvme_command),
>                         DMA_TO_DEVICE);
>         if (ret)
> @@ -647,8 +645,22 @@ static int nvme_rdma_connect_io_queues(struct 
> nvme_rdma_ctrl *ctrl)
>
>  static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
>  {
> +       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> +       unsigned int nr_io_queues;
>         int i, ret;
>
> +       nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> +       ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
> +       if (ret)
> +               return ret;
> +
> +       ctrl->queue_count = nr_io_queues + 1;
> +       if (ctrl->queue_count < 2)
> +               return 0;
> +
> +       dev_info(ctrl->ctrl.device,
> +               "creating %d I/O queues.\n", nr_io_queues);
> +
>         for (i = 1; i < ctrl->queue_count; i++) {
>                 ret = nvme_rdma_init_queue(ctrl, i,
> ctrl->ctrl.opts->queue_size);
> @@ -1793,20 +1805,8 @@ static const struct nvme_ctrl_ops 
> nvme_rdma_ctrl_ops = {
>
>  static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl)
>  {
> -       struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>         int ret;
>
> -       ret = nvme_set_queue_count(&ctrl->ctrl, &opts->nr_io_queues);
> -       if (ret)
> -               return ret;
> -
> -       ctrl->queue_count = opts->nr_io_queues + 1;
> -       if (ctrl->queue_count < 2)
> -               return 0;
> -
> -       dev_info(ctrl->ctrl.device,
> -               "creating %d I/O queues.\n", opts->nr_io_queues);
> -
> -- 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-10  7:59                     ` Yi Zhang
@ 2017-03-13  8:09                         ` Sagi Grimberg
  -1 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-13  8:09 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g


>> Yep... looks like we don't take into account that we can't use all the
>> queues now...
>>
>> Does this patch help:
> Still can reproduce the reconnect in 10 seconds issues with the patch,
> here is the log:
>
> [  193.574183] nvme nvme0: new ctrl: NQN "nvme-subsystem-name", addr
> 172.31.2.3:1023
> [  193.612039] __nvme_rdma_init_request: changing called
> [  193.638723] __nvme_rdma_init_request: changing called
> [  193.661767] __nvme_rdma_init_request: changing called
> [  193.684579] __nvme_rdma_init_request: changing called
> [  193.707327] __nvme_rdma_init_request: changing called
> [  193.730071] __nvme_rdma_init_request: changing called
> [  193.752896] __nvme_rdma_init_request: changing called
> [  193.775699] __nvme_rdma_init_request: changing called
> [  193.798813] __nvme_rdma_init_request: changing called
> [  193.821257] __nvme_rdma_init_request: changing called
> [  193.844090] __nvme_rdma_init_request: changing called
> [  193.866472] __nvme_rdma_init_request: changing called
> [  193.889375] __nvme_rdma_init_request: changing called
> [  193.912094] __nvme_rdma_init_request: changing called
> [  193.934942] __nvme_rdma_init_request: changing called
> [  193.957688] __nvme_rdma_init_request: changing called
> [  606.273376] Broke affinity for irq 16
> [  606.291940] Broke affinity for irq 28
> [  606.310201] Broke affinity for irq 90
> [  606.328211] Broke affinity for irq 93
> [  606.346263] Broke affinity for irq 97
> [  606.364314] Broke affinity for irq 100
> [  606.382105] Broke affinity for irq 104
> [  606.400727] smpboot: CPU 1 is now offline
> [  616.820505] nvme nvme0: reconnecting in 10 seconds
> [  626.882747] blk_mq_reinit_tagset: tag is null, continue
> [  626.914000] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [  626.947965] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [  626.974673] nvme nvme0: Failed reconnect attempt, requeueing...

This is strange...

Is the target alive? I'm assuming it didn't crash here correct?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-13  8:09                         ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-13  8:09 UTC (permalink / raw)



>> Yep... looks like we don't take into account that we can't use all the
>> queues now...
>>
>> Does this patch help:
> Still can reproduce the reconnect in 10 seconds issues with the patch,
> here is the log:
>
> [  193.574183] nvme nvme0: new ctrl: NQN "nvme-subsystem-name", addr
> 172.31.2.3:1023
> [  193.612039] __nvme_rdma_init_request: changing called
> [  193.638723] __nvme_rdma_init_request: changing called
> [  193.661767] __nvme_rdma_init_request: changing called
> [  193.684579] __nvme_rdma_init_request: changing called
> [  193.707327] __nvme_rdma_init_request: changing called
> [  193.730071] __nvme_rdma_init_request: changing called
> [  193.752896] __nvme_rdma_init_request: changing called
> [  193.775699] __nvme_rdma_init_request: changing called
> [  193.798813] __nvme_rdma_init_request: changing called
> [  193.821257] __nvme_rdma_init_request: changing called
> [  193.844090] __nvme_rdma_init_request: changing called
> [  193.866472] __nvme_rdma_init_request: changing called
> [  193.889375] __nvme_rdma_init_request: changing called
> [  193.912094] __nvme_rdma_init_request: changing called
> [  193.934942] __nvme_rdma_init_request: changing called
> [  193.957688] __nvme_rdma_init_request: changing called
> [  606.273376] Broke affinity for irq 16
> [  606.291940] Broke affinity for irq 28
> [  606.310201] Broke affinity for irq 90
> [  606.328211] Broke affinity for irq 93
> [  606.346263] Broke affinity for irq 97
> [  606.364314] Broke affinity for irq 100
> [  606.382105] Broke affinity for irq 104
> [  606.400727] smpboot: CPU 1 is now offline
> [  616.820505] nvme nvme0: reconnecting in 10 seconds
> [  626.882747] blk_mq_reinit_tagset: tag is null, continue
> [  626.914000] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [  626.947965] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [  626.974673] nvme nvme0: Failed reconnect attempt, requeueing...

This is strange...

Is the target alive? I'm assuming it didn't crash here correct?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-13  8:09                         ` Sagi Grimberg
@ 2017-03-14 13:27                             ` Yi Zhang
  -1 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-14 13:27 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g



On 03/13/2017 04:09 PM, Sagi Grimberg wrote:
>
>>> Yep... looks like we don't take into account that we can't use all the
>>> queues now...
>>>
>>> Does this patch help:
>> Still can reproduce the reconnect in 10 seconds issues with the patch,
>> here is the log:
>>
>> [  193.574183] nvme nvme0: new ctrl: NQN "nvme-subsystem-name", addr
>> 172.31.2.3:1023
>> [  193.612039] __nvme_rdma_init_request: changing called
>> [  193.638723] __nvme_rdma_init_request: changing called
>> [  193.661767] __nvme_rdma_init_request: changing called
>> [  193.684579] __nvme_rdma_init_request: changing called
>> [  193.707327] __nvme_rdma_init_request: changing called
>> [  193.730071] __nvme_rdma_init_request: changing called
>> [  193.752896] __nvme_rdma_init_request: changing called
>> [  193.775699] __nvme_rdma_init_request: changing called
>> [  193.798813] __nvme_rdma_init_request: changing called
>> [  193.821257] __nvme_rdma_init_request: changing called
>> [  193.844090] __nvme_rdma_init_request: changing called
>> [  193.866472] __nvme_rdma_init_request: changing called
>> [  193.889375] __nvme_rdma_init_request: changing called
>> [  193.912094] __nvme_rdma_init_request: changing called
>> [  193.934942] __nvme_rdma_init_request: changing called
>> [  193.957688] __nvme_rdma_init_request: changing called
>> [  606.273376] Broke affinity for irq 16
>> [  606.291940] Broke affinity for irq 28
>> [  606.310201] Broke affinity for irq 90
>> [  606.328211] Broke affinity for irq 93
>> [  606.346263] Broke affinity for irq 97
>> [  606.364314] Broke affinity for irq 100
>> [  606.382105] Broke affinity for irq 104
>> [  606.400727] smpboot: CPU 1 is now offline
>> [  616.820505] nvme nvme0: reconnecting in 10 seconds
>> [  626.882747] blk_mq_reinit_tagset: tag is null, continue
>> [  626.914000] nvme nvme0: Connect rejected: status 8 (invalid 
>> service ID).
>> [  626.947965] nvme nvme0: rdma_resolve_addr wait failed (-104).
>> [  626.974673] nvme nvme0: Failed reconnect attempt, requeueing...
>
> This is strange...
>
> Is the target alive? I'm assuming it didn't crash here correct?
The target was deleted by 'nvmetcli clear' command.
Then on client side, seems it doesn't know the target side was deleted 
and will always reconnecting in 10 seconds.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-14 13:27                             ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-14 13:27 UTC (permalink / raw)




On 03/13/2017 04:09 PM, Sagi Grimberg wrote:
>
>>> Yep... looks like we don't take into account that we can't use all the
>>> queues now...
>>>
>>> Does this patch help:
>> Still can reproduce the reconnect in 10 seconds issues with the patch,
>> here is the log:
>>
>> [  193.574183] nvme nvme0: new ctrl: NQN "nvme-subsystem-name", addr
>> 172.31.2.3:1023
>> [  193.612039] __nvme_rdma_init_request: changing called
>> [  193.638723] __nvme_rdma_init_request: changing called
>> [  193.661767] __nvme_rdma_init_request: changing called
>> [  193.684579] __nvme_rdma_init_request: changing called
>> [  193.707327] __nvme_rdma_init_request: changing called
>> [  193.730071] __nvme_rdma_init_request: changing called
>> [  193.752896] __nvme_rdma_init_request: changing called
>> [  193.775699] __nvme_rdma_init_request: changing called
>> [  193.798813] __nvme_rdma_init_request: changing called
>> [  193.821257] __nvme_rdma_init_request: changing called
>> [  193.844090] __nvme_rdma_init_request: changing called
>> [  193.866472] __nvme_rdma_init_request: changing called
>> [  193.889375] __nvme_rdma_init_request: changing called
>> [  193.912094] __nvme_rdma_init_request: changing called
>> [  193.934942] __nvme_rdma_init_request: changing called
>> [  193.957688] __nvme_rdma_init_request: changing called
>> [  606.273376] Broke affinity for irq 16
>> [  606.291940] Broke affinity for irq 28
>> [  606.310201] Broke affinity for irq 90
>> [  606.328211] Broke affinity for irq 93
>> [  606.346263] Broke affinity for irq 97
>> [  606.364314] Broke affinity for irq 100
>> [  606.382105] Broke affinity for irq 104
>> [  606.400727] smpboot: CPU 1 is now offline
>> [  616.820505] nvme nvme0: reconnecting in 10 seconds
>> [  626.882747] blk_mq_reinit_tagset: tag is null, continue
>> [  626.914000] nvme nvme0: Connect rejected: status 8 (invalid 
>> service ID).
>> [  626.947965] nvme nvme0: rdma_resolve_addr wait failed (-104).
>> [  626.974673] nvme nvme0: Failed reconnect attempt, requeueing...
>
> This is strange...
>
> Is the target alive? I'm assuming it didn't crash here correct?
The target was deleted by 'nvmetcli clear' command.
Then on client side, seems it doesn't know the target side was deleted 
and will always reconnecting in 10 seconds.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-14 13:27                             ` Yi Zhang
@ 2017-03-16 16:40                                 ` Sagi Grimberg
  -1 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-16 16:40 UTC (permalink / raw)
  To: Yi Zhang, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g


> The target was deleted by 'nvmetcli clear' command.
> Then on client side, seems it doesn't know the target side was deleted
> and will always reconnecting in 10 seconds.

Oh, so the target doesn't come back. That makes sense. The host
doesn't know if/when the target will come back so it attempts reconnect
periodically forever.

I think what you're asking is a "dev_loss_tmo" kind of functionality
where the host gives up on the controller correct?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-16 16:40                                 ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2017-03-16 16:40 UTC (permalink / raw)



> The target was deleted by 'nvmetcli clear' command.
> Then on client side, seems it doesn't know the target side was deleted
> and will always reconnecting in 10 seconds.

Oh, so the target doesn't come back. That makes sense. The host
doesn't know if/when the target will come back so it attempts reconnect
periodically forever.

I think what you're asking is a "dev_loss_tmo" kind of functionality
where the host gives up on the controller correct?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
  2017-03-16 16:40                                 ` Sagi Grimberg
@ 2017-03-18 12:06                                     ` Yi Zhang
  -1 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-18 12:06 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: hch-jcswGhMUV9g

On 03/17/2017 12:40 AM, Sagi Grimberg wrote:
>
>> The target was deleted by 'nvmetcli clear' command.
>> Then on client side, seems it doesn't know the target side was deleted
>> and will always reconnecting in 10 seconds.
>
> Oh, so the target doesn't come back. That makes sense. The host
> doesn't know if/when the target will come back so it attempts reconnect
> periodically forever.
>
> I think what you're asking is a "dev_loss_tmo" kind of functionality
> where the host gives up on the controller correct?
Hi Sagi
Yes, since the target was deleted, but the client doesn't realized it 
and always reconnect it.
I think it's better to stop the reconnect operation. Or any other good idea?
Since I'm a newbie and not familiar with for nvme-of, correct if my 
thought not reasonable. :)

Thanks
Yi

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side
@ 2017-03-18 12:06                                     ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2017-03-18 12:06 UTC (permalink / raw)


On 03/17/2017 12:40 AM, Sagi Grimberg wrote:
>
>> The target was deleted by 'nvmetcli clear' command.
>> Then on client side, seems it doesn't know the target side was deleted
>> and will always reconnecting in 10 seconds.
>
> Oh, so the target doesn't come back. That makes sense. The host
> doesn't know if/when the target will come back so it attempts reconnect
> periodically forever.
>
> I think what you're asking is a "dev_loss_tmo" kind of functionality
> where the host gives up on the controller correct?
Hi Sagi
Yes, since the target was deleted, but the client doesn't realized it 
and always reconnect it.
I think it's better to stop the reconnect operation. Or any other good idea?
Since I'm a newbie and not familiar with for nvme-of, correct if my 
thought not reasonable. :)

Thanks
Yi

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-18 12:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1832491330.31443919.1488709276951.JavaMail.zimbra@redhat.com>
     [not found] ` <1832491330.31443919.1488709276951.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-05 13:39   ` kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side Yi Zhang
2017-03-05 13:39     ` Yi Zhang
     [not found]     ` <1053522223.31446389.1488721184925.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-06 11:25       ` Sagi Grimberg
2017-03-06 11:25         ` Sagi Grimberg
     [not found]         ` <644fc4ab-df6b-a337-1431-bad881ef56ee-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-09  4:02           ` Yi Zhang
2017-03-09  4:02             ` Yi Zhang
     [not found]             ` <88ae146a-7510-9be0-c9b4-58e70f9d73b9-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-09 11:23               ` Sagi Grimberg
2017-03-09 11:23                 ` Sagi Grimberg
     [not found]                 ` <6ffda302-02f9-12f0-a112-ea7cd20b9ffa-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-10  7:59                   ` Yi Zhang
2017-03-10  7:59                     ` Yi Zhang
     [not found]                     ` <b0a84bcc-7dca-d342-b30e-b01eba8088cd-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-13  8:09                       ` Sagi Grimberg
2017-03-13  8:09                         ` Sagi Grimberg
     [not found]                         ` <6fe6d285-3cb4-c88c-9a7c-741fce54120c-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-14 13:27                           ` Yi Zhang
2017-03-14 13:27                             ` Yi Zhang
     [not found]                             ` <7a955472-1975-1b73-c88c-367576a56884-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-16 16:40                               ` Sagi Grimberg
2017-03-16 16:40                                 ` Sagi Grimberg
     [not found]                                 ` <bbb45ff5-8b61-2508-df4a-7c90eb6637de-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-18 12:06                                   ` Yi Zhang
2017-03-18 12:06                                     ` Yi Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.