From mboxrd@z Thu Jan 1 00:00:00 1970 From: swise@opengridcomputing.com (Steve Wise) Date: Fri, 10 Jun 2016 11:22:23 -0500 Subject: nvme-fabrics: crash at nvme connect-all In-Reply-To: <01c601d1c32a$59576ec0$0c064c40$@opengridcomputing.com> References: <53708289.31891804.1465463883806.JavaMail.zimbra@kalray.eu> <20160609132459.GA5105@infradead.org> <1290178000.33062227.1465486654766.JavaMail.zimbra@kalray.eu> <04d301d1c28d$183af7b0$48b0e710$@opengridcomputing.com> <04e301d1c292$d6c34430$8449cc90$@opengridcomputing.com> <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com> <01c601d1c32a$59576ec0$0c064c40$@opengridcomputing.com> Message-ID: <020b01d1c334$45077f50$cf167df0$@opengridcomputing.com> > > Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations > > (or whatever value you need to make it happen). Then on the same machine, > > export a target device, load nvme-rdma and discover/connect to that target > > device with nvme. It will crash. > > > > Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump. I'm > > not sure why... > > > > Previously I was using Doug's rdma rxe branch + sagi's rxe fixes + rebased on nvmf- > all.2. To simplify, I have now gone to just straight nvmf-all.2. Also, I separated the > host and target to different nodes and reproduced the problem. It?s the host side > that is crashing. Same GPF with RIP: > > RIP: 0010:[] [] > get_next_timer_interrupt+0x183/0x210 > > Steve. I enabled lots of kernel memory debugging and now hit this. Perhaps a clue? Freeing an active timer list widget? nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420 nvme nvme1: creating 16 I/O queues. nvme nvme1: Connect rejected, no private data. nvme nvme1: rdma_resolve_addr wait failed (-104). nvme nvme1: failed to initialize i/o queue: -104 ------------[ cut here ]------------ WARNING: CPU: 1 PID: 10440 at lib/debugobjects.c:263 debug_print_object+0x8e/0xb0 ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E) mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4] CPU: 1 PID: 10440 Comm: nvme Tainted: G E 4.7.0-rc2-nvmf-all.2+ #42 Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 0000000000000000 ffff881027a13a18 ffffffff812f032d ffffffff8130e65e ffff881027a13a78 ffff881027a13a78 0000000000000000 ffff881027a13a68 ffffffff8106694d 0000031800000001 000001072aad7ce8 dead000000000200 Call Trace: [] dump_stack+0x51/0x74 [] ? debug_print_object+0x8e/0xb0 [] __warn+0xfd/0x120 [] warn_slowpath_fmt+0x49/0x50 [] ? kfree_const+0x22/0x30 [] debug_print_object+0x8e/0xb0 [] ? __queue_work+0x520/0x520 [] __debug_check_no_obj_freed+0x1ee/0x270 [] debug_check_no_obj_freed+0x17/0x20 [] kfree+0x9c/0x120 [] ? kfree_const+0x22/0x30 [] ? kobject_cleanup+0x9c/0x1b0 [] nvme_rdma_free_ctrl+0xa6/0xc0 [nvme_rdma] [] nvme_free_ctrl+0x46/0x60 [nvme_core] [] nvme_put_ctrl+0x1b/0x20 [nvme_core] [] nvme_rdma_create_ctrl+0x412/0x4f0 [nvme_rdma] [] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics] [] nvmf_dev_write+0xac/0x110 [nvme_fabrics] [] __vfs_write+0x34/0x120 [] ? trace_event_raw_event_sys_enter+0xb5/0x130 [] vfs_write+0xb9/0x130 [] ? __fdget_pos+0x12/0x50 [] SyS_write+0x59/0xc0 [] do_syscall_64+0x6d/0x160 [] entry_SYSCALL64_slow_path+0x25/0x25 ---[ end trace 7f80ebccfc6bd15d ]---