All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvme-fabrics: crash at nvme connect-all
Date: Fri, 10 Jun 2016 11:22:23 -0500	[thread overview]
Message-ID: <020b01d1c334$45077f50$cf167df0$@opengridcomputing.com> (raw)
In-Reply-To: <01c601d1c32a$59576ec0$0c064c40$@opengridcomputing.com>

> > Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations
> > (or whatever value you need to make it happen).  Then on the same machine,
> > export a target device, load nvme-rdma and discover/connect to that target
> > device with nvme.  It will crash.
> >
> > Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm
> > not sure why...
> >
> 
> Previously I was using Doug's rdma rxe branch + sagi's rxe fixes + rebased on nvmf-
> all.2.   To simplify, I have now gone to just straight nvmf-all.2.  Also, I separated the
> host and target to different nodes and reproduced the problem.  It?s the host side
> that is crashing.  Same GPF with RIP:
> 
> RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>]
> get_next_timer_interrupt+0x183/0x210
> 
> Steve.

I enabled lots of kernel memory debugging and now hit this.  Perhaps a clue?  Freeing an active timer list widget?

nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
nvme nvme1: creating 16 I/O queues.
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
------------[ cut here ]------------
WARNING: CPU: 1 PID: 10440 at lib/debugobjects.c:263 debug_print_object+0x8e/0xb0
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E) mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4]
CPU: 1 PID: 10440 Comm: nvme Tainted: G            E   4.7.0-rc2-nvmf-all.2+ #42
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
 0000000000000000 ffff881027a13a18 ffffffff812f032d ffffffff8130e65e
 ffff881027a13a78 ffff881027a13a78 0000000000000000 ffff881027a13a68
 ffffffff8106694d 0000031800000001 000001072aad7ce8 dead000000000200
Call Trace:
 [<ffffffff812f032d>] dump_stack+0x51/0x74
 [<ffffffff8130e65e>] ? debug_print_object+0x8e/0xb0
 [<ffffffff8106694d>] __warn+0xfd/0x120
 [<ffffffff81066a29>] warn_slowpath_fmt+0x49/0x50
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff8130e65e>] debug_print_object+0x8e/0xb0
 [<ffffffff81080850>] ? __queue_work+0x520/0x520
 [<ffffffff8130ecbe>] __debug_check_no_obj_freed+0x1ee/0x270
 [<ffffffff8130ed57>] debug_check_no_obj_freed+0x17/0x20
 [<ffffffff811c3aac>] kfree+0x9c/0x120
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff812f2f3c>] ? kobject_cleanup+0x9c/0x1b0
 [<ffffffffa04cc696>] nvme_rdma_free_ctrl+0xa6/0xc0 [nvme_rdma]
 [<ffffffffa06fcc36>] nvme_free_ctrl+0x46/0x60 [nvme_core]
 [<ffffffffa06feb2b>] nvme_put_ctrl+0x1b/0x20 [nvme_core]
 [<ffffffffa04cf1a2>] nvme_rdma_create_ctrl+0x412/0x4f0 [nvme_rdma]
 [<ffffffffa04c5d02>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics]
 [<ffffffffa04c5e3c>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
 [<ffffffff811d9c24>] __vfs_write+0x34/0x120
 [<ffffffff81002515>] ? trace_event_raw_event_sys_enter+0xb5/0x130
 [<ffffffff811d9dc9>] vfs_write+0xb9/0x130
 [<ffffffff811f9592>] ? __fdget_pos+0x12/0x50
 [<ffffffff811da9b9>] SyS_write+0x59/0xc0
 [<ffffffff81002d6d>] do_syscall_64+0x6d/0x160
 [<ffffffff81642e7c>] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace 7f80ebccfc6bd15d ]---

  reply	other threads:[~2016-06-10 16:22 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
2016-06-09  9:29 ` Sagi Grimberg
2016-06-09 10:07   ` Marta Rybczynska
2016-06-09 11:09     ` Sagi Grimberg
2016-06-09 12:12       ` Marta Rybczynska
2016-06-09 12:30         ` Sagi Grimberg
2016-06-09 13:27           ` Steve Wise
2016-06-09 13:36             ` Steve Wise
2016-06-09 13:48               ` Sagi Grimberg
2016-06-09 14:09                 ` Steve Wise
2016-06-09 14:22                   ` Steve Wise
2016-06-09 14:29                     ` Steve Wise
2016-06-09 15:04                       ` Marta Rybczynska
2016-06-09 15:40                         ` Steve Wise
2016-06-09 15:48                           ` Steve Wise
2016-06-10  9:03                             ` Marta Rybczynska
2016-06-10 13:40                               ` Steve Wise
2016-06-10 13:42                                 ` Marta Rybczynska
2016-06-10 13:49                                   ` Steve Wise
2016-06-09 13:25   ` Christoph Hellwig
2016-06-09 13:24 ` Christoph Hellwig
2016-06-09 15:37   ` Marta Rybczynska
2016-06-09 20:25     ` Steve Wise
2016-06-09 20:35       ` Ming Lin
2016-06-09 21:06         ` Steve Wise
2016-06-09 22:26           ` Ming Lin
2016-06-09 22:40             ` Steve Wise
     [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
2016-06-10 15:11               ` Steve Wise
2016-06-10 16:22                 ` Steve Wise [this message]
2016-06-10 18:43                   ` Ming Lin
2016-06-10 19:17                     ` Steve Wise
2016-06-10 20:00                       ` Ming Lin
2016-06-10 20:15                         ` Steve Wise
2016-06-10 20:18                           ` Ming Lin
2016-06-10 21:14                             ` Steve Wise
2016-06-10 21:20                               ` Ming Lin
2016-06-10 21:25                                 ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020b01d1c334$45077f50$cf167df0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.