All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvmf/rdma host crash during heavy load and keep alive recovery
Date: Wed, 7 Sep 2016 16:08:50 -0500	[thread overview]
Message-ID: <039401d2094c$084d64e0$18e82ea0$@opengridcomputing.com> (raw)
In-Reply-To: <0c159abb-24ee-21bf-09d2-9fe7d269a2eb@grimberg.me>

> Hey Steve,
> 
> > Ok, back to this issue. :)
> >
> > The same crash happens with mlx4_ib, so this isn't related to cxgb4.  To sum
up:
> >
> > With pending NVME IO on the nvme-rdma host, and in the presence of kato
> > recovery/reconnect due to the target going away, some NVME requests get
> > restarted that are referencing nvmf controllers that have freed queues.  I
see
> > this also with my recent v4 series that corrects the recovery problems with
> > nvme-rdma when the target is down, but without pending IO.
> >
> > So the crash in this email is yet another issue that we see when the nvme
host
> > has lots of pending IO requests during kato recovery/reconnect...
> >
> > My findings to date:  the IO is not an admin queue IO.  It is not the kato
> > messages.  The io queue has been stopped, yet the request is attempted and
> > causes the crash.
> >
> > Any help is appreciated...
> 
> So in the current state, my impression is that we are seeing a request
> queued when we shouldn't (or at least assume we won't).
> 
> Given that you run heavy load to reproduce this, I can only suspect that
> this is a race condition.
> 
> Does this happen if you change the reconnect delay to be something
> different than 10 seconds? (say 30?)
>

Yes.  But I noticed something when performing this experiment that is an
important point, I think:  if I just bring the network interface down and leave
it down, we don't crash.  During this state, I see the host continually
reconnecting after the reconnect delay time, timing out trying to reconnect, and
retrying after another reconnect_delay period.  I see this for all 10 targets of
course.  The crash only happens when I bring the interface back up, and the
targets begin to reconnect.   So the process of successfully reconnecting the
RDMA QPs, and restarting the nvme queues is somehow triggering running an nvme
request too soon (or perhaps on the wrong queue).   
 
> Can you also give patch [1] a try? It's not a solution, but I want
> to see if it hides the problem...
> 

hmm.  I ran the experiment once with [1] and it didn't crash.  I ran it a 2nd
time and hit a new crash.  Maybe a problem with [1]?

[  379.864950] BUG: unable to handle kernel NULL pointer dereference at
0000000000000024
[  379.874330] IP: [<ffffffffa0307ad1>] ib_destroy_qp+0x21/0x1a0 [ib_core]
[  379.882489] PGD 1002571067 PUD fa121f067 PMD 0
[  379.888561] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  379.894526] Modules linked in: nvme_rdma nvme_fabrics brd iw_cxgb4 cxgb4
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc cachefiles
fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad ocrdma be2net
iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib mlx5_core mlx4_ib
mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror dm_region_hash dm_log
vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt
iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich mfd_core
mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler wmi
ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[  380.005238] CPU: 30 PID: 10981 Comm: kworker/30:2 Tainted: G            E
4.8.0-rc4-block-for-linus-dbg+ #32
[  380.017596] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[  380.027002] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[  380.036664] task: ffff880fd014af40 task.stack: ffff881013b38000
[  380.045082] RIP: 0010:[<ffffffffa0307ad1>]  [<ffffffffa0307ad1>]
ib_destroy_qp+0x21/0x1a0 [ib_core]
[  380.056562] RSP: 0018:ffff881013b3bb88  EFLAGS: 00010286
[  380.064291] RAX: ffff880fd014af40 RBX: ffff880fba6d3a78 RCX: 0000000000000000
[  380.073733] RDX: ffff881037593d20 RSI: ffff88103758dfc8 RDI: 0000000000000000
[  380.083030] RBP: ffff881013b3bbe8 R08: 0000000000000000 R09: 0000000000000000
[  380.092230] R10: ffffffffa0030060 R11: 0000000000000000 R12: ffff880fba6d3c98
[  380.101442] R13: 00000000ffffff98 R14: ffff88101ce78008 R15: ffff88100507c968
[  380.110541] FS:  0000000000000000(0000) GS:ffff881037580000(0000)
knlGS:0000000000000000
[  380.120507] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  380.128048] CR2: 0000000000000024 CR3: 0000000fa12f6000 CR4: 00000000000406e0
[  380.136990] Stack:
[  380.140714]  ffff881013b3bb98 0000000000000246 000003e800000286
ffffffffa0733fe0
[  380.149939]  ffff881013b3bbd8 ffffffffa0274588 ffff881013b3bbe8
ffff880fba6d3a78
[  380.159113]  ffff880fba6d3c98 00000000ffffff98 ffff88101ce78008
ffff88100507c968
[  380.168205] Call Trace:
[  380.172280]  [<ffffffffa0733fe0>] ? cma_work_handler+0xa0/0xa0 [rdma_cm]
[  380.180546]  [<ffffffffa0731611>] rdma_destroy_qp+0x31/0x50 [rdma_cm]
[  380.188483]  [<ffffffffa0661e92>] nvme_rdma_destroy_queue_ib+0x52/0xb0
[nvme_rdma]
[  380.197566]  [<ffffffffa0662678>] nvme_rdma_init_queue+0x128/0x180
[nvme_rdma]
[  380.206264]  [<ffffffffa0662a09>] nvme_rdma_reconnect_ctrl_work+0x79/0x220
[nvme_rdma]
[  380.215606]  [<ffffffff810a15e3>] process_one_work+0x183/0x4d0
[  380.222820]  [<ffffffff816de9f0>] ? __schedule+0x1f0/0x5b0
[  380.229680]  [<ffffffff816deeb0>] ? schedule+0x40/0xb0
[  380.236147]  [<ffffffff810a227d>] worker_thread+0x16d/0x530
[  380.243051]  [<ffffffff810b43d5>] ?
trace_event_raw_event_sched_switch+0xe5/0x130
[  380.251821]  [<ffffffff816de9f0>] ? __schedule+0x1f0/0x5b0
[  380.258600]  [<ffffffff810cba86>] ? __wake_up_common+0x56/0x90
[  380.265676]  [<ffffffff810a2110>] ? maybe_create_worker+0x120/0x120
[  380.273153]  [<ffffffff816deeb0>] ? schedule+0x40/0xb0
[  380.279504]  [<ffffffff810a2110>] ? maybe_create_worker+0x120/0x120
[  380.286993]  [<ffffffff810a6dbc>] kthread+0xcc/0xf0
[  380.293078]  [<ffffffff810b177e>] ? schedule_tail+0x1e/0xc0
[  380.299857]  [<ffffffff816e2b7f>] ret_from_fork+0x1f/0x40

  reply	other threads:[~2016-09-07 21:08 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise
2016-08-01 11:06 ` Christoph Hellwig
2016-08-01 14:26   ` Steve Wise
2016-08-01 21:38     ` Steve Wise
     [not found]     ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com>
2016-08-10 15:46       ` Steve Wise
     [not found]       ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com>
2016-08-10 16:00         ` Steve Wise
     [not found]         ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com>
2016-08-10 17:20           ` Steve Wise
2016-08-10 18:59             ` Steve Wise
2016-08-11  6:27               ` Sagi Grimberg
2016-08-11 13:58                 ` Steve Wise
2016-08-11 14:19                   ` Steve Wise
2016-08-11 14:40                   ` Steve Wise
2016-08-11 15:53                     ` Steve Wise
     [not found]                     ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com>
2016-08-15 14:39                       ` Steve Wise
2016-08-16  9:26                         ` Sagi Grimberg
2016-08-16 21:17                           ` Steve Wise
2016-08-17 18:57                             ` Sagi Grimberg
2016-08-17 19:07                               ` Steve Wise
2016-09-01 19:14                                 ` Steve Wise
2016-09-04  9:17                                   ` Sagi Grimberg
2016-09-07 21:08                                     ` Steve Wise [this message]
2016-09-08  7:45                                       ` Sagi Grimberg
2016-09-08 20:47                                         ` Steve Wise
2016-09-08 21:00                                         ` Steve Wise
     [not found]                                       ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me>
     [not found]                                         ` <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com>
     [not found]                                           ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com>
2016-09-08 21:21                                             ` Steve Wise
     [not found]                                           ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com>
     [not found]                                             ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com>
2016-09-08 21:37                                             ` Steve Wise
2016-09-09 15:50                                               ` Steve Wise
2016-09-12 20:10                                                 ` Steve Wise
     [not found]                                                   ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me>
2016-09-15  9:53                                                   ` Sagi Grimberg
2016-09-15 14:44                                                     ` Steve Wise
2016-09-15 15:10                                                       ` Steve Wise
2016-09-15 15:53                                                         ` Steve Wise
2016-09-15 16:45                                                           ` Steve Wise
2016-09-15 20:58                                                             ` Steve Wise
2016-09-16 11:04                                                               ` 'Christoph Hellwig'
2016-09-18 17:02                                                                 ` Sagi Grimberg
2016-09-19 15:38                                                                   ` Steve Wise
2016-09-21 21:20                                                                     ` Steve Wise
2016-09-23 23:57                                                                       ` Sagi Grimberg
2016-09-26 15:12                                                                         ` 'Christoph Hellwig'
2016-09-26 22:29                                                                           ` 'Christoph Hellwig'
2016-09-27 15:11                                                                             ` Steve Wise
2016-09-27 15:31                                                                               ` Steve Wise
2016-09-27 14:07                                                                         ` Steve Wise
2016-09-15 14:00                                                   ` Gabriel Krisman Bertazi
2016-09-15 14:31                                                     ` Steve Wise
2016-09-07 21:33                                     ` Steve Wise
2016-09-08  8:22                                       ` Sagi Grimberg
2016-09-08 17:19                                         ` Steve Wise
2016-09-09 15:57                                           ` Steve Wise
     [not found]                                       ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me>
     [not found]                                         ` <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com>
     [not found]                                           ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com>
2016-09-08 19:15                                             ` Steve Wise
     [not found]                                           ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com>
2016-09-08 19:26                                             ` Steve Wise
     [not found]                                             ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com>
2016-09-08 20:44                                               ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='039401d2094c$084d64e0$18e82ea0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.