All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvmf/rdma host crash during heavy load and keep alive recovery
Date: Thu, 15 Sep 2016 09:44:55 -0500	[thread overview]
Message-ID: <011501d20f5f$b94e6c80$2beb4580$@opengridcomputing.com> (raw)
In-Reply-To: <da2e918b-0f18-e032-272d-368c6ec49c62@grimberg.me>

> > @@ -1408,6 +1412,8 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx
> *hctx,
> >
> >         WARN_ON_ONCE(rq->tag < 0);
> >
> > +       BUG_ON(hctx != queue->hctx);
> > +       BUG_ON(test_bit(BLK_MQ_S_STOPPED, &hctx->state));
> >         dev = queue->device->dev;
> >         ib_dma_sync_single_for_cpu(dev, sqe->dma,
> >                         sizeof(struct nvme_command), DMA_TO_DEVICE);
> > ---
> >
> > When I reran the test forcing reconnects, I hit the BUG_ON(hctx !=
queue->hctx)
> > in nvme_rdma_queue_rq() when doing the first reconnect (not when initially
> > connecting the targets).   Here is the back trace.  Is my debug logic
flawed?
> > Or does this mean something is screwed up once we start reconnecting.
> 
> This is weird indeed.
> 
> The fact that you trigger this means that you successfully reconnect
> correct?
>

The state of the controller is NVME_CTRL_RECONNECTING.  In fact, this BUG_ON()
happened on the reconnect worker thread.   Ah, this is probably the connect
command on the admin queue maybe?


PID: 1819   TASK: ffff88101d0217c0  CPU: 0   COMMAND: "kworker/0:2"
 #0 [ffff8810090d34b0] machine_kexec at ffffffff8105fbd0
 #1 [ffff8810090d3520] __crash_kexec at ffffffff81116998
 #2 [ffff8810090d35f0] crash_kexec at ffffffff81116a6d
 #3 [ffff8810090d3620] oops_end at ffffffff81032bd6
 #4 [ffff8810090d3650] die at ffffffff810330cb
 #5 [ffff8810090d3680] do_trap at ffffffff8102fff1
 #6 [ffff8810090d36e0] do_error_trap at ffffffff8103032d
 #7 [ffff8810090d37a0] do_invalid_op at ffffffff81030480
 #8 [ffff8810090d37b0] invalid_op at ffffffff816e47be
    [exception RIP: nvme_rdma_queue_rq+621]
    RIP: ffffffffa065ce3d  RSP: ffff8810090d3868  RFLAGS: 00010206
    RAX: 0000000000000000  RBX: ffff880e33640000  RCX: dead000000000200
    RDX: ffff8810090d3928  RSI: ffff8810090d38f8  RDI: ffff880e315cb528
    RBP: ffff8810090d38a8   R8: ffff880e33640000   R9: 0000000000000000
    R10: 0000000000000674  R11: ffff8810090d3a18  R12: ffff880e36ab91d0
    R13: ffff880e33640170  R14: ffff880e315cb528  R15: ffff880e36bc1138
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8810090d38b0] __blk_mq_run_hw_queue at ffffffff81338b1b
#10 [ffff8810090d3a00] blk_mq_run_hw_queue at ffffffff81338ffe
#11 [ffff8810090d3a20] blk_mq_insert_request at ffffffff8133a130
#12 [ffff8810090d3a90] blk_execute_rq_nowait at ffffffff813342dd
#13 [ffff8810090d3ad0] blk_execute_rq at ffffffff8133442e
#14 [ffff8810090d3b80] __nvme_submit_sync_cmd at ffffffffa02715d5 [nvme_core]
#15 [ffff8810090d3bd0] nvmf_connect_io_queue at ffffffffa064d134 [nvme_fabrics]
#16 [ffff8810090d3c80] nvme_rdma_reconnect_ctrl_work at ffffffffa065cafb
[nvme_rdma]
#17 [ffff8810090d3cb0] process_one_work at ffffffff810a1613
#18 [ffff8810090d3d90] worker_thread at ffffffff810a22ad
#19 [ffff8810090d3ec0] kthread at ffffffff810a6dec
#20 [ffff8810090d3f50] ret_from_fork at ffffffff816e3bbf

 

  reply	other threads:[~2016-09-15 14:44 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise
2016-08-01 11:06 ` Christoph Hellwig
2016-08-01 14:26   ` Steve Wise
2016-08-01 21:38     ` Steve Wise
     [not found]     ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com>
2016-08-10 15:46       ` Steve Wise
     [not found]       ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com>
2016-08-10 16:00         ` Steve Wise
     [not found]         ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com>
2016-08-10 17:20           ` Steve Wise
2016-08-10 18:59             ` Steve Wise
2016-08-11  6:27               ` Sagi Grimberg
2016-08-11 13:58                 ` Steve Wise
2016-08-11 14:19                   ` Steve Wise
2016-08-11 14:40                   ` Steve Wise
2016-08-11 15:53                     ` Steve Wise
     [not found]                     ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com>
2016-08-15 14:39                       ` Steve Wise
2016-08-16  9:26                         ` Sagi Grimberg
2016-08-16 21:17                           ` Steve Wise
2016-08-17 18:57                             ` Sagi Grimberg
2016-08-17 19:07                               ` Steve Wise
2016-09-01 19:14                                 ` Steve Wise
2016-09-04  9:17                                   ` Sagi Grimberg
2016-09-07 21:08                                     ` Steve Wise
2016-09-08  7:45                                       ` Sagi Grimberg
2016-09-08 20:47                                         ` Steve Wise
2016-09-08 21:00                                         ` Steve Wise
     [not found]                                       ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me>
     [not found]                                         ` <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com>
     [not found]                                           ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com>
2016-09-08 21:21                                             ` Steve Wise
     [not found]                                           ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com>
     [not found]                                             ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com>
2016-09-08 21:37                                             ` Steve Wise
2016-09-09 15:50                                               ` Steve Wise
2016-09-12 20:10                                                 ` Steve Wise
     [not found]                                                   ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me>
2016-09-15  9:53                                                   ` Sagi Grimberg
2016-09-15 14:44                                                     ` Steve Wise [this message]
2016-09-15 15:10                                                       ` Steve Wise
2016-09-15 15:53                                                         ` Steve Wise
2016-09-15 16:45                                                           ` Steve Wise
2016-09-15 20:58                                                             ` Steve Wise
2016-09-16 11:04                                                               ` 'Christoph Hellwig'
2016-09-18 17:02                                                                 ` Sagi Grimberg
2016-09-19 15:38                                                                   ` Steve Wise
2016-09-21 21:20                                                                     ` Steve Wise
2016-09-23 23:57                                                                       ` Sagi Grimberg
2016-09-26 15:12                                                                         ` 'Christoph Hellwig'
2016-09-26 22:29                                                                           ` 'Christoph Hellwig'
2016-09-27 15:11                                                                             ` Steve Wise
2016-09-27 15:31                                                                               ` Steve Wise
2016-09-27 14:07                                                                         ` Steve Wise
2016-09-15 14:00                                                   ` Gabriel Krisman Bertazi
2016-09-15 14:31                                                     ` Steve Wise
2016-09-07 21:33                                     ` Steve Wise
2016-09-08  8:22                                       ` Sagi Grimberg
2016-09-08 17:19                                         ` Steve Wise
2016-09-09 15:57                                           ` Steve Wise
     [not found]                                       ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me>
     [not found]                                         ` <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com>
     [not found]                                           ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com>
2016-09-08 19:15                                             ` Steve Wise
     [not found]                                           ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com>
2016-09-08 19:26                                             ` Steve Wise
     [not found]                                             ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com>
2016-09-08 20:44                                               ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='011501d20f5f$b94e6c80$2beb4580$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.