All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvme/rdma initiator stuck on reboot
Date: Thu, 18 Aug 2016 13:50:42 -0500	[thread overview]
Message-ID: <027d01d1f981$6c19e9b0$444dbd10$@opengridcomputing.com> (raw)
In-Reply-To: <01ee01d1f97a$4406d5c0$cc148140$@opengridcomputing.com>

> > Btw, in that case the patch is not actually correct, as even workqueue
> > with a higher concurrency level MAY deadlock under enough memory
> > pressure.  We'll need separate workqueues to handle this case I think.
> >
> > > Yes?  And the
> > > reconnect worker was never completing?  Why is that?  Here are a few
tidbits
> > > about iWARP connections:  address resolution == neighbor discovery.  So if
> the
> > > neighbor is unreachable, it will take a few seconds for the OS to give up
> and
> > > fail the resolution.  If the neigh entry is valid and the peer becomes
> > > unreachable during connection setup, it might take 60 seconds or so for a
> > > connect operation to give up and fail.  So this is probably slowing the
> > > reconnect thread down.   But shouldn't the reconnect thread notice that a
> delete
> > > is trying to happen and bail out?
> >
> > I think we should aim for a state machine that can detect this, but
> > we'll have to see if that will end up in synchronization overkill.
> 
> Looking at the state machine I don't see why the reconnect thread would get
> stuck continually rescheduling once the controller was deleted.  Changing from
> RECONNECTING to DELETING will be done by nvme_change_ctrl_state().  So once
> that
> happens, in __nvme_rdma_del_ctrl() , the thread running reconnect logic should
> stop rescheduling due to this in the failure logic of
> nvme_rdma_reconnect_ctrl_work():
> 
> ...
> requeue:
>         /* Make sure we are not resetting/deleting */
>         if (ctrl->ctrl.state == NVME_CTRL_RECONNECTING) {
>                 dev_info(ctrl->ctrl.device,
>                         "Failed reconnect attempt, requeueing...\n");
>                 queue_delayed_work(nvme_rdma_wq, &ctrl->reconnect_work,
>                                         ctrl->reconnect_delay * HZ);
>         }
> ...
> 
> So something isn't happening like I think it is, I guess.


I see what happens.  Assume the 10 controllers are reconnecting and failing,
thus they reschedule each time.  I then run a script to delete all 10 devices
sequentially.  Like this:

for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1; done

The first device, nvme1n1 gets a disconnect/delete command and changes the
controller state from RECONNECTING to DELETING, and then schedules
nvme_rdma_del_ctrl_work(), but that is stuck behind the 9 others continually
reconnecting, failing, and rescheduling.  I'm not sure why the delete never gets
run though?  I would think if it is scheduled, then it would get executed before
the reconnect threads rescheduling?  Maybe we need some round-robin mode for our
workq?  And because the first delete is stuck, none of the subsequent delete
commands get executed.  Note: If I run each disconnect command in the
background, then they all get cleaned up ok.   Like this:

for i in $(seq 1 10); do nvme disconnect -d nvme${i}n1 & done

  reply	other threads:[~2016-08-18 18:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-16 19:40 nvme/rdma initiator stuck on reboot Steve Wise
2016-08-17 10:23 ` Sagi Grimberg
2016-08-17 14:33   ` Steve Wise
2016-08-17 14:46     ` Sagi Grimberg
2016-08-17 15:13       ` Steve Wise
2016-08-18  7:01         ` Sagi Grimberg
2016-08-18 13:59           ` Steve Wise
2016-08-18 14:47             ` Steve Wise
2016-08-18 15:21             ` 'Christoph Hellwig'
2016-08-18 17:59               ` Steve Wise
2016-08-18 18:50                 ` Steve Wise [this message]
2016-08-18 19:11                   ` Steve Wise
2016-08-19  8:58               ` Sagi Grimberg
2016-08-19 14:22                 ` Steve Wise
     [not found]                 ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>
2016-08-19 14:24                   ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='027d01d1f981$6c19e9b0$444dbd10$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.