All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events
Date: Mon, 29 Aug 2016 14:42:19 -0500	[thread overview]
Message-ID: <01c101d2022d$740f7680$5c2e6380$@opengridcomputing.com> (raw)
In-Reply-To: <13f597d1-0dd3-9c9e-9658-209f6817600a@grimberg.me>

> >> Care to respin your client registration patch so we can judge which
> >> is better?
> >
> > FYI, I also really hate the idea of having to potentially allocate
> > resources on each device at driver load time which the client registration
> > forces us into.
> 
> The client registration doesn't force us to allocate anything.
> It's simply for us trigger cleanups when the device is unplugged...
> 
> static void nvme_rdma_add_one(struct ib_device *device)
> {
> 	/* Do nothing */
> }
> 
> static void nvme_rdma_remove_one(struct ib_device *device,
> 		void *cdata)
> {
> 	/*
> 	 * for each ctrl where (ctrl->dev->device == device)
> 	 * 	queue delete controller
> 	 *
> 	 * flush the workqueue
> 	 */
> }
> 
> static struct ib_client nvme_rdma_client = {
>          .name   = "nvme_rdma",
>          .add    = nvme_rdma_add_one,
>          .remove = nvme_rdma_remove_one
> };
> 
> 
> > I really think we need to take a step back and offer interfaces that don't
> > suck in the core instead of trying to work around RDMA/CM in the core.
> > Unfortunately I don't really know what it takes for that yet.  I'm pretty
> > busy this work, but I'd be happy to reserve a lot of time next week to
> > dig into it unless someone beats me.
> 
> I agree we have *plenty* of room to improve in the RDMA_CM interface.
> But this particular problem is the fact that we might get a device
> removal right in the moment where we have no cm_id's open because we
> are in the middle of periodic reconnects. This is why we can't even see
> the event.
> 
> What sort of interface that would help here did you have in mind?
> 
> > I suspect a big part of that is having a queue state machine in the core,
> 
> We have a queue-pair state machine in the core, but currently it's not
> very useful for the consumers, and the silly thing is that it's not
> represented in the ib_qp struct and needs a ib_query_qp to figure it
> out (one of the reasons is that the QP states and their transitions
> are detailed in the different specs and not all of them are
> synchronous).
> 
> > and getting rid of that horrible RDMA/CM event multiplexer.
> 
> That would be very nice improvement...
> 

So should I respin the ib_client patch to just do device removal, or am I
wasting my time?

      reply	other threads:[~2016-08-29 19:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-26 13:53 [PATCH WIP/RFC 0/6] nvme-rdma device removal fixes Steve Wise
2016-08-25 20:49 ` [PATCH WIP/RFC 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-08-28 12:42   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-08-28 12:43   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-26 14:38   ` Christoph Hellwig
2016-08-26 14:41     ` Steve Wise
2016-08-28 12:45   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-08-26 14:39   ` Christoph Hellwig
2016-08-26 14:42     ` Steve Wise
2016-08-28 12:44   ` Sagi Grimberg
2016-08-26 13:50 ` [PATCH WIP/RFC 5/6] nvme-rdma: add DELETING queue flag Steve Wise
2016-08-26 14:14   ` Steve Wise
2016-08-28 12:48     ` Sagi Grimberg
2016-08-26 13:52 ` [PATCH WIP/RFC 6/6] nvme-rdma: keep a cm_id around during reconnect to get events Steve Wise
2016-08-26 14:41   ` Christoph Hellwig
2016-08-26 14:48     ` Steve Wise
2016-08-28 12:56   ` Sagi Grimberg
2016-08-29  7:30     ` Christoph Hellwig
2016-08-29 14:32       ` Sagi Grimberg
2016-08-29 19:42         ` Steve Wise [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01c101d2022d$740f7680$5c2e6380$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.