All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
To: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Cc: Shiraz Saleem
	<shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"hch-jcswGhMUV9g@public.gmane.org"
	<hch-jcswGhMUV9g@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-nvme
	<linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>
Subject: Re: Deadlock on device removal event for NVMeF target
Date: Thu, 29 Jun 2017 07:30:17 -0600	[thread overview]
Message-ID: <CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA@mail.gmail.com> (raw)
In-Reply-To: <61858a46-ebf1-a5bd-5213-65dadaadb84d-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

Could something like this be causing the D state problem I was seeing
in iSER almost a year ago? I tried writing a patch for iSER based on
this, but it didn't help. Either the bug is not being triggered in
device removal, or I didn't line up the statuses correctly. But it
seems that things are getting stuck in the work queue and some sort of
deadlock is happening so I was hopeful that something similar may be
in iSER.

Thanks,
Robert
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 28, 2017 at 12:50 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
>
>>> How about the (untested) alternative below:
>>> --
>>> [PATCH] nvmet-rdma: register ib_client to not deadlock in device
>>>    removal
>>>
>>> We can deadlock in case we got to a device removal
>>> event on a queue which is already in the process of
>>> destroying the cm_id is this is blocking until all
>>> events on this cm_id will drain. On the other hand
>>> we cannot guarantee that rdma_destroy_id was invoked
>>> as we only have indication that the queue disconnect
>>> flow has been queued (the queue state is updated before
>>> the realease work has been queued).
>>>
>>> So, we leave all the queue removal to a separate ib_client
>>> to avoid this deadlock as ib_client device removal is in
>>> a different context than the cm_id itself.
>>>
>>> Signed-off-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>>> ---
>>
>>
>> Yes. This patch fixes the problem I am seeing.
>
>
> Awsome,
>
> Adding your Tested-by tag.
>
> Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: robert@leblancnet.us (Robert LeBlanc)
Subject: Deadlock on device removal event for NVMeF target
Date: Thu, 29 Jun 2017 07:30:17 -0600	[thread overview]
Message-ID: <CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA@mail.gmail.com> (raw)
In-Reply-To: <61858a46-ebf1-a5bd-5213-65dadaadb84d@grimberg.me>

Could something like this be causing the D state problem I was seeing
in iSER almost a year ago? I tried writing a patch for iSER based on
this, but it didn't help. Either the bug is not being triggered in
device removal, or I didn't line up the statuses correctly. But it
seems that things are getting stuck in the work queue and some sort of
deadlock is happening so I was hopeful that something similar may be
in iSER.

Thanks,
Robert
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Jun 28, 2017@12:50 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>>> How about the (untested) alternative below:
>>> --
>>> [PATCH] nvmet-rdma: register ib_client to not deadlock in device
>>>    removal
>>>
>>> We can deadlock in case we got to a device removal
>>> event on a queue which is already in the process of
>>> destroying the cm_id is this is blocking until all
>>> events on this cm_id will drain. On the other hand
>>> we cannot guarantee that rdma_destroy_id was invoked
>>> as we only have indication that the queue disconnect
>>> flow has been queued (the queue state is updated before
>>> the realease work has been queued).
>>>
>>> So, we leave all the queue removal to a separate ib_client
>>> to avoid this deadlock as ib_client device removal is in
>>> a different context than the cm_id itself.
>>>
>>> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>>> ---
>>
>>
>> Yes. This patch fixes the problem I am seeing.
>
>
> Awsome,
>
> Adding your Tested-by tag.
>
> Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-06-29 13:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-26 22:59 Deadlock on device removal event for NVMeF target Shiraz Saleem
     [not found] ` <20170626225920.GA11700-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
2017-06-27  6:37   ` Sagi Grimberg
2017-06-27  6:37     ` Sagi Grimberg
     [not found]     ` <56030fcd-b8a0-fc0e-18e5-985ebf16a82e-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-27 19:31       ` Shiraz Saleem
2017-06-27 19:31         ` Shiraz Saleem
     [not found]         ` <20170627193157.GA29768-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
2017-06-28  6:50           ` Sagi Grimberg
2017-06-28  6:50             ` Sagi Grimberg
     [not found]             ` <61858a46-ebf1-a5bd-5213-65dadaadb84d-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-29 13:30               ` Robert LeBlanc [this message]
2017-06-29 13:30                 ` Robert LeBlanc
     [not found]                 ` <CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-29 14:32                   ` Sagi Grimberg
2017-06-29 14:32                     ` Sagi Grimberg
     [not found]                     ` <3e559faf-9ea4-081e-c9cd-cb1c36b4673f-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-29 16:18                       ` Robert LeBlanc
2017-06-29 16:18                         ` Robert LeBlanc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA@mail.gmail.com \
    --to=robert-4jagzrwafwbajfs6igw21g@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org \
    --cc=shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.