All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
To: Robert LeBlanc <robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org>
Cc: Shiraz Saleem
	<shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"hch-jcswGhMUV9g@public.gmane.org"
	<hch-jcswGhMUV9g@public.gmane.org>,
	"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-nvme
	<linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>
Subject: Re: Deadlock on device removal event for NVMeF target
Date: Thu, 29 Jun 2017 17:32:36 +0300	[thread overview]
Message-ID: <3e559faf-9ea4-081e-c9cd-cb1c36b4673f@grimberg.me> (raw)
In-Reply-To: <CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hey Robert,

> Could something like this be causing the D state problem I was seeing
> in iSER almost a year ago?

No, that is a bug in the mlx5 device as far as I'm concerned (although I
couldn't prove it). I've tried to track it down but without access to
the FW tools I can't understand what is going on. I've seen this same
phenomenon with nvmet-rdma before as well.

It looks like when we perform QP draining in the presence of rdma
operations it may not complete, meaning that the zero-length rdma write
never generates a completion. Maybe it has something to do with the qp
moving to error state when some rdma operations have not completed.

> I tried writing a patch for iSER based on
> this, but it didn't help. Either the bug is not being triggered in
> device removal,

It's 100% not related to device removal.

> or I didn't line up the statuses correctly. But it
> seems that things are getting stuck in the work queue and some sort of
> deadlock is happening so I was hopeful that something similar may be
> in iSER.

The hang is the ULP code waiting for QP drain.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: sagi@grimberg.me (Sagi Grimberg)
Subject: Deadlock on device removal event for NVMeF target
Date: Thu, 29 Jun 2017 17:32:36 +0300	[thread overview]
Message-ID: <3e559faf-9ea4-081e-c9cd-cb1c36b4673f@grimberg.me> (raw)
In-Reply-To: <CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA@mail.gmail.com>

Hey Robert,

> Could something like this be causing the D state problem I was seeing
> in iSER almost a year ago?

No, that is a bug in the mlx5 device as far as I'm concerned (although I
couldn't prove it). I've tried to track it down but without access to
the FW tools I can't understand what is going on. I've seen this same
phenomenon with nvmet-rdma before as well.

It looks like when we perform QP draining in the presence of rdma
operations it may not complete, meaning that the zero-length rdma write
never generates a completion. Maybe it has something to do with the qp
moving to error state when some rdma operations have not completed.

> I tried writing a patch for iSER based on
> this, but it didn't help. Either the bug is not being triggered in
> device removal,

It's 100% not related to device removal.

> or I didn't line up the statuses correctly. But it
> seems that things are getting stuck in the work queue and some sort of
> deadlock is happening so I was hopeful that something similar may be
> in iSER.

The hang is the ULP code waiting for QP drain.

  parent reply	other threads:[~2017-06-29 14:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-26 22:59 Deadlock on device removal event for NVMeF target Shiraz Saleem
     [not found] ` <20170626225920.GA11700-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
2017-06-27  6:37   ` Sagi Grimberg
2017-06-27  6:37     ` Sagi Grimberg
     [not found]     ` <56030fcd-b8a0-fc0e-18e5-985ebf16a82e-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-27 19:31       ` Shiraz Saleem
2017-06-27 19:31         ` Shiraz Saleem
     [not found]         ` <20170627193157.GA29768-GOXS9JX10wfOxmVO0tvppfooFf0ArEBIu+b9c/7xato@public.gmane.org>
2017-06-28  6:50           ` Sagi Grimberg
2017-06-28  6:50             ` Sagi Grimberg
     [not found]             ` <61858a46-ebf1-a5bd-5213-65dadaadb84d-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-29 13:30               ` Robert LeBlanc
2017-06-29 13:30                 ` Robert LeBlanc
     [not found]                 ` <CAANLjFr++5daZ6Vn8TYxcM0oMyU4PuMztcM5KKM6mOy7HEs7KA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-29 14:32                   ` Sagi Grimberg [this message]
2017-06-29 14:32                     ` Sagi Grimberg
     [not found]                     ` <3e559faf-9ea4-081e-c9cd-cb1c36b4673f-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-06-29 16:18                       ` Robert LeBlanc
2017-06-29 16:18                         ` Robert LeBlanc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e559faf-9ea4-081e-c9cd-cb1c36b4673f@grimberg.me \
    --to=sagi-nqwnxtmzq1alnmji0ikvqw@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=robert-4JaGZRWAfWbajFs6igw21g@public.gmane.org \
    --cc=shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.