linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Zhang <markzhang@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>, Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	Patrisious Haddad <phaddad@nvidia.com>,
	Israel Rukshin <israelr@nvidia.com>,
	Linux-nvme <linux-nvme@lists.infradead.org>,
	linux-rdma@vger.kernel.org,
	Michael Guralnik <michaelgur@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state
Date: Wed, 2 Nov 2022 09:56:42 +0800	[thread overview]
Message-ID: <e3d3b592-565d-0cbf-b2c8-7a36947b38f0@nvidia.com> (raw)
In-Reply-To: <681128a0-2e15-c8cb-0adc-1b5ebf57a759@nvidia.com>

On 11/1/2022 5:12 PM, Mark Zhang wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 9/8/2022 1:39 AM, Leon Romanovsky wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Wed, Sep 07, 2022 at 05:18:18PM +0200, Christoph Hellwig wrote:
>>> On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote:
>>>>>>
>>>>>> This entire code needs to move to the rdma core instead
>>>>>> of being leaked to ulps.
>>>>>
>>>>> We can move, but you will lose connection between queue number,
>>>>> caller and error itself.
>>>>
>>>> That still doesn't explain why nvme-rdma is special.
>>>>
>>>> In any event, the ulp can log the qpn so the context can be 
>>>> interrogated
>>>> if that is important.
>>>
>>> I also don't see why the QP event handler can't be called
>>> from user context to start with.  I see absolutely no reason to
>>> add boilerplate code to drivers for reporting slighly more verbose
>>> errors on one specific piece of hrdware.  I'd say clean up the mess
>>> that is the QP event handler first, and then once error reporting
>>> becomes trivial we can just do it.
>>
>> I don't know, Chuck documented it in 2018:
>> eb93c82ed8c7 ("RDMA/core: Document QP @event_handler function")
>>
>>    1164 struct ib_qp_init_attr {
>>    1165         /* Consumer's event_handler callback must not block */
>>    1166         void                  (*event_handler)(struct ib_event 
>> *, void *);
> 
> Looks like driver calls it in an atomic way, e.g.:
>    mlx5_ib_qp_event() -> ibqp->event_handler(&event, ibqp->qp_context);
> 
> Could driver also report it as an IB async event, as WQ_CATAS_ERROR is
> defined as an async event (IB spec C11-39), and QP_FATAL is also an
> event of enum ib_event_type? Is it a problem that one event is reported
> twice?
> 
> If it is acceptable then ulp could register this event handler with
> ib_register_event_handler(), which is non-atomic.

Or move qp event handler to non-atomic as Christoph suggested? This 
means to fix the mlx4/mlx5 driver, to call it in a work queue.

  reply	other threads:[~2022-11-02  1:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 11:37 [PATCH rdma-next 0/4] Provide more error details when a QP moves to Patrisious Haddad
2022-09-07 11:37 ` [PATCH rdma-next 1/4] net/mlx5: Introduce CQE error syndrome Patrisious Haddad
2022-09-07 11:37 ` [PATCH rdma-next 2/4] RDMA/core: Introduce ib_get_qp_err_syndrome function Patrisious Haddad
2022-09-07 11:37 ` [PATCH rdma-next 3/4] RDMA/mlx5: Implement ib_get_qp_err_syndrome Patrisious Haddad
2022-09-07 11:38 ` [PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state Patrisious Haddad
2022-09-07 12:02   ` Christoph Hellwig
2022-09-07 12:11     ` Leon Romanovsky
2022-09-07 12:34   ` Sagi Grimberg
2022-09-07 12:51     ` Leon Romanovsky
2022-09-07 15:16       ` Sagi Grimberg
2022-09-07 15:18         ` Christoph Hellwig
2022-09-07 17:39           ` Leon Romanovsky
2022-11-01  9:12             ` Mark Zhang
2022-11-02  1:56               ` Mark Zhang [this message]
2022-09-08  7:55           ` Patrisious Haddad
2022-09-07 17:29         ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3d3b592-565d-0cbf-b2c8-7a36947b38f0@nvidia.com \
    --to=markzhang@nvidia.com \
    --cc=chuck.lever@oracle.com \
    --cc=hch@lst.de \
    --cc=israelr@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=michaelgur@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).