All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
@ 2011-03-09  4:26 Jason Gunthorpe
       [not found] ` <20110309042613.GA21606-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2011-03-09  4:26 UTC (permalink / raw)
  To: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

If a SQ and a RQ are sharing a CQ then the opcode must be set to
determine if the error WC applies to the SQ or RQ, this is important
for buffer tracking, etc.

Testing shows that the is_send value is correct at this point so if
the chip does not provide an accurate opcode the default statements
will produce IBV_WC_RECV for RQ WC's and IBV_WC_SEND for SQ WC's.

Tested with a UD QP causing 'local length error' on both the RQ
and SQ.

Tested with a RC QP causing 'local length error' on the SQ and RQ,
as well as 'remote invalid request error' and
'Work Request Flushed Error'

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 src/cq.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

Roland: I don't have a PRM to check if this is correct for the chip,
but it is definately in line with what the IBA expects to happen
here. Some basic testing shows it works as expected..

For the RQ case the value of (cqe->owner_sr_opcode &
MLX4_CQE_OPCODE_MASK) is MLX4_CQE_OPCODE_ERROR, the SQ case
doesn't hit the default statement in my tests.

I noticed this while trying to figure out what to do with a 
'local length error' received on a UD RQ which is not specified
to be possible. Since it does not put the RQ into an error state
it just need to be ignored and the buffer recycled, except you can't
tell that it is a RQ local length error or a SQ local length error
without the opcode being set properly...

Same general patch applies to the kernel, and I didn't check other
drivers.

diff --git a/src/cq.c b/src/cq.c
index 8226b6b..c920844 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -253,13 +253,6 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 		++wq->tail;
 	}
 
-	if (is_error) {
-		mlx4_handle_error_cqe((struct mlx4_err_cqe *) cqe, wc);
-		return CQ_OK;
-	}
-
-	wc->status = IBV_WC_SUCCESS;
-
 	if (is_send) {
 		wc->wc_flags = 0;
 		switch (cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) {
@@ -311,6 +304,10 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 			wc->wc_flags = IBV_WC_WITH_IMM;
 			wc->imm_data = cqe->immed_rss_invalid;
 			break;
+		default:
+			/* assume it's a recv completion */
+			wc->opcode    = IBV_WC_RECV;
+			break;
 		}
 
 		wc->slid	   = ntohs(cqe->rlid);
@@ -322,6 +319,11 @@ static int mlx4_poll_one(struct mlx4_cq *cq,
 		wc->pkey_index     = ntohl(cqe->immed_rss_invalid) & 0x7f;
 	}
 
+	if (is_error)
+		mlx4_handle_error_cqe((struct mlx4_err_cqe *) cqe, wc);
+	else
+		wc->status = IBV_WC_SUCCESS;
+
 	return CQ_OK;
 }
 
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found] ` <20110309042613.GA21606-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2011-03-09  6:28   ` Or Gerlitz
       [not found]     ` <4D771E0F.7040402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2011-03-09  6:28 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 3/9/2011 6:26 AM, Jason Gunthorpe wrote:
> Roland: I don't have a PRM to check if this is correct for the chip, but
> it is definately in line with what the IBA expects to happen here.
Hi Jason, I've been taught that by IBTA if the completion isn't 
successful then the only valid WC field is the opcode, isn't that correct?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]     ` <4D771E0F.7040402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2011-03-09  7:04       ` Jason Gunthorpe
       [not found]         ` <20110309070441.GA25213-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2011-03-09  7:04 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 09, 2011 at 08:28:31AM +0200, Or Gerlitz wrote:
> On 3/9/2011 6:26 AM, Jason Gunthorpe wrote:
> >Roland: I don't have a PRM to check if this is correct for the chip, but
> >it is definately in line with what the IBA expects to happen here.

> Hi Jason, I've been taught that by IBTA if the completion isn't
> successful then the only valid WC field is the opcode, isn't that
> correct?

Did you mean wr_id not opcode?

I'd say that 11.4.2.1 supports the view that wr_id and status are the
only valid fields. However, the whole error handling architecture that
the WC's fit into is based around the idea that you can go from an
error WC back to the RQ/SQ that caused the error, correct the
situation and resume operation. That requires the opcode indicate at
least SEND vs RECV, and that qp_num be valid.

Frankly, it makes no sense that only wr_id is valid. The wr_id was
taken from a RQ/SQ, so qp_num and opcode must be knowable.

mlx4 HW can do this right now, it looks to me like QIB does it
already, donno about mthca. I'd say even if you have the view that
IBTA says it is not portable, having the information come out is still
very useful.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]         ` <20110309070441.GA25213-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2011-03-09  7:37           ` Or Gerlitz
       [not found]             ` <4D772E36.90103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2011-03-09 11:14           ` Bart Van Assche
  1 sibling, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2011-03-09  7:37 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Jason Gunthorpe wrote:
> Did you mean wr_id not opcode?

yes, sure, I was referring to the wr_id (cookie) and status.

> I'd say that 11.4.2.1 supports the view that wr_id and status are the
> only valid fields. However, the whole error handling architecture that
> the WC's fit into is based around the idea that you can go from an
> error WC back to the RQ/SQ that caused the error, correct the
> situation and resume operation. That requires the opcode indicate at
> least SEND vs RECV, and that qp_num be valid

Indeed, the qp number is required for supporting SRQ, since in that 
case, many sessions (QPs) share the same RQ buffer pool. As for the 
opcode, I understand what you're saying, but, still - given the cookie 
and the qp number the application can realize whether is was send or 
receive that failed, and with recording of the actual send opcode 
(send/rdma-read/write/etc) in the buffer pointed by the cookie further 
debug this. Ofcourse your approach would make debugging easier. The 
question is how to educate people that write to IB, should we tell them 
now they can look on the opcode? so far I was telling people never do so 
if the status isn't success.

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]         ` <20110309070441.GA25213-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2011-03-09  7:37           ` Or Gerlitz
@ 2011-03-09 11:14           ` Bart Van Assche
       [not found]             ` <AANLkTinsu1xCJr0ROfjGABWg6yHZit27eSFPZgd7GFF2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2011-03-09 11:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 9, 2011 at 8:04 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>
> On Wed, Mar 09, 2011 at 08:28:31AM +0200, Or Gerlitz wrote:
> > On 3/9/2011 6:26 AM, Jason Gunthorpe wrote:
> > >Roland: I don't have a PRM to check if this is correct for the chip, but
> > >it is definately in line with what the IBA expects to happen here.
>
> > Hi Jason, I've been taught that by IBTA if the completion isn't
> > successful then the only valid WC field is the opcode, isn't that
> > correct?
>
> Did you mean wr_id not opcode?
>
> I'd say that 11.4.2.1 supports the view that wr_id and status are the
> only valid fields. However, the whole error handling architecture that
> the WC's fit into is based around the idea that you can go from an
> error WC back to the RQ/SQ that caused the error, correct the
> situation and resume operation. That requires the opcode indicate at
> least SEND vs RECV, and that qp_num be valid.
>
> Frankly, it makes no sense that only wr_id is valid. The wr_id was
> taken from a RQ/SQ, so qp_num and opcode must be knowable.
>
> mlx4 HW can do this right now, it looks to me like QIB does it
> already, donno about mthca. I'd say even if you have the view that
> IBTA says it is not portable, having the information come out is still
> very useful.

If the mlx4 hardware can do this right now then that means that the
mlx4 driver does not pass on the opcode for error completions. The
approach taken in the recently posted ib_srpt code is to encode the
opcode in the wr_id. See also the encode_wr_id(), opcode_from_wr_id()
and idx_from_wr_id() functions.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]             ` <4D772E36.90103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2011-03-09 11:31               ` Bart Van Assche
  2011-03-09 17:40               ` Jason Gunthorpe
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2011-03-09 11:31 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jason Gunthorpe, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 9, 2011 at 8:37 AM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> The question is
> how to educate people that write to IB, should we tell them now they can
> look on the opcode? so far I was telling people never do so if the status
> isn't success.

And with good reason: I know at least one HCA / driver combination for
which the opcode in error completions can differ from the opcode in
the corresponding work request. As mentioned before this behavior is
compliant with the IBTA specs.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]             ` <4D772E36.90103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2011-03-09 11:31               ` Bart Van Assche
@ 2011-03-09 17:40               ` Jason Gunthorpe
       [not found]                 ` <20110309174044.GO22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2011-03-09 17:40 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 09, 2011 at 09:37:26AM +0200, Or Gerlitz wrote:

> >I'd say that 11.4.2.1 supports the view that wr_id and status are the
> >only valid fields. However, the whole error handling architecture that
> >the WC's fit into is based around the idea that you can go from an
> >error WC back to the RQ/SQ that caused the error, correct the
> >situation and resume operation. That requires the opcode indicate at
> >least SEND vs RECV, and that qp_num be valid
> 
> Indeed, the qp number is required for supporting SRQ, since in that
> case, many sessions (QPs) share the same RQ buffer pool. 

IBA says qp_num is also not available, you have to code that into
wr_id as well, which pretty much means wr_id has point to allocated
memory.

> As for the opcode, I understand what you're saying, but, still -
> given the cookie and the qp number the application can realize
> whether is was send or receive that failed, and with recording of
> the actual send opcode (send/rdma-read/write/etc) in the buffer
> pointed by the cookie further debug this. 

Right, but why should apps have to do this extra work of allocating
more memory, etc, etc, just to keep track of something the HCA already
keeps track of?

> Ofcourse your approach would make debugging easier. The question is
> how to educate people that write to IB, should we tell them now they
> can look on the opcode? so far I was telling people never do so if
> the status isn't success.

Well, clearly today you have to say that only wr_id and status are
valid and everything else is garbage, because that is what mlx4's
drivers do.

If we fix mlx4 and I'm correct then qib works, then you can go on to
say that 90% of the cards do correctly set opcode and qp_num, however
relying on it is not portable.

If mthca and ehca can be made to work in the same way then we go on to
say that OFA requires that all IB drivers work this way and propose
that IBA be updated.

Like any other non-portable construct using it or not depends on what
you are trying to accomplish and how much performance you gain. But in
any event, having the information is better for diagnostics so I think
the patch should be applied :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]             ` <AANLkTinsu1xCJr0ROfjGABWg6yHZit27eSFPZgd7GFF2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-03-09 17:42               ` Jason Gunthorpe
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2011-03-09 17:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Or Gerlitz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 09, 2011 at 12:14:48PM +0100, Bart Van Assche wrote:

> If the mlx4 hardware can do this right now then that means that the
> mlx4 driver does not pass on the opcode for error completions. 

Right, my patch fixes that for userspace, a kernel change would be
very similer, and it lookes like mtcha is basically the same too, if
the chip firwmare operate similarly.

The opcode that is passed on may not be 100% accurate, but it will
reflect the IBV_WC_RECV bit correctly.

> The approach taken in the recently posted ib_srpt code is to encode
> the opcode in the wr_id. See also the encode_wr_id(),
> opcode_from_wr_id() and idx_from_wr_id() functions.

Right, this is definitely required today.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc
       [not found]                 ` <20110309174044.GO22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2011-03-09 19:09                   ` Bart Van Assche
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2011-03-09 19:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 9, 2011 at 6:40 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> [ ... ]
> Like any other non-portable construct using it or not depends on what
> you are trying to accomplish and how much performance you gain. But in
> any event, having the information is better for diagnostics so I think
> the patch should be applied :)

Even if your patch would be applied, in portable code it wouldn't be
possible to make any assumptions about the validity of the opcode and
qp_num fields in error completions for a long time. Which is
unfortunate.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-03-09 19:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-09  4:26 [PATCH/libmlx] Set the ibv_wc.opcode even if the wc is an error wc Jason Gunthorpe
     [not found] ` <20110309042613.GA21606-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-03-09  6:28   ` Or Gerlitz
     [not found]     ` <4D771E0F.7040402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-03-09  7:04       ` Jason Gunthorpe
     [not found]         ` <20110309070441.GA25213-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-03-09  7:37           ` Or Gerlitz
     [not found]             ` <4D772E36.90103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-03-09 11:31               ` Bart Van Assche
2011-03-09 17:40               ` Jason Gunthorpe
     [not found]                 ` <20110309174044.GO22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-03-09 19:09                   ` Bart Van Assche
2011-03-09 11:14           ` Bart Van Assche
     [not found]             ` <AANLkTinsu1xCJr0ROfjGABWg6yHZit27eSFPZgd7GFF2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-03-09 17:42               ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.