Re: [PATCH v3] RDMA/rxe: Generate a completion on error after getting a wqe

From: "yangx.jy@fujitsu.com" <yangx.jy@fujitsu.com>
To: Bob Pearson <rpearsonhpe@gmail.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"jgg@nvidia.com" <jgg@nvidia.com>
Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH v3] RDMA/rxe: Generate a completion on error after getting a wqe
Date: Fri, 8 Apr 2022 03:14:53 +0000	[thread overview]
Message-ID: <81115ff0-a076-7292-8dad-49c9d5666b87@fujitsu.com> (raw)
In-Reply-To: <5c806f62-d258-7860-4b6b-416adc603879@gmail.com>

Hi Bob,

Thanks a lot for your detailed comments.

On 2022/4/8 5:39, Bob Pearson wrote:
> On 3/31/22 07:02, Xiao Yang wrote:
>> Current rxe_requester() doesn't generate a completion on error after
>> getting a wqe. Fix the issue by calling rxe_completer() on error.
>>
>> Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_req.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index ae5fbc79dd5c..01ae400e5481 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -648,26 +648,24 @@ int rxe_requester(void *arg)
>>   		psn_compare(qp->req.psn, (qp->comp.psn +
>>   				RXE_MAX_UNACKED_PSNS)) > 0)) {
>>   		qp->req.wait_psn = 1;
>> -		goto exit;
>> +		goto qp_op_err;
>>   	}
> This isn't an error. What is happening is the requester has advanced too far into the
> work request queue compared to what has been completed. So it sets a flag and exits the
> loop. When the completer finishes another completion it will reschedule the requester.
> RXE_MAX_UNACKED_PSNS is 128 so only 128 packets are allowed in the network without
> seeing an ack.

Got it.

>>   
>>   	/* Limit the number of inflight SKBs per QP */
>>   	if (unlikely(atomic_read(&qp->skb_out) >
>>   		     RXE_INFLIGHT_SKBS_PER_QP_HIGH)) {
>>   		qp->need_req_skb = 1;
>> -		goto exit;
>> +		goto qp_op_err;
>>   	}
> This also is not an error. Here there is a limit on the number SKBs in flight.
> The SKBs are consumed by the ethernet driver when the packet is put on the wire.
> This prevents the driver from using too much memory when the NIC is paused.

Got it.

>>   
>>   	opcode = next_opcode(qp, wqe, wqe->wr.opcode);
>> -	if (unlikely(opcode < 0)) {
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>> -		goto exit;
>> -	}
> This really is an error, but one that shouldn't happen under normal
> circumstances since it implies an illegal operation. Like attempting
> a write on a UD QP. Or causing an invalid opcode sequence.

Yes, this is the issue I try to resolve. Current rxe_requester() doesn't 
generate a completion when processing an unsupported/invalid opcode.
I am trying to introduce new RDMA Atomic Write opcode recently. If rxe 
driver doesn't support a new opcode (e.g. RDMA Atomic Write) and RDMA 
library supports it, an application using the new opcode can reproduce 
the issue.

>> +	if (unlikely(opcode < 0))
>> +		goto qp_op_err;
>>   
>>   	mask = rxe_opcode[opcode].mask;
>>   	if (unlikely(mask & RXE_READ_OR_ATOMIC_MASK)) {
>>   		if (check_init_depth(qp, wqe))
>> -			goto exit;
>> +			goto qp_op_err;
>>   	}
> This isn't an error. It means that someone posted a read/atomic operation
> and there are too many pending read/atomic operations pending. You just need
> to wait for one of them to complete so it is another pause.

Got it.

>>   
>>   	mtu = get_mtu(qp);
>> @@ -706,26 +704,26 @@ int rxe_requester(void *arg)
>>   	av = rxe_get_av(&pkt, &ah);
>>   	if (unlikely(!av)) {
>>   		pr_err("qp#%d Failed no address vector\n", qp_num(qp));
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   		goto err_drop_ah;
>>   	}
> This is an error. It could happen if the address handle referred to by
> the WR is not longer valid. There should always be an address vector but
> there may or may not be an address handle if the AV comes from a connected
> QP or not.

Yes, I think the original logic is enough.

>>   
>>   	skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt);
>>   	if (unlikely(!skb)) {
>>   		pr_err("qp#%d Failed allocating skb\n", qp_num(qp));
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   		goto err_drop_ah;
>>   	}
>>   
>>   	ret = finish_packet(qp, av, wqe, &pkt, skb, payload);
>>   	if (unlikely(ret)) {
>>   		pr_debug("qp#%d Error during finish packet\n", qp_num(qp));
>> +		if (ah)
>> +			rxe_put(ah);
>>   		if (ret == -EFAULT)
>>   			wqe->status = IB_WC_LOC_PROT_ERR;
>>   		else
>>   			wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   		kfree_skb(skb);
>> -		goto err_drop_ah;
>> +		goto err;
>>   	}
> Not sure the point of this.

Just keep the original logic.

>>   
>>   	if (ah)
>> @@ -751,8 +749,7 @@ int rxe_requester(void *arg)
>>   			goto exit;
>>   		}
>>   
>> -		wqe->status = IB_WC_LOC_QP_OP_ERR;
>> -		goto err;
>> +		goto qp_op_err;
>>   	}
>>   
>>   	update_state(qp, wqe, &pkt);
>> @@ -762,6 +759,9 @@ int rxe_requester(void *arg)
>>   err_drop_ah:
>>   	if (ah)
>>   		rxe_put(ah);
>> +
>> +qp_op_err:
>> +	wqe->status = IB_WC_LOC_QP_OP_ERR;
>>   err:
>>   	wqe->state = wqe_state_error;
>>   	__rxe_do_task(&qp->comp.task);
> 
> Near as I can tell all you did was turn a few pauses into errors which is wrong
> and otherwise move things around. What were you trying to solve here?

I will resend a new patch to only fix the unsupported/invalid issue.

Best Regards,
Xiao Yang

> 
> Bob