Re: ibv_req_notify_cq clarification

From: Tom Talpey <tom@talpey.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Gal Pressman <galpress@amazon.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: ibv_req_notify_cq clarification
Date: Thu, 18 Feb 2021 18:07:13 -0500	[thread overview]
Message-ID: <4b38c6fa-0a18-9f32-4dce-af8e3e39cb8e@talpey.com> (raw)
In-Reply-To: <20210218225131.GB2643399@ziepe.ca>

On 2/18/2021 5:51 PM, Jason Gunthorpe wrote:
> On Thu, Feb 18, 2021 at 05:22:31PM -0500, Tom Talpey wrote:
>> On 2/18/2021 11:23 AM, Jason Gunthorpe wrote:
>>> On Thu, Feb 18, 2021 at 05:52:16PM +0200, Gal Pressman wrote:
>>>> On 18/02/2021 14:53, Jason Gunthorpe wrote:
>>>>> On Thu, Feb 18, 2021 at 11:13:43AM +0200, Gal Pressman wrote:
>>>>>> I'm a bit confused about the meaning of the ibv_req_notify_cq() verb:
>>>>>> "Upon the addition of a new CQ entry (CQE) to cq, a completion event will be
>>>>>> added to the completion channel associated with the CQ."
>>>>>>
>>>>>> What is considered a new CQE in this case?
>>>>>> The next CQE from the user's perspective, i.e. any new CQE that wasn't consumed
>>>>>> by the user's poll cq?
>>>>>> Or any new CQE from the device's perspective?
>>>>>
>>>>> new CQE from the device perspective.
>>>>>
>>>>>> For example, if at the time of ibv_req_notify_cq() call the CQ has received 100
>>>>>> completions, but the user hasn't polled his CQ yet, when should he be notified?
>>>>>> On the 101 completion or immediately (since there are completions waiting on the
>>>>>> CQ)?
>>>>>
>>>>> 101 completion
>>>>>
>>>>> It is only meaningful to call it when the CQ is empty.
>>>>
>>>> Thanks, so there's an inherent race between the user's CQ poll and the next arm?
>>>
>>> I think the specs or man pages talk about this, the application has to
>>> observe empty, do arm, then poll again then sleep on the cq if empty.
>>>
>>>> Do you know what's the purpose of the consumer index in the arm doorbell that's
>>>> implemented by many providers?
>>>
>>> The consumer index is needed by HW to prevent CQ overflow, presumably
>>> the drivers push to reduce the cases where the HW has to read it from
>>> PCI
>>
>> Prevent CQ overflow? There's no such requirement that I'm aware of.
>> If the consumer doesn't provide a large-enough CQ, then it reaps the
>> consequences. Same thing for WQ depth, although I am aware that some
>> verbs implementations attempt to return a kind of EAGAIN when posting
>> to a send WQ.
>>
>> What can the provider do if the CQ is "full" anyway? Buffer the CQE
>> and go into some type of polling loop attempting to redeliver? Ouch!
> 
> QP goes to error, CQE is discarded, IIRC.

What!? There might be many QP's all sharing the same CQ. Put them
*all* into error? And for what, because the CQ is trash anyway. This
sounds like optimizing the error case. Uselessly.

I don't think any of this is in the verbs API. The architecture
was designed such that posting by the consumer (WQ) and provider (CQ)
do not need to perform any reads at all. It's all about lock-free
writing.

> Wrapping and overflowing the CQ is not acceptable, it would mean
> reading CQEs could never be done reliably.

But the provider never reads the CQ, only the consumer can read.
The provider writes to head, ignoring tail. Consumer reads from
tail, and it goes empty when tail == head. And if head overruns
tail, that was the consumer's fault for posting too many WQEs.

Gal, what providers do you see that have this check?

Tom.