On 09/11/2019 17:23, Jens Axboe wrote:
> On 11/9/19 4:16 AM, Pavel Begunkov wrote:
>>> I've been struggling a bit with how to make this reliable, and I'm not
>>> so sure there's a way to do that. Let's say an application sets up a
>>> ring with 8 sq entries, which would then default to 16 cq entries. With
>>> this patch, we'd allow 16 ios inflight. But what if the application does
>>>
>>> for (i = 0; i < 32; i++) {
>>> 	sqe = get_sqe();
>>> 	prep_sqe();
>>> 	submit_sqe();
>>> }
>>>
>>> And then directly proceeds to:
>>>
>>> do {
>>> 	get_completions();
>>> } while (has_completions);
>>>
>>> As long as fewer than 16 requests complete before we start reaping,
>>> we don't lose any events. Hence there's a risk of breaking existing
>>> setups with this, even though I don't think that's a high risk.
>>>
>>
>> I think, this should be considered as an erroneous usage of the API.
>> It's better to fail ASAP than to be surprised in a production
>> system, because of non-deterministic nature of such code. Even worse
>> with trying to debug such stuff.
>>
>> As for me, cases like below are too far-fetched
>>
>> for (i = 0; i < n; i++)
>> 	submit_read_sqe()
>> for (i = 0; i < n; i++) {
>> 	device_allow_next_read()
>> 	get_single_cqe()
>> }
> 
> I can't really disagree with that, it's a use case that's bound to fail
> every now and then...
> 
> But if we agree that's the case, then we should be able to just limit
> based on the cq ring size in question.
> 
> Do we make it different fro CQ_NODROP and !CQ_NODROP or not? Because the
> above case would work with CQ_NODROP, reliably. At least CQ_NODROP is
> new so we get to set the rules for that one, they just have to make
> sense.
> 
...would work reliably... and also would let user know about
stalling (-EBUSY). I thinks it's a good idea to always do
CQ_NODROP, as in the backlogged patch update.

-- 
Pavel Begunkov