On 09/11/2019 17:23, Jens Axboe wrote: > On 11/9/19 4:16 AM, Pavel Begunkov wrote: >>> I've been struggling a bit with how to make this reliable, and I'm not >>> so sure there's a way to do that. Let's say an application sets up a >>> ring with 8 sq entries, which would then default to 16 cq entries. With >>> this patch, we'd allow 16 ios inflight. But what if the application does >>> >>> for (i = 0; i < 32; i++) { >>> sqe = get_sqe(); >>> prep_sqe(); >>> submit_sqe(); >>> } >>> >>> And then directly proceeds to: >>> >>> do { >>> get_completions(); >>> } while (has_completions); >>> >>> As long as fewer than 16 requests complete before we start reaping, >>> we don't lose any events. Hence there's a risk of breaking existing >>> setups with this, even though I don't think that's a high risk. >>> >> >> I think, this should be considered as an erroneous usage of the API. >> It's better to fail ASAP than to be surprised in a production >> system, because of non-deterministic nature of such code. Even worse >> with trying to debug such stuff. >> >> As for me, cases like below are too far-fetched >> >> for (i = 0; i < n; i++) >> submit_read_sqe() >> for (i = 0; i < n; i++) { >> device_allow_next_read() >> get_single_cqe() >> } > > I can't really disagree with that, it's a use case that's bound to fail > every now and then... > > But if we agree that's the case, then we should be able to just limit > based on the cq ring size in question. > > Do we make it different fro CQ_NODROP and !CQ_NODROP or not? Because the > above case would work with CQ_NODROP, reliably. At least CQ_NODROP is > new so we get to set the rules for that one, they just have to make > sense. > ...would work reliably... and also would let user know about stalling (-EBUSY). I thinks it's a good idea to always do CQ_NODROP, as in the backlogged patch update. -- Pavel Begunkov