Thanks guys for all your help! Now I know better about SPDK internals and plan to add status checking code before submitting further IO request. I misunderstood the claim on http://www.spdk.io/doc/blob.html, which says "The blobstore is ... typically in lieu of a traditional filesystem". When it comes to writing code, using blobstore API means we are taking a lot more responsibilities than using a traditional filesystem. All the best! Zhengyu On 1/11/18 4:53 AM, Walker, Benjamin wrote: > On Wed, 2018-01-10 at 19:28 +0000, Andrey Kuzmin wrote: >> On Wed, Jan 10, 2018, 20:17 Walker, Benjamin >> wrote: >>> On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote: >>>> It appears quite logical to start submission with a check for pending >>>> completions, doesn't it? Or check for completions if downstream bdev >>> returns >>>> busy status. That would definitely meet app expectations whatever the >>> request >>>> pool size is. >>> >>> We've considered checking for completions inside the submission path if we >>> would >>> otherwise return ENOMEM. So far, we've decided not to go that direction for >>> two >>> reasons. >>> >>> 1) Even if we do this, there are still cases where we'll return ENOMEM. For >>> instance, if there are no completions to reap yet. >> >> While theoretically possible, such a case is problematic to imagine in >> practice. > > The user has 512 queue depth available and is submitting I/O in a tight loop. > The submission path through the blobstore and into the NVMe driver probably > takes on the order of 500ns to run. That means you can submit your full queue > depth worth in 256us. On many NAND SSDs that's well within P99 latency > expectations for 4KiB I/O, and it gets increasingly likely with larger I/O to > the point where it is almost guaranteed to happen with 128KiB requests. The user > is free to reduce the available queue depth to save memory as well. > >>> 2) This would result in completion callbacks in response to a submit call. >>> Today, the expectations are set that completions are called in response to a >>> poll call only. >> >> Feel free to correct me if I'm wrong, but my recollection is that completion >> callback may be called on submission path in case of error. > > I just checked and for the nvme and bdev libraries an error code will be given > to the user as the return code for the function. The callback will not be called > because the failure is known immediately. For the blobstore library it works the > opposite way - the functions have no return code and instead always call the > user callback. I think this is probably a design mistake on my part. For these > ENOMEM cases, we need to return that to the user as a return code. That makes it > much easier to handle the situation and makes it consistent with the other > libraries. > >> The case in question is, apparently, a corner one as application must check >> for completions if bdev returns busy status. One cannot run an unlimited rate >> client atop a rate-limited server w/o a poll enforced at some point. >> >> It might also be helpful to add a parameter to the poll call specifying the >> minimum number of completions to reap before returning control to the app, to >> deal with deadlocks like this one. > > There already is a parameter that limits the number of completions reaped in a > single poll call. Even if you don't specify a limit, the drivers enforce > sensible limits by default. > >> >> Regards, >> Andrey >> >>> _______________________________________________ >>> SPDK mailing list >>> SPDK(a)lists.01.org >>> https://lists.01.org/mailman/listinfo/spdk >> >> -- >> Regards, >> Andrey >> _______________________________________________ >> SPDK mailing list >> SPDK(a)lists.01.org >> https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk >