From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============3702836993898898721==" MIME-Version: 1.0 From: Zhengyu Zhang Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously Date: Thu, 11 Jan 2018 12:08:36 +0800 Message-ID: <5790fcd3-7650-3404-d1f2-11fe232c1bf3@gmail.com> In-Reply-To: 1515617617.6063.79.camel@intel.com List-ID: To: spdk@lists.01.org --===============3702836993898898721== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Thanks guys for all your help! Now I know better about SPDK internals and plan to add status checking code before submitting further IO request. I misunderstood the claim on http://www.spdk.io/doc/blob.html, which says "The blobstore is ... typically in lieu of a traditional filesystem". When it comes to writing code, using blobstore API means we are taking a lot more responsibilities than using a traditional filesystem. All the best! Zhengyu On 1/11/18 4:53 AM, Walker, Benjamin wrote: > On Wed, 2018-01-10 at 19:28 +0000, Andrey Kuzmin wrote: >> On Wed, Jan 10, 2018, 20:17 Walker, Benjamin >> wrote: >>> On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote: >>>> It appears quite logical to start submission with a check for pending >>>> completions, doesn't it? Or check for completions if downstream bdev >>> returns >>>> busy status. That would definitely meet app expectations whatever the >>> request >>>> pool size is. >>> >>> We've considered checking for completions inside the submission path if= we >>> would >>> otherwise return ENOMEM. So far, we've decided not to go that direction= for >>> two >>> reasons. >>> >>> 1) Even if we do this, there are still cases where we'll return ENOMEM.= For >>> instance, if there are no completions to reap yet. >> >> While theoretically possible, such a case is problematic to imagine in >> practice. > = > The user has 512 queue depth available and is submitting I/O in a tight l= oop. > The submission path through the blobstore and into the NVMe driver probab= ly > takes on the order of 500ns to run. That means you can submit your full q= ueue > depth worth in 256us. On many NAND SSDs that's well within P99 latency > expectations for 4KiB I/O, and it gets increasingly likely with larger I/= O to > the point where it is almost guaranteed to happen with 128KiB requests. T= he user > is free to reduce the available queue depth to save memory as well. > = >>> 2) This would result in completion callbacks in response to a submit ca= ll. >>> Today, the expectations are set that completions are called in response= to a >>> poll call only. >> >> Feel free to correct me if I'm wrong, but my recollection is that comple= tion >> callback may be called on submission path in case of error. > = > I just checked and for the nvme and bdev libraries an error code will be = given > to the user as the return code for the function. The callback will not be= called > because the failure is known immediately. For the blobstore library it wo= rks the > opposite way - the functions have no return code and instead always call = the > user callback. I think this is probably a design mistake on my part. For = these > ENOMEM cases, we need to return that to the user as a return code. That m= akes it > much easier to handle the situation and makes it consistent with the other > libraries. > = >> The case in question is, apparently, a corner one as application must ch= eck >> for completions if bdev returns busy status. One cannot run an unlimited= rate >> client atop a rate-limited server w/o a poll enforced at some point. >> >> It might also be helpful to add a parameter to the poll call specifying = the >> minimum number of completions to reap before returning control to the ap= p, to >> deal with deadlocks like this one. > = > There already is a parameter that limits the number of completions reaped= in a > single poll call. Even if you don't specify a limit, the drivers enforce > sensible limits by default. > = >> >> Regards, >> Andrey >> >>> _______________________________________________ >>> SPDK mailing list >>> SPDK(a)lists.01.org >>> https://lists.01.org/mailman/listinfo/spdk >> >> -- = >> Regards, >> Andrey >> _______________________________________________ >> SPDK mailing list >> SPDK(a)lists.01.org >> https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk >=20 --===============3702836993898898721==--