Hi Paul and Zhengyu,

The problem is that the app is not giving the block device a chance to complete any I/O while submitting the 520 back-to-back requests.  Blobstore is passive here – it does not do any polling on the block device – that is up to the application.

Technically, with a malloc backend, there is really no polling required since it’s just a memcpy – but the bdev layer defers immediate completions as an event so that bdev API users are ensured they will never get their completion callback invoked in the context of the bdev IO submission.  So in this test case, malloc ends up behaving similarly to an asynchronous block device backend like NVMe.

For NVMe, just giving the app time to poll will not guarantee that completions will occur fast enough to allow more submissions.  The CPU will always be able to submit I/O at a faster rate than the NVMe device can complete them (even for very small I/O).

Increasing the number of channel reqs would work – but at some point these will still run out.  So it really depends on your application – either increase the channel reqs to the absolutely maximum you will ever need, or add ENOMEM handling.

Note that using more channels will only work if those channels are each allocated on a separate thread.  Multiple requests to allocate a Blobstore channel on the same thread will always return the same channel.

Regards, 

-Jim

On 1/10/18, 9:03 AM, "SPDK on behalf of Luse, Paul E" <spdk-bounces(a)lists.01.org on behalf of paul.e.luse(a)intel.com> wrote:

    Hi Zhang,
    
    I'm not suggesting changing anything right now, thanks for the point on the SSD though - I'd wait for Ben or someone else to jump in with a bit more info on why this is happening.
    
    Thanks!!
    Paul
    
    -----Original Message-----
    From: Zhengyu Zhang [mailto:freeman.zhang1992(a)gmail.com] 
    Sent: Wednesday, January 10, 2018 9:00 AM
    To: Luse, Paul E <paul.e.luse(a)intel.com>
    Cc: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously
    
    Hi Paul
    
    Thanks for your reply!
    
    
    On 1/10/18 11:18 PM, Luse, Paul E wrote:
    > So what’s happening here is internally within Blobstore when
    > _spdk_blob_request_submit_op() tries to get a channel->req via
    > spdk_bs_batch_open() for a write (bear the end) it doesn’t have any 
    > available so returns NULL which results in a callback error of –ENOMEM 
    > to the hello_blob callback.  The default number of channel reqs is 512 
    > and the hello_blob app doesn’t change that and uses a single channel 
    > for submitting the 520 back to back write requests that you are 
    > issuing and this failure happens right there towards the end.
    > 
    
    So you are suggesting me to tune the number of channel reqs or using multiple channels if I want to write more?
    
    > 
    > Ben, if I crank up the max channel reqs this works OK.  I’m thinking 
    > this shouldn’t be needed and wondering why we aren’t placing completed 
    > channel reqs back on the list fast enough (if that’s the real problem).
    > Something to do with this being a malloc backend maybe?  Would try 
    > w/nvme but my dev system isn’t quite ready for prime time yet after 
    > reblasting it….
    > 
    >
    
    I tested on both malloc backend and an P3700 NVMe SSD. Their results are the same.
    
    Thanks!
    Zhengyu
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk