Re: [SPDK] Problem with Blobstore when write 65MB continously

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 20:53 Walker, Benjamin
  0 siblings, 0 replies; 16+ messages in thread
From: Walker, Benjamin @ 2018-01-10 20:53 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

On Wed, 2018-01-10 at 19:28 +0000, Andrey Kuzmin wrote:
> On Wed, Jan 10, 2018, 20:17 Walker, Benjamin <benjamin.walker(a)intel.com>
> wrote:
> > On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote:
> > > It appears quite logical to start submission with a check for pending
> > > completions, doesn't it? Or check for completions if downstream bdev
> > returns
> > > busy status. That would definitely meet app expectations whatever the
> > request
> > > pool size is.
> > 
> > We've considered checking for completions inside the submission path if we
> > would
> > otherwise return ENOMEM. So far, we've decided not to go that direction for
> > two
> > reasons.
> > 
> > 1) Even if we do this, there are still cases where we'll return ENOMEM. For
> > instance, if there are no completions to reap yet.
> 
> While theoretically possible, such a case is problematic to imagine in
> practice.

The user has 512 queue depth available and is submitting I/O in a tight loop.
The submission path through the blobstore and into the NVMe driver probably
takes on the order of 500ns to run. That means you can submit your full queue
depth worth in 256us. On many NAND SSDs that's well within P99 latency
expectations for 4KiB I/O, and it gets increasingly likely with larger I/O to
the point where it is almost guaranteed to happen with 128KiB requests. The user
is free to reduce the available queue depth to save memory as well.

> > 2) This would result in completion callbacks in response to a submit call.
> > Today, the expectations are set that completions are called in response to a
> > poll call only.
> 
> Feel free to correct me if I'm wrong, but my recollection is that completion
> callback may be called on submission path in case of error.

I just checked and for the nvme and bdev libraries an error code will be given
to the user as the return code for the function. The callback will not be called
because the failure is known immediately. For the blobstore library it works the
opposite way - the functions have no return code and instead always call the
user callback. I think this is probably a design mistake on my part. For these
ENOMEM cases, we need to return that to the user as a return code. That makes it
much easier to handle the situation and makes it consistent with the other
libraries.

> The case in question is, apparently, a corner one as application must check
> for completions if bdev returns busy status. One cannot run an unlimited rate
> client atop a rate-limited server w/o a poll enforced at some point.
> 
> It might also be helpful to add a parameter to the poll call specifying the
> minimum number of completions to reap before returning control to the app, to
> deal with deadlocks like this one.

There already is a parameter that limits the number of completions reaped in a
single poll call. Even if you don't specify a limit, the drivers enforce
sensible limits by default.

> 
> Regards,
> Andrey
> 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> -- 
> Regards,
> Andrey
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-11  4:08 Zhengyu Zhang
  0 siblings, 0 replies; 16+ messages in thread
From: Zhengyu Zhang @ 2018-01-11  4:08 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4057 bytes --]

Thanks guys for all your help! Now I know better about SPDK internals
and plan to add status checking code before submitting further IO request.

I misunderstood the claim on http://www.spdk.io/doc/blob.html, which
says "The blobstore is ... typically in lieu of a traditional
filesystem". When it comes to writing code, using blobstore API means we
are taking a lot more responsibilities than using a traditional filesystem.


All the best!
Zhengyu

On 1/11/18 4:53 AM, Walker, Benjamin wrote:
> On Wed, 2018-01-10 at 19:28 +0000, Andrey Kuzmin wrote:
>> On Wed, Jan 10, 2018, 20:17 Walker, Benjamin <benjamin.walker(a)intel.com>
>> wrote:
>>> On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote:
>>>> It appears quite logical to start submission with a check for pending
>>>> completions, doesn't it? Or check for completions if downstream bdev
>>> returns
>>>> busy status. That would definitely meet app expectations whatever the
>>> request
>>>> pool size is.
>>>
>>> We've considered checking for completions inside the submission path if we
>>> would
>>> otherwise return ENOMEM. So far, we've decided not to go that direction for
>>> two
>>> reasons.
>>>
>>> 1) Even if we do this, there are still cases where we'll return ENOMEM. For
>>> instance, if there are no completions to reap yet.
>>
>> While theoretically possible, such a case is problematic to imagine in
>> practice.
> 
> The user has 512 queue depth available and is submitting I/O in a tight loop.
> The submission path through the blobstore and into the NVMe driver probably
> takes on the order of 500ns to run. That means you can submit your full queue
> depth worth in 256us. On many NAND SSDs that's well within P99 latency
> expectations for 4KiB I/O, and it gets increasingly likely with larger I/O to
> the point where it is almost guaranteed to happen with 128KiB requests. The user
> is free to reduce the available queue depth to save memory as well.
> 
>>> 2) This would result in completion callbacks in response to a submit call.
>>> Today, the expectations are set that completions are called in response to a
>>> poll call only.
>>
>> Feel free to correct me if I'm wrong, but my recollection is that completion
>> callback may be called on submission path in case of error.
> 
> I just checked and for the nvme and bdev libraries an error code will be given
> to the user as the return code for the function. The callback will not be called
> because the failure is known immediately. For the blobstore library it works the
> opposite way - the functions have no return code and instead always call the
> user callback. I think this is probably a design mistake on my part. For these
> ENOMEM cases, we need to return that to the user as a return code. That makes it
> much easier to handle the situation and makes it consistent with the other
> libraries.
> 
>> The case in question is, apparently, a corner one as application must check
>> for completions if bdev returns busy status. One cannot run an unlimited rate
>> client atop a rate-limited server w/o a poll enforced at some point.
>>
>> It might also be helpful to add a parameter to the poll call specifying the
>> minimum number of completions to reap before returning control to the app, to
>> deal with deadlocks like this one.
> 
> There already is a parameter that limits the number of completions reaped in a
> single poll call. Even if you don't specify a limit, the drivers enforce
> sensible limits by default.
> 
>>
>> Regards,
>> Andrey
>>
>>> _______________________________________________
>>> SPDK mailing list
>>> SPDK(a)lists.01.org
>>> https://lists.01.org/mailman/listinfo/spdk
>>
>> -- 
>> Regards,
>> Andrey
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 19:28 Andrey Kuzmin
  0 siblings, 0 replies; 16+ messages in thread
From: Andrey Kuzmin @ 2018-01-10 19:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2217 bytes --]

On Wed, Jan 10, 2018, 20:17 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote:
> >
> >
> > On Wed, Jan 10, 2018, 19:47 Luse, Paul E <paul.e.luse(a)intel.com> wrote:
> > > So I'll take the easy way out and just ask.. what's the most efficient
> way
> > > for an app, designed just this one for whatever reason, essentially
> > > relinquish control to the event reactor - is there a single call he
> can slip
> > > into the loop to give other pending events, if any a chance to run and
> if
> > > not continue on submitting?
> >
> > It appears quite logical to start submission with a check for pending
> > completions, doesn't it? Or check for completions if downstream bdev
> returns
> > busy status. That would definitely meet app expectations whatever the
> request
> > pool size is.
>
> We've considered checking for completions inside the submission path if we
> would
> otherwise return ENOMEM. So far, we've decided not to go that direction
> for two
> reasons.
>
> 1) Even if we do this, there are still cases where we'll return ENOMEM. For
> instance, if there are no completions to reap yet.
>

While theoretically possible, such a case is problematic to imagine in
practice.

2) This would result in completion callbacks in response to a submit call.
> Today, the expectations are set that completions are called in response to
> a
> poll call only.
>

Feel free to correct me if I'm wrong, but my recollection is that
completion callback may be called on submission path in case of error.

The case in question is, apparently, a corner one as application must check
for completions if bdev returns busy status. One cannot run an unlimited
rate client atop a rate-limited server w/o a poll enforced at some point.

It might also be helpful to add a parameter to the poll call specifying the
minimum number of completions to reap before returning control to the app,
to deal with deadlocks like this one.

Regards,
Andrey

_______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3271 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 17:17 Walker, Benjamin
  0 siblings, 0 replies; 16+ messages in thread
From: Walker, Benjamin @ 2018-01-10 17:17 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote:
> 
> 
> On Wed, Jan 10, 2018, 19:47 Luse, Paul E <paul.e.luse(a)intel.com> wrote:
> > So I'll take the easy way out and just ask.. what's the most efficient way
> > for an app, designed just this one for whatever reason, essentially
> > relinquish control to the event reactor - is there a single call he can slip
> > into the loop to give other pending events, if any a chance to run and if
> > not continue on submitting?
> 
> It appears quite logical to start submission with a check for pending
> completions, doesn't it? Or check for completions if downstream bdev returns
> busy status. That would definitely meet app expectations whatever the request
> pool size is.

We've considered checking for completions inside the submission path if we would
otherwise return ENOMEM. So far, we've decided not to go that direction for two
reasons.

1) Even if we do this, there are still cases where we'll return ENOMEM. For
instance, if there are no completions to reap yet.
2) This would result in completion callbacks in response to a submit call.
Today, the expectations are set that completions are called in response to a
poll call only.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 17:11 Walker, Benjamin
  0 siblings, 0 replies; 16+ messages in thread
From: Walker, Benjamin @ 2018-01-10 17:11 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]

On Wed, 2018-01-10 at 17:02 +0000, Luse, Paul E wrote:
> OK yeah that makes sense, one could experiment with batching some number of
> requests and counting through callbacks to find a good number to submit before
> returning, was hoping for something like spdk_yield_thread() but of course
> didn't look first.  Do you think that might be useful or has this never come
> up before? 

If we could implement spdk_yield_thread() in C, it would be amazing.
Unfortunately, coroutines are impossible to implement in pure C. It makes a lot
of sense to program all of this asynchronous stuff using coroutines, or even
better futures and promises, but those things just aren't available to us. And
sticking to pure, standard compliant C is important for SPDK because it gets
ported to all sorts of architectures and run time environments.

> 
> Thx
> Paul
> 
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
> Sent: Wednesday, January 10, 2018 9:58 AM
> To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
> Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously
> 
> On Wed, 2018-01-10 at 16:47 +0000, Luse, Paul E wrote:
> > Damn, I should have known that :( Thanks guys!! 
> > 
> > So I'll take the easy way out and just ask.. what's the most efficient 
> > way for an app, designed just this one for whatever reason, 
> > essentially relinquish control to the event reactor - is there a 
> > single call he can slip into the loop to give other pending events, if 
> > any a chance to run and if not continue on submitting?
> 
> Return from the function submitting I/O after submitting a certain number and
> resume submitting new I/O in response to completion callbacks.
> 
> > 
> > Thx
> > Paul
> > 
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
> > Benjamin
> > Sent: Wednesday, January 10, 2018 9:33 AM
> > To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
> > Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously
> > 
> > On Wed, 2018-01-10 at 16:21 +0000, Harris, James R wrote:
> > > Hi Paul and Zhengyu,
> > > 
> > > The problem is that the app is not giving the block device a chance 
> > > to complete any I/O while submitting the 520 back-to-back requests.
> > > Blobstore is passive here – it does not do any polling on the block 
> > > device – that is up to the application.
> > 
> > To additionally clarify - the bdev layer will poll for completions on 
> > your behalf, but it does so on the same thread that you are submitting 
> > I/O from. If you are in a tight loop submitting I/O and never yield 
> > back to the event reactor, the polling won't have a chance to occur. 
> > You can either increase the number of reqs per channel or submit 
> > smaller batches at a time. Basically, do what Jim said below.
> > 
> > > Increasing the number of channel reqs would work – but at some point 
> > > these will still run out.  So it really depends on your application 
> > > – either increase the channel reqs to the absolutely maximum you 
> > > will ever need, or add ENOMEM handling.
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 17:02 Luse, Paul E
  0 siblings, 0 replies; 16+ messages in thread
From: Luse, Paul E @ 2018-01-10 17:02 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

OK yeah that makes sense, one could experiment with batching some number of requests and counting through callbacks to find a good number to submit before returning, was hoping for something like spdk_yield_thread() but of course didn't look first.  Do you think that might be useful or has this never come up before? 

Thx
Paul

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
Sent: Wednesday, January 10, 2018 9:58 AM
To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously

On Wed, 2018-01-10 at 16:47 +0000, Luse, Paul E wrote:
> Damn, I should have known that :( Thanks guys!! 
> 
> So I'll take the easy way out and just ask.. what's the most efficient 
> way for an app, designed just this one for whatever reason, 
> essentially relinquish control to the event reactor - is there a 
> single call he can slip into the loop to give other pending events, if 
> any a chance to run and if not continue on submitting?

Return from the function submitting I/O after submitting a certain number and resume submitting new I/O in response to completion callbacks.

> 
> Thx
> Paul
> 
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, 
> Benjamin
> Sent: Wednesday, January 10, 2018 9:33 AM
> To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
> Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously
> 
> On Wed, 2018-01-10 at 16:21 +0000, Harris, James R wrote:
> > Hi Paul and Zhengyu,
> > 
> > The problem is that the app is not giving the block device a chance 
> > to complete any I/O while submitting the 520 back-to-back requests.
> > Blobstore is passive here – it does not do any polling on the block 
> > device – that is up to the application.
> 
> To additionally clarify - the bdev layer will poll for completions on 
> your behalf, but it does so on the same thread that you are submitting 
> I/O from. If you are in a tight loop submitting I/O and never yield 
> back to the event reactor, the polling won't have a chance to occur. 
> You can either increase the number of reqs per channel or submit 
> smaller batches at a time. Basically, do what Jim said below.
> 
> > Increasing the number of channel reqs would work – but at some point 
> > these will still run out.  So it really depends on your application 
> > – either increase the channel reqs to the absolutely maximum you 
> > will ever need, or add ENOMEM handling.
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 17:00 Andrey Kuzmin
  0 siblings, 0 replies; 16+ messages in thread
From: Andrey Kuzmin @ 2018-01-10 17:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2417 bytes --]

On Wed, Jan 10, 2018, 19:47 Luse, Paul E <paul.e.luse(a)intel.com> wrote:

> Damn, I should have known that :( Thanks guys!!
>
> So I'll take the easy way out and just ask.. what's the most efficient way
> for an app, designed just this one for whatever reason, essentially
> relinquish control to the event reactor - is there a single call he can
> slip into the loop to give other pending events, if any a chance to run and
> if not continue on submitting?
>

It appears quite logical to start submission with a check for pending
completions, doesn't it? Or check for completions if downstream bdev
returns busy status. That would definitely meet app expectations whatever
the request pool size is.

Regards,
Andrey


> Thx
> Paul
>
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker,
> Benjamin
> Sent: Wednesday, January 10, 2018 9:33 AM
> To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
> Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously
>
> On Wed, 2018-01-10 at 16:21 +0000, Harris, James R wrote:
> > Hi Paul and Zhengyu,
> >
> > The problem is that the app is not giving the block device a chance to
> > complete any I/O while submitting the 520 back-to-back requests.
> > Blobstore is passive here – it does not do any polling on the block
> > device – that is up to the application.
>
> To additionally clarify - the bdev layer will poll for completions on your
> behalf, but it does so on the same thread that you are submitting I/O from.
> If you are in a tight loop submitting I/O and never yield back to the event
> reactor, the polling won't have a chance to occur. You can either increase
> the number of reqs per channel or submit smaller batches at a time.
> Basically, do what Jim said below.
>
> > Increasing the number of channel reqs would work – but at some point
> > these will still run out.  So it really depends on your application –
> > either increase the channel reqs to the absolutely maximum you will
> > ever need, or add ENOMEM handling.
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3473 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 16:58 Walker, Benjamin
  0 siblings, 0 replies; 16+ messages in thread
From: Walker, Benjamin @ 2018-01-10 16:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

On Wed, 2018-01-10 at 16:47 +0000, Luse, Paul E wrote:
> Damn, I should have known that :( Thanks guys!! 
> 
> So I'll take the easy way out and just ask.. what's the most efficient way for
> an app, designed just this one for whatever reason, essentially relinquish
> control to the event reactor - is there a single call he can slip into the
> loop to give other pending events, if any a chance to run and if not continue
> on submitting?

Return from the function submitting I/O after submitting a certain number and
resume submitting new I/O in response to completion callbacks.

> 
> Thx
> Paul
> 
> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
> Sent: Wednesday, January 10, 2018 9:33 AM
> To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
> Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously
> 
> On Wed, 2018-01-10 at 16:21 +0000, Harris, James R wrote:
> > Hi Paul and Zhengyu,
> > 
> > The problem is that the app is not giving the block device a chance to 
> > complete any I/O while submitting the 520 back-to-back requests.  
> > Blobstore is passive here – it does not do any polling on the block 
> > device – that is up to the application.
> 
> To additionally clarify - the bdev layer will poll for completions on your
> behalf, but it does so on the same thread that you are submitting I/O from. If
> you are in a tight loop submitting I/O and never yield back to the event
> reactor, the polling won't have a chance to occur. You can either increase the
> number of reqs per channel or submit smaller batches at a time. Basically, do
> what Jim said below.
> 
> > Increasing the number of channel reqs would work – but at some point 
> > these will still run out.  So it really depends on your application – 
> > either increase the channel reqs to the absolutely maximum you will 
> > ever need, or add ENOMEM handling.
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 16:47 Luse, Paul E
  0 siblings, 0 replies; 16+ messages in thread
From: Luse, Paul E @ 2018-01-10 16:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]

Damn, I should have known that :( Thanks guys!! 

So I'll take the easy way out and just ask.. what's the most efficient way for an app, designed just this one for whatever reason, essentially relinquish control to the event reactor - is there a single call he can slip into the loop to give other pending events, if any a chance to run and if not continue on submitting?

Thx
Paul

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
Sent: Wednesday, January 10, 2018 9:33 AM
To: freeman.zhang1992(a)gmail.com; spdk(a)lists.01.org
Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously

On Wed, 2018-01-10 at 16:21 +0000, Harris, James R wrote:
> Hi Paul and Zhengyu,
> 
> The problem is that the app is not giving the block device a chance to 
> complete any I/O while submitting the 520 back-to-back requests.  
> Blobstore is passive here – it does not do any polling on the block 
> device – that is up to the application.

To additionally clarify - the bdev layer will poll for completions on your behalf, but it does so on the same thread that you are submitting I/O from. If you are in a tight loop submitting I/O and never yield back to the event reactor, the polling won't have a chance to occur. You can either increase the number of reqs per channel or submit smaller batches at a time. Basically, do what Jim said below.

> Increasing the number of channel reqs would work – but at some point 
> these will still run out.  So it really depends on your application – 
> either increase the channel reqs to the absolutely maximum you will 
> ever need, or add ENOMEM handling.
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 16:32 Walker, Benjamin
  0 siblings, 0 replies; 16+ messages in thread
From: Walker, Benjamin @ 2018-01-10 16:32 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On Wed, 2018-01-10 at 16:21 +0000, Harris, James R wrote:
> Hi Paul and Zhengyu,
> 
> The problem is that the app is not giving the block device a chance to
> complete any I/O while submitting the 520 back-to-back requests.  Blobstore is
> passive here – it does not do any polling on the block device – that is up to
> the application.

To additionally clarify - the bdev layer will poll for completions on your
behalf, but it does so on the same thread that you are submitting I/O from. If
you are in a tight loop submitting I/O and never yield back to the event
reactor, the polling won't have a chance to occur. You can either increase the
number of reqs per channel or submit smaller batches at a time. Basically, do
what Jim said below.

> Increasing the number of channel reqs would work – but at some point these
> will still run out.  So it really depends on your application – either
> increase the channel reqs to the absolutely maximum you will ever need, or add
> ENOMEM handling.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 16:21 Harris, James R
  0 siblings, 0 replies; 16+ messages in thread
From: Harris, James R @ 2018-01-10 16:21 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3704 bytes --]

Hi Paul and Zhengyu,

The problem is that the app is not giving the block device a chance to complete any I/O while submitting the 520 back-to-back requests.  Blobstore is passive here – it does not do any polling on the block device – that is up to the application.

Technically, with a malloc backend, there is really no polling required since it’s just a memcpy – but the bdev layer defers immediate completions as an event so that bdev API users are ensured they will never get their completion callback invoked in the context of the bdev IO submission.  So in this test case, malloc ends up behaving similarly to an asynchronous block device backend like NVMe.

For NVMe, just giving the app time to poll will not guarantee that completions will occur fast enough to allow more submissions.  The CPU will always be able to submit I/O at a faster rate than the NVMe device can complete them (even for very small I/O).

Increasing the number of channel reqs would work – but at some point these will still run out.  So it really depends on your application – either increase the channel reqs to the absolutely maximum you will ever need, or add ENOMEM handling.

Note that using more channels will only work if those channels are each allocated on a separate thread.  Multiple requests to allocate a Blobstore channel on the same thread will always return the same channel.

Regards, 

-Jim

On 1/10/18, 9:03 AM, "SPDK on behalf of Luse, Paul E" <spdk-bounces(a)lists.01.org on behalf of paul.e.luse(a)intel.com> wrote:

    Hi Zhang,

    I'm not suggesting changing anything right now, thanks for the point on the SSD though - I'd wait for Ben or someone else to jump in with a bit more info on why this is happening.

    Thanks!!
    Paul

    -----Original Message-----
    From: Zhengyu Zhang [mailto:freeman.zhang1992(a)gmail.com] 
    Sent: Wednesday, January 10, 2018 9:00 AM
    To: Luse, Paul E <paul.e.luse(a)intel.com>
    Cc: Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously

    Hi Paul

    Thanks for your reply!

    On 1/10/18 11:18 PM, Luse, Paul E wrote:
    > So what’s happening here is internally within Blobstore when
    > _spdk_blob_request_submit_op() tries to get a channel->req via
    > spdk_bs_batch_open() for a write (bear the end) it doesn’t have any 
    > available so returns NULL which results in a callback error of –ENOMEM 
    > to the hello_blob callback.  The default number of channel reqs is 512 
    > and the hello_blob app doesn’t change that and uses a single channel 
    > for submitting the 520 back to back write requests that you are 
    > issuing and this failure happens right there towards the end.
    > 

    So you are suggesting me to tune the number of channel reqs or using multiple channels if I want to write more?

    > 
    > Ben, if I crank up the max channel reqs this works OK.  I’m thinking 
    > this shouldn’t be needed and wondering why we aren’t placing completed 
    > channel reqs back on the list fast enough (if that’s the real problem).
    > Something to do with this being a malloc backend maybe?  Would try 
    > w/nvme but my dev system isn’t quite ready for prime time yet after 
    > reblasting it….
    > 
    >

    I tested on both malloc backend and an P3700 NVMe SSD. Their results are the same.

    Thanks!
    Zhengyu
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 16:03 Luse, Paul E
  0 siblings, 0 replies; 16+ messages in thread
From: Luse, Paul E @ 2018-01-10 16:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1812 bytes --]

Hi Zhang,

I'm not suggesting changing anything right now, thanks for the point on the SSD though - I'd wait for Ben or someone else to jump in with a bit more info on why this is happening.

Thanks!!
Paul

-----Original Message-----
From: Zhengyu Zhang [mailto:freeman.zhang1992(a)gmail.com] 
Sent: Wednesday, January 10, 2018 9:00 AM
To: Luse, Paul E <paul.e.luse(a)intel.com>
Cc: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously

Hi Paul

Thanks for your reply!


On 1/10/18 11:18 PM, Luse, Paul E wrote:
> So what’s happening here is internally within Blobstore when
> _spdk_blob_request_submit_op() tries to get a channel->req via
> spdk_bs_batch_open() for a write (bear the end) it doesn’t have any 
> available so returns NULL which results in a callback error of –ENOMEM 
> to the hello_blob callback.  The default number of channel reqs is 512 
> and the hello_blob app doesn’t change that and uses a single channel 
> for submitting the 520 back to back write requests that you are 
> issuing and this failure happens right there towards the end.
> 

So you are suggesting me to tune the number of channel reqs or using multiple channels if I want to write more?

> 
> Ben, if I crank up the max channel reqs this works OK.  I’m thinking 
> this shouldn’t be needed and wondering why we aren’t placing completed 
> channel reqs back on the list fast enough (if that’s the real problem).
> Something to do with this being a malloc backend maybe?  Would try 
> w/nvme but my dev system isn’t quite ready for prime time yet after 
> reblasting it….
> 
>

I tested on both malloc backend and an P3700 NVMe SSD. Their results are the same.

Thanks!
Zhengyu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 15:59 Zhengyu Zhang
  0 siblings, 0 replies; 16+ messages in thread
From: Zhengyu Zhang @ 2018-01-10 15:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

Hi Paul

Thanks for your reply!


On 1/10/18 11:18 PM, Luse, Paul E wrote:
> So what’s happening here is internally within Blobstore when
> _spdk_blob_request_submit_op() tries to get a channel->req via
> spdk_bs_batch_open() for a write (bear the end) it doesn’t have any
> available so returns NULL which results in a callback error of –ENOMEM
> to the hello_blob callback.  The default number of channel reqs is 512
> and the hello_blob app doesn’t change that and uses a single channel for
> submitting the 520 back to back write requests that you are issuing and
> this failure happens right there towards the end.
> 

So you are suggesting me to tune the number of channel reqs or using
multiple channels if I want to write more?

> 
> Ben, if I crank up the max channel reqs this works OK.  I’m thinking
> this shouldn’t be needed and wondering why we aren’t placing completed
> channel reqs back on the list fast enough (if that’s the real problem).
> Something to do with this being a malloc backend maybe?  Would try
> w/nvme but my dev system isn’t quite ready for prime time yet after
> reblasting it….
> 
>

I tested on both malloc backend and an P3700 NVMe SSD. Their results are
the same.

Thanks!
Zhengyu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 15:18 Luse, Paul E
  0 siblings, 0 replies; 16+ messages in thread
From: Luse, Paul E @ 2018-01-10 15:18 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

So what’s happening here is internally within Blobstore when _spdk_blob_request_submit_op() tries to get a channel->req via spdk_bs_batch_open() for a write (bear the end) it doesn’t have any available so returns NULL which results in a callback error of –ENOMEM to the hello_blob callback.  The default number of channel reqs is 512 and the hello_blob app doesn’t change that and uses a single channel for submitting the 520 back to back write requests that you are issuing and this failure happens right there towards the end.

Ben, if I crank up the max channel reqs this works OK.  I’m thinking this shouldn’t be needed and wondering why we aren’t placing completed channel reqs back on the list fast enough (if that’s the real problem). Something to do with this being a malloc backend maybe?  Would try w/nvme but my dev system isn’t quite ready for prime time yet after reblasting it….

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Luse, Paul E
Sent: Wednesday, January 10, 2018 7:04 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously

Hi Zhang,

Not sure off the top of my head but I’m hapy to take a quick look, will let ya know what I see on this end…

-Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Zhengyu Zhang
Sent: Tuesday, January 9, 2018 8:16 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Problem with Blobstore when write 65MB continously

Hi list!

I want to write some app with Blobstore in SPDK. I am playing with example/blob/hello_world/hello_blob.c for a while. I modified the hello_blob to make it write more pages than its original one page:

for ( i = 0; i < SOMEVAL; i ++) {
spdk_bs_io_write_blob(hello_context->blob, hello_context->channel,
hello_context->write_buff, offset, 32,
 write_complete, hello_context);
offset += 32;
}

I meant to write blob for SOMEVAL times and 32 pages for each write. When the total amount of writing data is below 64M (SOMEVAL <= 512), it works fine. However, when the total size is over 64M, e.g. 65M, it breaks:

hello_blob.c: 388:blob_create_complete: *NOTICE*: new blob id 4294967296
hello_blob.c: 327:open_complete: *NOTICE*: entry
hello_blob.c: 338:open_complete: *NOTICE*: blobstore has FREE clusters of 380063
hello_blob.c: 358:open_complete: *NOTICE*: resized blob now has USED clusters of 65
hello_blob.c: 295:sync_complete: *NOTICE*: entry
hello_blob.c: 253:blob_write: *NOTICE*: entry
hello_blob.c: 232:write_complete: *NOTICE*: entry
hello_blob.c: 115:unload_bs: *ERROR*: Error in write completion (err -12)
blobstore.c:2563:spdk_bs_unload: *ERROR*: Blobstore still has open blobs
hello_blob.c:  99:unload_complete: *NOTICE*: entry
hello_blob.c: 101:unload_complete: *ERROR*: Error -16 unloading the bobstore

I have no idea what is going on ...  can anyone help?

Thanks
ZHengyu

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 10295 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10 14:03 Luse, Paul E
  0 siblings, 0 replies; 16+ messages in thread
From: Luse, Paul E @ 2018-01-10 14:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Hi Zhang,

Not sure off the top of my head but I’m hapy to take a quick look, will let ya know what I see on this end…

-Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Zhengyu Zhang
Sent: Tuesday, January 9, 2018 8:16 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Problem with Blobstore when write 65MB continously

Hi list!

I want to write some app with Blobstore in SPDK. I am playing with example/blob/hello_world/hello_blob.c for a while. I modified the hello_blob to make it write more pages than its original one page:

for ( i = 0; i < SOMEVAL; i ++) {
spdk_bs_io_write_blob(hello_context->blob, hello_context->channel,
hello_context->write_buff, offset, 32,
 write_complete, hello_context);
offset += 32;
}

I meant to write blob for SOMEVAL times and 32 pages for each write. When the total amount of writing data is below 64M (SOMEVAL <= 512), it works fine. However, when the total size is over 64M, e.g. 65M, it breaks:

hello_blob.c: 388:blob_create_complete: *NOTICE*: new blob id 4294967296
hello_blob.c: 327:open_complete: *NOTICE*: entry
hello_blob.c: 338:open_complete: *NOTICE*: blobstore has FREE clusters of 380063
hello_blob.c: 358:open_complete: *NOTICE*: resized blob now has USED clusters of 65
hello_blob.c: 295:sync_complete: *NOTICE*: entry
hello_blob.c: 253:blob_write: *NOTICE*: entry
hello_blob.c: 232:write_complete: *NOTICE*: entry
hello_blob.c: 115:unload_bs: *ERROR*: Error in write completion (err -12)
blobstore.c:2563:spdk_bs_unload: *ERROR*: Blobstore still has open blobs
hello_blob.c:  99:unload_complete: *NOTICE*: entry
hello_blob.c: 101:unload_complete: *ERROR*: Error -16 unloading the bobstore

I have no idea what is going on ...  can anyone help?


Thanks
ZHengyu

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 6918 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [SPDK] Problem with Blobstore when write 65MB continously
@ 2018-01-10  3:15 Zhengyu Zhang
  0 siblings, 0 replies; 16+ messages in thread
From: Zhengyu Zhang @ 2018-01-10  3:15 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

Hi list!

I want to write some app with Blobstore in SPDK. I am playing with
example/blob/hello_world/hello_blob.c for a while. I modified the
hello_blob to make it write more pages than its original one page:

for ( i = 0; i < SOMEVAL; i ++) {

spdk_bs_io_write_blob(hello_context->blob, hello_context->channel,

hello_context->write_buff, offset, 32,
 write_complete, hello_context);

offset += 32;

}

I meant to write blob for SOMEVAL times and 32 pages for each write. When
the total amount of writing data is below 64M (SOMEVAL <= 512), it works
fine. However, when the total size is over 64M, e.g. 65M, it breaks:

hello_blob.c: 388:blob_create_complete: *NOTICE*: new blob id 4294967296
hello_blob.c: 327:open_complete: *NOTICE*: entry
hello_blob.c: 338:open_complete: *NOTICE*: blobstore has FREE clusters of
380063
hello_blob.c: 358:open_complete: *NOTICE*: resized blob now has USED
clusters of 65
hello_blob.c: 295:sync_complete: *NOTICE*: entry
hello_blob.c: 253:blob_write: *NOTICE*: entry

hello_blob.c: 232:write_complete: *NOTICE*: entry
hello_blob.c: 115:unload_bs: *ERROR*: Error in write completion (err -12)
blobstore.c:2563:spdk_bs_unload: *ERROR*: Blobstore still has open blobs
hello_blob.c:  99:unload_complete: *NOTICE*: entry
hello_blob.c: 101:unload_complete: *ERROR*: Error -16 unloading the bobstore


I have no idea what is going on ...  can anyone help?


Thanks
ZHengyu

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 2574 bytes --]

[-- Attachment #3: hello_blob.diff --]
[-- Type: text/plain, Size: 6864 bytes --]

diff --git a/examples/blob/hello_world/hello_blob.c b/examples/blob/hello_world/hello_blob.c
index f2cf2f2..0f3cebb 100644
--- a/examples/blob/hello_world/hello_blob.c
+++ b/examples/blob/hello_world/hello_blob.c
@@ -40,6 +40,27 @@
 #include "spdk/blob.h"
 #include "spdk/log.h"
 
+#include <time.h>
+
+#define TOTAL_MB    (65)
+#define IO_DEPTH_IN_BYTE    (131072ULL)
+#define IO_DEPTH_IN_PAGE    (32ULL)
+
+#define ONE (1ULL)
+#define THOUSAND    (1000ULL * ONE)
+#define MILLION (THOUSAND * THOUSAND)
+
+inline double get_current_time()
+{
+    struct timespec ts;
+    clock_gettime(CLOCK_MONOTONIC, &ts);
+    return (double)ts.tv_sec * MILLION + (double)ts.tv_nsec / THOUSAND;
+}
+
+
+static uint64_t count;
+static double time_start, time_stop;
+
 /*
  * We'll use this struct to gather housekeeping hello_context to pass between
  * our events and callbacks.
@@ -51,7 +72,8 @@ struct hello_context_t {
 	struct spdk_io_channel *channel;
 	uint8_t *read_buff;
 	uint8_t *write_buff;
-	uint64_t page_size;
+	uint64_t buf_size;
+	// uint64_t page_size;
 	int rc;
 };
 
@@ -149,26 +171,34 @@ read_complete(void *arg1, int bserrno)
 {
 	struct hello_context_t *hello_context = arg1;
 	int match_res = -1;
+	static uint64_t rd_cnt;
 
-	SPDK_NOTICELOG("entry\n");
+	// SPDK_NOTICELOG("entry\n");
 	if (bserrno) {
 		unload_bs(hello_context, "Error in read completion",
 			  bserrno);
 		return;
+	} else {
+	    rd_cnt += 1;
+	    if (rd_cnt >= count) {
+		time_stop = get_current_time();
+		printf("read %ld kb from SSD in %lf sec.\n", rd_cnt * 128, (time_stop - time_start)/MILLION);
+		/* Now let's close it and delete the blob in the callback. */
+		spdk_blob_close(hello_context->blob, delete_blob, hello_context);
+	    }
+	
 	}
-
+#if 0
 	/* Now let's make sure things match. */
 	match_res = memcmp(hello_context->write_buff, hello_context->read_buff,
-			   hello_context->page_size);
+			   hello_context->buf_size);
 	if (match_res) {
 		unload_bs(hello_context, "Error in data compare", -1);
 		return;
 	} else {
 		SPDK_NOTICELOG("read SUCCESS and data matches!\n");
 	}
-
-	/* Now let's close it and delete the blob in the callback. */
-	spdk_blob_close(hello_context->blob, delete_blob, hello_context);
+#endif
 }
 
 /*
@@ -177,20 +207,17 @@ read_complete(void *arg1, int bserrno)
 static void
 read_blob(struct hello_context_t *hello_context)
 {
-	SPDK_NOTICELOG("entry\n");
+//	SPDK_NOTICELOG("entry\n");
+	uint64_t i;
 
-	hello_context->read_buff = spdk_dma_malloc(hello_context->page_size,
-				   0x1000, NULL);
-	if (hello_context->read_buff == NULL) {
-		unload_bs(hello_context, "Error in memory allocation",
-			  -ENOMEM);
-		return;
-	}
 
-	/* Issue the read and compare the results in the callback. */
+    uint64_t offset = 0;
+    for (i=0; i<count; i++) {
 	spdk_bs_io_read_blob(hello_context->blob, hello_context->channel,
-			     hello_context->read_buff, 0, 1, read_complete,
+			     hello_context->read_buff, offset, IO_DEPTH_IN_PAGE, read_complete,
 			     hello_context);
+        offset += IO_DEPTH_IN_PAGE;
+    }
 }
 
 /*
@@ -200,16 +227,21 @@ static void
 write_complete(void *arg1, int bserrno)
 {
 	struct hello_context_t *hello_context = arg1;
+	static uint64_t wr_cnt;
 
 	SPDK_NOTICELOG("entry\n");
 	if (bserrno) {
 		unload_bs(hello_context, "Error in write completion",
 			  bserrno);
 		return;
+	} else {
+	    wr_cnt += 1;
+	    if (wr_cnt >= count) {
+		printf("write %ld kb into SSD\n Begin reading ...\n", wr_cnt * 128);
+		time_start = get_current_time();
+		read_blob(hello_context);
+	    }
 	}
-
-	/* Now let's read back what we wrote and make sure it matches. */
-	read_blob(hello_context);
 }
 
 /*
@@ -219,19 +251,20 @@ static void
 blob_write(struct hello_context_t *hello_context)
 {
 	SPDK_NOTICELOG("entry\n");
+	uint64_t i;
 
 	/*
 	 * Buffers for data transfer need to be allocated via SPDK. We will
 	 * tranfer 1 page of 4K aligned data at offset 0 in the blob.
 	 */
-	hello_context->write_buff = spdk_dma_malloc(hello_context->page_size,
+	hello_context->write_buff = spdk_dma_malloc(hello_context->buf_size,
 				    0x1000, NULL);
 	if (hello_context->write_buff == NULL) {
 		unload_bs(hello_context, "Error in allocating memory",
 			  -ENOMEM);
 		return;
 	}
-	memset(hello_context->write_buff, 0x5a, hello_context->page_size);
+	memset(hello_context->write_buff, 0x5a, hello_context->buf_size);
 
 	/* Now we have to allocate a channel. */
 	hello_context->channel = spdk_bs_alloc_io_channel(hello_context->bs);
@@ -241,10 +274,15 @@ blob_write(struct hello_context_t *hello_context)
 		return;
 	}
 
-	/* Let's perform the write, 1 page at offset 0. */
-	spdk_bs_io_write_blob(hello_context->blob, hello_context->channel,
-			      hello_context->write_buff,
-			      0, 1, write_complete, hello_context);
+	uint64_t offset = 0;
+	printf("count = %ld\n", count);
+	for (i=0; i<count; i++) {
+		printf("writing %ld pages at offset %ld\n", IO_DEPTH_IN_PAGE, offset);
+		spdk_bs_io_write_blob(hello_context->blob, hello_context->channel,
+			hello_context->write_buff,
+			offset, IO_DEPTH_IN_PAGE, write_complete, hello_context);
+		offset += IO_DEPTH_IN_PAGE;
+	}
 }
 
 /*
@@ -261,6 +299,16 @@ sync_complete(void *arg1, int bserrno)
 			  bserrno);
 		return;
 	}
+	count  = TOTAL_MB * 1024  / (IO_DEPTH_IN_BYTE / 1024);
+
+
+	hello_context->read_buff = spdk_dma_malloc(hello_context->buf_size,
+				   0x1000, NULL);
+	if (hello_context->read_buff == NULL) {
+		unload_bs(hello_context, "Error in memory allocation",
+			  -ENOMEM);
+		return;
+	}
 
 	/* Blob has been created & sized & MD sync'd, let's write to it. */
 	blob_write(hello_context);
@@ -297,7 +345,9 @@ open_complete(void *cb_arg, struct spdk_blob *blob, int bserrno)
 	 * there'd usually be many blobs of various sizes. The resize
 	 * unit is a cluster.
 	 */
-	rc = spdk_blob_resize(hello_context->blob, free);
+	printf("cluster size = %ld\n", spdk_bs_get_cluster_size(hello_context->bs));
+	rc = spdk_blob_resize(hello_context->blob, TOTAL_MB);
+	//rc = spdk_blob_resize(hello_context->blob, free);
 	if (rc) {
 		unload_bs(hello_context, "Error in blob resize",
 			  bserrno);
@@ -375,7 +425,8 @@ bs_init_complete(void *cb_arg, struct spdk_blob_store *bs,
 	 * We will use the page size in allocating buffers, etc., later
 	 * so we'll just save it in out context buffer here.
 	 */
-	hello_context->page_size = spdk_bs_get_page_size(hello_context->bs);
+	// hello_context->page_size = spdk_bs_get_page_size(hello_context->bs);
+	hello_context->buf_size = IO_DEPTH_IN_BYTE;
 
 	/*
 	 * The blostore has been initialized, let's create a blob.
@@ -405,6 +456,7 @@ hello_start(void *arg1, void *arg2)
 	 * get it via its name.
 	 */
 	bdev = spdk_bdev_get_by_name("Malloc0");
+	//bdev = spdk_bdev_get_by_name("Nvme0n1");
 	if (bdev == NULL) {
 		SPDK_ERRLOG("Could not find a bdev\n");
 		spdk_app_stop(-1);

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: hello_blob.c --]
[-- Type: text/x-csrc, Size: 14694 bytes --]

/*-
 *   BSD LICENSE
 *
 *   Copyright (c) Intel Corporation.
 *   All rights reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
 *   are met:
 *
 *     * Redistributions of source code must retain the above copyright
 *       notice, this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in
 *       the documentation and/or other materials provided with the
 *       distribution.
 *     * Neither the name of Intel Corporation nor the names of its
 *       contributors may be used to endorse or promote products derived
 *       from this software without specific prior written permission.
 *
 *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include "spdk/stdinc.h"

#include "spdk/bdev.h"
#include "spdk/env.h"
#include "spdk/event.h"
#include "spdk/blob_bdev.h"
#include "spdk/blob.h"
#include "spdk/log.h"

#include <time.h>

#define TOTAL_MB    (65)
#define IO_DEPTH_IN_BYTE    (131072ULL)
#define IO_DEPTH_IN_PAGE    (32ULL)

#define ONE (1ULL)
#define THOUSAND    (1000ULL * ONE)
#define MILLION (THOUSAND * THOUSAND)

inline double get_current_time()
{
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return (double)ts.tv_sec * MILLION + (double)ts.tv_nsec / THOUSAND;
}


static uint64_t count;
static double time_start, time_stop;

/*
 * We'll use this struct to gather housekeeping hello_context to pass between
 * our events and callbacks.
 */
struct hello_context_t {
	struct spdk_blob_store *bs;
	struct spdk_blob *blob;
	spdk_blob_id blobid;
	struct spdk_io_channel *channel;
	uint8_t *read_buff;
	uint8_t *write_buff;
	uint64_t buf_size;
	// uint64_t page_size;
	int rc;
};

/*
 * Free up memory that we allocated.
 */
static void
hello_cleanup(struct hello_context_t *hello_context)
{
	spdk_dma_free(hello_context->read_buff);
	spdk_dma_free(hello_context->write_buff);
	free(hello_context);
}

/*
 * Callback routine for the blobstore unload.
 */
static void
unload_complete(void *cb_arg, int bserrno)
{
	struct hello_context_t *hello_context = cb_arg;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		SPDK_ERRLOG("Error %d unloading the bobstore\n", bserrno);
		hello_context->rc = bserrno;
	}

	spdk_app_stop(hello_context->rc);
}

/*
 * Unload the blobstore, cleaning up as needed.
 */
static void
unload_bs(struct hello_context_t *hello_context, char *msg, int bserrno)
{
	if (bserrno) {
		SPDK_ERRLOG("%s (err %d)\n", msg, bserrno);
		hello_context->rc = bserrno;
	}
	if (hello_context->bs) {
		if (hello_context->channel) {
			spdk_bs_free_io_channel(hello_context->channel);
		}
		spdk_bs_unload(hello_context->bs, unload_complete, hello_context);
	} else {
		spdk_app_stop(bserrno);
	}
}

/*
 * Callback routine for the deletion of a blob.
 */
static void
delete_complete(void *arg1, int bserrno)
{
	struct hello_context_t *hello_context = arg1;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in delete completion",
			  bserrno);
		return;
	}

	/* We're all done, we can unload the blobstore. */
	unload_bs(hello_context, "", 0);
}

/*
 * Function for deleting a blob.
 */
static void
delete_blob(void *arg1, int bserrno)
{
	struct hello_context_t *hello_context = arg1;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in close completion",
			  bserrno);
		return;
	}

	spdk_bs_delete_blob(hello_context->bs, hello_context->blobid,
			    delete_complete, hello_context);
}

/*
 * Callback function for reading a blob.
 */
static void
read_complete(void *arg1, int bserrno)
{
	struct hello_context_t *hello_context = arg1;
	int match_res = -1;
	static uint64_t rd_cnt;

	// SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in read completion",
			  bserrno);
		return;
	} else {
	    rd_cnt += 1;
	    if (rd_cnt >= count) {
		time_stop = get_current_time();
		printf("read %ld kb from SSD in %lf sec.\n", rd_cnt * 128, (time_stop - time_start)/MILLION);
		/* Now let's close it and delete the blob in the callback. */
		spdk_blob_close(hello_context->blob, delete_blob, hello_context);
	    }
	
	}
#if 0
	/* Now let's make sure things match. */
	match_res = memcmp(hello_context->write_buff, hello_context->read_buff,
			   hello_context->buf_size);
	if (match_res) {
		unload_bs(hello_context, "Error in data compare", -1);
		return;
	} else {
		SPDK_NOTICELOG("read SUCCESS and data matches!\n");
	}
#endif
}

/*
 * Function for reading a blob.
 */
static void
read_blob(struct hello_context_t *hello_context)
{
//	SPDK_NOTICELOG("entry\n");
	uint64_t i;


    uint64_t offset = 0;
    for (i=0; i<count; i++) {
	spdk_bs_io_read_blob(hello_context->blob, hello_context->channel,
			     hello_context->read_buff, offset, IO_DEPTH_IN_PAGE, read_complete,
			     hello_context);
        offset += IO_DEPTH_IN_PAGE;
    }
}

/*
 * Callback function for writing a blob.
 */
static void
write_complete(void *arg1, int bserrno)
{
	struct hello_context_t *hello_context = arg1;
	static uint64_t wr_cnt;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in write completion",
			  bserrno);
		return;
	} else {
	    wr_cnt += 1;
	    if (wr_cnt >= count) {
		printf("write %ld kb into SSD\n Begin reading ...\n", wr_cnt * 128);
		time_start = get_current_time();
		read_blob(hello_context);
	    }
	}
}

/*
 * Function for writing to a blob.
 */
static void
blob_write(struct hello_context_t *hello_context)
{
	SPDK_NOTICELOG("entry\n");
	uint64_t i;

	/*
	 * Buffers for data transfer need to be allocated via SPDK. We will
	 * tranfer 1 page of 4K aligned data at offset 0 in the blob.
	 */
	hello_context->write_buff = spdk_dma_malloc(hello_context->buf_size,
				    0x1000, NULL);
	if (hello_context->write_buff == NULL) {
		unload_bs(hello_context, "Error in allocating memory",
			  -ENOMEM);
		return;
	}
	memset(hello_context->write_buff, 0x5a, hello_context->buf_size);

	/* Now we have to allocate a channel. */
	hello_context->channel = spdk_bs_alloc_io_channel(hello_context->bs);
	if (hello_context->channel == NULL) {
		unload_bs(hello_context, "Error in allocating channel",
			  -ENOMEM);
		return;
	}

	uint64_t offset = 0;
	printf("count = %ld\n", count);
	for (i=0; i<count; i++) {
		printf("writing %ld pages at offset %ld\n", IO_DEPTH_IN_PAGE, offset);
		spdk_bs_io_write_blob(hello_context->blob, hello_context->channel,
			hello_context->write_buff,
			offset, IO_DEPTH_IN_PAGE, write_complete, hello_context);
		offset += IO_DEPTH_IN_PAGE;
	}
}

/*
 * Callback function for sync'ing metadata.
 */
static void
sync_complete(void *arg1, int bserrno)
{
	struct hello_context_t *hello_context = arg1;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in sync callback",
			  bserrno);
		return;
	}
	count  = TOTAL_MB * 1024  / (IO_DEPTH_IN_BYTE / 1024);


	hello_context->read_buff = spdk_dma_malloc(hello_context->buf_size,
				   0x1000, NULL);
	if (hello_context->read_buff == NULL) {
		unload_bs(hello_context, "Error in memory allocation",
			  -ENOMEM);
		return;
	}

	/* Blob has been created & sized & MD sync'd, let's write to it. */
	blob_write(hello_context);
}

/*
 * Callback function for opening a blob.
 */
static void
open_complete(void *cb_arg, struct spdk_blob *blob, int bserrno)
{
	struct hello_context_t *hello_context = cb_arg;
	uint64_t free = 0;
	uint64_t total = 0;
	int rc = 0;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in open completion",
			  bserrno);
		return;
	}


	hello_context->blob = blob;
	free = spdk_bs_free_cluster_count(hello_context->bs);
	SPDK_NOTICELOG("blobstore has FREE clusters of %" PRIu64 "\n",
		       free);

	/*
	 * Before we can use our new blob, we have to resize it
	 * as the initial size is 0. For this example we'll use the
	 * full size of the blobstore but it would be expected that
	 * there'd usually be many blobs of various sizes. The resize
	 * unit is a cluster.
	 */
	printf("cluster size = %ld\n", spdk_bs_get_cluster_size(hello_context->bs));
	rc = spdk_blob_resize(hello_context->blob, TOTAL_MB);
	//rc = spdk_blob_resize(hello_context->blob, free);
	if (rc) {
		unload_bs(hello_context, "Error in blob resize",
			  bserrno);
		return;
	}

	total = spdk_blob_get_num_clusters(hello_context->blob);
	SPDK_NOTICELOG("resized blob now has USED clusters of %" PRIu64 "\n",
		       total);

	/*
	 * Metadata is stored in volatile memory for performance
	 * reasons and therefore needs to be synchronized with
	 * non-volatile storage to make it persistent. This can be
	 * done manually, as shown here, or if not it will be done
	 * automatically when the blob is closed. It is always a
	 * good idea to sync after making metadata changes unless
	 * it has an unacceptable impact on application performance.
	 */
	spdk_blob_sync_md(hello_context->blob, sync_complete, hello_context);
}

/*
 * Callback function for creating a blob.
 */
static void
blob_create_complete(void *arg1, spdk_blob_id blobid, int bserrno)
{
	struct hello_context_t *hello_context = arg1;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error in blob create callback",
			  bserrno);
		return;
	}

	hello_context->blobid = blobid;
	SPDK_NOTICELOG("new blob id %" PRIu64 "\n", hello_context->blobid);

	/* We have to open the blob before we can do things like resize. */
	spdk_bs_open_blob(hello_context->bs, hello_context->blobid,
			  open_complete, hello_context);
}

/*
 * Function for creating a blob.
 */
static void
create_blob(struct hello_context_t *hello_context)
{
	SPDK_NOTICELOG("entry\n");
	spdk_bs_create_blob(hello_context->bs, blob_create_complete, hello_context);
}

/*
 * Callback function for initializing the blobstore.
 */
static void
bs_init_complete(void *cb_arg, struct spdk_blob_store *bs,
		 int bserrno)
{
	struct hello_context_t *hello_context = cb_arg;

	SPDK_NOTICELOG("entry\n");
	if (bserrno) {
		unload_bs(hello_context, "Error init'ing the blobstore",
			  bserrno);
		return;
	}

	hello_context->bs = bs;
	SPDK_NOTICELOG("blobstore: %p\n", hello_context->bs);
	/*
	 * We will use the page size in allocating buffers, etc., later
	 * so we'll just save it in out context buffer here.
	 */
	// hello_context->page_size = spdk_bs_get_page_size(hello_context->bs);
	hello_context->buf_size = IO_DEPTH_IN_BYTE;

	/*
	 * The blostore has been initialized, let's create a blob.
	 * Note that we could allcoate an SPDK event and use
	 * spdk_event_call() to schedule it if we wanted to keep
	 * our events as limited as possible wrt the amount of
	 * work that they do.
	 */
	create_blob(hello_context);
}

/*
 * Our initial event that kicks off everything from main().
 */
static void
hello_start(void *arg1, void *arg2)
{
	struct hello_context_t *hello_context = arg1;
	struct spdk_bdev *bdev = NULL;
	struct spdk_bs_dev *bs_dev = NULL;

	SPDK_NOTICELOG("entry\n");
	/*
	 * Get the bdev. For this example it is our malloc (RAM)
	 * disk configured via hello_blob.conf that was passed
	 * in when we started the SPDK app framework so we can
	 * get it via its name.
	 */
	bdev = spdk_bdev_get_by_name("Malloc0");
	//bdev = spdk_bdev_get_by_name("Nvme0n1");
	if (bdev == NULL) {
		SPDK_ERRLOG("Could not find a bdev\n");
		spdk_app_stop(-1);
		return;
	}

	/*
	 * spdk_bs_init() requires us to fill out the structure
	 * spdk_bs_dev with a set of callbacks. These callbacks
	 * implement read, write, and other operations on the
	 * underlying disks. As a convenience, a utility function
	 * is provided that creates an spdk_bs_dev that implements
	 * all of the callbacks by forwarding the I/O to the
	 * SPDK bdev layer. Other helper functions are also
	 * available in the blob lib in blob_bdev.c that simply
	 * make it easier to layer blobstore on top of a bdev.
	 * However blobstore can be more tightly integrated into
	 * any lower layer, such as NVMe for example.
	 */
	bs_dev = spdk_bdev_create_bs_dev(bdev, NULL, NULL);
	if (bs_dev == NULL) {
		SPDK_ERRLOG("Could not create blob bdev!!\n");
		spdk_app_stop(-1);
		return;
	}

	spdk_bs_init(bs_dev, NULL, bs_init_complete, hello_context);
}

int
main(int argc, char **argv)
{
	struct spdk_app_opts opts = {};
	int rc = 0;
	struct hello_context_t *hello_context = NULL;

	SPDK_NOTICELOG("entry\n");

	/* Set default values in opts structure. */
	spdk_app_opts_init(&opts);

	/*
	 * Setup a few specifics before we init, for most SPDK cmd line
	 * apps, the config file will be passed in as an arg but to make
	 * this example super simple we just hardcode it. We also need to
	 * specify a name for the app.
	 */
	opts.name = "hello_blob";
	opts.config_file = "hello_blob.conf";


	/*
	 * Now we'll allocate and intialize the blobstore itself. We
	 * can pass in an spdk_bs_opts if we want something other than
	 * the defaults (cluster size, etc), but here we'll just take the
	 * defaults.  We'll also pass in a struct that we'll use for
	 * callbacks so we've got efficient bookeeping of what we're
	 * creating. This is an async operation and bs_init_complete()
	 * will be called when it is complete.
	 */
	hello_context = calloc(1, sizeof(struct hello_context_t));
	if (hello_context != NULL) {
		/*
		 * spdk_app_start() will block running hello_start() until
		 * spdk_app_stop() is called by someone (not simply when
		 * hello_start() returns)
		 */
		rc = spdk_app_start(&opts, hello_start, hello_context, NULL);
		if (rc) {
			SPDK_NOTICELOG("ERROR!\n");
		} else {
			SPDK_NOTICELOG("SUCCCESS!\n");
		}
		/* Free up memory that we allocated */
		hello_cleanup(hello_context);
	} else {
		SPDK_ERRLOG("Could not alloc hello_context struct!!\n");
		rc = -ENOMEM;
	}

	/* Gracefully close out all of the SPDK subsystems. */
	spdk_app_fini();
	return rc;
}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-01-11  4:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-10 20:53 [SPDK] Problem with Blobstore when write 65MB continously Walker, Benjamin
  -- strict thread matches above, loose matches on Subject: below --
2018-01-11  4:08 Zhengyu Zhang
2018-01-10 19:28 Andrey Kuzmin
2018-01-10 17:17 Walker, Benjamin
2018-01-10 17:11 Walker, Benjamin
2018-01-10 17:02 Luse, Paul E
2018-01-10 17:00 Andrey Kuzmin
2018-01-10 16:58 Walker, Benjamin
2018-01-10 16:47 Luse, Paul E
2018-01-10 16:32 Walker, Benjamin
2018-01-10 16:21 Harris, James R
2018-01-10 16:03 Luse, Paul E
2018-01-10 15:59 Zhengyu Zhang
2018-01-10 15:18 Luse, Paul E
2018-01-10 14:03 Luse, Paul E
2018-01-10  3:15 Zhengyu Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.