All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 19:35 Andrey Kuzmin
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Kuzmin @ 2016-07-06 19:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6183 bytes --]

On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> Andrey,
>
> That sounds exactly like what we are experiencing, however we’re working
> off the spdk codebase that was current as of last week and are still
> experiencing the issue.  Do you know what the resource allocation fault was
> and how we might be able to determine if that is still occurring?
>
I'll take a look at commit log, both SPDK and mine, and will get back to
you.

Regards,
Andrey

> Ben,
>
> We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set
> our queue depth high enough it will fail that assertion, as would be
> expected.  Whatever other failure we’re experiencing does not seem to be
> causing spdk_nvme_ns_cmd_read() to return an error code.
>
>
>
> Also I performed some tests with the spdk perf tool and was not able to
> replicate our problem.  It ran fine at various queue depths and core
> masks.  When the qd was set too high, it failed gracefully with an error
> message.  This is all as expected.
>
>
>
> I’d like to continue down the path of investigating if some resource
> allocation or something else is failing silently for us.  Any specific
> ideas?
>
>
>
> Thanks!
>
> Will
>
>
>
> *From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <
> andrey.v.kuzmin(a)gmail.com>
> *Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
> *Date: *Wednesday, July 6, 2016 at 12:01 PM
> *To: *Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject: *Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being
> called sometimes
>
>
>
> On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <
> benjamin.walker(a)intel.com> wrote:
>
> Hi Will,
>
>
>
> Since I can't see the code for your application I'd like to try and
> reproduce the problem with code that I have some visibility into. Are you
> able to reproduce the problem using our perf tool (examples/nvme/perf)? If
> you aren't, this is likely a problem with your test application and not
> SPDK.
>
>
>
> I had been witnessing a similar issue with an earlier SPDK release, back
> around Feb, where the submit call was failing due to the resource
> allocation fault and neither returning an error nor invoking the callback,
> but my issue has been fixed in the recent release (I can't recall the
> actual commit, but there definitely was one dealing exactly with the
> cause).
>
>
>
>
>
> Based on the symptoms, my best guess is that your memory pool ran out of
> request objects. The first thing to check is whether spdk_nvme_ns_cmd_read
> failed. If it fails, it won't call the callback. You can check for failure
> by looking at the return value - see the documentation here
> <http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>.
> Your application allocates this memory pool up front - all of our examples
> allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need
> to allocate a large enough pool to handle the maximum number of outstanding
> requests you plan to have. We recently added a "hello_world" style example
> for the NVMe driver at
> https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with
> tons of comments. One of the comments explains this memory pool in detail.
>
>
>
> That memory pool allocation is a bit of a wart on our otherwise clean API.
> We're looking at different strategies to clean that up. Let me know what
> the result of the debugging is and I'll shoot you some more ideas to try if
> necessary.
>
>
>
> Are there any plans regarding the global request pool rework?
>
>
>
> Regards,
>
> Andrey
>
>
>
>
>
> Thanks,
>
> Ben
>
>
>
> On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
>
> Hello,
>
> We have written a test application that is utilizing the spdk library to
> benchmark a set of 3 Intel P3700 drives and a single 750 drive
> (concurrently).  We’ve done some testing using fio and the kernel nvme
> drivers and have had no problem achieving the claimed IOPs (4k random read)
> of all drives on our system.
>
>
>
> What we have found during our testing is that spdk will sometimes start to
> silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the
> following situations:
>
> 1.       Testing a single drive and passing in 0 for max_completions to
> spdk_nvme_qpair_process_completions().  We haven’t seen any issues with
> single drive testing when max_completions was > 0.
>
> 2.       Testing all four drives at once will result in one drive failing
> to receive callbacks, seemingly regardless of what number we pass for
> max_completions (1 through 128).
>
>
>
> Here are other observations we’ve made
>
> -When the callbacks fail to be called for a drive, they fail to be called
> for the remaining duration of the test.
>
> -The drive that ‘fails’ when testing 4 drives concurrently varies from
> test to test.
>
> -‘failure’ of a drive seems to be correlated with the number of
> outstanding read operations, though it is not a strict correlation.
>
>
>
> Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for
> CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads
> are on the the same cpu socket as the nvme device they are talking to.
>
>
>
> We’d like to know what is causing this issue and what we can do to help
> investigate the problem.  What other information can we provide?  Is there
> some part of the spdk code that we can look at to help determine the cause?
>
>
>
> Thanks,
>
> Will
>
>
>
> _______________________________________________
>
> SPDK mailing list
>
> SPDK(a)lists.01.org
>
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 13481 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-12 14:38 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-12 14:38 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 20815 bytes --]

Ben,

Thanks for the detailed explanation, it makes sense to me.  We’re happy to help resolve any issues with the SPDK.  We intend to keep using it since it so much more efficient than the kernel driver.
FYI my tests in the meantime have not uncovered any more issues and the fix worked as intended.
Thanks for being so responsive and helpful!

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of "Walker, Benjamin" <benjamin.walker(a)intel.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 4:03 PM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

Glad you found it. Here's an explanation of what was wrong in the NVMe driver:

The queue pairs are implemented as two arrays - one of 64 byte commands and the other of 16 byte completions - and two sets of head/tail indices. The driver submits commands by copying commands into the submission array and then doing an MMIO write to the "doorbell" register with the new value of the driver's submission queue tail index. The SSD completions commands by copying a completion into the completion array and toggling its phase bit, like I explained before. When the completion has been consumed, the NVMe driver writes an updated value for its completion tail index to a doorbell register. This final step is important - it is what signals which entries in the completion array are open for new completions.

MMIO is expensive (writes less so than reads, but still expensive), so we try very hard to avoid doing any. Instead of writing a new completion queue tail index after each completion, we recently added a patch that writes out the tail index only after we're done processing a whole batch of completions. Unfortunately, we didn't consider the case where the entire queue depth worth of completions was processed in one call to check for completions. We didn't originally think this was possible because the queue is 256 deep but we only allow 128 actual commands to be outstanding. It turns out, it is possible under very specific circumstances though, which is what you hit.

Here's a concrete example of one way to hit the problem. Let's say that the current completion queue tail index is 0 and there is an active completion there. When the function to poll for completions checks that and finds the phase bit toggled, it will immediately call the user callback. If that callback submits I/O and then does something computationally expensive, it's possible that by the time the user callback returns the I/O that was submitted is already complete. That means the polling function will find the next completion and repeat. It's possible that this goes on until the completion queue is entirely full (256 entries), at which point no new completions come and we exit the loop. When we exit the loop, we write the location where we expect the next completion to be placed - but that's 0 (circular queue) which is the old value. The device can't detect that we wrote the same value as was previously there, so it acts as if there was no update. This is responsible for the hang you were seeing. At high queue depths, the user callback doesn't even have to be particular expensive to make this happen on low latency devices.

We talked about different ways to solve this and the one we've agreed on is to limit the maximum number of completions in one batch of polling to the queue depth - 1 (255). I'm submitting a patch for this now.

Thanks for your help digging through this.

On Thu, 2016-07-07 at 18:45 +0000, Will Del Genio wrote:
Ben,

I was able to track down the bug I mentioned in the previous email to an issue in my code.  I believe the spdk is working correctly now thanks to change you suggested.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:36 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

Ben,

I have run more tests and experienced another failure (the same as before).  I wasn’t able to attach gdb to debug so I’ve been trying to replicate and haven’t had luck yet.  I will continue to try to replicate.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:09 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes


Ben,



I manually reverted the change from that commit and it looks like that resolved the issue.  At qd 256 I have not seen any failures.



For reference here is the relevant section of my nvme_qpair.c file now:



                if (++qpair->cq_head == qpair->num_entries) {

                        qpair->cq_head = 0;

                        qpair->phase = !qpair->phase;

                }



                spdk_mmio_write_4(qpair->cq_hdbl, qpair->cq_head);



                if (++num_completions == max_completions) {

                        break;

                }

        }



        return num_completions;

}



Let me know if you need any more help/info/testing.  Thank you very much for your help.



-Will



On 7/6/16, 6:23 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:







On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote:

> Ben,

>

> I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely

> sure if it was completely eliminated).



Can you try reverting commit ff7e2122c74b09e5961cbcb2622fda9c0087f48f  and see if that fixes the

problem? I believe we see the problem and it will only occur with 256 or greater queue depth. If

reverting that commit solves the problem, we'll submit a patch to fix it the right way and I'll

explain what happened.



>

> To get the vars I had to put the qd back to 256:

>

> (gdb) p *qpair

> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr =

> 0x7fffdaa5d000,

>   queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries =

> 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> (gdb) p qpair->cpl[qpair->cq_head -1]

> $2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2

> = 0, m = 0, dnr = 0}}

>  (gdb) p *(qpair->sq_tdbl)

> $4 = 121

>  (gdb) p *(qpair->cq_hdbl)

> $5 = 249

> (gdb)

>

> On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b

> enjamin.walker(a)intel.com> wrote:

>

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:

> > Ben,

> >

> > Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue

> > depth 256.

>

> Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the

> hardware. The NVMe driver should be doing software queueing beyond that automatically, but this

> data

> point will help narrow down the problem.

>

> >  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more

> > completions to handle, then I will recheck the code I wrote to keep track of the number of

> > outstanding read requests.

> >

> > Here is the qpair:

> > (gdb) p *qpair

> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =

> > 0x7fffdaa5d000,

> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =

> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>

> Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands

> outstanding.

> The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe

> completions - there is definitely a problem here.

>

> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> > (gdb) p qpair->phase

> > $2 = 0 '\000'

> > (gdb) p qpair->cpl[qpair->cq_head]

> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,

> > rsvd2 = 0, m = 0, dnr = 0}}

>

> Can you print out the following 3 things:

> - qpair->cpl[qpair->cq_head - 1]

> - qpair->sq_tdbl

> - qpair->cq_hdbl

>

> >

> > -Will

> >

> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of

> > b

> > enjamin.walker(a)intel.com> wrote:

> >

> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:

> > > Andrey,

> > >

> > > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found

> > > the

> > > reason is isn’t returning any completions is because this check is failing at line 469: if

> > > (cpl-

> > > > status.p != qpair->phase)

> > >

> > > Relevant gdb info here:

> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions

> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463

> > > 463   in nvme_qpair.c

> > > (gdb) p qpair->cpl[qpair->cq_head]

> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,

> > > rsvd2

> > > = 0, m = 0, dnr = 0}}

> > > (gdb) p qpair->phase

> > > $12 = 1 '\001'

> > >

> > > What does this mean?  Does this information help at all?

> >

> > The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses

> > -

> > and a set of head and tail indices. The arrays are circular, so you can loop back around to the

> > beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first

> > pass

> > through the array, new entries in the queue are marked by setting their phase to 1. On the next

> > pass

> > through the array, new elements are marked by setting their phase bit to 0, etc. The current

> > iteration's expected phase value is stored in qpair->phase. So the code you are looking at on

> > lines

> > 467-470 is basically saying:

> >

> > 1) Grab the completion entry at the head of the queue

> > 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

> >

> > All that means is that there are no completions outstanding for any commands according to the

> > SSD,

> > which doesn't narrow much down. At the point where you are broken in, can you just dump out the

> > whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are

> > any

> > commands actually pending at the device and what state the device is in. My expectation is that

> > there aren't any commands outstanding.

> >

> > Can you also provide some background as to what kind of I/O you're submitting (read or write,

> > size,

> > queue depth, etc.) when this occurs?

> >

> > > -Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 2:35 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> > >

> > > Andrey,

> > > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase

> > > that was current as of last week and are still experiencing the issue.  Do you know what the

> > > resource allocation fault was and how we might be able to determine if that is still

> > > occurring?

> > >

> > >

> > >

> > > I'll take a look at commit log, both SPDK and mine, and will get back to you.

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > > Ben,

> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth

> > > high

> > > enough it will fail that assertion, as would be expected.  Whatever other failure we’re

> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

> > >

> > > Also I performed some tests with the spdk perf tool and was not able to replicate our

> > > problem.  It

> > > ran fine at various queue depths and core masks.  When the qd was set too high, it failed

> > > gracefully with an error message.  This is all as expected.

> > >

> > > I’d like to continue down the path of investigating if some resource allocation or something

> > > else

> > > is failing silently for us.  Any specific ideas?

> > >

> > > Thanks!

> > > Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 12:01 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:

> > >

> > > Hi Will,

> > >

> > >

> > >

> > > Since I can't see the code for your application I'd like to try and reproduce the problem with

> > > code that I have some visibility into. Are you able to reproduce the problem using our perf

> > > tool

> > > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and

> > > not

> > > SPDK.

> > >

> > >

> > >

> > >

> > > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the

> > > submit call was failing due to the resource allocation fault and neither returning an error

> > > nor

> > > invoking the callback, but my issue has been fixed in the recent release (I can't recall the

> > > actual commit, but there definitely was one dealing exactly with the cause).

> > >

> > >

> > >

> > >

> > > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The

> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the

> > > callback. You can check for failure by looking at the return value - see the documentation

> > > here.

> > > Your application allocates this memory pool up front - all of our examples allocate 8k

> > > requests

> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to

> > > handle

> > > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"

> > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/he

> > > ll

> > > o_

> > > world with tons of comments. One of the comments explains this memory pool in detail.

> > >

> > >

> > >

> > > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at

> > > different strategies to clean that up. Let me know what the result of the debugging is and

> > > I'll

> > > shoot you some more ideas to try if necessary.

> > >

> > >

> > >

> > >

> > > Are there any plans regarding the global request pool rework?

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > >

> > >

> > >

> > >

> > > Thanks,

> > >

> > > Ben

> > >

> > >

> > >

> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:

> > >

> > > Hello,

> > > We have written a test application that is utilizing the spdk library to benchmark a set of 3

> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio

> > > and

> > > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of

> > > all

> > > drives on our system.

> > >

> > > What we have found during our testing is that spdk will sometimes start to silently fail to

> > > call

> > > the callback passed to spdk_nvme_ns_cmd_read in the following situations:

> > > 1.       Testing a single drive and passing in 0 for max_completions to

> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing

> > > when

> > > max_completions was > 0.

> > > 2.       Testing all four drives at once will result in one drive failing to receive

> > > callbacks,

> > > seemingly regardless of what number we pass for max_completions (1 through 128).

> > >

> > > Here are other observations we’ve made

> > > -When the callbacks fail to be called for a drive, they fail to be called for the remaining

> > > duration of the test.

> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.

> > > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,

> > > though it is not a strict correlation.

> > >

> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3

> > > are

> > > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme

> > > device

> > > they are talking to.

> > >

> > > We’d like to know what is causing this issue and what we can do to help investigate the

> > > problem.

> > > What other information can we provide?  Is there some part of the spdk code that we can look

> > > at

> > > to

> > > help determine the cause?

> > >

> > > Thanks,

> > > Will

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > > --

> > >

> > > Regards,

> > > Andrey

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> >

> >

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

>

>

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org

https://lists.01.org/mailman/listinfo/spdk



_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>

https://lists.01.org/mailman/listinfo/spdk

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 47239 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-07 21:03 Walker, Benjamin
  0 siblings, 0 replies; 18+ messages in thread
From: Walker, Benjamin @ 2016-07-07 21:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 20095 bytes --]

Glad you found it. Here's an explanation of what was wrong in the NVMe driver:

The queue pairs are implemented as two arrays - one of 64 byte commands and the other of 16 byte completions - and two sets of head/tail indices. The driver submits commands by copying commands into the submission array and then doing an MMIO write to the "doorbell" register with the new value of the driver's submission queue tail index. The SSD completions commands by copying a completion into the completion array and toggling its phase bit, like I explained before. When the completion has been consumed, the NVMe driver writes an updated value for its completion tail index to a doorbell register. This final step is important - it is what signals which entries in the completion array are open for new completions.

MMIO is expensive (writes less so than reads, but still expensive), so we try very hard to avoid doing any. Instead of writing a new completion queue tail index after each completion, we recently added a patch that writes out the tail index only after we're done processing a whole batch of completions. Unfortunately, we didn't consider the case where the entire queue depth worth of completions was processed in one call to check for completions. We didn't originally think this was possible because the queue is 256 deep but we only allow 128 actual commands to be outstanding. It turns out, it is possible under very specific circumstances though, which is what you hit.

Here's a concrete example of one way to hit the problem. Let's say that the current completion queue tail index is 0 and there is an active completion there. When the function to poll for completions checks that and finds the phase bit toggled, it will immediately call the user callback. If that callback submits I/O and then does something computationally expensive, it's possible that by the time the user callback returns the I/O that was submitted is already complete. That means the polling function will find the next completion and repeat. It's possible that this goes on until the completion queue is entirely full (256 entries), at which point no new completions come and we exit the loop. When we exit the loop, we write the location where we expect the next completion to be placed - but that's 0 (circular queue) which is the old value. The device can't detect that we wrote the same value as was previously there, so it acts as if there was no update. This is responsible for the hang you were seeing. At high queue depths, the user callback doesn't even have to be particular expensive to make this happen on low latency devices.

We talked about different ways to solve this and the one we've agreed on is to limit the maximum number of completions in one batch of polling to the queue depth - 1 (255). I'm submitting a patch for this now.

Thanks for your help digging through this.

On Thu, 2016-07-07 at 18:45 +0000, Will Del Genio wrote:
Ben,

I was able to track down the bug I mentioned in the previous email to an issue in my code.  I believe the spdk is working correctly now thanks to change you suggested.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:36 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

Ben,

I have run more tests and experienced another failure (the same as before).  I wasn’t able to attach gdb to debug so I’ve been trying to replicate and haven’t had luck yet.  I will continue to try to replicate.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:09 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes


Ben,



I manually reverted the change from that commit and it looks like that resolved the issue.  At qd 256 I have not seen any failures.



For reference here is the relevant section of my nvme_qpair.c file now:



                if (++qpair->cq_head == qpair->num_entries) {

                        qpair->cq_head = 0;

                        qpair->phase = !qpair->phase;

                }



                spdk_mmio_write_4(qpair->cq_hdbl, qpair->cq_head);



                if (++num_completions == max_completions) {

                        break;

                }

        }



        return num_completions;

}



Let me know if you need any more help/info/testing.  Thank you very much for your help.



-Will



On 7/6/16, 6:23 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:







On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote:

> Ben,

>

> I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely

> sure if it was completely eliminated).



Can you try reverting commit ff7e2122c74b09e5961cbcb2622fda9c0087f48f  and see if that fixes the

problem? I believe we see the problem and it will only occur with 256 or greater queue depth. If

reverting that commit solves the problem, we'll submit a patch to fix it the right way and I'll

explain what happened.



>

> To get the vars I had to put the qd back to 256:

>

> (gdb) p *qpair

> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr =

> 0x7fffdaa5d000,

>   queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries =

> 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> (gdb) p qpair->cpl[qpair->cq_head -1]

> $2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2

> = 0, m = 0, dnr = 0}}

>  (gdb) p *(qpair->sq_tdbl)

> $4 = 121

>  (gdb) p *(qpair->cq_hdbl)

> $5 = 249

> (gdb)

>

> On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b

> enjamin.walker(a)intel.com> wrote:

>

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:

> > Ben,

> >

> > Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue

> > depth 256.

>

> Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the

> hardware. The NVMe driver should be doing software queueing beyond that automatically, but this

> data

> point will help narrow down the problem.

>

> >  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more

> > completions to handle, then I will recheck the code I wrote to keep track of the number of

> > outstanding read requests.

> >

> > Here is the qpair:

> > (gdb) p *qpair

> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =

> > 0x7fffdaa5d000,

> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =

> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>

> Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands

> outstanding.

> The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe

> completions - there is definitely a problem here.

>

> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> > (gdb) p qpair->phase

> > $2 = 0 '\000'

> > (gdb) p qpair->cpl[qpair->cq_head]

> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,

> > rsvd2 = 0, m = 0, dnr = 0}}

>

> Can you print out the following 3 things:

> - qpair->cpl[qpair->cq_head - 1]

> - qpair->sq_tdbl

> - qpair->cq_hdbl

>

> >

> > -Will

> >

> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of

> > b

> > enjamin.walker(a)intel.com> wrote:

> >

> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:

> > > Andrey,

> > >

> > > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found

> > > the

> > > reason is isn’t returning any completions is because this check is failing at line 469: if

> > > (cpl-

> > > > status.p != qpair->phase)

> > >

> > > Relevant gdb info here:

> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions

> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463

> > > 463   in nvme_qpair.c

> > > (gdb) p qpair->cpl[qpair->cq_head]

> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,

> > > rsvd2

> > > = 0, m = 0, dnr = 0}}

> > > (gdb) p qpair->phase

> > > $12 = 1 '\001'

> > >

> > > What does this mean?  Does this information help at all?

> >

> > The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses

> > -

> > and a set of head and tail indices. The arrays are circular, so you can loop back around to the

> > beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first

> > pass

> > through the array, new entries in the queue are marked by setting their phase to 1. On the next

> > pass

> > through the array, new elements are marked by setting their phase bit to 0, etc. The current

> > iteration's expected phase value is stored in qpair->phase. So the code you are looking at on

> > lines

> > 467-470 is basically saying:

> >

> > 1) Grab the completion entry at the head of the queue

> > 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

> >

> > All that means is that there are no completions outstanding for any commands according to the

> > SSD,

> > which doesn't narrow much down. At the point where you are broken in, can you just dump out the

> > whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are

> > any

> > commands actually pending at the device and what state the device is in. My expectation is that

> > there aren't any commands outstanding.

> >

> > Can you also provide some background as to what kind of I/O you're submitting (read or write,

> > size,

> > queue depth, etc.) when this occurs?

> >

> > > -Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 2:35 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> > >

> > > Andrey,

> > > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase

> > > that was current as of last week and are still experiencing the issue.  Do you know what the

> > > resource allocation fault was and how we might be able to determine if that is still

> > > occurring?

> > >

> > >

> > >

> > > I'll take a look at commit log, both SPDK and mine, and will get back to you.

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > > Ben,

> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth

> > > high

> > > enough it will fail that assertion, as would be expected.  Whatever other failure we’re

> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

> > >

> > > Also I performed some tests with the spdk perf tool and was not able to replicate our

> > > problem.  It

> > > ran fine at various queue depths and core masks.  When the qd was set too high, it failed

> > > gracefully with an error message.  This is all as expected.

> > >

> > > I’d like to continue down the path of investigating if some resource allocation or something

> > > else

> > > is failing silently for us.  Any specific ideas?

> > >

> > > Thanks!

> > > Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 12:01 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:

> > >

> > > Hi Will,

> > >

> > >

> > >

> > > Since I can't see the code for your application I'd like to try and reproduce the problem with

> > > code that I have some visibility into. Are you able to reproduce the problem using our perf

> > > tool

> > > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and

> > > not

> > > SPDK.

> > >

> > >

> > >

> > >

> > > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the

> > > submit call was failing due to the resource allocation fault and neither returning an error

> > > nor

> > > invoking the callback, but my issue has been fixed in the recent release (I can't recall the

> > > actual commit, but there definitely was one dealing exactly with the cause).

> > >

> > >

> > >

> > >

> > > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The

> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the

> > > callback. You can check for failure by looking at the return value - see the documentation

> > > here.

> > > Your application allocates this memory pool up front - all of our examples allocate 8k

> > > requests

> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to

> > > handle

> > > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"

> > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/he

> > > ll

> > > o_

> > > world with tons of comments. One of the comments explains this memory pool in detail.

> > >

> > >

> > >

> > > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at

> > > different strategies to clean that up. Let me know what the result of the debugging is and

> > > I'll

> > > shoot you some more ideas to try if necessary.

> > >

> > >

> > >

> > >

> > > Are there any plans regarding the global request pool rework?

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > >

> > >

> > >

> > >

> > > Thanks,

> > >

> > > Ben

> > >

> > >

> > >

> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:

> > >

> > > Hello,

> > > We have written a test application that is utilizing the spdk library to benchmark a set of 3

> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio

> > > and

> > > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of

> > > all

> > > drives on our system.

> > >

> > > What we have found during our testing is that spdk will sometimes start to silently fail to

> > > call

> > > the callback passed to spdk_nvme_ns_cmd_read in the following situations:

> > > 1.       Testing a single drive and passing in 0 for max_completions to

> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing

> > > when

> > > max_completions was > 0.

> > > 2.       Testing all four drives at once will result in one drive failing to receive

> > > callbacks,

> > > seemingly regardless of what number we pass for max_completions (1 through 128).

> > >

> > > Here are other observations we’ve made

> > > -When the callbacks fail to be called for a drive, they fail to be called for the remaining

> > > duration of the test.

> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.

> > > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,

> > > though it is not a strict correlation.

> > >

> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3

> > > are

> > > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme

> > > device

> > > they are talking to.

> > >

> > > We’d like to know what is causing this issue and what we can do to help investigate the

> > > problem.

> > > What other information can we provide?  Is there some part of the spdk code that we can look

> > > at

> > > to

> > > help determine the cause?

> > >

> > > Thanks,

> > > Will

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > > --

> > >

> > > Regards,

> > > Andrey

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> >

> >

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

>

>

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org

https://lists.01.org/mailman/listinfo/spdk



_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 43592 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-07 18:45 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-07 18:45 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 16979 bytes --]

Ben,

I was able to track down the bug I mentioned in the previous email to an issue in my code.  I believe the spdk is working correctly now thanks to change you suggested.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:36 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

Ben,

I have run more tests and experienced another failure (the same as before).  I wasn’t able to attach gdb to debug so I’ve been trying to replicate and haven’t had luck yet.  I will continue to try to replicate.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:09 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes


Ben,



I manually reverted the change from that commit and it looks like that resolved the issue.  At qd 256 I have not seen any failures.



For reference here is the relevant section of my nvme_qpair.c file now:



                if (++qpair->cq_head == qpair->num_entries) {

                        qpair->cq_head = 0;

                        qpair->phase = !qpair->phase;

                }



                spdk_mmio_write_4(qpair->cq_hdbl, qpair->cq_head);



                if (++num_completions == max_completions) {

                        break;

                }

        }



        return num_completions;

}



Let me know if you need any more help/info/testing.  Thank you very much for your help.



-Will



On 7/6/16, 6:23 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:







On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote:

> Ben,

>

> I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely

> sure if it was completely eliminated).



Can you try reverting commit ff7e2122c74b09e5961cbcb2622fda9c0087f48f  and see if that fixes the

problem? I believe we see the problem and it will only occur with 256 or greater queue depth. If

reverting that commit solves the problem, we'll submit a patch to fix it the right way and I'll

explain what happened.



>

> To get the vars I had to put the qd back to 256:

>

> (gdb) p *qpair

> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr =

> 0x7fffdaa5d000,

>   queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries =

> 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> (gdb) p qpair->cpl[qpair->cq_head -1]

> $2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2

> = 0, m = 0, dnr = 0}}

>  (gdb) p *(qpair->sq_tdbl)

> $4 = 121

>  (gdb) p *(qpair->cq_hdbl)

> $5 = 249

> (gdb)

>

> On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b

> enjamin.walker(a)intel.com> wrote:

>

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:

> > Ben,

> >

> > Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue

> > depth 256.

>

> Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the

> hardware. The NVMe driver should be doing software queueing beyond that automatically, but this

> data

> point will help narrow down the problem.

>

> >  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more

> > completions to handle, then I will recheck the code I wrote to keep track of the number of

> > outstanding read requests.

> >

> > Here is the qpair:

> > (gdb) p *qpair

> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =

> > 0x7fffdaa5d000,

> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =

> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>

> Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands

> outstanding.

> The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe

> completions - there is definitely a problem here.

>

> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> > (gdb) p qpair->phase

> > $2 = 0 '\000'

> > (gdb) p qpair->cpl[qpair->cq_head]

> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,

> > rsvd2 = 0, m = 0, dnr = 0}}

>

> Can you print out the following 3 things:

> - qpair->cpl[qpair->cq_head - 1]

> - qpair->sq_tdbl

> - qpair->cq_hdbl

>

> >

> > -Will

> >

> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of

> > b

> > enjamin.walker(a)intel.com> wrote:

> >

> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:

> > > Andrey,

> > >

> > > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found

> > > the

> > > reason is isn’t returning any completions is because this check is failing at line 469: if

> > > (cpl-

> > > > status.p != qpair->phase)

> > >

> > > Relevant gdb info here:

> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions

> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463

> > > 463   in nvme_qpair.c

> > > (gdb) p qpair->cpl[qpair->cq_head]

> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,

> > > rsvd2

> > > = 0, m = 0, dnr = 0}}

> > > (gdb) p qpair->phase

> > > $12 = 1 '\001'

> > >

> > > What does this mean?  Does this information help at all?

> >

> > The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses

> > -

> > and a set of head and tail indices. The arrays are circular, so you can loop back around to the

> > beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first

> > pass

> > through the array, new entries in the queue are marked by setting their phase to 1. On the next

> > pass

> > through the array, new elements are marked by setting their phase bit to 0, etc. The current

> > iteration's expected phase value is stored in qpair->phase. So the code you are looking at on

> > lines

> > 467-470 is basically saying:

> >

> > 1) Grab the completion entry at the head of the queue

> > 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

> >

> > All that means is that there are no completions outstanding for any commands according to the

> > SSD,

> > which doesn't narrow much down. At the point where you are broken in, can you just dump out the

> > whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are

> > any

> > commands actually pending at the device and what state the device is in. My expectation is that

> > there aren't any commands outstanding.

> >

> > Can you also provide some background as to what kind of I/O you're submitting (read or write,

> > size,

> > queue depth, etc.) when this occurs?

> >

> > > -Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 2:35 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> > >

> > > Andrey,

> > > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase

> > > that was current as of last week and are still experiencing the issue.  Do you know what the

> > > resource allocation fault was and how we might be able to determine if that is still

> > > occurring?

> > >

> > >

> > >

> > > I'll take a look at commit log, both SPDK and mine, and will get back to you.

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > > Ben,

> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth

> > > high

> > > enough it will fail that assertion, as would be expected.  Whatever other failure we’re

> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

> > >

> > > Also I performed some tests with the spdk perf tool and was not able to replicate our

> > > problem.  It

> > > ran fine at various queue depths and core masks.  When the qd was set too high, it failed

> > > gracefully with an error message.  This is all as expected.

> > >

> > > I’d like to continue down the path of investigating if some resource allocation or something

> > > else

> > > is failing silently for us.  Any specific ideas?

> > >

> > > Thanks!

> > > Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 12:01 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:

> > >

> > > Hi Will,

> > >

> > >

> > >

> > > Since I can't see the code for your application I'd like to try and reproduce the problem with

> > > code that I have some visibility into. Are you able to reproduce the problem using our perf

> > > tool

> > > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and

> > > not

> > > SPDK.

> > >

> > >

> > >

> > >

> > > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the

> > > submit call was failing due to the resource allocation fault and neither returning an error

> > > nor

> > > invoking the callback, but my issue has been fixed in the recent release (I can't recall the

> > > actual commit, but there definitely was one dealing exactly with the cause).

> > >

> > >

> > >

> > >

> > > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The

> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the

> > > callback. You can check for failure by looking at the return value - see the documentation

> > > here.

> > > Your application allocates this memory pool up front - all of our examples allocate 8k

> > > requests

> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to

> > > handle

> > > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"

> > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/he

> > > ll

> > > o_

> > > world with tons of comments. One of the comments explains this memory pool in detail.

> > >

> > >

> > >

> > > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at

> > > different strategies to clean that up. Let me know what the result of the debugging is and

> > > I'll

> > > shoot you some more ideas to try if necessary.

> > >

> > >

> > >

> > >

> > > Are there any plans regarding the global request pool rework?

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > >

> > >

> > >

> > >

> > > Thanks,

> > >

> > > Ben

> > >

> > >

> > >

> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:

> > >

> > > Hello,

> > > We have written a test application that is utilizing the spdk library to benchmark a set of 3

> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio

> > > and

> > > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of

> > > all

> > > drives on our system.

> > >

> > > What we have found during our testing is that spdk will sometimes start to silently fail to

> > > call

> > > the callback passed to spdk_nvme_ns_cmd_read in the following situations:

> > > 1.       Testing a single drive and passing in 0 for max_completions to

> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing

> > > when

> > > max_completions was > 0.

> > > 2.       Testing all four drives at once will result in one drive failing to receive

> > > callbacks,

> > > seemingly regardless of what number we pass for max_completions (1 through 128).

> > >

> > > Here are other observations we’ve made

> > > -When the callbacks fail to be called for a drive, they fail to be called for the remaining

> > > duration of the test.

> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.

> > > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,

> > > though it is not a strict correlation.

> > >

> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3

> > > are

> > > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme

> > > device

> > > they are talking to.

> > >

> > > We’d like to know what is causing this issue and what we can do to help investigate the

> > > problem.

> > > What other information can we provide?  Is there some part of the spdk code that we can look

> > > at

> > > to

> > > help determine the cause?

> > >

> > > Thanks,

> > > Will

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > > --

> > >

> > > Regards,

> > > Andrey

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> >

> >

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

>

>

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org

https://lists.01.org/mailman/listinfo/spdk



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 40192 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-07 15:36 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-07 15:36 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 16429 bytes --]

Ben,

I have run more tests and experienced another failure (the same as before).  I wasn’t able to attach gdb to debug so I’ve been trying to replicate and haven’t had luck yet.  I will continue to try to replicate.

-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Will Del Genio <wdelgenio(a)xeograph.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, July 7, 2016 at 10:09 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes


Ben,



I manually reverted the change from that commit and it looks like that resolved the issue.  At qd 256 I have not seen any failures.



For reference here is the relevant section of my nvme_qpair.c file now:



                if (++qpair->cq_head == qpair->num_entries) {

                        qpair->cq_head = 0;

                        qpair->phase = !qpair->phase;

                }



                spdk_mmio_write_4(qpair->cq_hdbl, qpair->cq_head);



                if (++num_completions == max_completions) {

                        break;

                }

        }



        return num_completions;

}



Let me know if you need any more help/info/testing.  Thank you very much for your help.



-Will



On 7/6/16, 6:23 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:







On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote:

> Ben,

>

> I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely

> sure if it was completely eliminated).



Can you try reverting commit ff7e2122c74b09e5961cbcb2622fda9c0087f48f  and see if that fixes the

problem? I believe we see the problem and it will only occur with 256 or greater queue depth. If

reverting that commit solves the problem, we'll submit a patch to fix it the right way and I'll

explain what happened.



>

> To get the vars I had to put the qd back to 256:

>

> (gdb) p *qpair

> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr =

> 0x7fffdaa5d000,

>   queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries =

> 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> (gdb) p qpair->cpl[qpair->cq_head -1]

> $2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2

> = 0, m = 0, dnr = 0}}

>  (gdb) p *(qpair->sq_tdbl)

> $4 = 121

>  (gdb) p *(qpair->cq_hdbl)

> $5 = 249

> (gdb)

>

> On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b

> enjamin.walker(a)intel.com> wrote:

>

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:

> > Ben,

> >

> > Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue

> > depth 256.

>

> Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the

> hardware. The NVMe driver should be doing software queueing beyond that automatically, but this

> data

> point will help narrow down the problem.

>

> >  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more

> > completions to handle, then I will recheck the code I wrote to keep track of the number of

> > outstanding read requests.

> >

> > Here is the qpair:

> > (gdb) p *qpair

> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =

> > 0x7fffdaa5d000,

> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =

> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>

> Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands

> outstanding.

> The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe

> completions - there is definitely a problem here.

>

> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> > (gdb) p qpair->phase

> > $2 = 0 '\000'

> > (gdb) p qpair->cpl[qpair->cq_head]

> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,

> > rsvd2 = 0, m = 0, dnr = 0}}

>

> Can you print out the following 3 things:

> - qpair->cpl[qpair->cq_head - 1]

> - qpair->sq_tdbl

> - qpair->cq_hdbl

>

> >

> > -Will

> >

> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of

> > b

> > enjamin.walker(a)intel.com> wrote:

> >

> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:

> > > Andrey,

> > >

> > > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found

> > > the

> > > reason is isn’t returning any completions is because this check is failing at line 469: if

> > > (cpl-

> > > > status.p != qpair->phase)

> > >

> > > Relevant gdb info here:

> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions

> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463

> > > 463   in nvme_qpair.c

> > > (gdb) p qpair->cpl[qpair->cq_head]

> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,

> > > rsvd2

> > > = 0, m = 0, dnr = 0}}

> > > (gdb) p qpair->phase

> > > $12 = 1 '\001'

> > >

> > > What does this mean?  Does this information help at all?

> >

> > The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses

> > -

> > and a set of head and tail indices. The arrays are circular, so you can loop back around to the

> > beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first

> > pass

> > through the array, new entries in the queue are marked by setting their phase to 1. On the next

> > pass

> > through the array, new elements are marked by setting their phase bit to 0, etc. The current

> > iteration's expected phase value is stored in qpair->phase. So the code you are looking at on

> > lines

> > 467-470 is basically saying:

> >

> > 1) Grab the completion entry at the head of the queue

> > 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

> >

> > All that means is that there are no completions outstanding for any commands according to the

> > SSD,

> > which doesn't narrow much down. At the point where you are broken in, can you just dump out the

> > whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are

> > any

> > commands actually pending at the device and what state the device is in. My expectation is that

> > there aren't any commands outstanding.

> >

> > Can you also provide some background as to what kind of I/O you're submitting (read or write,

> > size,

> > queue depth, etc.) when this occurs?

> >

> > > -Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 2:35 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> > >

> > > Andrey,

> > > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase

> > > that was current as of last week and are still experiencing the issue.  Do you know what the

> > > resource allocation fault was and how we might be able to determine if that is still

> > > occurring?

> > >

> > >

> > >

> > > I'll take a look at commit log, both SPDK and mine, and will get back to you.

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > > Ben,

> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth

> > > high

> > > enough it will fail that assertion, as would be expected.  Whatever other failure we’re

> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

> > >

> > > Also I performed some tests with the spdk perf tool and was not able to replicate our

> > > problem.  It

> > > ran fine at various queue depths and core masks.  When the qd was set too high, it failed

> > > gracefully with an error message.  This is all as expected.

> > >

> > > I’d like to continue down the path of investigating if some resource allocation or something

> > > else

> > > is failing silently for us.  Any specific ideas?

> > >

> > > Thanks!

> > > Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 12:01 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:

> > >

> > > Hi Will,

> > >

> > >

> > >

> > > Since I can't see the code for your application I'd like to try and reproduce the problem with

> > > code that I have some visibility into. Are you able to reproduce the problem using our perf

> > > tool

> > > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and

> > > not

> > > SPDK.

> > >

> > >

> > >

> > >

> > > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the

> > > submit call was failing due to the resource allocation fault and neither returning an error

> > > nor

> > > invoking the callback, but my issue has been fixed in the recent release (I can't recall the

> > > actual commit, but there definitely was one dealing exactly with the cause).

> > >

> > >

> > >

> > >

> > > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The

> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the

> > > callback. You can check for failure by looking at the return value - see the documentation

> > > here.

> > > Your application allocates this memory pool up front - all of our examples allocate 8k

> > > requests

> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to

> > > handle

> > > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"

> > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/he

> > > ll

> > > o_

> > > world with tons of comments. One of the comments explains this memory pool in detail.

> > >

> > >

> > >

> > > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at

> > > different strategies to clean that up. Let me know what the result of the debugging is and

> > > I'll

> > > shoot you some more ideas to try if necessary.

> > >

> > >

> > >

> > >

> > > Are there any plans regarding the global request pool rework?

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > >

> > >

> > >

> > >

> > > Thanks,

> > >

> > > Ben

> > >

> > >

> > >

> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:

> > >

> > > Hello,

> > > We have written a test application that is utilizing the spdk library to benchmark a set of 3

> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio

> > > and

> > > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of

> > > all

> > > drives on our system.

> > >

> > > What we have found during our testing is that spdk will sometimes start to silently fail to

> > > call

> > > the callback passed to spdk_nvme_ns_cmd_read in the following situations:

> > > 1.       Testing a single drive and passing in 0 for max_completions to

> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing

> > > when

> > > max_completions was > 0.

> > > 2.       Testing all four drives at once will result in one drive failing to receive

> > > callbacks,

> > > seemingly regardless of what number we pass for max_completions (1 through 128).

> > >

> > > Here are other observations we’ve made

> > > -When the callbacks fail to be called for a drive, they fail to be called for the remaining

> > > duration of the test.

> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.

> > > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,

> > > though it is not a strict correlation.

> > >

> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3

> > > are

> > > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme

> > > device

> > > they are talking to.

> > >

> > > We’d like to know what is causing this issue and what we can do to help investigate the

> > > problem.

> > > What other information can we provide?  Is there some part of the spdk code that we can look

> > > at

> > > to

> > > help determine the cause?

> > >

> > > Thanks,

> > > Will

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > > --

> > >

> > > Regards,

> > > Andrey

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> >

> >

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

>

>

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org

https://lists.01.org/mailman/listinfo/spdk



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 38675 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-07 15:09 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-07 15:09 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 15828 bytes --]

Ben,



I manually reverted the change from that commit and it looks like that resolved the issue.  At qd 256 I have not seen any failures.



For reference here is the relevant section of my nvme_qpair.c file now:



                if (++qpair->cq_head == qpair->num_entries) {

                        qpair->cq_head = 0;

                        qpair->phase = !qpair->phase;

                }



                spdk_mmio_write_4(qpair->cq_hdbl, qpair->cq_head);



                if (++num_completions == max_completions) {

                        break;

                }

        }



        return num_completions;

}



Let me know if you need any more help/info/testing.  Thank you very much for your help.



-Will



On 7/6/16, 6:23 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:







On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote:

> Ben,

>

> I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely

> sure if it was completely eliminated).



Can you try reverting commit ff7e2122c74b09e5961cbcb2622fda9c0087f48f  and see if that fixes the

problem? I believe we see the problem and it will only occur with 256 or greater queue depth. If

reverting that commit solves the problem, we'll submit a patch to fix it the right way and I'll

explain what happened.



>

> To get the vars I had to put the qd back to 256:

>

> (gdb) p *qpair

> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr =

> 0x7fffdaa5d000,

>   queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries =

> 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> (gdb) p qpair->cpl[qpair->cq_head -1]

> $2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2

> = 0, m = 0, dnr = 0}}

>  (gdb) p *(qpair->sq_tdbl)

> $4 = 121

>  (gdb) p *(qpair->cq_hdbl)

> $5 = 249

> (gdb)

>

> On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b

> enjamin.walker(a)intel.com> wrote:

>

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:

> > Ben,

> >

> > Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue

> > depth 256.

>

> Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the

> hardware. The NVMe driver should be doing software queueing beyond that automatically, but this

> data

> point will help narrow down the problem.

>

> >  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more

> > completions to handle, then I will recheck the code I wrote to keep track of the number of

> > outstanding read requests.

> >

> > Here is the qpair:

> > (gdb) p *qpair

> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =

> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =

> > 0x7fffdaa5d000,

> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =

> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

>

> Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands

> outstanding.

> The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe

> completions - there is definitely a problem here.

>

> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},

> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}

> > (gdb) p qpair->phase

> > $2 = 0 '\000'

> > (gdb) p qpair->cpl[qpair->cq_head]

> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,

> > rsvd2 = 0, m = 0, dnr = 0}}

>

> Can you print out the following 3 things:

> - qpair->cpl[qpair->cq_head - 1]

> - qpair->sq_tdbl

> - qpair->cq_hdbl

>

> >

> > -Will

> >

> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of

> > b

> > enjamin.walker(a)intel.com> wrote:

> >

> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:

> > > Andrey,

> > >

> > > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found

> > > the

> > > reason is isn’t returning any completions is because this check is failing at line 469: if

> > > (cpl-

> > > > status.p != qpair->phase)

> > >

> > > Relevant gdb info here:

> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions

> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463

> > > 463   in nvme_qpair.c

> > > (gdb) p qpair->cpl[qpair->cq_head]

> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,

> > > rsvd2

> > > = 0, m = 0, dnr = 0}}

> > > (gdb) p qpair->phase

> > > $12 = 1 '\001'

> > >

> > > What does this mean?  Does this information help at all?

> >

> > The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses

> > -

> > and a set of head and tail indices. The arrays are circular, so you can loop back around to the

> > beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first

> > pass

> > through the array, new entries in the queue are marked by setting their phase to 1. On the next

> > pass

> > through the array, new elements are marked by setting their phase bit to 0, etc. The current

> > iteration's expected phase value is stored in qpair->phase. So the code you are looking at on

> > lines

> > 467-470 is basically saying:

> >

> > 1) Grab the completion entry at the head of the queue

> > 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

> >

> > All that means is that there are no completions outstanding for any commands according to the

> > SSD,

> > which doesn't narrow much down. At the point where you are broken in, can you just dump out the

> > whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are

> > any

> > commands actually pending at the device and what state the device is in. My expectation is that

> > there aren't any commands outstanding.

> >

> > Can you also provide some background as to what kind of I/O you're submitting (read or write,

> > size,

> > queue depth, etc.) when this occurs?

> >

> > > -Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 2:35 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> > >

> > > Andrey,

> > > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase

> > > that was current as of last week and are still experiencing the issue.  Do you know what the

> > > resource allocation fault was and how we might be able to determine if that is still

> > > occurring?

> > >

> > >

> > >

> > > I'll take a look at commit log, both SPDK and mine, and will get back to you.

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > > Ben,

> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth

> > > high

> > > enough it will fail that assertion, as would be expected.  Whatever other failure we’re

> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

> > >

> > > Also I performed some tests with the spdk perf tool and was not able to replicate our

> > > problem.  It

> > > ran fine at various queue depths and core masks.  When the qd was set too high, it failed

> > > gracefully with an error message.  This is all as expected.

> > >

> > > I’d like to continue down the path of investigating if some resource allocation or something

> > > else

> > > is failing silently for us.  Any specific ideas?

> > >

> > > Thanks!

> > > Will

> > >

> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>

> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Date: Wednesday, July 6, 2016 at 12:01 PM

> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>

> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

> > >

> > >

> > >

> > >

> > >

> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:

> > >

> > > Hi Will,

> > >

> > >

> > >

> > > Since I can't see the code for your application I'd like to try and reproduce the problem with

> > > code that I have some visibility into. Are you able to reproduce the problem using our perf

> > > tool

> > > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and

> > > not

> > > SPDK.

> > >

> > >

> > >

> > >

> > > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the

> > > submit call was failing due to the resource allocation fault and neither returning an error

> > > nor

> > > invoking the callback, but my issue has been fixed in the recent release (I can't recall the

> > > actual commit, but there definitely was one dealing exactly with the cause).

> > >

> > >

> > >

> > >

> > > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The

> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the

> > > callback. You can check for failure by looking at the return value - see the documentation

> > > here.

> > > Your application allocates this memory pool up front - all of our examples allocate 8k

> > > requests

> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to

> > > handle

> > > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"

> > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/he

> > > ll

> > > o_

> > > world with tons of comments. One of the comments explains this memory pool in detail.

> > >

> > >

> > >

> > > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at

> > > different strategies to clean that up. Let me know what the result of the debugging is and

> > > I'll

> > > shoot you some more ideas to try if necessary.

> > >

> > >

> > >

> > >

> > > Are there any plans regarding the global request pool rework?

> > >

> > >

> > >

> > > Regards,

> > >

> > > Andrey

> > >

> > >

> > >

> > >

> > >

> > > Thanks,

> > >

> > > Ben

> > >

> > >

> > >

> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:

> > >

> > > Hello,

> > > We have written a test application that is utilizing the spdk library to benchmark a set of 3

> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio

> > > and

> > > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of

> > > all

> > > drives on our system.

> > >

> > > What we have found during our testing is that spdk will sometimes start to silently fail to

> > > call

> > > the callback passed to spdk_nvme_ns_cmd_read in the following situations:

> > > 1.       Testing a single drive and passing in 0 for max_completions to

> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing

> > > when

> > > max_completions was > 0.

> > > 2.       Testing all four drives at once will result in one drive failing to receive

> > > callbacks,

> > > seemingly regardless of what number we pass for max_completions (1 through 128).

> > >

> > > Here are other observations we’ve made

> > > -When the callbacks fail to be called for a drive, they fail to be called for the remaining

> > > duration of the test.

> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.

> > > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,

> > > though it is not a strict correlation.

> > >

> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3

> > > are

> > > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme

> > > device

> > > they are talking to.

> > >

> > > We’d like to know what is causing this issue and what we can do to help investigate the

> > > problem.

> > > What other information can we provide?  Is there some part of the spdk code that we can look

> > > at

> > > to

> > > help determine the cause?

> > >

> > > Thanks,

> > > Will

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > >

> > > --

> > >

> > > Regards,

> > > Andrey

> > >

> > >

> > >

> > >

> > > _______________________________________________

> > > SPDK mailing list

> > > SPDK(a)lists.01.org

> > > https://lists.01.org/mailman/listinfo/spdk

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> >

> >

> > _______________________________________________

> > SPDK mailing list

> > SPDK(a)lists.01.org

> > https://lists.01.org/mailman/listinfo/spdk

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

>

>

> _______________________________________________

> SPDK mailing list

> SPDK(a)lists.01.org

> https://lists.01.org/mailman/listinfo/spdk

_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org

https://lists.01.org/mailman/listinfo/spdk



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 37057 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 23:23 Walker, Benjamin
  0 siblings, 0 replies; 18+ messages in thread
From: Walker, Benjamin @ 2016-07-06 23:23 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 14269 bytes --]



On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote:
> Ben,
> 
> I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely
> sure if it was completely eliminated).

Can you try reverting commit ff7e2122c74b09e5961cbcb2622fda9c0087f48f  and see if that fixes the
problem? I believe we see the problem and it will only occur with 256 or greater queue depth. If
reverting that commit solves the problem, we'll submit a patch to fix it the right way and I'll
explain what happened.

> 
> To get the vars I had to put the qd back to 256:
> 
> (gdb) p *qpair
> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =
> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr =
> 0x7fffdaa5d000,
>   queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries =
> 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,
>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},
> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
> (gdb) p qpair->cpl[qpair->cq_head -1]
> $2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2
> = 0, m = 0, dnr = 0}}
>  (gdb) p *(qpair->sq_tdbl)
> $4 = 121
>  (gdb) p *(qpair->cq_hdbl)
> $5 = 249
> (gdb)
> 
> On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b
> enjamin.walker(a)intel.com> wrote:
> 
> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:
> > Ben,
> > 
> > Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue
> > depth 256. 
> 
> Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the
> hardware. The NVMe driver should be doing software queueing beyond that automatically, but this
> data
> point will help narrow down the problem.
> 
> >  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more
> > completions to handle, then I will recheck the code I wrote to keep track of the number of
> > outstanding read requests.
> > 
> > Here is the qpair:
> > (gdb) p *qpair
> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =
> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =
> > 0x7fffdaa5d000,
> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =
> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,
> 
> Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands
> outstanding.
> The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe
> completions - there is definitely a problem here.
> 
> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},
> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
> > (gdb) p qpair->phase
> > $2 = 0 '\000'
> > (gdb) p qpair->cpl[qpair->cq_head]
> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,
> > rsvd2 = 0, m = 0, dnr = 0}}
> 
> Can you print out the following 3 things:
> - qpair->cpl[qpair->cq_head - 1]
> - qpair->sq_tdbl
> - qpair->cq_hdbl
> 
> > 
> > -Will
> > 
> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
> > b
> > enjamin.walker(a)intel.com> wrote:
> > 
> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> > > Andrey,
> > >  
> > > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found
> > > the
> > > reason is isn’t returning any completions is because this check is failing at line 469: if
> > > (cpl-
> > > > status.p != qpair->phase)
> > >  
> > > Relevant gdb info here:
> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions
> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> > > 463   in nvme_qpair.c
> > > (gdb) p qpair->cpl[qpair->cq_head]
> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,
> > > rsvd2
> > > = 0, m = 0, dnr = 0}}
> > > (gdb) p qpair->phase
> > > $12 = 1 '\001'
> > >  
> > > What does this mean?  Does this information help at all?
> > 
> > The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses
> > -
> > and a set of head and tail indices. The arrays are circular, so you can loop back around to the
> > beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first
> > pass
> > through the array, new entries in the queue are marked by setting their phase to 1. On the next
> > pass
> > through the array, new elements are marked by setting their phase bit to 0, etc. The current
> > iteration's expected phase value is stored in qpair->phase. So the code you are looking at on
> > lines
> > 467-470 is basically saying:
> > 
> > 1) Grab the completion entry at the head of the queue
> > 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit
> > 
> > All that means is that there are no completions outstanding for any commands according to the
> > SSD,
> > which doesn't narrow much down. At the point where you are broken in, can you just dump out the
> > whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are
> > any
> > commands actually pending at the device and what state the device is in. My expectation is that
> > there aren't any commands outstanding.
> > 
> > Can you also provide some background as to what kind of I/O you're submitting (read or write,
> > size,
> > queue depth, etc.) when this occurs?
> > 
> > > -Will
> > >  
> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Date: Wednesday, July 6, 2016 at 2:35 PM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> > > 
> > >  
> > > 
> > >  
> > > 
> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:
> > > 
> > > Andrey,
> > > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase
> > > that was current as of last week and are still experiencing the issue.  Do you know what the
> > > resource allocation fault was and how we might be able to determine if that is still
> > > occurring?
> > > 
> > > 
> > > 
> > > I'll take a look at commit log, both SPDK and mine, and will get back to you.
> > > 
> > >  
> > > 
> > > Regards,
> > > 
> > > Andrey
> > > 
> > > Ben,
> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth
> > > high
> > > enough it will fail that assertion, as would be expected.  Whatever other failure we’re
> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.
> > >  
> > > Also I performed some tests with the spdk perf tool and was not able to replicate our
> > > problem.  It
> > > ran fine at various queue depths and core masks.  When the qd was set too high, it failed
> > > gracefully with an error message.  This is all as expected.
> > >  
> > > I’d like to continue down the path of investigating if some resource allocation or something
> > > else
> > > is failing silently for us.  Any specific ideas?
> > >  
> > > Thanks!
> > > Will
> > >  
> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Date: Wednesday, July 6, 2016 at 12:01 PM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> > > 
> > > 
> > > 
> > >  
> > > 
> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> > > 
> > > Hi Will,
> > > 
> > >  
> > > 
> > > Since I can't see the code for your application I'd like to try and reproduce the problem with
> > > code that I have some visibility into. Are you able to reproduce the problem using our perf
> > > tool
> > > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and
> > > not
> > > SPDK. 
> > > 
> > > 
> > >  
> > > 
> > > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the
> > > submit call was failing due to the resource allocation fault and neither returning an error
> > > nor
> > > invoking the callback, but my issue has been fixed in the recent release (I can't recall the
> > > actual commit, but there definitely was one dealing exactly with the cause).
> > >  
> > > 
> > >  
> > > 
> > > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The
> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the
> > > callback. You can check for failure by looking at the return value - see the documentation
> > > here.
> > > Your application allocates this memory pool up front - all of our examples allocate 8k
> > > requests
> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to
> > > handle
> > > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"
> > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/he
> > > ll
> > > o_
> > > world with tons of comments. One of the comments explains this memory pool in detail.
> > > 
> > >  
> > > 
> > > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at
> > > different strategies to clean that up. Let me know what the result of the debugging is and
> > > I'll
> > > shoot you some more ideas to try if necessary.
> > > 
> > > 
> > >  
> > > 
> > > Are there any plans regarding the global request pool rework?
> > > 
> > >  
> > > 
> > > Regards,
> > > 
> > > Andrey
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Thanks,
> > > 
> > > Ben
> > > 
> > >  
> > > 
> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> > > 
> > > Hello,
> > > We have written a test application that is utilizing the spdk library to benchmark a set of 3
> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio
> > > and
> > > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of
> > > all
> > > drives on our system.
> > >  
> > > What we have found during our testing is that spdk will sometimes start to silently fail to
> > > call
> > > the callback passed to spdk_nvme_ns_cmd_read in the following situations:
> > > 1.       Testing a single drive and passing in 0 for max_completions to
> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing
> > > when
> > > max_completions was > 0.
> > > 2.       Testing all four drives at once will result in one drive failing to receive
> > > callbacks,
> > > seemingly regardless of what number we pass for max_completions (1 through 128).
> > >  
> > > Here are other observations we’ve made
> > > -When the callbacks fail to be called for a drive, they fail to be called for the remaining
> > > duration of the test.
> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
> > > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,
> > > though it is not a strict correlation.
> > >  
> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3
> > > are
> > > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme
> > > device
> > > they are talking to.
> > >  
> > > We’d like to know what is causing this issue and what we can do to help investigate the
> > > problem. 
> > > What other information can we provide?  Is there some part of the spdk code that we can look
> > > at
> > > to
> > > help determine the cause?
> > >  
> > > Thanks,
> > > Will
> > >  
> > > 
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > > 
> > > 
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > > 
> > > 
> > >  
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > > 
> > > --
> > > 
> > > Regards,
> > > Andrey
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 22:56 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-06 22:56 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 12864 bytes --]

Ben,

I tried capping my queue depth to 128 and it significantly improved the problem (I’m not entirely sure if it was completely eliminated).

To get the vars I had to put the qd back to 256:

(gdb) p *qpair
$1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl = 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaabf000}, tr = 0x7fffdaa5d000,
  queued_req = {stqh_first = 0x7fffea91a500, stqh_last = 0x7fffea8efec0}, id = 1, num_entries = 256, sq_tail = 121, cq_head = 249, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,
  qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70}, cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
(gdb) p qpair->cpl[qpair->cq_head -1]
$2 = {cdw0 = 0, rsvd1 = 0, sqhd = 250, sqid = 1, cid = 97, status = {p = 0, sc = 0, sct = 0, rsvd2 = 0, m = 0, dnr = 0}}
 (gdb) p *(qpair->sq_tdbl)
$4 = 121
 (gdb) p *(qpair->cq_hdbl)
$5 = 249
(gdb)

On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:

On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:
> Ben,
> 
> Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue
> depth 256. 

Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the
hardware. The NVMe driver should be doing software queueing beyond that automatically, but this data
point will help narrow down the problem.

>  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more
> completions to handle, then I will recheck the code I wrote to keep track of the number of
> outstanding read requests.
> 
> Here is the qpair:
> (gdb) p *qpair
> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =
> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =
> 0x7fffdaa5d000,
>   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =
> 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands outstanding.
The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe
completions - there is definitely a problem here.

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},
> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
> (gdb) p qpair->phase
> $2 = 0 '\000'
> (gdb) p qpair->cpl[qpair->cq_head]
> $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,
> rsvd2 = 0, m = 0, dnr = 0}}

Can you print out the following 3 things:
- qpair->cpl[qpair->cq_head - 1]
- qpair->sq_tdbl
- qpair->cq_hdbl

> 
> -Will
> 
> On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b
> enjamin.walker(a)intel.com> wrote:
> 
> On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> > Andrey,
> >  
> > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found
> > the
> > reason is isn’t returning any completions is because this check is failing at line 469: if (cpl-
> > > status.p != qpair->phase)
> >  
> > Relevant gdb info here:
> > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions
> > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> > 463   in nvme_qpair.c
> > (gdb) p qpair->cpl[qpair->cq_head]
> > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,
> > rsvd2
> > = 0, m = 0, dnr = 0}}
> > (gdb) p qpair->phase
> > $12 = 1 '\001'
> >  
> > What does this mean?  Does this information help at all?
> 
> The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses -
> and a set of head and tail indices. The arrays are circular, so you can loop back around to the
> beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first pass
> through the array, new entries in the queue are marked by setting their phase to 1. On the next
> pass
> through the array, new elements are marked by setting their phase bit to 0, etc. The current
> iteration's expected phase value is stored in qpair->phase. So the code you are looking at on
> lines
> 467-470 is basically saying:
> 
> 1) Grab the completion entry at the head of the queue
> 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit
> 
> All that means is that there are no completions outstanding for any commands according to the SSD,
> which doesn't narrow much down. At the point where you are broken in, can you just dump out the
> whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are any
> commands actually pending at the device and what state the device is in. My expectation is that
> there aren't any commands outstanding.
> 
> Can you also provide some background as to what kind of I/O you're submitting (read or write,
> size,
> queue depth, etc.) when this occurs?
> 
> > -Will
> >  
> > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Date: Wednesday, July 6, 2016 at 2:35 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> > 
> >  
> > 
> >  
> > 
> > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:
> > 
> > Andrey,
> > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase
> > that was current as of last week and are still experiencing the issue.  Do you know what the
> > resource allocation fault was and how we might be able to determine if that is still occurring?
> > 
> > 
> > 
> > I'll take a look at commit log, both SPDK and mine, and will get back to you.
> > 
> >  
> > 
> > Regards,
> > 
> > Andrey
> > 
> > Ben,
> > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high
> > enough it will fail that assertion, as would be expected.  Whatever other failure we’re
> > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.
> >  
> > Also I performed some tests with the spdk perf tool and was not able to replicate our
> > problem.  It
> > ran fine at various queue depths and core masks.  When the qd was set too high, it failed
> > gracefully with an error message.  This is all as expected.
> >  
> > I’d like to continue down the path of investigating if some resource allocation or something
> > else
> > is failing silently for us.  Any specific ideas?
> >  
> > Thanks!
> > Will
> >  
> > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Date: Wednesday, July 6, 2016 at 12:01 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> > 
> > 
> > 
> >  
> > 
> > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> > 
> > Hi Will,
> > 
> >  
> > 
> > Since I can't see the code for your application I'd like to try and reproduce the problem with
> > code that I have some visibility into. Are you able to reproduce the problem using our perf tool
> > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not
> > SPDK. 
> > 
> > 
> >  
> > 
> > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the
> > submit call was failing due to the resource allocation fault and neither returning an error nor
> > invoking the callback, but my issue has been fixed in the recent release (I can't recall the
> > actual commit, but there definitely was one dealing exactly with the cause).
> >  
> > 
> >  
> > 
> > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The
> > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the
> > callback. You can check for failure by looking at the return value - see the documentation here.
> > Your application allocates this memory pool up front - all of our examples allocate 8k requests
> > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle
> > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"
> > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hell
> > o_
> > world with tons of comments. One of the comments explains this memory pool in detail.
> > 
> >  
> > 
> > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at
> > different strategies to clean that up. Let me know what the result of the debugging is and I'll
> > shoot you some more ideas to try if necessary.
> > 
> > 
> >  
> > 
> > Are there any plans regarding the global request pool rework?
> > 
> >  
> > 
> > Regards,
> > 
> > Andrey
> > 
> >  
> > 
> >  
> > 
> > Thanks,
> > 
> > Ben
> > 
> >  
> > 
> > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> > 
> > Hello,
> > We have written a test application that is utilizing the spdk library to benchmark a set of 3
> > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and
> > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of
> > all
> > drives on our system.
> >  
> > What we have found during our testing is that spdk will sometimes start to silently fail to call
> > the callback passed to spdk_nvme_ns_cmd_read in the following situations:
> > 1.       Testing a single drive and passing in 0 for max_completions to
> > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing
> > when
> > max_completions was > 0.
> > 2.       Testing all four drives at once will result in one drive failing to receive callbacks,
> > seemingly regardless of what number we pass for max_completions (1 through 128).
> >  
> > Here are other observations we’ve made
> > -When the callbacks fail to be called for a drive, they fail to be called for the remaining
> > duration of the test.
> > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
> > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,
> > though it is not a strict correlation.
> >  
> > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3
> > are
> > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme
> > device
> > they are talking to.
> >  
> > We’d like to know what is causing this issue and what we can do to help investigate the
> > problem. 
> > What other information can we provide?  Is there some part of the spdk code that we can look at
> > to
> > help determine the cause?
> >  
> > Thanks,
> > Will
> >  
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > --
> > 
> > Regards,
> > Andrey
> > 
> > 
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 21:47 Andrey Kuzmin
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Kuzmin @ 2016-07-06 21:47 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 12685 bytes --]

Will,

the issue I was referring to was resolved by the following commit:
https://github.com/spdk/spdk/commit/eb555b139e56b3c0b7f508906bcfde5bda844592
.

On Thu, Jul 7, 2016, 00:40 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:
> > Ben,
> >
> > Thanks, you explained that very well.  I’m working with a random 4k read
> only workload of queue
> > depth 256.
>
> Can you try capping your queue depth at 128? That's the maximum I/O we
> allow outstanding at the
> hardware. The NVMe driver should be doing software queueing beyond that
> automatically, but this data
> point will help narrow down the problem.
>
> >  I’m using 4 drives and one thread per drive.  If it’s true that there
> are just no more
> > completions to handle, then I will recheck the code I wrote to keep
> track of the number of
> > outstanding read requests.
> >
> > Here is the qpair:
> > (gdb) p *qpair
> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd =
> 0x7fffdaae0000, cpl =
> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first =
> 0x7fffdaad8000}, tr =
> > 0x7fffdaa5d000,
> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last =
> 0x7fffea9cdf40}, id = 1, num_entries =
> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true,
> sq_in_cmb = false,
>
> Note how sq_tail - cq_head is 128, meaning the driver believes there to be
> 128 commands outstanding.
> The driver's view of the world (commands outstanding) doesn't line up with
> us not getting any NVMe
> completions - there is definitely a problem here.
>
> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0,
> tqe_prev = 0x7fffddca7b70},
> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
> > (gdb) p qpair->phase
> > $2 = 0 '\000'
> > (gdb) p qpair->cpl[qpair->cq_head]
> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p
> = 1, sc = 0, sct = 0,
> > rsvd2 = 0, m = 0, dnr = 0}}
>
> Can you print out the following 3 things:
> - qpair->cpl[qpair->cq_head - 1]
> - qpair->sq_tdbl
> - qpair->cq_hdbl
>
> >
> > -Will
> >
> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <
> spdk-bounces(a)lists.01.org on behalf of b
> > enjamin.walker(a)intel.com> wrote:
> >
> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> > > Andrey,
> > >
> > > I was able to step into the spdk_nvme_qpair_process_completions()
> function with gdb and found
> > > the
> > > reason is isn’t returning any completions is because this check is
> failing at line 469: if (cpl-
> > > > status.p != qpair->phase)
> > >
> > > Relevant gdb info here:
> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5,
> spdk_nvme_qpair_process_completions
> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> > > 463   in nvme_qpair.c
> > > (gdb) p qpair->cpl[qpair->cq_head]
> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p
> = 0, sc = 0, sct = 0,
> > > rsvd2
> > > = 0, m = 0, dnr = 0}}
> > > (gdb) p qpair->phase
> > > $12 = 1 '\001'
> > >
> > > What does this mean?  Does this information help at all?
> >
> > The NVMe hardware queue pairs consist of two arrays - one of commands
> and the other of responses -
> > and a set of head and tail indices. The arrays are circular, so you can
> loop back around to the
> > beginning. Each entry in the array contains a phase bit, which is either
> 1 or 0. On the first pass
> > through the array, new entries in the queue are marked by setting their
> phase to 1. On the next
> > pass
> > through the array, new elements are marked by setting their phase bit to
> 0, etc. The current
> > iteration's expected phase value is stored in qpair->phase. So the code
> you are looking at on
> > lines
> > 467-470 is basically saying:
> >
> > 1) Grab the completion entry at the head of the queue
> > 2) Check its phase bit. If it hasn't toggled, there is no new
> completion, so exit
> >
> > All that means is that there are no completions outstanding for any
> commands according to the SSD,
> > which doesn't narrow much down. At the point where you are broken in,
> can you just dump out the
> > whole qpair structure? Something like "p *qpair" should do it. That way
> I can see if there are any
> > commands actually pending at the device and what state the device is in.
> My expectation is that
> > there aren't any commands outstanding.
> >
> > Can you also provide some background as to what kind of I/O you're
> submitting (read or write,
> > size,
> > queue depth, etc.) when this occurs?
> >
> > > -Will
> > >
> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <
> andrey.v.kuzmin(a)gmail.com>
> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Date: Wednesday, July 6, 2016 at 2:35 PM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being
> called sometimes
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com>
> wrote:
> > >
> > > Andrey,
> > > That sounds exactly like what we are experiencing, however we’re
> working off the spdk codebase
> > > that was current as of last week and are still experiencing the
> issue.  Do you know what the
> > > resource allocation fault was and how we might be able to determine if
> that is still occurring?
> > >
> > >
> > >
> > > I'll take a look at commit log, both SPDK and mine, and will get back
> to you.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Andrey
> > >
> > > Ben,
> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I
> set our queue depth high
> > > enough it will fail that assertion, as would be expected.  Whatever
> other failure we’re
> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to
> return an error code.
> > >
> > > Also I performed some tests with the spdk perf tool and was not able
> to replicate our
> > > problem.  It
> > > ran fine at various queue depths and core masks.  When the qd was set
> too high, it failed
> > > gracefully with an error message.  This is all as expected.
> > >
> > > I’d like to continue down the path of investigating if some resource
> allocation or something
> > > else
> > > is failing silently for us.  Any specific ideas?
> > >
> > > Thanks!
> > > Will
> > >
> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <
> andrey.v.kuzmin(a)gmail.com>
> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Date: Wednesday, July 6, 2016 at 12:01 PM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being
> called sometimes
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <
> benjamin.walker(a)intel.com> wrote:
> > >
> > > Hi Will,
> > >
> > >
> > >
> > > Since I can't see the code for your application I'd like to try and
> reproduce the problem with
> > > code that I have some visibility into. Are you able to reproduce the
> problem using our perf tool
> > > (examples/nvme/perf)? If you aren't, this is likely a problem with
> your test application and not
> > > SPDK.
> > >
> > >
> > >
> > >
> > > I had been witnessing a similar issue with an earlier SPDK release,
> back around Feb, where the
> > > submit call was failing due to the resource allocation fault and
> neither returning an error nor
> > > invoking the callback, but my issue has been fixed in the recent
> release (I can't recall the
> > > actual commit, but there definitely was one dealing exactly with the
> cause).
> > >
> > >
> > >
> > >
> > > Based on the symptoms, my best guess is that your memory pool ran out
> of request objects. The
> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it
> fails, it won't call the
> > > callback. You can check for failure by looking at the return value -
> see the documentation here.
> > > Your application allocates this memory pool up front - all of our
> examples allocate 8k requests
> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a
> large enough pool to handle
> > > the maximum number of outstanding requests you plan to have. We
> recently added a "hello_world"
> > > style example for the NVMe driver at
> https://github.com/spdk/spdk/tree/master/examples/nvme/hell
> > > o_
> > > world with tons of comments. One of the comments explains this memory
> pool in detail.
> > >
> > >
> > >
> > > That memory pool allocation is a bit of a wart on our otherwise clean
> API. We're looking at
> > > different strategies to clean that up. Let me know what the result of
> the debugging is and I'll
> > > shoot you some more ideas to try if necessary.
> > >
> > >
> > >
> > >
> > > Are there any plans regarding the global request pool rework?
> > >
> > >
> > >
> > > Regards,
> > >
> > > Andrey
> > >
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Ben
> > >
> > >
> > >
> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> > >
> > > Hello,
> > > We have written a test application that is utilizing the spdk library
> to benchmark a set of 3
> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done
> some testing using fio and
> > > the kernel nvme drivers and have had no problem achieving the claimed
> IOPs (4k random read) of
> > > all
> > > drives on our system.
> > >
> > > What we have found during our testing is that spdk will sometimes
> start to silently fail to call
> > > the callback passed to spdk_nvme_ns_cmd_read in the following
> situations:
> > > 1.       Testing a single drive and passing in 0 for max_completions to
> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues
> with single drive testing
> > > when
> > > max_completions was > 0.
> > > 2.       Testing all four drives at once will result in one drive
> failing to receive callbacks,
> > > seemingly regardless of what number we pass for max_completions (1
> through 128).
> > >
> > > Here are other observations we’ve made
> > > -When the callbacks fail to be called for a drive, they fail to be
> called for the remaining
> > > duration of the test.
> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from
> test to test.
> > > -‘failure’ of a drive seems to be correlated with the number of
> outstanding read operations,
> > > though it is not a strict correlation.
> > >
> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot
> for CPU 0 and the other 3
> > > are
> > > on PCI slots on CPU 1.  The master/slave threads are on the the same
> cpu socket as the nvme
> > > device
> > > they are talking to.
> > >
> > > We’d like to know what is causing this issue and what we can do to
> help investigate the
> > > problem.
> > > What other information can we provide?  Is there some part of the spdk
> code that we can look at
> > > to
> > > help determine the cause?
> > >
> > > Thanks,
> > > Will
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > >
> > > --
> > >
> > > Regards,
> > > Andrey
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> >
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 16815 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 21:40 Walker, Benjamin
  0 siblings, 0 replies; 18+ messages in thread
From: Walker, Benjamin @ 2016-07-06 21:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 11667 bytes --]

On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:
> Ben,
> 
> Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue
> depth 256. 

Can you try capping your queue depth at 128? That's the maximum I/O we allow outstanding at the
hardware. The NVMe driver should be doing software queueing beyond that automatically, but this data
point will help narrow down the problem.

>  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more
> completions to handle, then I will recheck the code I wrote to keep track of the number of
> outstanding read requests.
> 
> Here is the qpair:
> (gdb) p *qpair
> $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl =
> 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr =
> 0x7fffdaa5d000,
>   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries =
> 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,

Note how sq_tail - cq_head is 128, meaning the driver believes there to be 128 commands outstanding.
The driver's view of the world (commands outstanding) doesn't line up with us not getting any NVMe
completions - there is definitely a problem here.

>   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70},
> cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
> (gdb) p qpair->phase
> $2 = 0 '\000'
> (gdb) p qpair->cpl[qpair->cq_head]
> $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0,
> rsvd2 = 0, m = 0, dnr = 0}}

Can you print out the following 3 things:
- qpair->cpl[qpair->cq_head - 1]
- qpair->sq_tdbl
- qpair->cq_hdbl

> 
> -Will
> 
> On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of b
> enjamin.walker(a)intel.com> wrote:
> 
> On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> > Andrey,
> >  
> > I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found
> > the
> > reason is isn’t returning any completions is because this check is failing at line 469: if (cpl-
> > > status.p != qpair->phase)
> >  
> > Relevant gdb info here:
> > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions
> > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> > 463   in nvme_qpair.c
> > (gdb) p qpair->cpl[qpair->cq_head]
> > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0,
> > rsvd2
> > = 0, m = 0, dnr = 0}}
> > (gdb) p qpair->phase
> > $12 = 1 '\001'
> >  
> > What does this mean?  Does this information help at all?
> 
> The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses -
> and a set of head and tail indices. The arrays are circular, so you can loop back around to the
> beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first pass
> through the array, new entries in the queue are marked by setting their phase to 1. On the next
> pass
> through the array, new elements are marked by setting their phase bit to 0, etc. The current
> iteration's expected phase value is stored in qpair->phase. So the code you are looking at on
> lines
> 467-470 is basically saying:
> 
> 1) Grab the completion entry at the head of the queue
> 2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit
> 
> All that means is that there are no completions outstanding for any commands according to the SSD,
> which doesn't narrow much down. At the point where you are broken in, can you just dump out the
> whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are any
> commands actually pending at the device and what state the device is in. My expectation is that
> there aren't any commands outstanding.
> 
> Can you also provide some background as to what kind of I/O you're submitting (read or write,
> size,
> queue depth, etc.) when this occurs?
> 
> > -Will
> >  
> > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Date: Wednesday, July 6, 2016 at 2:35 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> > 
> >  
> > 
> >  
> > 
> > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:
> > 
> > Andrey,
> > That sounds exactly like what we are experiencing, however we’re working off the spdk codebase
> > that was current as of last week and are still experiencing the issue.  Do you know what the
> > resource allocation fault was and how we might be able to determine if that is still occurring?
> > 
> > 
> > 
> > I'll take a look at commit log, both SPDK and mine, and will get back to you.
> > 
> >  
> > 
> > Regards,
> > 
> > Andrey
> > 
> > Ben,
> > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high
> > enough it will fail that assertion, as would be expected.  Whatever other failure we’re
> > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.
> >  
> > Also I performed some tests with the spdk perf tool and was not able to replicate our
> > problem.  It
> > ran fine at various queue depths and core masks.  When the qd was set too high, it failed
> > gracefully with an error message.  This is all as expected.
> >  
> > I’d like to continue down the path of investigating if some resource allocation or something
> > else
> > is failing silently for us.  Any specific ideas?
> >  
> > Thanks!
> > Will
> >  
> > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Date: Wednesday, July 6, 2016 at 12:01 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> > 
> > 
> > 
> >  
> > 
> > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> > 
> > Hi Will,
> > 
> >  
> > 
> > Since I can't see the code for your application I'd like to try and reproduce the problem with
> > code that I have some visibility into. Are you able to reproduce the problem using our perf tool
> > (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not
> > SPDK. 
> > 
> > 
> >  
> > 
> > I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the
> > submit call was failing due to the resource allocation fault and neither returning an error nor
> > invoking the callback, but my issue has been fixed in the recent release (I can't recall the
> > actual commit, but there definitely was one dealing exactly with the cause).
> >  
> > 
> >  
> > 
> > Based on the symptoms, my best guess is that your memory pool ran out of request objects. The
> > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the
> > callback. You can check for failure by looking at the return value - see the documentation here.
> > Your application allocates this memory pool up front - all of our examples allocate 8k requests
> > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle
> > the maximum number of outstanding requests you plan to have. We recently added a "hello_world"
> > style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hell
> > o_
> > world with tons of comments. One of the comments explains this memory pool in detail.
> > 
> >  
> > 
> > That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at
> > different strategies to clean that up. Let me know what the result of the debugging is and I'll
> > shoot you some more ideas to try if necessary.
> > 
> > 
> >  
> > 
> > Are there any plans regarding the global request pool rework?
> > 
> >  
> > 
> > Regards,
> > 
> > Andrey
> > 
> >  
> > 
> >  
> > 
> > Thanks,
> > 
> > Ben
> > 
> >  
> > 
> > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> > 
> > Hello,
> > We have written a test application that is utilizing the spdk library to benchmark a set of 3
> > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and
> > the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of
> > all
> > drives on our system.
> >  
> > What we have found during our testing is that spdk will sometimes start to silently fail to call
> > the callback passed to spdk_nvme_ns_cmd_read in the following situations:
> > 1.       Testing a single drive and passing in 0 for max_completions to
> > spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing
> > when
> > max_completions was > 0.
> > 2.       Testing all four drives at once will result in one drive failing to receive callbacks,
> > seemingly regardless of what number we pass for max_completions (1 through 128).
> >  
> > Here are other observations we’ve made
> > -When the callbacks fail to be called for a drive, they fail to be called for the remaining
> > duration of the test.
> > -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
> > -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,
> > though it is not a strict correlation.
> >  
> > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3
> > are
> > on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme
> > device
> > they are talking to.
> >  
> > We’d like to know what is causing this issue and what we can do to help investigate the
> > problem. 
> > What other information can we provide?  Is there some part of the spdk code that we can look at
> > to
> > help determine the cause?
> >  
> > Thanks,
> > Will
> >  
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > 
> > --
> > 
> > Regards,
> > Andrey
> > 
> > 
> > 
> > 
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 21:21 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-06 21:21 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10962 bytes --]

Ben,

I have an int that I am incrementing immediately before each spdk_nvme_ns_cmd_read() call to keep track of the number of submitted reads and another int that I am incrementing by the return value of spdk_nvme_qpair_process_completions() and when the failure occurs, the number of submitted reads = number of completed reads + 256. To me this is strong evidence that the drive should actually have 256 completions to process, where the qpair seems to be indicating there are no completions left.

-Will

On 7/6/16, 4:00 PM, "SPDK on behalf of Will Del Genio" <spdk-bounces(a)lists.01.org on behalf of wdelgenio(a)xeograph.com> wrote:

Ben,

Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue depth 256.  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more completions to handle, then I will recheck the code I wrote to keep track of the number of outstanding read requests.

Here is the qpair:
(gdb) p *qpair
$1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl = 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr = 0x7fffdaa5d000,
  queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries = 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,
  qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70}, cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
(gdb) p qpair->phase
$2 = 0 '\000'
(gdb) p qpair->cpl[qpair->cq_head]
$3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0, rsvd2 = 0, m = 0, dnr = 0}}

-Will

On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:

On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> Andrey,
>  
> I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found the
> reason is isn’t returning any completions is because this check is failing at line 469: if (cpl-
> >status.p != qpair->phase)
>  
> Relevant gdb info here:
> Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions
> (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> 463   in nvme_qpair.c
> (gdb) p qpair->cpl[qpair->cq_head]
> $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0, rsvd2
> = 0, m = 0, dnr = 0}}
> (gdb) p qpair->phase
> $12 = 1 '\001'
>  
> What does this mean?  Does this information help at all?

The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses -
and a set of head and tail indices. The arrays are circular, so you can loop back around to the
beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first pass
through the array, new entries in the queue are marked by setting their phase to 1. On the next pass
through the array, new elements are marked by setting their phase bit to 0, etc. The current
iteration's expected phase value is stored in qpair->phase. So the code you are looking at on lines
467-470 is basically saying:

1) Grab the completion entry at the head of the queue
2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

All that means is that there are no completions outstanding for any commands according to the SSD,
which doesn't narrow much down. At the point where you are broken in, can you just dump out the
whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are any
commands actually pending at the device and what state the device is in. My expectation is that
there aren't any commands outstanding.

Can you also provide some background as to what kind of I/O you're submitting (read or write, size,
queue depth, etc.) when this occurs?

> -Will
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, July 6, 2016 at 2:35 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> 
>  
> 
>  
> 
> On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:
> 
> Andrey,
> That sounds exactly like what we are experiencing, however we’re working off the spdk codebase
> that was current as of last week and are still experiencing the issue.  Do you know what the
> resource allocation fault was and how we might be able to determine if that is still occurring?
> 
> 
> 
> I'll take a look at commit log, both SPDK and mine, and will get back to you.
> 
>  
> 
> Regards,
> 
> Andrey
> 
> Ben,
> We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high
> enough it will fail that assertion, as would be expected.  Whatever other failure we’re
> experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.
>  
> Also I performed some tests with the spdk perf tool and was not able to replicate our problem.  It
> ran fine at various queue depths and core masks.  When the qd was set too high, it failed
> gracefully with an error message.  This is all as expected.
>  
> I’d like to continue down the path of investigating if some resource allocation or something else
> is failing silently for us.  Any specific ideas?
>  
> Thanks!
> Will
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, July 6, 2016 at 12:01 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> 
> 
> 
>  
> 
> On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> 
> Hi Will,
> 
>  
> 
> Since I can't see the code for your application I'd like to try and reproduce the problem with
> code that I have some visibility into. Are you able to reproduce the problem using our perf tool
> (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not
> SPDK. 
> 
> 
>  
> 
> I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the
> submit call was failing due to the resource allocation fault and neither returning an error nor
> invoking the callback, but my issue has been fixed in the recent release (I can't recall the
> actual commit, but there definitely was one dealing exactly with the cause).
>  
> 
>  
> 
> Based on the symptoms, my best guess is that your memory pool ran out of request objects. The
> first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the
> callback. You can check for failure by looking at the return value - see the documentation here.
> Your application allocates this memory pool up front - all of our examples allocate 8k requests
> (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle
> the maximum number of outstanding requests you plan to have. We recently added a "hello_world"
> style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_
> world with tons of comments. One of the comments explains this memory pool in detail.
> 
>  
> 
> That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at
> different strategies to clean that up. Let me know what the result of the debugging is and I'll
> shoot you some more ideas to try if necessary.
> 
> 
>  
> 
> Are there any plans regarding the global request pool rework?
> 
>  
> 
> Regards,
> 
> Andrey
> 
>  
> 
>  
> 
> Thanks,
> 
> Ben
> 
>  
> 
> On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> 
> Hello,
> We have written a test application that is utilizing the spdk library to benchmark a set of 3
> Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and
> the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all
> drives on our system.
>  
> What we have found during our testing is that spdk will sometimes start to silently fail to call
> the callback passed to spdk_nvme_ns_cmd_read in the following situations:
> 1.       Testing a single drive and passing in 0 for max_completions to
> spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when
> max_completions was > 0.
> 2.       Testing all four drives at once will result in one drive failing to receive callbacks,
> seemingly regardless of what number we pass for max_completions (1 through 128).
>  
> Here are other observations we’ve made
> -When the callbacks fail to be called for a drive, they fail to be called for the remaining
> duration of the test.
> -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
> -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,
> though it is not a strict correlation.
>  
> Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are
> on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device
> they are talking to.
>  
> We’d like to know what is causing this issue and what we can do to help investigate the problem. 
> What other information can we provide?  Is there some part of the spdk code that we can look at to
> help determine the cause?
>  
> Thanks,
> Will
>  
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
>  
> 
> 
> 
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> --
> 
> Regards,
> Andrey
> 
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 21:00 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-06 21:00 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10177 bytes --]

Ben,

Thanks, you explained that very well.  I’m working with a random 4k read only workload of queue depth 256.  I’m using 4 drives and one thread per drive.  If it’s true that there are just no more completions to handle, then I will recheck the code I wrote to keep track of the number of outstanding read requests.

Here is the qpair:
(gdb) p *qpair
$1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd = 0x7fffdaae0000, cpl = 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first = 0x7fffdaad8000}, tr = 0x7fffdaa5d000,
  queued_req = {stqh_first = 0x7fffea9a1780, stqh_last = 0x7fffea9cdf40}, id = 1, num_entries = 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true, sq_in_cmb = false,
  qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0, tqe_prev = 0x7fffddca7b70}, cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
(gdb) p qpair->phase
$2 = 0 '\000'
(gdb) p qpair->cpl[qpair->cq_head]
$3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p = 1, sc = 0, sct = 0, rsvd2 = 0, m = 0, dnr = 0}}

-Will

On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:

On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> Andrey,
>  
> I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found the
> reason is isn’t returning any completions is because this check is failing at line 469: if (cpl-
> >status.p != qpair->phase)
>  
> Relevant gdb info here:
> Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions
> (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> 463   in nvme_qpair.c
> (gdb) p qpair->cpl[qpair->cq_head]
> $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0, rsvd2
> = 0, m = 0, dnr = 0}}
> (gdb) p qpair->phase
> $12 = 1 '\001'
>  
> What does this mean?  Does this information help at all?

The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses -
and a set of head and tail indices. The arrays are circular, so you can loop back around to the
beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first pass
through the array, new entries in the queue are marked by setting their phase to 1. On the next pass
through the array, new elements are marked by setting their phase bit to 0, etc. The current
iteration's expected phase value is stored in qpair->phase. So the code you are looking at on lines
467-470 is basically saying:

1) Grab the completion entry at the head of the queue
2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

All that means is that there are no completions outstanding for any commands according to the SSD,
which doesn't narrow much down. At the point where you are broken in, can you just dump out the
whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are any
commands actually pending at the device and what state the device is in. My expectation is that
there aren't any commands outstanding.

Can you also provide some background as to what kind of I/O you're submitting (read or write, size,
queue depth, etc.) when this occurs?

> -Will
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, July 6, 2016 at 2:35 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> 
>  
> 
>  
> 
> On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:
> 
> Andrey,
> That sounds exactly like what we are experiencing, however we’re working off the spdk codebase
> that was current as of last week and are still experiencing the issue.  Do you know what the
> resource allocation fault was and how we might be able to determine if that is still occurring?
> 
> 
> 
> I'll take a look at commit log, both SPDK and mine, and will get back to you.
> 
>  
> 
> Regards,
> 
> Andrey
> 
> Ben,
> We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high
> enough it will fail that assertion, as would be expected.  Whatever other failure we’re
> experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.
>  
> Also I performed some tests with the spdk perf tool and was not able to replicate our problem.  It
> ran fine at various queue depths and core masks.  When the qd was set too high, it failed
> gracefully with an error message.  This is all as expected.
>  
> I’d like to continue down the path of investigating if some resource allocation or something else
> is failing silently for us.  Any specific ideas?
>  
> Thanks!
> Will
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, July 6, 2016 at 12:01 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> 
> 
> 
>  
> 
> On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> 
> Hi Will,
> 
>  
> 
> Since I can't see the code for your application I'd like to try and reproduce the problem with
> code that I have some visibility into. Are you able to reproduce the problem using our perf tool
> (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not
> SPDK. 
> 
> 
>  
> 
> I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the
> submit call was failing due to the resource allocation fault and neither returning an error nor
> invoking the callback, but my issue has been fixed in the recent release (I can't recall the
> actual commit, but there definitely was one dealing exactly with the cause).
>  
> 
>  
> 
> Based on the symptoms, my best guess is that your memory pool ran out of request objects. The
> first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the
> callback. You can check for failure by looking at the return value - see the documentation here.
> Your application allocates this memory pool up front - all of our examples allocate 8k requests
> (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle
> the maximum number of outstanding requests you plan to have. We recently added a "hello_world"
> style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_
> world with tons of comments. One of the comments explains this memory pool in detail.
> 
>  
> 
> That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at
> different strategies to clean that up. Let me know what the result of the debugging is and I'll
> shoot you some more ideas to try if necessary.
> 
> 
>  
> 
> Are there any plans regarding the global request pool rework?
> 
>  
> 
> Regards,
> 
> Andrey
> 
>  
> 
>  
> 
> Thanks,
> 
> Ben
> 
>  
> 
> On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> 
> Hello,
> We have written a test application that is utilizing the spdk library to benchmark a set of 3
> Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and
> the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all
> drives on our system.
>  
> What we have found during our testing is that spdk will sometimes start to silently fail to call
> the callback passed to spdk_nvme_ns_cmd_read in the following situations:
> 1.       Testing a single drive and passing in 0 for max_completions to
> spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when
> max_completions was > 0.
> 2.       Testing all four drives at once will result in one drive failing to receive callbacks,
> seemingly regardless of what number we pass for max_completions (1 through 128).
>  
> Here are other observations we’ve made
> -When the callbacks fail to be called for a drive, they fail to be called for the remaining
> duration of the test.
> -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
> -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,
> though it is not a strict correlation.
>  
> Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are
> on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device
> they are talking to.
>  
> We’d like to know what is causing this issue and what we can do to help investigate the problem. 
> What other information can we provide?  Is there some part of the spdk code that we can look at to
> help determine the cause?
>  
> Thanks,
> Will
>  
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
>  
> 
> 
> 
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> --
> 
> Regards,
> Andrey
> 
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 20:50 Walker, Benjamin
  0 siblings, 0 replies; 18+ messages in thread
From: Walker, Benjamin @ 2016-07-06 20:50 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8848 bytes --]

On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> Andrey,
>  
> I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found the
> reason is isn’t returning any completions is because this check is failing at line 469: if (cpl-
> >status.p != qpair->phase)
>  
> Relevant gdb info here:
> Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions
> (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> 463   in nvme_qpair.c
> (gdb) p qpair->cpl[qpair->cq_head]
> $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0, rsvd2
> = 0, m = 0, dnr = 0}}
> (gdb) p qpair->phase
> $12 = 1 '\001'
>  
> What does this mean?  Does this information help at all?

The NVMe hardware queue pairs consist of two arrays - one of commands and the other of responses -
and a set of head and tail indices. The arrays are circular, so you can loop back around to the
beginning. Each entry in the array contains a phase bit, which is either 1 or 0. On the first pass
through the array, new entries in the queue are marked by setting their phase to 1. On the next pass
through the array, new elements are marked by setting their phase bit to 0, etc. The current
iteration's expected phase value is stored in qpair->phase. So the code you are looking at on lines
467-470 is basically saying:

1) Grab the completion entry at the head of the queue
2) Check its phase bit. If it hasn't toggled, there is no new completion, so exit

All that means is that there are no completions outstanding for any commands according to the SSD,
which doesn't narrow much down. At the point where you are broken in, can you just dump out the
whole qpair structure? Something like "p *qpair" should do it. That way I can see if there are any
commands actually pending at the device and what state the device is in. My expectation is that
there aren't any commands outstanding.

Can you also provide some background as to what kind of I/O you're submitting (read or write, size,
queue depth, etc.) when this occurs?

> -Will
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, July 6, 2016 at 2:35 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> 
>  
> 
>  
> 
> On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:
> 
> Andrey,
> That sounds exactly like what we are experiencing, however we’re working off the spdk codebase
> that was current as of last week and are still experiencing the issue.  Do you know what the
> resource allocation fault was and how we might be able to determine if that is still occurring?
> 
> 
> 
> I'll take a look at commit log, both SPDK and mine, and will get back to you.
> 
>  
> 
> Regards,
> 
> Andrey
> 
> Ben,
> We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high
> enough it will fail that assertion, as would be expected.  Whatever other failure we’re
> experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.
>  
> Also I performed some tests with the spdk perf tool and was not able to replicate our problem.  It
> ran fine at various queue depths and core masks.  When the qd was set too high, it failed
> gracefully with an error message.  This is all as expected.
>  
> I’d like to continue down the path of investigating if some resource allocation or something else
> is failing silently for us.  Any specific ideas?
>  
> Thanks!
> Will
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, July 6, 2016 at 12:01 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
> 
> 
> 
>  
> 
> On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com> wrote:
> 
> Hi Will,
> 
>  
> 
> Since I can't see the code for your application I'd like to try and reproduce the problem with
> code that I have some visibility into. Are you able to reproduce the problem using our perf tool
> (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not
> SPDK. 
> 
> 
>  
> 
> I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the
> submit call was failing due to the resource allocation fault and neither returning an error nor
> invoking the callback, but my issue has been fixed in the recent release (I can't recall the
> actual commit, but there definitely was one dealing exactly with the cause).
>  
> 
>  
> 
> Based on the symptoms, my best guess is that your memory pool ran out of request objects. The
> first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the
> callback. You can check for failure by looking at the return value - see the documentation here.
> Your application allocates this memory pool up front - all of our examples allocate 8k requests
> (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle
> the maximum number of outstanding requests you plan to have. We recently added a "hello_world"
> style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_
> world with tons of comments. One of the comments explains this memory pool in detail.
> 
>  
> 
> That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at
> different strategies to clean that up. Let me know what the result of the debugging is and I'll
> shoot you some more ideas to try if necessary.
> 
> 
>  
> 
> Are there any plans regarding the global request pool rework?
> 
>  
> 
> Regards,
> 
> Andrey
> 
>  
> 
>  
> 
> Thanks,
> 
> Ben
> 
>  
> 
> On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> 
> Hello,
> We have written a test application that is utilizing the spdk library to benchmark a set of 3
> Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and
> the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all
> drives on our system.
>  
> What we have found during our testing is that spdk will sometimes start to silently fail to call
> the callback passed to spdk_nvme_ns_cmd_read in the following situations:
> 1.       Testing a single drive and passing in 0 for max_completions to
> spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when
> max_completions was > 0.
> 2.       Testing all four drives at once will result in one drive failing to receive callbacks,
> seemingly regardless of what number we pass for max_completions (1 through 128).
>  
> Here are other observations we’ve made
> -When the callbacks fail to be called for a drive, they fail to be called for the remaining
> duration of the test.
> -The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
> -‘failure’ of a drive seems to be correlated with the number of outstanding read operations,
> though it is not a strict correlation.
>  
> Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are
> on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device
> they are talking to.
>  
> We’d like to know what is causing this issue and what we can do to help investigate the problem. 
> What other information can we provide?  Is there some part of the spdk code that we can look at to
> help determine the cause?
>  
> Thanks,
> Will
>  
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> 
>  
> 
> 
> 
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> 
> --
> 
> Regards,
> Andrey
> 
> 
> 
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 20:33 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-06 20:33 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7041 bytes --]

Andrey,

I was able to step into the spdk_nvme_qpair_process_completions() function with gdb and found the reason is isn’t returning any completions is because this check is failing at line 469: if (cpl->status.p != qpair->phase)

Relevant gdb info here:
Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_completions (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
463   in nvme_qpair.c
(gdb) p qpair->cpl[qpair->cq_head]
$11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p = 0, sc = 0, sct = 0, rsvd2 = 0, m = 0, dnr = 0}}
(gdb) p qpair->phase
$12 = 1 '\001'

What does this mean?  Does this information help at all?
-Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Wednesday, July 6, 2016 at 2:35 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes


On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com<mailto:wdelgenio(a)xeograph.com>> wrote:
Andrey,
That sounds exactly like what we are experiencing, however we’re working off the spdk codebase that was current as of last week and are still experiencing the issue.  Do you know what the resource allocation fault was and how we might be able to determine if that is still occurring?
I'll take a look at commit log, both SPDK and mine, and will get back to you.

Regards,
Andrey
Ben,
We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high enough it will fail that assertion, as would be expected.  Whatever other failure we’re experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

Also I performed some tests with the spdk perf tool and was not able to replicate our problem.  It ran fine at various queue depths and core masks.  When the qd was set too high, it failed gracefully with an error message.  This is all as expected.

I’d like to continue down the path of investigating if some resource allocation or something else is failing silently for us.  Any specific ideas?

Thanks!
Will

From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com<mailto:andrey.v.kuzmin(a)gmail.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Wednesday, July 6, 2016 at 12:01 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com<mailto:benjamin.walker(a)intel.com>> wrote:
Hi Will,

Since I can't see the code for your application I'd like to try and reproduce the problem with code that I have some visibility into. Are you able to reproduce the problem using our perf tool (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not SPDK.

I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the submit call was failing due to the resource allocation fault and neither returning an error nor invoking the callback, but my issue has been fixed in the recent release (I can't recall the actual commit, but there definitely was one dealing exactly with the cause).


Based on the symptoms, my best guess is that your memory pool ran out of request objects. The first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the callback. You can check for failure by looking at the return value - see the documentation here<http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>. Your application allocates this memory pool up front - all of our examples allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle the maximum number of outstanding requests you plan to have. We recently added a "hello_world" style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with tons of comments. One of the comments explains this memory pool in detail.

That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at different strategies to clean that up. Let me know what the result of the debugging is and I'll shoot you some more ideas to try if necessary.

Are there any plans regarding the global request pool rework?

Regards,
Andrey


Thanks,
Ben

On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
Hello,
We have written a test application that is utilizing the spdk library to benchmark a set of 3 Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all drives on our system.

What we have found during our testing is that spdk will sometimes start to silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the following situations:
1.       Testing a single drive and passing in 0 for max_completions to spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when max_completions was > 0.
2.       Testing all four drives at once will result in one drive failing to receive callbacks, seemingly regardless of what number we pass for max_completions (1 through 128).

Here are other observations we’ve made
-When the callbacks fail to be called for a drive, they fail to be called for the remaining duration of the test.
-The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
-‘failure’ of a drive seems to be correlated with the number of outstanding read operations, though it is not a strict correlation.

Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device they are talking to.

We’d like to know what is causing this issue and what we can do to help investigate the problem.  What other information can we provide?  Is there some part of the spdk code that we can look at to help determine the cause?

Thanks,
Will


_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>

https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk
--

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 22182 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 17:56 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-06 17:56 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5465 bytes --]

Andrey,
That sounds exactly like what we are experiencing, however we’re working off the spdk codebase that was current as of last week and are still experiencing the issue.  Do you know what the resource allocation fault was and how we might be able to determine if that is still occurring?

Ben,
We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set our queue depth high enough it will fail that assertion, as would be expected.  Whatever other failure we’re experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to return an error code.

Also I performed some tests with the spdk perf tool and was not able to replicate our problem.  It ran fine at various queue depths and core masks.  When the qd was set too high, it failed gracefully with an error message.  This is all as expected.

I’d like to continue down the path of investigating if some resource allocation or something else is failing silently for us.  Any specific ideas?

Thanks!
Will

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <andrey.v.kuzmin(a)gmail.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Wednesday, July 6, 2016 at 12:01 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com<mailto:benjamin.walker(a)intel.com>> wrote:
Hi Will,

Since I can't see the code for your application I'd like to try and reproduce the problem with code that I have some visibility into. Are you able to reproduce the problem using our perf tool (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not SPDK.

I had been witnessing a similar issue with an earlier SPDK release, back around Feb, where the submit call was failing due to the resource allocation fault and neither returning an error nor invoking the callback, but my issue has been fixed in the recent release (I can't recall the actual commit, but there definitely was one dealing exactly with the cause).


Based on the symptoms, my best guess is that your memory pool ran out of request objects. The first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the callback. You can check for failure by looking at the return value - see the documentation here<http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>. Your application allocates this memory pool up front - all of our examples allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle the maximum number of outstanding requests you plan to have. We recently added a "hello_world" style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with tons of comments. One of the comments explains this memory pool in detail.

That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at different strategies to clean that up. Let me know what the result of the debugging is and I'll shoot you some more ideas to try if necessary.

Are there any plans regarding the global request pool rework?

Regards,
Andrey


Thanks,
Ben

On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
Hello,
We have written a test application that is utilizing the spdk library to benchmark a set of 3 Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all drives on our system.

What we have found during our testing is that spdk will sometimes start to silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the following situations:
1.       Testing a single drive and passing in 0 for max_completions to spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when max_completions was > 0.
2.       Testing all four drives at once will result in one drive failing to receive callbacks, seemingly regardless of what number we pass for max_completions (1 through 128).

Here are other observations we’ve made
-When the callbacks fail to be called for a drive, they fail to be called for the remaining duration of the test.
-The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
-‘failure’ of a drive seems to be correlated with the number of outstanding read operations, though it is not a strict correlation.

Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device they are talking to.

We’d like to know what is causing this issue and what we can do to help investigate the problem.  What other information can we provide?  Is there some part of the spdk code that we can look at to help determine the cause?

Thanks,
Will


_______________________________________________

SPDK mailing list

SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>

https://lists.01.org/mailman/listinfo/spdk

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 15107 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 17:01 Andrey Kuzmin
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Kuzmin @ 2016-07-06 17:01 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4244 bytes --]

On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> Hi Will,
>
> Since I can't see the code for your application I'd like to try and
> reproduce the problem with code that I have some visibility into. Are you
> able to reproduce the problem using our perf tool (examples/nvme/perf)? If
> you aren't, this is likely a problem with your test application and not
> SPDK.
>

I had been witnessing a similar issue with an earlier SPDK release, back
around Feb, where the submit call was failing due to the resource
allocation fault and neither returning an error nor invoking the callback,
but my issue has been fixed in the recent release (I can't recall the
actual commit, but there definitely was one dealing exactly with the cause).


>
> Based on the symptoms, my best guess is that your memory pool ran out of
> request objects. The first thing to check is whether spdk_nvme_ns_cmd_read
> failed. If it fails, it won't call the callback. You can check for failure
> by looking at the return value - see the documentation here
> <http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>.
> Your application allocates this memory pool up front - all of our examples
> allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need
> to allocate a large enough pool to handle the maximum number of outstanding
> requests you plan to have. We recently added a "hello_world" style example
> for the NVMe driver at
> https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with
> tons of comments. One of the comments explains this memory pool in detail.
>
> That memory pool allocation is a bit of a wart on our otherwise clean API.
> We're looking at different strategies to clean that up. Let me know what
> the result of the debugging is and I'll shoot you some more ideas to try if
> necessary.
>

Are there any plans regarding the global request pool rework?

Regards,
Andrey


>
> Thanks,
> Ben
>
> On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
>
> Hello,
>
> We have written a test application that is utilizing the spdk library to
> benchmark a set of 3 Intel P3700 drives and a single 750 drive
> (concurrently).  We’ve done some testing using fio and the kernel nvme
> drivers and have had no problem achieving the claimed IOPs (4k random read)
> of all drives on our system.
>
>
>
> What we have found during our testing is that spdk will sometimes start to
> silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the
> following situations:
>
> 1.       Testing a single drive and passing in 0 for max_completions to
> spdk_nvme_qpair_process_completions().  We haven’t seen any issues with
> single drive testing when max_completions was > 0.
>
> 2.       Testing all four drives at once will result in one drive failing
> to receive callbacks, seemingly regardless of what number we pass for
> max_completions (1 through 128).
>
>
>
> Here are other observations we’ve made
>
> -When the callbacks fail to be called for a drive, they fail to be called
> for the remaining duration of the test.
>
> -The drive that ‘fails’ when testing 4 drives concurrently varies from
> test to test.
>
> -‘failure’ of a drive seems to be correlated with the number of
> outstanding read operations, though it is not a strict correlation.
>
>
>
> Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for
> CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads
> are on the the same cpu socket as the nvme device they are talking to.
>
>
>
> We’d like to know what is causing this issue and what we can do to help
> investigate the problem.  What other information can we provide?  Is there
> some part of the spdk code that we can look at to help determine the cause?
>
>
>
> Thanks,
>
> Will
>
>
>
> _______________________________________________
> SPDK mailing listSPDK(a)lists.01.orghttps://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8533 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 15:35 Walker, Benjamin
  0 siblings, 0 replies; 18+ messages in thread
From: Walker, Benjamin @ 2016-07-06 15:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3349 bytes --]

Hi Will,

Since I can't see the code for your application I'd like to try and reproduce the problem with code that I have some visibility into. Are you able to reproduce the problem using our perf tool (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not SPDK.

Based on the symptoms, my best guess is that your memory pool ran out of request objects. The first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the callback. You can check for failure by looking at the return value - see the documentation here<http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>. Your application allocates this memory pool up front - all of our examples allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle the maximum number of outstanding requests you plan to have. We recently added a "hello_world" style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with tons of comments. One of the comments explains this memory pool in detail.

That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at different strategies to clean that up. Let me know what the result of the debugging is and I'll shoot you some more ideas to try if necessary.

Thanks,
Ben

On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
Hello,
We have written a test application that is utilizing the spdk library to benchmark a set of 3 Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all drives on our system.

What we have found during our testing is that spdk will sometimes start to silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the following situations:
1.       Testing a single drive and passing in 0 for max_completions to spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when max_completions was > 0.
2.       Testing all four drives at once will result in one drive failing to receive callbacks, seemingly regardless of what number we pass for max_completions (1 through 128).

Here are other observations we’ve made
-When the callbacks fail to be called for a drive, they fail to be called for the remaining duration of the test.
-The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
-‘failure’ of a drive seems to be correlated with the number of outstanding read operations, though it is not a strict correlation.

Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device they are talking to.

We’d like to know what is causing this issue and what we can do to help investigate the problem.  What other information can we provide?  Is there some part of the spdk code that we can look at to help determine the cause?

Thanks,
Will


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 9524 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-05 21:03 Will Del Genio
  0 siblings, 0 replies; 18+ messages in thread
From: Will Del Genio @ 2016-07-05 21:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]

Hello,
We have written a test application that is utilizing the spdk library to benchmark a set of 3 Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all drives on our system.

What we have found during our testing is that spdk will sometimes start to silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the following situations:
1.       Testing a single drive and passing in 0 for max_completions to spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when max_completions was > 0.
2.       Testing all four drives at once will result in one drive failing to receive callbacks, seemingly regardless of what number we pass for max_completions (1 through 128).

Here are other observations we’ve made
-When the callbacks fail to be called for a drive, they fail to be called for the remaining duration of the test.
-The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
-‘failure’ of a drive seems to be correlated with the number of outstanding read operations, though it is not a strict correlation.

Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device they are talking to.

We’d like to know what is causing this issue and what we can do to help investigate the problem.  What other information can we provide?  Is there some part of the spdk code that we can look at to help determine the cause?

Thanks,
Will


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 7538 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-07-12 14:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-06 19:35 [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes Andrey Kuzmin
  -- strict thread matches above, loose matches on Subject: below --
2016-07-12 14:38 Will Del Genio
2016-07-07 21:03 Walker, Benjamin
2016-07-07 18:45 Will Del Genio
2016-07-07 15:36 Will Del Genio
2016-07-07 15:09 Will Del Genio
2016-07-06 23:23 Walker, Benjamin
2016-07-06 22:56 Will Del Genio
2016-07-06 21:47 Andrey Kuzmin
2016-07-06 21:40 Walker, Benjamin
2016-07-06 21:21 Will Del Genio
2016-07-06 21:00 Will Del Genio
2016-07-06 20:50 Walker, Benjamin
2016-07-06 20:33 Will Del Genio
2016-07-06 17:56 Will Del Genio
2016-07-06 17:01 Andrey Kuzmin
2016-07-06 15:35 Walker, Benjamin
2016-07-05 21:03 Will Del Genio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.