Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

* Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
@ 2016-07-06 19:35 Andrey Kuzmin
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Kuzmin @ 2016-07-06 19:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6183 bytes --]

On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com> wrote:

> Andrey,
>
> That sounds exactly like what we are experiencing, however we’re working
> off the spdk codebase that was current as of last week and are still
> experiencing the issue.  Do you know what the resource allocation fault was
> and how we might be able to determine if that is still occurring?
>
I'll take a look at commit log, both SPDK and mine, and will get back to
you.

Regards,
Andrey

> Ben,
>
> We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I set
> our queue depth high enough it will fail that assertion, as would be
> expected.  Whatever other failure we’re experiencing does not seem to be
> causing spdk_nvme_ns_cmd_read() to return an error code.
>
>
>
> Also I performed some tests with the spdk perf tool and was not able to
> replicate our problem.  It ran fine at various queue depths and core
> masks.  When the qd was set too high, it failed gracefully with an error
> message.  This is all as expected.
>
>
>
> I’d like to continue down the path of investigating if some resource
> allocation or something else is failing silently for us.  Any specific
> ideas?
>
>
>
> Thanks!
>
> Will
>
>
>
> *From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <
> andrey.v.kuzmin(a)gmail.com>
> *Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
> *Date: *Wednesday, July 6, 2016 at 12:01 PM
> *To: *Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject: *Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being
> called sometimes
>
>
>
> On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <
> benjamin.walker(a)intel.com> wrote:
>
> Hi Will,
>
>
>
> Since I can't see the code for your application I'd like to try and
> reproduce the problem with code that I have some visibility into. Are you
> able to reproduce the problem using our perf tool (examples/nvme/perf)? If
> you aren't, this is likely a problem with your test application and not
> SPDK.
>
>
>
> I had been witnessing a similar issue with an earlier SPDK release, back
> around Feb, where the submit call was failing due to the resource
> allocation fault and neither returning an error nor invoking the callback,
> but my issue has been fixed in the recent release (I can't recall the
> actual commit, but there definitely was one dealing exactly with the
> cause).
>
>
>
>
>
> Based on the symptoms, my best guess is that your memory pool ran out of
> request objects. The first thing to check is whether spdk_nvme_ns_cmd_read
> failed. If it fails, it won't call the callback. You can check for failure
> by looking at the return value - see the documentation here
> <http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>.
> Your application allocates this memory pool up front - all of our examples
> allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need
> to allocate a large enough pool to handle the maximum number of outstanding
> requests you plan to have. We recently added a "hello_world" style example
> for the NVMe driver at
> https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with
> tons of comments. One of the comments explains this memory pool in detail.
>
>
>
> That memory pool allocation is a bit of a wart on our otherwise clean API.
> We're looking at different strategies to clean that up. Let me know what
> the result of the debugging is and I'll shoot you some more ideas to try if
> necessary.
>
>
>
> Are there any plans regarding the global request pool rework?
>
>
>
> Regards,
>
> Andrey
>
>
>
>
>
> Thanks,
>
> Ben
>
>
>
> On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
>
> Hello,
>
> We have written a test application that is utilizing the spdk library to
> benchmark a set of 3 Intel P3700 drives and a single 750 drive
> (concurrently).  We’ve done some testing using fio and the kernel nvme
> drivers and have had no problem achieving the claimed IOPs (4k random read)
> of all drives on our system.
>
>
>
> What we have found during our testing is that spdk will sometimes start to
> silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the
> following situations:
>
> 1.       Testing a single drive and passing in 0 for max_completions to
> spdk_nvme_qpair_process_completions().  We haven’t seen any issues with
> single drive testing when max_completions was > 0.
>
> 2.       Testing all four drives at once will result in one drive failing
> to receive callbacks, seemingly regardless of what number we pass for
> max_completions (1 through 128).
>
>
>
> Here are other observations we’ve made
>
> -When the callbacks fail to be called for a drive, they fail to be called
> for the remaining duration of the test.
>
> -The drive that ‘fails’ when testing 4 drives concurrently varies from
> test to test.
>
> -‘failure’ of a drive seems to be correlated with the number of
> outstanding read operations, though it is not a strict correlation.
>
>
>
> Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for
> CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads
> are on the the same cpu socket as the nvme device they are talking to.
>
>
>
> We’d like to know what is causing this issue and what we can do to help
> investigate the problem.  What other information can we provide?  Is there
> some part of the spdk code that we can look at to help determine the cause?
>
>
>
> Thanks,
>
> Will
>
>
>
> _______________________________________________
>
> SPDK mailing list
>
> SPDK(a)lists.01.org
>
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 13481 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread