Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes

From: Andrey Kuzmin <andrey.v.kuzmin at gmail.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
Date: Wed, 06 Jul 2016 21:47:05 +0000	[thread overview]
Message-ID: <CANvN+enZPuZL0Xk+AVhwEemkTeANr4ef7nMyA=eyVXAhvC6R+w@mail.gmail.com> (raw)
In-Reply-To: 1467841201.5999.222.camel@intel.com

[-- Attachment #1: Type: text/plain, Size: 12685 bytes --]

Will,

the issue I was referring to was resolved by the following commit:
https://github.com/spdk/spdk/commit/eb555b139e56b3c0b7f508906bcfde5bda844592
.

On Thu, Jul 7, 2016, 00:40 Walker, Benjamin <benjamin.walker(a)intel.com>
wrote:

> On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote:
> > Ben,
> >
> > Thanks, you explained that very well.  I’m working with a random 4k read
> only workload of queue
> > depth 256.
>
> Can you try capping your queue depth at 128? That's the maximum I/O we
> allow outstanding at the
> hardware. The NVMe driver should be doing software queueing beyond that
> automatically, but this data
> point will help narrow down the problem.
>
> >  I’m using 4 drives and one thread per drive.  If it’s true that there
> are just no more
> > completions to handle, then I will recheck the code I wrote to keep
> track of the number of
> > outstanding read requests.
> >
> > Here is the qpair:
> > (gdb) p *qpair
> > $1 = {sq_tdbl = 0x7ffff7ff2008, cq_hdbl = 0x7ffff7ff200c, cmd =
> 0x7fffdaae0000, cpl =
> > 0x7fffdaade000, free_tr = {lh_first = 0x0}, outstanding_tr = {lh_first =
> 0x7fffdaad8000}, tr =
> > 0x7fffdaa5d000,
> >   queued_req = {stqh_first = 0x7fffea9a1780, stqh_last =
> 0x7fffea9cdf40}, id = 1, num_entries =
> > 256, sq_tail = 249, cq_head = 121, phase = 0 '\000', is_enabled = true,
> sq_in_cmb = false,
>
> Note how sq_tail - cq_head is 128, meaning the driver believes there to be
> 128 commands outstanding.
> The driver's view of the world (commands outstanding) doesn't line up with
> us not getting any NVMe
> completions - there is definitely a problem here.
>
> >   qprio = 0 '\000', ctrlr = 0x7fffddca6740, tailq = {tqe_next = 0x0,
> tqe_prev = 0x7fffddca7b70},
> > cmd_bus_addr = 35430203392, cpl_bus_addr = 35430195200}
> > (gdb) p qpair->phase
> > $2 = 0 '\000'
> > (gdb) p qpair->cpl[qpair->cq_head]
> > $3 = {cdw0 = 0, rsvd1 = 0, sqhd = 132, sqid = 1, cid = 112, status = {p
> = 1, sc = 0, sct = 0,
> > rsvd2 = 0, m = 0, dnr = 0}}
>
> Can you print out the following 3 things:
> - qpair->cpl[qpair->cq_head - 1]
> - qpair->sq_tdbl
> - qpair->cq_hdbl
>
> >
> > -Will
> >
> > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" <
> spdk-bounces(a)lists.01.org on behalf of b
> > enjamin.walker(a)intel.com> wrote:
> >
> > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote:
> > > Andrey,
> > >
> > > I was able to step into the spdk_nvme_qpair_process_completions()
> function with gdb and found
> > > the
> > > reason is isn’t returning any completions is because this check is
> failing at line 469: if (cpl-
> > > > status.p != qpair->phase)
> > >
> > > Relevant gdb info here:
> > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5,
> spdk_nvme_qpair_process_completions
> > > (qpair=0x7ffff0a5baa0, max_completions=0) at nvme_qpair.c:463
> > > 463   in nvme_qpair.c
> > > (gdb) p qpair->cpl[qpair->cq_head]
> > > $11 = {cdw0 = 0, rsvd1 = 0, sqhd = 51, sqid = 1, cid = 27, status = {p
> = 0, sc = 0, sct = 0,
> > > rsvd2
> > > = 0, m = 0, dnr = 0}}
> > > (gdb) p qpair->phase
> > > $12 = 1 '\001'
> > >
> > > What does this mean?  Does this information help at all?
> >
> > The NVMe hardware queue pairs consist of two arrays - one of commands
> and the other of responses -
> > and a set of head and tail indices. The arrays are circular, so you can
> loop back around to the
> > beginning. Each entry in the array contains a phase bit, which is either
> 1 or 0. On the first pass
> > through the array, new entries in the queue are marked by setting their
> phase to 1. On the next
> > pass
> > through the array, new elements are marked by setting their phase bit to
> 0, etc. The current
> > iteration's expected phase value is stored in qpair->phase. So the code
> you are looking at on
> > lines
> > 467-470 is basically saying:
> >
> > 1) Grab the completion entry at the head of the queue
> > 2) Check its phase bit. If it hasn't toggled, there is no new
> completion, so exit
> >
> > All that means is that there are no completions outstanding for any
> commands according to the SSD,
> > which doesn't narrow much down. At the point where you are broken in,
> can you just dump out the
> > whole qpair structure? Something like "p *qpair" should do it. That way
> I can see if there are any
> > commands actually pending at the device and what state the device is in.
> My expectation is that
> > there aren't any commands outstanding.
> >
> > Can you also provide some background as to what kind of I/O you're
> submitting (read or write,
> > size,
> > queue depth, etc.) when this occurs?
> >
> > > -Will
> > >
> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <
> andrey.v.kuzmin(a)gmail.com>
> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Date: Wednesday, July 6, 2016 at 2:35 PM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being
> called sometimes
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 6, 2016, 20:56 Will Del Genio <wdelgenio(a)xeograph.com>
> wrote:
> > >
> > > Andrey,
> > > That sounds exactly like what we are experiencing, however we’re
> working off the spdk codebase
> > > that was current as of last week and are still experiencing the
> issue.  Do you know what the
> > > resource allocation fault was and how we might be able to determine if
> that is still occurring?
> > >
> > >
> > >
> > > I'll take a look at commit log, both SPDK and mine, and will get back
> to you.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Andrey
> > >
> > > Ben,
> > > We’re ASSERTing that the result of spdk_nvme_ns_cmd_read() == 0.  If I
> set our queue depth high
> > > enough it will fail that assertion, as would be expected.  Whatever
> other failure we’re
> > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to
> return an error code.
> > >
> > > Also I performed some tests with the spdk perf tool and was not able
> to replicate our
> > > problem.  It
> > > ran fine at various queue depths and core masks.  When the qd was set
> too high, it failed
> > > gracefully with an error message.  This is all as expected.
> > >
> > > I’d like to continue down the path of investigating if some resource
> allocation or something
> > > else
> > > is failing silently for us.  Any specific ideas?
> > >
> > > Thanks!
> > > Will
> > >
> > > From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Andrey Kuzmin <
> andrey.v.kuzmin(a)gmail.com>
> > > Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Date: Wednesday, July 6, 2016 at 12:01 PM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being
> called sometimes
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin <
> benjamin.walker(a)intel.com> wrote:
> > >
> > > Hi Will,
> > >
> > >
> > >
> > > Since I can't see the code for your application I'd like to try and
> reproduce the problem with
> > > code that I have some visibility into. Are you able to reproduce the
> problem using our perf tool
> > > (examples/nvme/perf)? If you aren't, this is likely a problem with
> your test application and not
> > > SPDK.
> > >
> > >
> > >
> > >
> > > I had been witnessing a similar issue with an earlier SPDK release,
> back around Feb, where the
> > > submit call was failing due to the resource allocation fault and
> neither returning an error nor
> > > invoking the callback, but my issue has been fixed in the recent
> release (I can't recall the
> > > actual commit, but there definitely was one dealing exactly with the
> cause).
> > >
> > >
> > >
> > >
> > > Based on the symptoms, my best guess is that your memory pool ran out
> of request objects. The
> > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it
> fails, it won't call the
> > > callback. You can check for failure by looking at the return value -
> see the documentation here.
> > > Your application allocates this memory pool up front - all of our
> examples allocate 8k requests
> > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a
> large enough pool to handle
> > > the maximum number of outstanding requests you plan to have. We
> recently added a "hello_world"
> > > style example for the NVMe driver at
> https://github.com/spdk/spdk/tree/master/examples/nvme/hell
> > > o_
> > > world with tons of comments. One of the comments explains this memory
> pool in detail.
> > >
> > >
> > >
> > > That memory pool allocation is a bit of a wart on our otherwise clean
> API. We're looking at
> > > different strategies to clean that up. Let me know what the result of
> the debugging is and I'll
> > > shoot you some more ideas to try if necessary.
> > >
> > >
> > >
> > >
> > > Are there any plans regarding the global request pool rework?
> > >
> > >
> > >
> > > Regards,
> > >
> > > Andrey
> > >
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Ben
> > >
> > >
> > >
> > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
> > >
> > > Hello,
> > > We have written a test application that is utilizing the spdk library
> to benchmark a set of 3
> > > Intel P3700 drives and a single 750 drive (concurrently).  We’ve done
> some testing using fio and
> > > the kernel nvme drivers and have had no problem achieving the claimed
> IOPs (4k random read) of
> > > all
> > > drives on our system.
> > >
> > > What we have found during our testing is that spdk will sometimes
> start to silently fail to call
> > > the callback passed to spdk_nvme_ns_cmd_read in the following
> situations:
> > > 1.       Testing a single drive and passing in 0 for max_completions to
> > > spdk_nvme_qpair_process_completions().  We haven’t seen any issues
> with single drive testing
> > > when
> > > max_completions was > 0.
> > > 2.       Testing all four drives at once will result in one drive
> failing to receive callbacks,
> > > seemingly regardless of what number we pass for max_completions (1
> through 128).
> > >
> > > Here are other observations we’ve made
> > > -When the callbacks fail to be called for a drive, they fail to be
> called for the remaining
> > > duration of the test.
> > > -The drive that ‘fails’ when testing 4 drives concurrently varies from
> test to test.
> > > -‘failure’ of a drive seems to be correlated with the number of
> outstanding read operations,
> > > though it is not a strict correlation.
> > >
> > > Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot
> for CPU 0 and the other 3
> > > are
> > > on PCI slots on CPU 1.  The master/slave threads are on the the same
> cpu socket as the nvme
> > > device
> > > they are talking to.
> > >
> > > We’d like to know what is causing this issue and what we can do to
> help investigate the
> > > problem.
> > > What other information can we provide?  Is there some part of the spdk
> code that we can look at
> > > to
> > > help determine the cause?
> > >
> > > Thanks,
> > > Will
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > >
> > > --
> > >
> > > Regards,
> > > Andrey
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> >
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
-- 

Regards,
Andrey

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 16815 bytes --]