Re: [SPDK] Handling of physical disk removals

* Re: [SPDK] Handling of physical disk removals
@ 2018-06-01  8:25 Baruch Even
  0 siblings, 0 replies; 17+ messages in thread
From: Baruch Even @ 2018-06-01  8:25 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5052 bytes --]

On Thu, May 31, 2018 at 7:24 PM Harris, James R <james.r.harris(a)intel.com>
wrote:

>
>
>
>
> *From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Baruch Even <
> baruch(a)weka.io>
>
>
> *Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
>
> *Date: *Thursday, May 31, 2018 at 12:37 AM
>
>
> *To: *Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject: *Re: [SPDK] Handling of physical disk removals
>
>
>
> […]
>
>
>
> I understand your point but my the main advantage of doing the extra
> effort is the ability to better integrate the system and avoid doing
> redundant work along the way. The proposed separation may also at times
> cause opposing things to be done that make no sense. For example, we found
> tha a large sequential IO works far better than several smaller ones and we
> also found that if we combine some unrelated IOs in our queue that will be
> nearly sequential with a scratch buffer in reads we improve overall
> performance. If we do that and down the stack you will break the IO at that
> exact junction than we just added wasteful reads. As such we need to have
> this combination logic know about the specific drive and specific IO that
> it needs to handle and the various constraints and it makes a lot of sense
> to do it in our level rather than at the SPDK level. For this reason I
> would to have a layer in SPDK that does the least amount of work and just
> handle the NVMe protocol, exposing the device/hw constraints and let the
> application above do the needed smarts and adapt as the landscape will
> change.
>
> The main thing to remember is that I am writing a storage system, not a
> random application. As such I *need* to know the device characteristics in
> order to make the best use of it. I *cannot* just let spdk hide these
> details from me. I fully understand that there are other users that write
> an application and don't want to bother with the device details at this
> level and are fully content to get the first and major performance
> improvement by using spdk over the kernel driver but for my use case that
> is not enough.
>
>
>
> Hi Baruch,
>
> I’d like to understand this a bit more.  It sounds like you’d to see a
> mode that completely ignores things like MDTS and PRP violations, or device
> quirks related to performance (i.e. 128KB stripe boundaries on a number of
> Intel NVMe SSDs).  Maybe this mode is a lower-level API, maybe it’s a
> compile-time option that removes some of the checks in the driver, maybe
> something else.  Is that accurate?
>
Yes. I want to be able to just send an IO request without most of the
safety nets for which I will take responsibility. In addition it would be
nice if I could forego the additional tracking that is done and embed what
is needed for it in my own IO tracking. I simply want a bare ability to
send IO commands and get the replies, even to the level of skipping the
callback and just have a loop that returns the IO identifier. This could
just be a low level part of the existing code that does the higher level
tracking, splitting and verification that the IO is valid for the specific
NVMe device.

> Note that the SPDK nvme driver never splits an I/O arbitrarily – it always
> based on conforming to NVMe protocol or device-specific quirks that
> dramatically affect performance.  If the MDTS for the device is 256KB, then
> submitting an I/O with a size larger than 256KB **must** be split.  If an
> I/O spans a 128KB boundary on an SSD like the Intel P3700, the driver
> splits the I/O on that boundary to avoid a long delay (up to 1ms) for
> handling those types of I/O in SSDs that exhibit this type of striping
> phenomena.  For scattered payloads, the driver will only split the I/O if
> the payload vectors would violate PRP otherwise.  You said “[SPDK] will
> break the IO at that exact junction” – SPDK should only be breaking the I/O
> to meet one of these three cases.  If you’re seeing something different,
> please advise.
>
What I said is that there are places where my code combines IOs and if I
was not aware of the NVMe mandated splitting than the system is wasteful. I
follow all the NVMe and device specific rules to avoid this waste. I have
not seen SPDK do something that is incorrect.

> In regards to your concerns on I/O splitting - I think step one is making
> sure the driver has APIs to expose any device-specific characteristics such
> as sectors per stripe.  Step two is measuring the overhead of the splitting
> logic when splitting is not required.  Based on those measurements, we can
> consider optimizations and/or evaluate a bypass mode.  The bypass mode
> seems important to Weka.io – would you like to submit an RFC for what this
> bypass mode might look like in more detail?
>
What is needed for the RFC?

Baruch
-- 

*Baruch Even, Software Developer  E  baruch(a)weka.io <liran(a)weka.io>
www.weka.io <http://www.weka.io>*

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 10061 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread