Hi Shahar,


First I apologize I didn't look the patch in detail before reply.


I know updating SPDK from 17.07 to 18.01 is not easy task.
However creating own framework only on the DPDK lockless ring looks cumbersome for me.

Thanks for your interesting proposal again and I'm sorry that I can't do enough effort to the communitization in this direction immediately. 

Thanks,
Shuhei


差出人: SPDK <spdk-bounces@lists.01.org> が Shahar Salzman <shahar.salzman@kaminario.com> の代理で送信
送信日時: 2018年5月10日 4:45
宛先: spdk@lists.01.org
件名: [!]Re: [SPDK] SPDK + user space appliance
 

Hi Ben,


You make a good point.

We are looking for a stable API, so given a stable API which is not under bdev_internal, I can port the entire bdev_user module into my glue API, and use it as my spdk gateway.

But there is a benefit to pushing this to the community, since it would seem that the semantics (e.g. register_device, submit/poll for IO, complete IO) are very similar to those attempting to integrate spdk to a storage appliance (I would love to hear from Shuhei Matsumoto or  Eyal Harari which seem to be attempting a similar integration of spdk to their products).


Using this sort of API is much easier than having to understand an elaborate bdev API, spdk threading concerns, reactor issues etc.


In addition, since the semantics are similar, I would imagine that if each user writes their own integration module, we end up with multple versions of the same code.

All parties eventually benefit from the "communitization" since we can openly discuss the interface, and code quality of a common module should be much better than any "private" copy.


I also agree, that if this module goes upstream it needs to be paired with a good unit test running in CI so that as the internal interface evolves and as bdev changes, module breakages are easily detected, otherwise issues need to be solved upon integrating a new version.


Shahar




From: SPDK <spdk-bounces@lists.01.org> on behalf of Walker, Benjamin <benjamin.walker@intel.com>
Sent: Wednesday, May 9, 2018 7:52:38 PM
To: spdk@lists.01.org
Subject: Re: [SPDK] SPDK + user space appliance
 
On Wed, 2018-05-09 at 07:48 +0000, Shahar Salzman wrote:
> Hi Ben,
>
> To answer your question first, the main reason to have a generic spdk IO API
> for storage appliance is being able to maintain functionality over versions.
> We started with our module in version 17.07, but when we started integrating
> it into 17.07.01, and the 18.X version we started running into issues. Since
> the bdev layer is internal to spdk, it makes sense that it will change both in
> functionality and in interface from time to time especially in NVMeF where the
> specifications are so young, which is the reason that we would like to have an
> API which is more stable over versions.
> As a storage appliance, the requirements are fairly stable, and at least in
> the case of SCSI, a thin "glue API" had been enough to get the core appliance
> work with multiple frontend targets (e.g. both SCSI and internal replication).
> I believe that the same method can be applied to NVMEF.
> In addition, I think that our use case is not unique to the way we are
> integrating spdk, which means that the work we put in can benefit other users,
> and hopefully via their experience, the code can become more robust and
> useful.

Thank you for this - now everything is much clearer. I totally agree with what
you're saying about the API being unstable and internal. That makes it quite
difficult to write a bdev module that continues to function across releases.
However, I don't think trading one API (the bdev module API) for another (the
proposed bdev user API) fixes anything at a fundamental level. This is because
the API changes we've made to the bdev module API are necessary to correctly
implement some types of bdev modules (although probably not yours currently).
Over time, you'll just end up propagating all of those changes into the newly
proposed bdev user API in order to make it suitable for general use and we won't
be in any better of a spot than we are today.

I do think this is a problem that needs to be solved as soon as possible though.
 The solution, in my mind, is to start the work of defining a public bdev module
API in include/spdk/bdev_module.h. It will take some work to define the API in a
way that as many of the implementation details are hidden as possible, but it is
worth the effort. Most of the module API currently resides in
include/spdk_internal/bdev.h, but there is a lot of stuff in there that doesn't
need to become public.

Once we have a well defined public interface, we can take steps to avoid
breaking it going forward. I'm giving a talk on API stability at the SPDK summit
next week that will touch on some of the strategies we plan to employ to keep
APIs stable over time.

I'll wait for more feedback to see if we can build some consensus about the best
path forward here. I'm interested to see if there are any other perspectives
that may view this a bit differently.

>
> Shahar
> From: SPDK <spdk-bounces@lists.01.org> on behalf of Walker, Benjamin <benjamin
> .walker@intel.com>
> Sent: Tuesday, May 8, 2018 8:30:20 PM
> To: spdk@lists.01.org
> Subject: Re: [SPDK] SPDK + user space appliance

> On Tue, 2018-05-08 at 07:36 +0000, Shahar Salzman wrote:
> > Hi Jim and Ben,
> >
> > For the threading issue, I agree that there is something not very clean in
> the
> > interface, as there is an assumption on how the user implements it. As I did
> > in the bdev_user_example, we also use a ring in order to place all the
> > incoming IO without delaying the reactor, and then use multiple pollers to
> > actually handle the IO (deduplication, compression, HA etc.). This is why
> > there are 2 distinct interfaces - submit_io callback, and
> > the bdev_user_submit_completion interface which (normally) is called on
> > another thread (not the original poller), and passed back to the reactor via
> > the completion queue on the bdev_user_io_channel, and the registered poller
> > thread which takes from the user completion queue.
> > Do you think that a cleaner interface would be modifying the submit_io
> > callback to a poll_io interface which checks a bdev_user internal ring for
> IO?
> > Or do you think that the current interface is OK provided good
> documentation?
> >
> > Regarding the spdk_call_unaffinitized, I am currently using spdk_event_call
> in
> > order to register my volumes, I don't really like this since it forces me to
> > (eventually) add another async callback in my app to verify that device
> > registration was successful (and this just adds more conditions, futures
> etc.
> > in the application). Is there a way to call spdk interfaces directly with a
> > "non-spdk" thread (i.e. TLS is not initialized)?
>
> I'm not so much concerned yet with the interface you've defined, but rather
> understanding the whole approach at a high level. The SPDK bdev layer is
> designed for custom bdev modules to be added, so my primary question is why
> write a generic bdev_user module as opposed to writing a "your custom storage
> backend" module? I think this is the key piece, and understanding the process
> you went through as you designed this will probably yield a whole bunch of
> good
> improvements to the current bdev module system.
>
> Thanks,
> Ben
>
>
> >
> > Hope this answers the questions,
> > Shahar
>
>
>
> > From: SPDK <spdk-bounces@lists.01.org> on behalf of Harris, James R
> > <james.r.harris@intel.com>
> > Sent: Monday, May 7, 2018 9:18:20 PM
> > To: Storage Performance Development Kit
> > Subject: Re: [SPDK] SPDK + user space appliance
> > 
> > There are also calls such as spdk_call_unaffinitized() and
> > spdk_unaffinitize_thread() which have been added to enable cases where a
> bdev
> > module may need to spawn non-polling threads and don’t want those threads to
> > inherit the affinity of the calling thread.  The SPDK rbd module currently
> > uses these (see git commit fa5206c4) since rbd_open is a blocking call.
> (Note
> > that librbd does now support rbd_aio_open which is better suited for SPDK.)
> >
> > -Jim
> >
> >
> > On 5/7/18, 11:02 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces@list
> s.
> > 01.org on behalf of benjamin.walker@intel.com> wrote:
> >
> >     Hi Shahar,
> >    
> >     Thank you for submitting the patch. I've looked through it in detail and
> I
> > think
> >     I understand the purpose of this code, but I'm going to explain it back
> to
> > you
> >     so you can correct me where I'm wrong.
> >    
> >     I think this code solves two distinct problems:
> >    
> >     1) You need to forward I/O out of the bdev layer to some custom backend,
> > and you
> >     want the code that does that to live outside of the SPDK repository.
> >    
> >     2) Your custom back-end library isn't suitable for use in a run-to-
> > completion
> >     model. By that I mean that you can't just call your library directly on
> > the
> >     thread that originally receives the spdk_bdev_io request because your
> > library
> >     either blocks or generally takes too long to return from the submission
> > call
> >     (maybe it is doing inline compression or something). Instead, you need
> to
> >     shuttle those requests off to separate threads for handling.
> >    
> >     As far as point #1, today the SPDK build system does not nicely
> > accommodate bdev
> >     modules whose code lives outside of SPDK. SPDK expects them to be in
> >     lib/bdev/<module_name>. However, that's a fairly straightforward change
> to
> > the
> >     build system and it's one we've been intending to make for some time.
> >    
> >     For point #2, this is likely the case for a large number of storage
> back-
> > ends,
> >     but I think the proper way to solve it is probably back-end specific and
> > not
> >     general purpose. As a counter-point, SPDK already integrates with a
> number
> > of
> >     third-party storage back-ends today (Ceph RBD, libiscsi, libaio, etc.)
> and
> > none
> >     of those ended up needing to pass messages to other threads. They all
> > support
> >     asynchronous operations, though. I could imagine writing a bdev module
> > that
> >     ultimately makes POSIX preadv calls, for instance. That would need to be
> >     implemented with a thread pool and each bdev_io gets funneled off to a
> > thread in
> >     the pool to perform the blocking operation.
> >    
> >     Ok - I explained what I think I'm understanding. Now tell me where I
> went
> > wrong
> >     :)
> >    
> >     Thanks,
> >     Ben
> >    
> >     On Sun, 2018-05-06 at 10:32 +0000, Shahar Salzman wrote:
> >     > Hi,
> >     >
> >     > I pushed the code for review, thanks Daniel for the help.
> >     >
> >     > In a nutshell:
> >     > - bdev_user - an API for a user appliance to use spdk as an
> iSCSI/NVMeF
> > target
> >     > - bdev_user_example - reference application
> >     > - The API relies on rings in order to submit/complete IOs
> >     > - User appliance registers callbacks for submit_io (should we have
> >     > read/write/other instead?)
> >     > - User appliance registers its devices so that they may be added to an
> >     > existing namespace (I am using RPC to do the management)
> >     >
> >     > Thanks,
> >     > Shahar
> >     >
> >     >
> >     > From: SPDK <spdk-bounces@lists.01.org> on behalf of Verkamp, Daniel
> > <daniel.ve
> >     > rkamp@intel.com>
> >     > Sent: Thursday, May 3, 2018 8:50 PM
> >     > To: Storage Performance Development Kit
> >     > Subject: Re: [SPDK] SPDK + user space appliance
> >     > 
> >     > Hi Shahar,
> >     > 
> >     > The target branch for the push should be ‘refs/for/master’, not
> ‘master’
> > – if
> >     > you configured a remote as specified in http://www.spdk.io/development
www.spdk.io
The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance by moving all of the necessary drivers into userspace and operating in a polled mode instead of relying on interrupts, which avoids kernel context switches and ...


> /
> > it
> >     > should look like:
> >     > 
> >     > [remote "review"]
> >     >   url = https://review.gerrithub.io/spdk/spdk
> >     >   push = HEAD:refs/for/master
> >     > 
> >     > From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Shahar
> > Salzman
> >     > Sent: Thursday, May 3, 2018 1:00 AM
> >     > To: Storage Performance Development Kit <spdk@lists.01.org>
> >     > Subject: Re: [SPDK] SPDK + user space appliance
> >     > 
> >     > Hi Ben,
> >     > 
> >     > I have the code ready for review (spdk/master on dpdk/18.02), but I do
> > not
> >     > have push rights for gerrithub:
> >     > shahar.salzman@shahars-vm:~/Kaminario/git/spdk$ git push spdk-review
> >     > HEAD:master
> >     > Password for 'https://ShaharSalzman-K@review.gerrithub.io':
> >     > Counting objects: 109, done.
> >     > Compressing objects: 100% (22/22), done.
> >     > Writing objects: 100% (22/22), 8.70 KiB | 0 bytes/s, done.
> >     > Total 22 (delta 14), reused 0 (delta 0)
> >     > remote: Resolving deltas: 100% (14/14)
> >     > remote: Branch refs/heads/master:
> >     > remote: You are not allowed to perform this operation.
> >     > remote: To push into this reference you need 'Push' rights.
> >     > remote: User: ShaharSalzman-K
> >     > remote: Please read the documentation and contact an administrator
> >     > remote: if you feel the configuration is incorrect
> >     > remote: Processing changes: refs: 1, done   
> >     > To https://ShaharSalzman-K@review.gerrithub.io/a/spdk/spdk
> >     >  ! [remote rejected] HEAD -> master (prohibited by Gerrit: ref update
> > access
> >     > denied)
> >     > error: failed to push some refs to 'https://clicktime.symantec.com/a/1/VcP2v4HJWwYtLlvp3_LcgSsqhmYOOUAZscJQHE8uNvI=?d=ESGMfx885Mon-FYabmhD7ls3wtWmBZW9kWgryeq1SISrgpYjCOb4IS6lwYKSCrvtHc7CEvBRgFQdII_hRkKdEHtIPw-_l-2vbpSf4mgn8dWJRwfogT1dkkd4aplrKU00CzI0rDhCgZT67GHdJ7NowVIQ_12RIr45uXe3dXDgV1slShko1C3-ayHaghRMgJQoGQz5ewaurDC42wc7_w1BEs8oj1QoSKsDNxm9-hFs4L9a-XshONMr1SxTXIt9EIfUvWoaFlCtJeJikXq2XnYc-tunDEHEykqgMCz_AWlpwA4UE5RVCmkekk-Fuxjm4UYnc6ggHScc1xqmGn-WwsISBpjmD_tGzsoKqRuVhiV01CId-_wCmWsv-uXcoIhW0e_JPIuV6LLTZhTPZ8JMljeezHgswSjkLaVkX1U3lDxg25b69Q%3D%3D&u=https%3A%2F%2FShaharSalzman-K%40review.ger
> ri
> > thub.i
> >     > o/a/spdk/spdk'
> >     > 
> >     > Am I doing something incorrect, or is this just a permission issue?
> >     > 
> >     > Thanks,
> >     > Shahar
> >     > From: SPDK <spdk-bounces@lists.01.org> on behalf of Shahar Salzman
> > <shahar.sal
> >     > zman@kaminario.com>
> >     > Sent: Wednesday, April 25, 2018 9:02:38 AM
> >     > To: Storage Performance Development Kit
> >     > Subject: Re: [SPDK] SPDK + user space appliance
> >     > 
> >     > Hi Ben,
> >     > 
> >     > The code is currently working on v17.07, we are planning on bumping
> the
> >     > version to one of the latest stable versions (18.01?) + master.
> >     > It will take me (hopefully) a few days to update the code and have our
> >     > internal CI start running on this version, not sure it would be
> useful,
> > but I
> >     > can get our working 17.07 code (+ reference application) for review
> much
> >     > faster.
> >     > What is the best course of action?
> >     > 
> >     > Shahar
> >     > From: SPDK <spdk-bounces@lists.01.org> on behalf of Walker, Benjamin
> > <benjamin
> >     > .walker@intel.com>
> >     > Sent: Tuesday, April 24, 2018 7:19:12 PM
> >     > To: Storage Performance Development Kit
> >     > Subject: Re: [SPDK] SPDK + user space appliance
> >     > 
> >     > Hi Shahar,
> >     > 
> >     > Would you be willing to submit your bdev module as a patch on
> GerritHub?
> > That
> >     > way everyone can take a look and provide feedback. If you don’t want
> it
> > to run
> >     > the tests, you can put [RFC] and the beginning of the commit message.
> >     > 
> >     > Thanks,
> >     > Ben
> >     > 
> >     > From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Shahar
> > Salzman
> >     > Sent: Monday, April 23, 2018 8:45 AM
> >     > To: spdk@lists.01.org
> >     > Subject: Re: [SPDK] SPDK + user space appliance
> >     > 
> >     > Hi Ben,
> >     > 
> >     > Bumping this thread since I've been having some new thoughts on the
> > issue now
> >     > that we are starting integration with newer spdk versions.
> >     > Unfortunately the merge isn't as smooth as I'd like it to be since the
> > bdev
> >     > module is pretty tightly integrated into spdk, perhaps we made some
> > false
> >     > assumptions writing the module, but it seems some of the newer spdk
> > features
> >     > are complicating the integration.
> >     > My question is, if this passthrough module is useful, wouldn't it be
> > better to
> >     > maintain it as part of spdk so that we can catch issues as soon as
> they
> > show
> >     > up?
> >     > We would be happy to help with maintaining this module, the module
> with
> > is
> >     > currently part of our CI with our "frozen" spdk version, but once
> > integrated
> >     > into the newer version we choose, I'll add it to the CI our CI as
> well.
> >     > 
> >     > Shahar
> >     > From: SPDK <spdk-bounces@lists.01.org> on behalf of Walker, Benjamin
> > <benjamin
> >     > .walker@intel.com>
> >     > Sent: Friday, February 2, 2018 11:43:58 PM
> >     > To: spdk@lists.01.org
> >     > Subject: Re: [SPDK] SPDK + user space appliance
> >     > 
> >     > On Thu, 2018-02-01 at 08:29 +0000, Shahar Salzman wrote:
> >     > > Hi Ben,
> >     > >
> >     > > Would you also like to take a look at the bdev_user module?
> >     > > It still needs some patching (as some of the stuff is still hard
> > coded), but
> >     > I
> >     > > think we can get most of it cleaned up in a couple of days.
> >     > >
> >     > > In any case, is it the intention that the user write his own bdev
> > module, or
> >     > > would this user appliance glue be a useful generic module?
> >     >
> >     > For existing storage stacks that serve block I/O, like the internals
> of
> > a SAN,
> >     > the idea is that you write your own bdev module to forward I/O coming
> > out of
> >     > the
> >     > SPDK bdev layer. Then you can use the SPDK iSCSI/NVMe-oF/vhost targets
> > mostly
> >     > as-is.
> >     >
> >     > In some cases, the actual iSCSI/NVMe-oF/vhost target applications
> won't
> >     > integrate nicely directly into an existing storage application because
> > they
> >     > spawn their own threads and allocate their own memory. To support
> that,
> > the
> >     > libraries may be consumed directly instead of the applications
> > (lib/iscsi,
> >     > lib/scsi, lib/nvmf, etc.). The libraries don't spawn any of their own
> > threads,
> >     > but instead rely on SPDK's abstractions in include/spdk/io_channel.h.
> > See
> >     >
> >     > http://www.spdk.io/doc/concurrency.html
> >     >
> >     > We don't currently have a way to write a custom bdev module that
> resides
> >     > outside
> >     > of the SPDK source tree, but it's very possible to add support for
> that.
> > But
> >     > beyond that inconvenience (just drop your module in lib/bdev for now),
> > writing
> >     > a
> >     > bdev module is the recommended way of interacting with the bottom end
> of
> > the
> >     > SPDK bdev layer. I think that's what you really want to be doing in
> your
> > code,
> >     > from what I can tell.
> >     >
> >     > I hope that helps!
> >     > _______________________________________________
> >     > SPDK mailing list
> >     > SPDK@lists.01.org
> >     > https://lists.01.org/mailman/listinfo/spdk
> >     >
> >     > _______________________________________________
> >     > SPDK mailing list
> >     > SPDK@lists.01.org
> >     > https://lists.01.org/mailman/listinfo/spdk
> >     _______________________________________________
> >     SPDK mailing list
> >     SPDK@lists.01.org
> >     https://lists.01.org/mailman/listinfo/spdk
> >    
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK@lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK@lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK@lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK@lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK@lists.01.org
https://lists.01.org/mailman/listinfo/spdk