From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1448628841847064941==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] SPDK + user space appliance Date: Wed, 09 May 2018 16:52:38 +0000 Message-ID: <1525884757.2784.54.camel@intel.com> In-Reply-To: AM3PR04MB370F7E8E080D5F9A6CFCF0889990@AM3PR04MB370.eurprd04.prod.outlook.com List-ID: To: spdk@lists.01.org --===============1448628841847064941== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2018-05-09 at 07:48 +0000, Shahar Salzman wrote: > Hi Ben, > = > To answer your question first, the main reason to have a generic spdk IO = API > for storage appliance is being able to maintain functionality over versio= ns. > We started with our module in version 17.07, but when we started integrat= ing > it into 17.07.01, and the 18.X version we started running into issues. Si= nce > the bdev layer is internal to spdk, it makes sense that it will change bo= th in > functionality and in interface from time to time especially in NVMeF wher= e the > specifications are so young, which is the reason that we would like to ha= ve an > API which is more stable over versions. > As a storage appliance, the requirements are fairly stable, and at least = in > the case of SCSI, a thin "glue API" had been enough to get the core appli= ance > work with multiple frontend targets (e.g. both SCSI and internal replicat= ion). > I believe that the same method can be applied to NVMEF. > In addition, I think that our use case is not unique to the way we are > integrating spdk, which means that the work we put in can benefit other u= sers, > and hopefully via their experience, the code can become more robust and > useful. Thank you for this - now everything is much clearer. I totally agree with w= hat you're saying about the API being unstable and internal. That makes it quite difficult to write a bdev module that continues to function across releases. However, I don't think trading one API (the bdev module API) for another (t= he proposed bdev user API) fixes anything at a fundamental level. This is beca= use the API changes we've made to the bdev module API are necessary to correctly implement some types of bdev modules (although probably not yours currently= ). Over time, you'll just end up propagating all of those changes into the new= ly proposed bdev user API in order to make it suitable for general use and we = won't be in any better of a spot than we are today. I do think this is a problem that needs to be solved as soon as possible th= ough. The solution, in my mind, is to start the work of defining a public bdev m= odule API in include/spdk/bdev_module.h. It will take some work to define the API= in a way that as many of the implementation details are hidden as possible, but = it is worth the effort. Most of the module API currently resides in include/spdk_internal/bdev.h, but there is a lot of stuff in there that doe= sn't need to become public. Once we have a well defined public interface, we can take steps to avoid breaking it going forward. I'm giving a talk on API stability at the SPDK s= ummit next week that will touch on some of the strategies we plan to employ to ke= ep APIs stable over time. I'll wait for more feedback to see if we can build some consensus about the= best path forward here. I'm interested to see if there are any other perspectives that may view this a bit differently. > = > Shahar > From: SPDK on behalf of Walker, Benjamin .walker(a)intel.com> > Sent: Tuesday, May 8, 2018 8:30:20 PM > To: spdk(a)lists.01.org > Subject: Re: [SPDK] SPDK + user space appliance > = > On Tue, 2018-05-08 at 07:36 +0000, Shahar Salzman wrote: > > Hi Jim and Ben, > > = > > For the threading issue, I agree that there is something not very clean= in > the > > interface, as there is an assumption on how the user implements it. As = I did > > in the bdev_user_example, we also use a ring in order to place all the > > incoming IO without delaying the reactor, and then use multiple pollers= to > > actually handle the IO (deduplication, compression, HA etc.). This is w= hy > > there are 2 distinct interfaces - submit_io callback, and > > the bdev_user_submit_completion interface which (normally) is called on > > another thread (not the original poller), and passed back to the reacto= r via > > the completion queue on the bdev_user_io_channel, and the registered po= ller > > thread which takes from the user completion queue. > > Do you think that a cleaner interface would be modifying the submit_io > > callback to a poll_io interface which checks a bdev_user internal ring = for > IO? > > Or do you think that the current interface is OK provided good > documentation? > > = > > Regarding the spdk_call_unaffinitized, I am currently using spdk_event_= call > in > > order to register my volumes, I don't really like this since it forces = me to > > (eventually) add another async callback in my app to verify that device > > registration was successful (and this just adds more conditions, futures > etc. > > in the application). Is there a way to call spdk interfaces directly wi= th a > > "non-spdk" thread (i.e. TLS is not initialized)? > = > I'm not so much concerned yet with the interface you've defined, but rath= er > understanding the whole approach at a high level. The SPDK bdev layer is > designed for custom bdev modules to be added, so my primary question is w= hy > write a generic bdev_user module as opposed to writing a "your custom sto= rage > backend" module? I think this is the key piece, and understanding the pro= cess > you went through as you designed this will probably yield a whole bunch of > good > improvements to the current bdev module system. > = > Thanks, > Ben > = > = > > = > > Hope this answers the questions, > > Shahar > = > = > = > > From: SPDK on behalf of Harris, James R > > > > Sent: Monday, May 7, 2018 9:18:20 PM > > To: Storage Performance Development Kit > > Subject: Re: [SPDK] SPDK + user space appliance > > = > > There are also calls such as spdk_call_unaffinitized() and > > spdk_unaffinitize_thread() which have been added to enable cases where a > bdev > > module may need to spawn non-polling threads and don=E2=80=99t want tho= se threads to > > inherit the affinity of the calling thread. The SPDK rbd module curren= tly > > uses these (see git commit fa5206c4) since rbd_open is a blocking call. = > (Note > > that librbd does now support rbd_aio_open which is better suited for SP= DK.) > > = > > -Jim > > = > > = > > On 5/7/18, 11:02 AM, "SPDK on behalf of Walker, Benjamin" s. > > 01.org on behalf of benjamin.walker(a)intel.com> wrote: > > = > > Hi Shahar, > > = > > Thank you for submitting the patch. I've looked through it in detai= l and > I > > think > > I understand the purpose of this code, but I'm going to explain it = back > to > > you > > so you can correct me where I'm wrong. > > = > > I think this code solves two distinct problems: > > = > > 1) You need to forward I/O out of the bdev layer to some custom bac= kend, > > and you > > want the code that does that to live outside of the SPDK repository. > > = > > 2) Your custom back-end library isn't suitable for use in a run-to- > > completion > > model. By that I mean that you can't just call your library directl= y on > > the > > thread that originally receives the spdk_bdev_io request because yo= ur > > library > > either blocks or generally takes too long to return from the submis= sion > > call > > (maybe it is doing inline compression or something). Instead, you n= eed > to > > shuttle those requests off to separate threads for handling. > > = > > As far as point #1, today the SPDK build system does not nicely > > accommodate bdev > > modules whose code lives outside of SPDK. SPDK expects them to be in > > lib/bdev/. However, that's a fairly straightforward ch= ange > to > > the > > build system and it's one we've been intending to make for some tim= e. > > = > > For point #2, this is likely the case for a large number of storage > back- > > ends, > > but I think the proper way to solve it is probably back-end specifi= c and > > not > > general purpose. As a counter-point, SPDK already integrates with a > number > > of > > third-party storage back-ends today (Ceph RBD, libiscsi, libaio, et= c.) > and > > none > > of those ended up needing to pass messages to other threads. They a= ll > > support > > asynchronous operations, though. I could imagine writing a bdev mod= ule > > that > > ultimately makes POSIX preadv calls, for instance. That would need = to be > > implemented with a thread pool and each bdev_io gets funneled off t= o a > > thread in > > the pool to perform the blocking operation. > > = > > Ok - I explained what I think I'm understanding. Now tell me where I > went > > wrong > > :) > > = > > Thanks, > > Ben > > = > > On Sun, 2018-05-06 at 10:32 +0000, Shahar Salzman wrote: > > > Hi, > > > = > > > I pushed the code for review, thanks Daniel for the help. > > > = > > > In a nutshell: > > > - bdev_user - an API for a user appliance to use spdk as an > iSCSI/NVMeF > > target > > > - bdev_user_example - reference application > > > - The API relies on rings in order to submit/complete IOs > > > - User appliance registers callbacks for submit_io (should we have > > > read/write/other instead?) > > > - User appliance registers its devices so that they may be added = to an > > > existing namespace (I am using RPC to do the management) > > > = > > > Thanks, > > > Shahar > > > = > > > = > > > From: SPDK on behalf of Verkamp, Da= niel > > > > rkamp(a)intel.com> > > > Sent: Thursday, May 3, 2018 8:50 PM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] SPDK + user space appliance > > > = > > > Hi Shahar, > > > = > > > The target branch for the push should be =E2=80=98refs/for/master= =E2=80=99, not > =E2=80=98master=E2=80=99 > > =E2=80=93 if > > > you configured a remote as specified in http://www.spdk.io/develo= pment > / > > it > > > should look like: > > > = > > > [remote "review"] > > > url =3D https://review.gerrithub.io/spdk/spdk > > > push =3D HEAD:refs/for/master > > > = > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Shah= ar > > Salzman > > > Sent: Thursday, May 3, 2018 1:00 AM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] SPDK + user space appliance > > > = > > > Hi Ben, > > > = > > > I have the code ready for review (spdk/master on dpdk/18.02), but= I do > > not > > > have push rights for gerrithub: > > > shahar.salzman(a)shahars-vm:~/Kaminario/git/spdk$ git push spdk-r= eview > > > HEAD:master > > > Password for 'https://ShaharSalzman-K(a)review.gerrithub.io': = > > > Counting objects: 109, done. > > > Compressing objects: 100% (22/22), done. > > > Writing objects: 100% (22/22), 8.70 KiB | 0 bytes/s, done. > > > Total 22 (delta 14), reused 0 (delta 0) > > > remote: Resolving deltas: 100% (14/14) > > > remote: Branch refs/heads/master: > > > remote: You are not allowed to perform this operation. > > > remote: To push into this reference you need 'Push' rights. > > > remote: User: ShaharSalzman-K > > > remote: Please read the documentation and contact an administrator > > > remote: if you feel the configuration is incorrect > > > remote: Processing changes: refs: 1, done = > > > To https://ShaharSalzman-K(a)review.gerrithub.io/a/spdk/spdk > > > ! [remote rejected] HEAD -> master (prohibited by Gerrit: ref up= date > > access > > > denied) > > > error: failed to push some refs to 'https://ShaharSalzman-K(a)rev= iew.ger > ri > > thub.i > > > o/a/spdk/spdk' > > > = > > > Am I doing something incorrect, or is this just a permission issu= e? > > > = > > > Thanks, > > > Shahar > > > From: SPDK on behalf of Shahar Salz= man > > > > zman(a)kaminario.com> > > > Sent: Wednesday, April 25, 2018 9:02:38 AM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] SPDK + user space appliance > > > = > > > Hi Ben, > > > = > > > The code is currently working on v17.07, we are planning on bumpi= ng > the > > > version to one of the latest stable versions (18.01?) + master. > > > It will take me (hopefully) a few days to update the code and hav= e our > > > internal CI start running on this version, not sure it would be > useful, > > but I > > > can get our working 17.07 code (+ reference application) for revi= ew > much > > > faster. > > > What is the best course of action? > > > = > > > Shahar > > > From: SPDK on behalf of Walker, Ben= jamin > > > > .walker(a)intel.com> > > > Sent: Tuesday, April 24, 2018 7:19:12 PM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] SPDK + user space appliance > > > = > > > Hi Shahar, > > > = > > > Would you be willing to submit your bdev module as a patch on > GerritHub? > > That > > > way everyone can take a look and provide feedback. If you don=E2= =80=99t want > it > > to run > > > the tests, you can put [RFC] and the beginning of the commit mess= age. > > > = > > > Thanks, > > > Ben > > > = > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Shah= ar > > Salzman > > > Sent: Monday, April 23, 2018 8:45 AM > > > To: spdk(a)lists.01.org > > > Subject: Re: [SPDK] SPDK + user space appliance > > > = > > > Hi Ben, > > > = > > > Bumping this thread since I've been having some new thoughts on t= he > > issue now > > > that we are starting integration with newer spdk versions. > > > Unfortunately the merge isn't as smooth as I'd like it to be sinc= e the > > bdev > > > module is pretty tightly integrated into spdk, perhaps we made so= me > > false > > > assumptions writing the module, but it seems some of the newer sp= dk > > features > > > are complicating the integration. > > > My question is, if this passthrough module is useful, wouldn't it= be > > better to > > > maintain it as part of spdk so that we can catch issues as soon as > they > > show > > > up? > > > We would be happy to help with maintaining this module, the module > with > > is > > > currently part of our CI with our "frozen" spdk version, but once > > integrated > > > into the newer version we choose, I'll add it to the CI our CI as > well. > > > = > > > Shahar > > > From: SPDK on behalf of Walker, Ben= jamin > > > > .walker(a)intel.com> > > > Sent: Friday, February 2, 2018 11:43:58 PM > > > To: spdk(a)lists.01.org > > > Subject: Re: [SPDK] SPDK + user space appliance > > > = > > > On Thu, 2018-02-01 at 08:29 +0000, Shahar Salzman wrote: > > > > Hi Ben, > > > > = > > > > Would you also like to take a look at the bdev_user module? > > > > It still needs some patching (as some of the stuff is still hard > > coded), but > > > I > > > > think we can get most of it cleaned up in a couple of days. > > > > = > > > > In any case, is it the intention that the user write his own bd= ev > > module, or > > > > would this user appliance glue be a useful generic module? > > > = > > > For existing storage stacks that serve block I/O, like the intern= als > of > > a SAN, > > > the idea is that you write your own bdev module to forward I/O co= ming > > out of > > > the > > > SPDK bdev layer. Then you can use the SPDK iSCSI/NVMe-oF/vhost ta= rgets > > mostly > > > as-is. > > > = > > > In some cases, the actual iSCSI/NVMe-oF/vhost target applications > won't > > > integrate nicely directly into an existing storage application be= cause > > they > > > spawn their own threads and allocate their own memory. To support > that, > > the > > > libraries may be consumed directly instead of the applications > > (lib/iscsi, > > > lib/scsi, lib/nvmf, etc.). The libraries don't spawn any of their= own > > threads, > > > but instead rely on SPDK's abstractions in include/spdk/io_channe= l.h. > > See > > > = > > > http://www.spdk.io/doc/concurrency.html > > > = > > > We don't currently have a way to write a custom bdev module that > resides > > > outside > > > of the SPDK source tree, but it's very possible to add support for > that. > > But > > > beyond that inconvenience (just drop your module in lib/bdev for = now), > > writing > > > a > > > bdev module is the recommended way of interacting with the bottom= end > of > > the > > > SPDK bdev layer. I think that's what you really want to be doing = in > your > > code, > > > from what I can tell. > > > = > > > I hope that helps! > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > = > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============1448628841847064941==--