All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luse, Paul E <paul.e.luse at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Zero Copy (was BDEV-IO Lifecycle - Need your input.)
Date: Thu, 12 Jul 2018 01:16:17 +0000	[thread overview]
Message-ID: <82C9F782B054C94B9FC04A331649C77AAD4C1354@fmsmsx104.amr.corp.intel.com> (raw)
In-Reply-To: 20fe60ce3eb2ea4d25636aa8132e2e2fa38d07f4.camel@intel.com

[-- Attachment #1: Type: text/plain, Size: 29649 bytes --]

This is a great explanation Ben, thanks.  I'll start looking through the patches as well.  A few quick questions for additional context and to make sure I understand your terminology:

* Can you expand on " The populate flag determines whether the "live" data is present in the given buffer" maybe with an example or something (what do you mean by "live", you use it a few times)
* You mention " I believe I've addressed all known use cases with this design" but in this email chain, at least, only cover the NVMe-oF use case.  Can you list the known cases you mention and maybe just say a few words about them?

Thanks
Paul

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Walker, Benjamin
Sent: Wednesday, July 11, 2018 2:53 PM
To: John.Meneghini(a)netapp.com; Sriram.Popuri(a)netapp.com; Srikanth.Kaligotla(a)netapp.com
Cc: raju.gottumukkala(a)broadcom.com; spdk(a)lists.01.org
Subject: Re: [SPDK] Zero Copy (was BDEV-IO Lifecycle - Need your input.)

I've done a bunch of work here to formalize the zero copy API in the bdev layer.
The patch series begins here:

https://review.gerrithub.io/#/c/spdk/spdk/+/386166/

There has been some good discussion on the patches themselves, but I wanted to circle back to the mailing list for a bit. The basic idea is to add two new functions to the bdev layer:

int spdk_bdev_zcopy_start(struct spdk_bdev_desc *desc,
			  struct spdk_io_channel *ch,
			  uint64_t offset_blocks, uint64_t num_blocks,
			  bool populate,
			  spdk_bdev_io_completion_cb cb, void *cb_arg); int spdk_bdev_zcopy_end(struct spdk_bdev_io *bdev_io, bool commit,
			spdk_bdev_io_completion_cb cb, void *cb_arg);

The zcopy start call makes a request to the backing block device to provide a region of memory suitable for performing a zero copy update to the region described by the offset_blocks and num_blocks parameters. The populate flag determines whether the "live" data is present in the given buffer. When this request completes, it returns a bdev_io in the completion callback. That bdev_io will have a scatter gather list filled out describing the memory region.

When the user has finished transferring data into or out of the data buffer, the user calls spdk_bdev_zcopy_end with the originally provided bdev_io. The commit flag indicates whether the data in the buffer needs to be made "live".

The NVMe-oF target can take advantage of these new APIs by altering the state machine in the RDMA layer to call spdk_bdev_zcopy_start to obtain data buffers after parsing the command, then performing RDMA transfers into or out of the provided region, and finally completing the command with spdk_bdev_zcopy_end.
The work to do those modifications has not been done yet, and I expect them to be a significant amount of effort.

As many bdev modules as possible should attempt to support zero copy semantics, but for those bdev modules that aren't able to, one of the patches in my series will automatically emulate the zcopy commands using regular reads and writes.
This allows the higher level users of the bdev API to always make the zcopy calls for simplicity. The emulation path should be no worse, in terms of performance, than doing a regular read or write.

I believe I've addressed all known use cases with this design, and while there will certainly be some future patches to fix problems that come up, I'm confident that we're on the right track here. If there is any feedback in this area at all, now is the time to provide it.

I think this also leads into some really awesome future work that I'll suggest in the hope that someone in the community is inspired. One idea is that the bdev_nvme module could support zero copy semantics by returning a pointer to a portion of a Controller Memory Buffer or Persistent Memory Buffer suitable for data. There aren't many devices for sale today that can do this, but it's in the spec and I imagine they're coming. This would allow the NVMe-oF target, for example, to perform RDMA transfers peer-to-peer between the NIC and the NVMe SSD. Similarly, for the bdev_pmem module, direct pointers into the the persistent memory could be returned. The NVMe-oF target could then perform RDMA transfers into or out of the persistent memory directly, and the zcopy end call with commit set to true would perform the required flush instructions.

Thanks,
Ben


On Mon, 2018-07-09 at 19:19 +0000, Popuri, Sriram wrote:
> I would like to jot down what requirements we are looking at. Wherever 
> there is "====wait" for something, there is a potential context switch.
> 
> For a write request, the following is the workflow for zero copy:
> 1) Host submits Write request
> 2) Protocol validation of request
> 4) Get iovs
> 5) R2T/RDMA_READ
> ====wait for host====
> 5) Host transfers data (data_out/RDMA_READ_COMPLETE)
> 6) Perform Write(spdk_bdev_writev_blocks) to bdev_module ====wait for 
> bdev module to respond====
> 7) on bdev_module response, send rsp/status to host
> 8) free req resources
> 
> Read path:
> 1) Host submits Read request
> 2) Protocol validatation of request
> 3) Submit read to bdev_module
> ====wait for data blocks from bdev_module====
> 4) respond iov (data_in/RDMA_WRITE)
> ====wait for host====
> 5) host ack/RDMA_WRITE_COMPLETE
> 6) free req resources
> 
> In future we will have Persistent reservations where there can be 
> read/modify/update cases depending on how the bdev module handles it. 
> So the zero copy apis needs to address future requirements as well.
> 
> The above workflows suggest we require a state to track at what step 
> the request is. A state will help to cleanup resources properly when 
> handling aborts, disconnects, namespace destroy workflows etc.
> So the requirement is to figure out a way to clean up resources 
> according to the state of the bdev_io in its life cycle and stitch in 
> the zero copy workflows into existing spdk code which assumes buffers are pre-allocated.
> 
> I think Srikanth's proposed changes are made in target layer to 
> populate and commit to bdev module. However his suggestion is changing 
> the bdev_io life cycle which I understand from John is not agreeable to others.
> 
> With the new zcopy apis also there is no clarity on the bdev_io life 
> cycle. A bdev_io is acquired during zcopy_start and it's not clear how 
> to deal with the bdev_io allocated in read/write paths.
> 
> The orchestration of zero copy either be done at transport layer, 
> target layer or bdev module layer.
> 
> We don't like the orchestration done from transport layer because:
> 1) No clear way to deal with bdev_io (acquired from zcopy) and bdev_io 
> (acquired from read/write codepaths).
> 2) If I understand correctly, by the time a read/write request is 
> submitted to target, the data_out would have happened. So the protocol 
> validation happens after data_out phase. For example an LBA out of 
> range request or invalid SGL length is returned after a data_out is 
> performed. We want to run through the protocol validation before data_out phase.
> 
> I feel the bdev module is the ideal place to deal with zero copy 
> buffers. The bdev module knows how to get buffers and when to claim 
> them. All it needs is a proper cleanup and by the time request hits 
> bdev module, bdev_io is acquired so no need to change bdev_io life cycle or introduce a new io type.
> 
> I propose the following:
> 1) Host submits Write request
> 2) Protocol validation of request
>    spdk_nvmf_request_exec
> 4) submit request to bdev module
>    spdk_bdev_writev_blocks(). bdev_io is acquired
> 5) bdev module determines the buffers needs to be allocated.
>    Will perform something like what is prposed in the patch:
> spdk_bdev_zcopy_start(populate=false)
> 6) New spdk_nvmf_request_exec_status
> (SPDK_NVMF_REQUEST_EXEC_STATUS_BUFF_READY)
> 7) Transport does R2T/RDMA_READ
> ====wait for host====
> 8) Host transfers data (data_out/RDMA_READ_COMPLETE)
> 9) Perform Write again(spdk_nvmf_request_exec/spdk_bdev_writev_blocks) 
> to bdev_module.
> ====wait for bdev module to respond====
> 10) on bdev_module response, send rsp/status to host
> 11) free req resources
>     free bdev_io spdk_bdev_free_io.
>     Here we need a hook to free resources allocated by bdev module
>     spdk_bdev_free_io_driver_ctx(bdev_io->driver_ctx)
> 
> Read path:
> 1) Host submits Read request
> 2) Protocol validatation of request
> 3) Submit the request (spdk_bdev_readv_blocks) to bdev_module ====wait 
> for data blocks from bdev_module====
> 4) respond iov (data_in/RDMA_WRITE)
> ====wait for host====
> 5) host ack/RDMA_WRITE_COMPLETE
> 6) free req resources
>    free bdev_io spdk_bdev_free_io.
>    Here we need a hook to free resources allocated by bdev module
>    spdk_bdev_free_io_driver_ctx(bdev_io->driver_ctx)
> 
> If it makes sense we can work on a patch.
> 
> Regards,
> ~Sriram
> 
> -----Original Message-----
> From: Walker, Benjamin <benjamin.walker(a)intel.com>
> Sent: Monday, July 9, 2018 11:22 PM
> To: Meneghini, John <John.Meneghini(a)netapp.com>; Popuri, Sriram 
> <Sriram.Popuri @netapp.com>; Kaligotla, Srikanth 
> <Srikanth.Kaligotla(a)netapp.com>
> Cc: raju.gottumukkala(a)broadcom.com; Rodriguez, Edwin 
> <Ed.Rodriguez(a)netapp.com> ; spdk(a)lists.01.org; Pai, Madhu 
> <Madhusudan.Pai(a)netapp.com>; NGC-john.barnard- broadcom.com 
> <john.barnard(a)broadcom.com>
> Subject: Re: BDEV-IO Lifecycle - Need your input.
> 
> On Mon, 2018-07-09 at 16:18 +0000, Meneghini, John wrote:
> > Hi Jim.
> > 
> > This is the patch series I believe Ben is proposing to replace
> > Srikanth's:  ht tps://review.gerrithub.io/#/c/spdk/spdk/+/415860/
> > 
> > pick a21bbcf2 nvmf: Move data buffer pool to generic layer pick
> > 09586e54 bdev: Add a zero copy I/O path pick 1cc9fe95 bdev: Make 
> > malloc bdev use the new zero copy mechanism pick 40a1f62b bdevperf:
> > Use new zcopy API for reads pick eba6f35a bdev: Emulate zero copy 
> > support when necessary
> > 
> > Is this correct?
> 
> 
> My patch series doesn't entirely replace Srikanth's set of patches. 
> There are two separate things necessary to implement zero copy. First 
> is the infrastructure within the bdev layer to make requests of a bdev 
> module to obtain  or release a suitable buffer for a zero copy 
> operation. Second is actually making the NVMe-oF target use that 
> mechanism, which includes extending the bdev_io lifetime. My patch series only addresses the first part.
> 
> Thanks,
> Ben
> 
> 
> 
> > 
> > /John
> > 
> > On 7/9/18, 11:31 AM, "Harris, James R" <james.r.harris(a)intel.com> wrote:
> > 
> >     Hi Sriram:
> >     
> >     Ben has some later patches in this series with some examples:
> >     
> >     https://review.gerrithub.io/#/c/spdk/spdk/+/386167/ - implements 
> > this zero copy API in the simple ‘malloc’ bdev module
> >     https://review.gerrithub.io/#/c/spdk/spdk/+/416579/ - uses the 
> > zero copy API in the SPDK bdevperf utility
> >     
> >     -Jim
> >     
> >     
> >     On 7/9/18, 3:34 AM, "Popuri, Sriram" <Sriram.Popuri(a)netapp.com> wrote:
> >     
> >         Sorry was not focusing on this change. Just give me a day or 
> > two to get back.
> >         From a quick glance I didn't understand how zcopy start/end 
> > fits into a read/write life cycle. Is there an example on how zcopy 
> > start/end is consumed or someone can do give me a quick dump on how 
> > its envisioned to be used.
> >         
> >         Regards,
> >         ~Sriram
> >         
> >         -----Original Message-----
> >         From: Meneghini, John 
> >         Sent: Friday, July 6, 2018 10:16 PM
> >         To: Walker, Benjamin <benjamin.walker(a)intel.com>; Verkamp, 
> > Daniel <dan iel.verkamp(a)intel.com>; Harris, James R 
> > <james.r.harris(a)intel.com>; Kaligotla, Srikanth 
> > <Srikanth.Kaligotla(a)netapp.c
> > om>
> >         Cc: raju.gottumukkala(a)broadcom.com; spdk(a)lists.01.org; 
> > Rodriguez, Edwin <Ed.Rodriguez(a)netapp.com>; Pai, Madhu 
> > <Madhusudan.Pai(a)netapp.com>; NGC- john.barnard-broadcom.com 
> > <john.barnard(a)br oadcom.com>; Popuri, Sriram <Sriram.
> > Popuri(a)netapp.com>
> >         Subject: Re: BDEV-IO Lifecycle - Need your input.
> >         
> >         I'm adding Sriram to this thread.  Sriram is another NetApp 
> > engineer who is working on this stuff internally.
> >         
> >         /John
> >         
> >         On 7/6/18, 12:41 PM, "Walker, Benjamin" 
> > <benjamin.walker(a)intel.com>
> > wrote:
> >         
> >             Hi Srikanth,
> >             
> >             I wanted to check in to see if you had any additional 
> > feedback on the zero copy
> >             operations in the bdev layer here:
> >             
> >             https://review.gerrithub.io/#/c/spdk/spdk/+/386166/
> >             
> >             This does not address extending the lifetime of the 
> > bdev_io in the NVMe-oF
> >             target, but I think this is looking like the right 
> > mechanism for the bdev layer.
> >             
> >             Thanks,
> >             Ben
> >             
> >             On Thu, 2018-06-21 at 18:11 -0700, Harris, James R wrote:
> >             > Thanks Srikanth.  Sounds like the spdk_bdev_io pool 
> > sizing along with
> >             > spdk_bdev_queue_io_wait() meets your needs then 
> > regarding the spdk_bdev_io
> >             > memory.
> >             >  
> >             > Regarding zcopy_start/zcopy_end – it looks like you’ve 
> > already added a bunch
> >             > of comments to Ben’s patch on GerritHub.  For now I’d 
> > say let’s continue our
> >             > discussion there.  I’ve responded to a couple of 
> > similar questions there and
> >             > I’m sure Ben will have more replies tomorrow.
> >             >  
> >             > -Jim
> >             >  
> >             >  
> >             > From: "Kaligotla, Srikanth" <Srikanth.Kaligotla(a)netapp.com>
> >             > Date: Thursday, June 21, 2018 at 5:01 PM
> >             > To: James Harris <james.r.harris(a)intel.com>, "Walker, 
> > Benjamin"
> > <benjamin.walk
> >             > er(a)intel.com>, Daniel Verkamp <daniel.verkamp(a)intel.com>
> >             > Cc: "raju.gottumukkala(a)broadcom.com" 
> > <raju.gottumukkala(a)broadcom .com>,
> >             > "Meneghini, John" <John.Meneghini(a)netapp.com>, 
> > "Rodriguez, Edwin" <Ed.Rodrigue
> >             > z(a)netapp.com>, "Pai, Madhu" 
> > <Madhusudan.Pai(a)netapp.com>,
> > "NGC-
> > john.barnard-
> >             > broadcom.com" <john.barnard(a)broadcom.com>, 
> > "spdk(a)lists.01.org" < spdk(a)lists.01.
> >             > org>
> >             > Subject: RE: BDEV-IO Lifecycle - Need your input.
> >             >  
> >             > Hi Jim,
> >             >  
> >             > I wish I joined the community meeting, I also missed the
> >             > spdk_bdev_queue_io_wait().
> >             >  
> >             > So, there are 2 issues I am intending to solve,
> >             >  
> >             > 1.       We want to ensure that an instance of bdev-io is
> > acquired prior to or
> >             > along with I/O data buffer. This allows for better 
> > error handling when bdev_io
> >             > pool is exhausted. Yes, we can solve it via sizing the 
> > pool right. The changes
> >             > queue-io-wait also addresses the problem.
> >             > 2.       Most importantly, by extending the life of bdev-io
> > (Same life span as
> >             > nvmf_request structure), abort use case and other use 
> > cases that involve
> >             > cleaning up of IO data buffer are accomplished 
> > effectively. Let me elaborate;
> >             > bdev-io is the fabric that connects the nvme command 
> > and operation with the
> >             > backend. The driver context and data buffer context is 
> > stored in bdev-io. The
> >             > freeing up of bdev_io resource is pushed to the end so 
> > the I/O cleanup can
> >             > happen after controller has transmitted the data to 
> > the host. In absence of
> >             > bdev_io, we will have end up adding more and more void 
> > context in request
> >             > structure. Hence the push to extend the life of bdev_io.
> >             >  
> >             > I see Ben’s recent patch “zcopy_start” and 
> > “zcopy_end”; We can make that work
> >             > as long as the bdev-io allocated/acquired stays till the end.
> > One of the
> >             > challenges I see with that patch is defining a 
> > callback for the I/O
> >             > submission. For instance, zopy_start will allocate 
> > bdev_io and submits the I/O
> >             > to the bdev device. Underlying implementation can be 
> > synchronous or can be
> >             > asynchronous. The callback for this submission should 
> > check to see if it is a
> >             > BUFFER-READY or PENDING and accordingly relay it back 
> > to transport. Next phase
> >             > is actual I/O submission. Let’s say, it reuses the 
> > bdev-io obtained in ZCOPY-
> >             > START, now the callback should determine if it is a 
> > success or failure.
> >             > Finally when ZCOPY-END is invoked the supplied bdev_io 
> > will have all necessary
> >             > data to release the WRITE buffer or unlock the read 
> > buffer based on the
> >             > operation performed.
> >             >  
> >             > I hope I’m making sense. I guess, my effort is to 
> > extend the lifespan of bdev-
> >             > io and let it continue to host the driver context and 
> > buffer context populated
> >             > during the BEGIN phase.
> >             >  
> >             > Regards,
> >             > Srikanth
> >             >  
> >             > From: Harris, James R <james.r.harris(a)intel.com> 
> >             > Sent: Thursday, June 21, 2018 2:21 PM
> >             > To: Kaligotla, Srikanth 
> > <Srikanth.Kaligotla(a)netapp.com>; Walker, Benjamin <ben
> >             > jamin.walker(a)intel.com>; Verkamp, Daniel 
> > <daniel.verkamp(a)intel.c
> > om>
> >             > Cc: raju.gottumukkala(a)broadcom.com; Meneghini, John 
> > <John.Menegh ini(a)netapp.com
> >             > >; Rodriguez, Edwin <Ed.Rodriguez(a)netapp.com>; Pai, 
> > Madhu <Madhu sudan.Pai(a)neta
> >             > pp.com>; NGC-john.barnard-broadcom.com 
> > <john.barnard(a)broadcom.co
> > m>; spdk(a)lists
> >             > .01.org
> >             > Subject: Re: BDEV-IO Lifecycle - Need your input.
> >             >  
> >             > Hi Srikanth,
> >             >  
> >             > Following up on this thread and the discussion in 
> > yesterday’s community
> >             > meeting.
> >             >  
> >             > The recent spdk_bdev_queue_io_wait() changes allow an 
> > SPDK application to work
> >             > better in general when the spdk_bdev_io buffer pool is 
> > exhausted.  We still
> >             > need to make changes to the NVMe-oF target to use this 
> > new API but that should
> >             > be done soon-ish.
> >             >  
> >             > With that in place, the spdk_bdev_io buffer pool 
> > itself can be configured up
> >             > or down when the application starts.  Currently the 
> > default is 64K
> >             > spdk_bdev_io buffers.  sizeof(struct spdk_bdev_io) ==
> > 216 plus the per-IO
> >             > context size allocated for the bdev module.  This can 
> > be up to
> > 192 bytes
> >             > (virtio bdev module) but is likely much smaller for 
> > you depending on the
> >             > context size for your ontap bdev module.
> >             >  
> >             > Let’s assume your per-IO context size is 64 bytes.  
> > 64K x (192 +
> > 64) = 16MB.
> >             >  
> >             > I’m not sure how many spdk_bdev_io you need in flight 
> > at any given time.  64K
> >             > seems like a lot but I’d be curious to hear your 
> > thoughts on this.  If this is
> >             > the right number, then worst case, there would be 
> > about 16MB of DRAM that
> >             > would sit unused if the NVMe-oF target in your system 
> > was not active.  Is that
> >             > too burdensome for your application?
> >             >  
> >             > Thanks,
> >             >  
> >             > -Jim
> >             >  
> >             >  
> >             >  
> >             > From: "Kaligotla, Srikanth" <Srikanth.Kaligotla(a)netapp.com>
> >             > Date: Wednesday, June 20, 2018 at 12:47 PM
> >             > To: "Walker, Benjamin" <benjamin.walker(a)intel.com>, 
> > James Harris <james.r.harr
> >             > is(a)intel.com>, Daniel Verkamp <daniel.verkamp(a)intel.com>
> >             > Cc: "raju.gottumukkala(a)broadcom.com" 
> > <raju.gottumukkala(a)broadcom .com>,
> >             > "Meneghini, John" <John.Meneghini(a)netapp.com>, 
> > "Rodriguez, Edwin" <Ed.Rodrigue
> >             > z(a)netapp.com>, "Pai, Madhu" 
> > <Madhusudan.Pai(a)netapp.com>,
> > "NGC-
> > john.barnard-
> >             > broadcom.com" <john.barnard(a)broadcom.com>, 
> > "spdk(a)lists.01.org" < spdk(a)lists.01.
> >             > org>
> >             > Subject: RE: BDEV-IO Lifecycle - Need your input.
> >             >  
> >             > Hello,
> >             >  
> >             > First revision of changes to extend the lifecycle of 
> > bdev_io are available for
> >             > review. I would like to solicit your input on the 
> > proposed API/Code flow
> >             > changes.
> >             >  
> >             > https://review.gerrithub.io/c/spdk/spdk/+/415860
> >             >  
> >             > Thanks,
> >             > Srikanth
> >             >  
> >             >  
> >             > From: "Kaligotla, Srikanth" <Srikanth.Kaligotla(a)netapp.com>
> >             > Date: Friday, May 11, 2018 at 2:27 PM
> >             > To: "Walker, Benjamin" <benjamin.walker(a)intel.com>, 
> > "Harris, James R" <james.r
> >             > .harris(a)intel.com>
> >             > Cc: "raju.gottumukkala(a)broadcom.com" 
> > <raju.gottumukkala(a)broadcom .com>,
> >             > "Meneghini, John" <John.Meneghini(a)netapp.com>, 
> > "Rodriguez, Edwin" <Ed.Rodrigue
> >             > z(a)netapp.com>, "Pai, Madhu" 
> > <Madhusudan.Pai(a)netapp.com>,
> > "NGC-
> > john.barnard-
> >             > broadcom.com" <john.barnard(a)broadcom.com>, 
> > "spdk(a)lists.01.org" < spdk(a)lists.01.
> >             > org>
> >             > Subject: RE: BDEV-IO Lifecycle - Need your input.
> >             >  
> >             > CC: List
> >             >  
> >             > Hi Ben,
> >             >  
> >             > Your proposal to interface with the backend to acquire 
> > and release buffers is
> >             > good. You have accurately stated that the challenge is 
> > in developing intuitive
> >             > semantics. And that has been my struggle. To me, there 
> > are two problem
> >             > statements;
> >             >  
> >             > 1.       It is expected that bdev_io pool is sized correctly
> > so
> > the call to
> >             > get bdev_io succeeds. Failure to acquire bdev_io will 
> > result in DEVICE-ERROR.
> >             > The transport is already capable of handling temporary 
> > memory failures by
> >             > moving the request to PENDING-Queue. Hence the 
> > proposal to change the bdev_io
> >             > lifecycle and perhaps connect it with the 
> > spdk_nvmf_request object. Thus all
> >             > buffer needs are addressed at the beginning of I/O request.
> >             > 2.       I/O Data buffers are sourced and managed by the
> > backend. One of the
> >             > challenges I see with your proposed interface is the 
> > lack of details like if
> >             > resource is being acquired for a READ operation or 
> > WRITE operation. The
> >             > handling is quite different in each case. Since 
> > bdev_io->type is overloaded
> >             > the type of I/O operation is lost. I suppose one can 
> > cast the cb_arg
> >             > (nvmf_request) and then proceed. Also, the bdev_io 
> > should be present to
> >             > RELEASE the buffer. Zero copy semantics warrants that 
> > data buffer stays until
> >             > controller-to-host has occurred. In other words, 
> > bdev_io lives till REQUEST
> >             > come to COMPLETE state.
> >             >  
> >             > What are your thoughts in introducing spdk_bdev_init() 
> > and
> > spdk_bdev_fini() as
> >             > an alternative approach to extend the lifecyle of 
> > bdev_io and allow data
> >             > buffer management via bdev fn_table ?
> >             >  
> >             > I hope I’m making sense…
> >             >  
> >             > Thanks,
> >             > Srikanth
> >             >  
> >             > From: Walker, Benjamin <benjamin.walker(a)intel.com> 
> >             > Sent: Friday, May 11, 2018 12:28 PM
> >             > To: Harris, James R <james.r.harris(a)intel.com>; 
> > Kaligotla, Srikanth <Srikanth.
> >             > Kaligotla(a)netapp.com>
> >             > Cc: raju.gottumukkala(a)broadcom.com; Meneghini, John 
> > <John.Menegh ini(a)netapp.com
> >             > >; Rodriguez, Edwin <Ed.Rodriguez(a)netapp.com>; Pai, 
> > Madhu <Madhu sudan.Pai(a)neta
> >             > pp.com>; NGC-john.barnard-broadcom.com 
> > <john.barnard(a)broadcom.co
> > m>
> >             > Subject: Re: BDEV-IO Lifecycle - Need your input.
> >             >  
> >             > Hi Srikanth,
> >             >  
> >             > Yes - we'll need to introduce some way to acquire and 
> > release buffers from the
> >             > bdev layer earlier on in the state machine that 
> > processes an NVMe-oF request.
> >             > I've had this patch out for review for several months 
> > as a proposal for this
> >             > scenario:
> >             >  
> >             > https://review.gerrithub.io/#/c/spdk/spdk/+/386166/
> >             >  
> >             > It doesn't pass the tests - it's just a proposal for 
> > the interface. 90% of the
> >             > challenge here is in developing intuitive semantics.
> >             >  
> >             > Thanks,
> >             > Ben
> >             >  
> >             > P.S. This is the kind of discussion that would fit 
> > perfectly on the mailing
> >             > list.
> >             >  
> >             > On Wed, 2018-05-09 at 20:35 +0000, Kaligotla, Srikanth wrote:
> >             > > Hi Ben, Hi James,
> >             > >  
> >             > > I would like to solicit opinions on the lifecycle 
> > for bdev-io resource
> >             > > object. Attached is the image of RDMA state machine 
> > in its current
> >             > > implementation. When the REQUEST enters the 
> > NEED-BUFFER state, the buffers
> >             > > necessary for carrying out I/O operation are 
> > allocated/acquired from the
> >             > > memory pool. An instance of BDEV-IO comes into 
> > existence after REQUEST
> >             > > reaches READY-TO-EXECUTE state. The BDEV-IO is 
> > teared down as soon the
> >             > > backend returns. From a BDEV perspective, BDEV-IO is 
> > simply a translation
> >             > > unit that facilitates I/O buffers from the backend. 
> > The driver context
> >             > > embedded within bdev_io holds great deal of 
> > information pertaining to the
> >             > > I/O under execution. It assists in error handling, 
> > dereferencing the buffers
> >             > > upon I/O completion and in abort handling. In 
> > summary, the bdev_io stays
> >             > > alive until request has come to COMPLETE state. I’d 
> > like to hear peoples
> >             > > thoughts in introducing the plumbing to acquire 
> > BDEV-IO resource in REQUEST-
> >             > > NEED-BUFFER state and release it in REQUEST-COMPLETE 
> > state.  I will shortly
> >             > > have patch available for review that introduces 
> > spdk_bdev_init and
> >             > > spdk_bdev_fini which in turn invokes the 
> > corresponding bdev fn_table to
> >             > >  initialize/cleanup.  
> >             > >  
> >             > > I wanted to use this email to communicate our intent 
> > and solicit your
> >             > > feedback. We have a working implementation of the 
> > above proposal and prior
> >             > > to pushing it upstream for review would like to hear 
> > your thoughts. These
> >             > > proposed changes to upstream are a result of FC 
> > transport work in
> >             > > collaboration with the Broadcom team who are also 
> > copied to this mail.
> >             > > Myself and John will be at SPDK Dev conference and 
> > if required we can
> >             > > elaborate further on this proposal.
> >             > >  
> >             > > Thanks,
> >             > > Srikanth
> >             > 
> >             >
> >             
> >         
> >         
> >     
> >     
> > 
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

             reply	other threads:[~2018-07-12  1:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12  1:16 Luse, Paul E [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-07-12 21:16 [SPDK] Zero Copy (was BDEV-IO Lifecycle - Need your input.) Walker, Benjamin
2018-07-12 21:01 Luse, Paul E
2018-07-12 20:44 Walker, Benjamin
2018-07-12  5:42 Harris, James R
2018-07-12  4:30 Meneghini, John
2018-07-11 21:52 Walker, Benjamin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82C9F782B054C94B9FC04A331649C77AAD4C1354@fmsmsx104.amr.corp.intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.