From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============5390339317543613310==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] spdk_malloc vs. malloc Date: Tue, 08 May 2018 18:10:33 +0000 Message-ID: <1525802914.2784.36.camel@intel.com> In-Reply-To: 8B0ABB2F-D73F-4309-811B-E84F3A51B989@netapp.com List-ID: To: spdk@lists.01.org --===============5390339317543613310== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Tue, 2018-05-08 at 17:57 +0000, Rodriguez, Edwin wrote: > The problem currently with the dpdk based implementation of spdk_malloc is > that it ignores the flag altogether so somebody could pass in 0 without it > ever being noticed. https://review.gerrithub.io/#/c/spdk/spdk/+/410479/ Hopefully this check will prevent people from using spdk_malloc in a way th= at isn't implementable for your env. > = > From: "Walker, Benjamin" > Date: Tuesday, May 8, 2018 at 1:54 PM > To: Ed Rodriguez , "spdk(a)lists.01.org" .org>, "z.khatami88(a)gmail.com" , "Harris, Jame= s R" es.r.harris(a)intel.com> > Subject: Re: [SPDK] spdk_malloc vs. malloc > = > On Tue, 2018-05-08 at 17:35 +0000, Rodriguez, Edwin wrote: > > Something like this would work for NetApp: > > = > > enum spdk_malloc_flags { > > SPDK_MALLOC_DMA, > > SPDK_MALLOC_SHARE, > > SPDK_MALLOC_DMA_SHARE > > }; > > void *spdk_malloc(size_t size, size_t align, uint64_t *phys_addr, int > > socket_id, enum spdk_malloc_flags flags); > = > = > This is very close to the code that is already merged. The code today uses > #defines instead of an enum, and DMA is 1, SHARE is 2, and DMA+SHARE is 3= , but > it's otherwise the same thing. Although in the current code it is technic= ally > possible to pass no flags while in your definition it is not possible, it= is > already the case that passing no flags is not valid and the call should f= ail. > = > The flags are definitely intended to be prohibited from being 0, which I = think > is the key requirement for you guys. We can make that clearer in the > documentation for sure. > = > > = > > Ed R > > = > > = > > From: SPDK on behalf of Ed Rodriguez > > > @netapp.com> > > Reply-To: Storage Performance Development Kit > > Date: Monday, May 7, 2018 at 1:39 PM > > To: "Walker, Benjamin" , "Harris, James R" > > > .harris(a)intel.com>, "spdk(a)lists.01.org" , "z.k= hatami88(a)gm > > ai > > l.com" > > Subject: Re: [SPDK] spdk_malloc vs. malloc > > = > > spdk_malloc could work for cases 2-4 if the flags param were prohibited= from > > being 0 - in our implementation we don't do anything special for shared > > memory > > so spdk_free can continue to call the equivalent of spdk_dma_free. > > = > > Ed R > > = > > On 5/7/18, 1:19 PM, "Walker, Benjamin" wr= ote: > > = > > SPDK needs four different types of memory allocation operations to= day: > > = > > 1) Standard virtual memory > > 2) Process-shared memory > > 3) DMA-safe memory ("pinned") > > 4) DMA-safe and process-shared memory > > = > > For #1, SPDK is using POSIX to do that allocation and free. That o= ne is > > a > > very > > separate discussion than the other 3, so I'll leave that alone. > > = > > For numbers 2-4, SPDK could either have 3 separate functions or one > > function > > with flags to do the allocations. I don't think the allocation side > > choice > > makes > > much of a difference. On the free side, however, we need to decide= if > > we're > > going to do one generic spdk_free(void *buf), an spdk_free with fl= ags, > > or > > 3 > > separate free functions corresponding to 3 separate allocation > > functions. > > We're > > currently moving toward spdk_malloc with flags and spdk_free witho= ut, > > but > > that > > is up for debate. > > = > > There are at least 3 known implementations of the SPDK env layer t= oday, > > so > > I'll > > attempt to outline what each of those needs so we have some concre= te > > examples. > > = > > The primary implementation of the env layer, which ships with SPDK= , is > > the > > one > > based on DPDK. DPDK internally always allocates all memory as both= DMA- > > safe and > > shared, so the distinction there isn't necessary. Essentially, > > allocation > > types > > 2-4 are all degenerate and map to a single DPDK call. On the free = side, > > DPDK > > does not need to know what type of allocation it was (again, becau= se > > they're all > > degenerate). > > = > > One known alternative implementation (Oracle) has three separate > > operations > > corresponding to memory allocation types 2, 3, and 4. However, my > > understanding > > is that they also have internal tracking structures such that they= can > > figure > > out which type of allocation an address came from originally, so t= hey > > are > > able > > to implement spdk_free(void *buf) by doing internal look-ups and d= on't > > need the > > user to pass them flags. > > = > > Another known alternative implementation (NetApp) has two memory > > allocation > > operations internally. One corresponds to both types 1 and 2, and = the > > other > > corresponds to types 3 and 4. Further, on the free side there are > > different free > > functions for types 1 and 2 compared to types 3 and 4. The current > > abstraction > > in SPDK has a single call for types 2-4 with no flags, so it's val= id > > criticism > > that this API is difficult for NetApp to implement. We need to fig= ure > > out > > a way > > to address this. > > = > > The easiest solution today, in my opinion, is for the NetApp > > implementation to > > always allocate DMA-safe memory in response to calls to spdk_mallo= c, > > even > > if > > only the shared flag was specified (case #2). Note that SPDK doesn= 't > > call > > spdk_malloc for case #1 already, so that's a separate code path. T= hen > > always do > > the DMA-safe memory free in response to spdk_free. That will defin= itely > > make it > > work correctly today, and then the NetApp implementation doesn't n= eed > > the > > flags > > on free either. > > = > > I'd love to have as many people weigh in here as possible. The more > > diverse > > opinions we get, the better the ultimate design will be. We absolu= tely > > have to > > make sure this memory abstraction is solid and meets everyone's ne= eds, > > while > > still being reasonably simple to use. > > = > > Thanks, > > Ben > > = > > = > > On Fri, 2018-05-04 at 17:17 -0700, Harris, James R wrote: > > > You=E2=80=99re right Ed. We=E2=80=99ll need to hold off on addi= ng usages of this new > > API > > > until we get this sorted out. = > > > > > > What if we just add an spdk_dma_shared_malloc() and got rid of t= he > > flags > > > completely? And then just keep all of the existing malloc/free = calls > > where > > > DMA/shared memory is not required. This should alleviate the im= pact > > to > > your > > > code base and I *think* would work for Oracle =E2=80=93 but woul= d need Zahra > > to > > > confirm. This adds an explicit assumption of course that we won= =E2=80=99t > > have > > a need > > > for some other kind of malloc =E2=80=98flag=E2=80=99 in the futu= re. > > > > > > Sorry John =E2=80=93 I know you wanted to end this thread but I= =E2=80=99ve decided to > > continue > > > it. ( > > > > > > -Jim > > > > > > > > > > > > On 5/4/18, 3:37 PM, "Rodriguez, Edwin" > > wrote: > > > > > > The problem I have with the new code is that it makes > > distinguishing > > > between dma and non-dma free difficult. In the kernel, dma memo= ry > > comes > > from > > > a different pool than regular allocations, i.e. > > contigmalloc/contigfree > > in > > > FreeBsd. How does spdk_free distinguish between a block allocat= ed > > with > > > SPDK_MALLOC_DMA and a regular block? > > > = > > > Ed R > > > = > > > On 5/4/18, 6:30 PM, "SPDK on behalf of Meneghini, John" > bounce > > s(a)list > > > s.01.org on behalf of John.Meneghini(a)netapp.com> wrote: > > > = > > > Thanks, Ben, et al, for your reply. > > > = > > > > you may want to avoid calling them at run-time because= they > > can > > > fail, but SPDK generally already does > > > > that for performance reasons anyway. Can you outline w= hat > > you > > mean? > > > = > > > What is run-time? We are using the SPDK libraries insi= de of > > a > > kernel > > > kmod. So, all time is run-time. > > > = > > > I think we can end this thread. I tried to win this arg= ument > > once > > > before. I guess calloc() and malloc() are here to stay... so > > > it doesn't matter what we do with spdk_malloc(). > > > = > > > :-) > > > = > > > If anyone wants to ask me about my views on this topic, = I'll > > be > > at the > > > SPDK Developer's Summit in two weeks. > > > = > > > See you then. > > > = > > > /John > > > = > > > On 5/3/18, 7:24 PM, "Walker, Benjamin" > .co > > m> > > > wrote: > > > = > > > On Thu, 2018-05-03 at 20:14 +0000, Meneghini, John w= rote: > > > > The issue is: spdk_malloc and spdk_calloc are logi= cal > > APIs > > to > > > abstract the > > > > POSIX malloc and calloc calls. We chose to NOT use > > those > > API > > > names originally > > > > and went with the spdk_dma_{} api names. My concer= n is > > that > > > taking those API > > > > names now will eliminate the possibility of any ot= her > > use > > in the > > > future. > > > > > > > > I am not asking to "fully abstract POSIX". I am s= imply > > saying > > > that the > > > > malloc(), calloc() and free() POSIX APIs suffer f= rom > > some > > > problems. This is > > > > why DPDK implemented rte_malloc() and rte_calloc()= . = > > As > > we move > > > forward to > > > > build production quality products out of SPDK we w= ill > > want > > to > > > abstract malloc, > > > > calloc and free; especially in the libraries. I d= on't > > care > > > about what the > > > > applications do. I want a better memory management > > APIs > > in the > > > libraries. > > > = > > > DPDK implemented rte_malloc and rte_calloc to alloca= te > > DMA- > > safe > > > and shared > > > memory. They still call POSIX malloc and calloc for = all > > other > > > types of > > > allocations. This is the same thing that SPDK has do= ne. > > > = > > > I've never heard of malloc, calloc, and free having > > problems > > to > > > the extent that > > > you'd want to reimplement them in your application s= ome > > other way. > > > Certainly, > > > you may want to avoid calling them at run-time becau= se > > they > > can > > > fail, but SPDK > > > generally already does that for performance reasons > > anyway. > > Can > > > you outline what > > > you mean? > > > = > > > > > > > > I'd be happy to simply map spdk_malloc/calloc() to > > > rte_malloc/calloc() and I > > > > don't see the point introducing an spdk_malloc() t= hat > > only > > > extends > > > > spdk_dma_malloc(). I'd rather absorb an API chang= e to > > > spdk_dma_malloc(). We > > > > change many APIs in SPDK from release to release. I > > don't > > see > > > why this API > > > > can't be changed - with agreement from the SPDK > > community. > > > > > > > > /* > > > > * Allocate memory on default heap. > > > > */ > > > > void * > > > > rte_malloc(const char *type, size_t size, unsigned > > align) > > > > { > > > > return rte_malloc_socket(type, size, align, > > > SOCKET_ID_ANY); > > > > } > > > > > > > > /* = > > > > * Allocate zero'd memory on default heap. > > > > */ > > > > void * > > > > rte_calloc(const char *type, size_t num, size_t si= ze, > > unsigned > > > align) > > > > { > > > > return rte_zmalloc(type, num * size, align= ); > > > > } > > > > > > > > /John > > > > > > > > On 5/3/18, 3:21 PM, "Walker, Benjamin" > @in > > tel.co > > > m> wrote: > > > > > > > > On Thu, 2018-05-03 at 18:23 +0000, Meneghini, = John > > wrote: > > > > > > If the user passes flags =3D=3D 0 to the = new > > spdk_malloc() > > > call, this > > > > could be > > > > > implemented by malloc() or equivalent behind= the > > scenes, > > > > > = > > > > > So, does this mean you=E2=80=99re willing to= change all > > calls to > > > malloc(size) > > > > with > > > > > spdk_malloc(size, 0, NULL, > > SPDK_ENV_SOCKET_ID_ANY, > > > SPDK_MALLOC_SHARE)? > > > > > = > > > > > Is that the plan? > > > > > > > > > = > > > > Hi John, > > > > = > > > > SPDK explicitly depends on POSIX throughout the > > code > > base. > > > We do have an > > > > environment abstraction layer designed to make > > porting > > SPDK > > > to various > > > > environments (i.e. away from DPDK) easier, but= this > > only > > > abstracts > > > > operations > > > > that are not available through standard POSIX > > calls. > > We're > > > concerned that > > > > fully > > > > abstracting POSIX would introduce a significant > > barrier to > > > entry for new > > > > contributors to the project, while only enabli= ng > > one > > > additional use case > > > > that > > > > we're aware of. I understand that this one use= case > > happens > > > to be yours, > > > > and so > > > > this is a challenge for you. > > > > = > > > > In your code today, I assume you carry patches= to > > replace > > > the POSIX calls > > > > with > > > > your available alternatives. We've attempted to > > make > > > carrying these > > > > patches > > > > reasonably easy by moving all of the POSIX inc= ludes > > into a > > > single header > > > > (include/stdinc.h). Since you're already carry= ing > > those > > > patches, can you > > > > simply > > > > choose a different name for your replacement f= or > > > malloc/calloc? > > > > = > > > > Thanks, > > > > Ben > > > > = > > > > > > > = > > > = > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > = > > > = > > > = > > > > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > = > =20 --===============5390339317543613310==--