Re: [SPDK] spdk_malloc vs. malloc

From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] spdk_malloc vs. malloc
Date: Mon, 07 May 2018 17:18:48 +0000	[thread overview]
Message-ID: <1525713527.46960.38.camel@intel.com> (raw)
In-Reply-To: F776E011-11D4-4AAE-BB1A-B04CD4FE88C5@intel.com

[-- Attachment #1: Type: text/plain, Size: 10972 bytes --]

SPDK needs four different types of memory allocation operations today:

1) Standard virtual memory
2) Process-shared memory
3) DMA-safe memory ("pinned")
4) DMA-safe and process-shared memory

For #1, SPDK is using POSIX to do that allocation and free. That one is a very
separate discussion than the other 3, so I'll leave that alone.

For numbers 2-4, SPDK could either have 3 separate functions or one function
with flags to do the allocations. I don't think the allocation side choice makes
much of a difference. On the free side, however, we need to decide if we're
going to do one generic spdk_free(void *buf), an spdk_free with flags, or 3
separate free functions corresponding to 3 separate allocation functions. We're
currently moving toward spdk_malloc with flags and spdk_free without, but that
is up for debate.

There are at least 3 known implementations of the SPDK env layer today, so I'll
attempt to outline what each of those needs so we have some concrete examples.

The primary implementation of the env layer, which ships with SPDK, is the one
based on DPDK. DPDK internally always allocates all memory as both DMA-safe and
shared, so the distinction there isn't necessary. Essentially, allocation types
2-4 are all degenerate and map to a single DPDK call. On the free side, DPDK
does not need to know what type of allocation it was (again, because they're all
degenerate).

One known alternative implementation (Oracle) has three separate operations
corresponding to memory allocation types 2, 3, and 4. However, my understanding
is that they also have internal tracking structures such that they can figure
out which type of allocation an address came from originally, so they are able
to implement spdk_free(void *buf) by doing internal look-ups and don't need the
user to pass them flags.

Another known alternative implementation (NetApp) has two memory allocation
operations internally. One corresponds to both types 1 and 2, and the other
corresponds to types 3 and 4. Further, on the free side there are different free
functions for types 1 and 2 compared to types 3 and 4. The current abstraction
in SPDK has a single call for types 2-4 with no flags, so it's valid criticism
that this API is difficult for NetApp to implement. We need to figure out a way
to address this.

The easiest solution today, in my opinion, is for the NetApp implementation to
always allocate DMA-safe memory in response to calls to spdk_malloc, even if
only the shared flag was specified (case #2). Note that SPDK doesn't call
spdk_malloc for case #1 already, so that's a separate code path. Then always do
the DMA-safe memory free in response to spdk_free. That will definitely make it
work correctly today, and then the NetApp implementation doesn't need the flags
on free either.

I'd love to have as many people weigh in here as possible. The more diverse
opinions we get, the better the ultimate design will be. We absolutely have to
make sure this memory abstraction is solid and meets everyone's needs, while
still being reasonably simple to use.

Thanks,
Ben

On Fri, 2018-05-04 at 17:17 -0700, Harris, James R wrote:
> You’re right Ed.  We’ll need to hold off on adding usages of this new API
> until we get this sorted out.  
> 
> What if we just add an spdk_dma_shared_malloc() and got rid of the flags
> completely?  And then just keep all of the existing malloc/free calls where
> DMA/shared memory is not required.  This should alleviate the impact to your
> code base and I *think* would work for Oracle – but would need Zahra to
> confirm.  This adds an explicit assumption of course that we won’t have a need
> for some other kind of malloc ‘flag’ in the future.
> 
> Sorry John – I know you wanted to end this thread but I’ve decided to continue
> it.  (
> 
> -Jim
> 
> 
> 
> On 5/4/18, 3:37 PM, "Rodriguez, Edwin" <Ed.Rodriguez(a)netapp.com> wrote:
> 
>     The problem I have with the new code is that it makes distinguishing
> between dma and non-dma free difficult.  In the kernel, dma memory comes from
> a different pool than regular allocations, i.e. contigmalloc/contigfree in
> FreeBsd.  How does spdk_free distinguish between a block allocated with
> SPDK_MALLOC_DMA and a regular block?
>     
>     Ed R
>     
>     On 5/4/18, 6:30 PM, "SPDK on behalf of Meneghini, John" <spdk-bounces(a)list
> s.01.org on behalf of John.Meneghini(a)netapp.com> wrote:
>     
>         Thanks, Ben, et al, for your reply.
>         
>         > you may want to avoid calling them at run-time because they can
> fail, but SPDK generally already does
>         > that for performance reasons anyway. Can you outline what you mean?
>         
>         What is run-time?   We are using the SPDK libraries inside of a kernel
> kmod.  So, all time is run-time.
>         
>         I think we can end this thread.  I tried to win this argument once
> before. I guess calloc() and malloc() are here to stay... so
>         it doesn't matter what we do with spdk_malloc().
>         
>         :-)
>         
>         If anyone wants to ask me about my views on this topic, I'll be at the
> SPDK Developer's Summit in two weeks.
>         
>         See you then.
>         
>         /John 
>         
>         On 5/3/18, 7:24 PM, "Walker, Benjamin" <benjamin.walker(a)intel.com>
> wrote:
>         
>             On Thu, 2018-05-03 at 20:14 +0000, Meneghini, John wrote:
>             > The issue is: spdk_malloc and spdk_calloc are logical APIs to
> abstract the
>             > POSIX malloc and calloc calls.  We chose to NOT use those API
> names originally
>             > and went with the spdk_dma_{} api names. My concern is that
> taking those API
>             > names now will eliminate the possibility of any other use in the
> future.
>             > 
>             > I am not asking to "fully abstract POSIX".  I am simply saying
> that the
>             > malloc(), calloc() and free()  POSIX APIs suffer from some
> problems.  This is
>             > why DPDK implemented rte_malloc() and rte_calloc().   As we move
> forward to
>             > build production quality products out of SPDK we will want to
> abstract malloc,
>             > calloc and free; especially in the libraries.  I don't care
> about what the
>             > applications do.  I want a better memory management APIs in the
> libraries.
>             
>             DPDK implemented rte_malloc and rte_calloc to allocate DMA-safe
> and shared
>             memory. They still call POSIX malloc and calloc for all other
> types of
>             allocations. This is the same thing that SPDK has done.
>             
>             I've never heard of malloc, calloc, and free having problems to
> the extent that
>             you'd want to reimplement them in your application some other way.
> Certainly,
>             you may want to avoid calling them at run-time because they can
> fail, but SPDK
>             generally already does that for performance reasons anyway. Can
> you outline what
>             you mean?
>             
>             > 
>             > I'd be happy to simply map spdk_malloc/calloc() to
> rte_malloc/calloc() and I
>             > don't see the point introducing an spdk_malloc() that only
> extends
>             > spdk_dma_malloc().  I'd rather absorb an API change to
> spdk_dma_malloc().  We
>             > change many APIs in SPDK from release to release. I don't see
> why this API
>             > can't be changed - with agreement from the SPDK community.
>             > 
>             > /*
>             >  * Allocate memory on default heap.
>             >  */
>             > void *
>             > rte_malloc(const char *type, size_t size, unsigned align)
>             > {
>             >         return rte_malloc_socket(type, size, align,
> SOCKET_ID_ANY);
>             > }
>             > 
>             > /*     
>             >  * Allocate zero'd memory on default heap.
>             >  */
>             > void *
>             > rte_calloc(const char *type, size_t num, size_t size, unsigned
> align)
>             > {
>             >         return rte_zmalloc(type, num * size, align);
>             > }
>             > 
>             > /John
>             > 
>             > On 5/3/18, 3:21 PM, "Walker, Benjamin" <benjamin.walker(a)intel.co
> m> wrote:
>             > 
>             >     On Thu, 2018-05-03 at 18:23 +0000, Meneghini, John wrote:
>             >     > >  If the user passes flags == 0 to the new spdk_malloc()
> call, this
>             > could be
>             >     > implemented by malloc() or equivalent behind the scenes,
>             >     >  
>             >     > So, does this mean you’re willing to change all calls to
> malloc(size)
>             > with
>             >     > spdk_malloc(size, 0, NULL, SPDK_ENV_SOCKET_ID_ANY,
> SPDK_MALLOC_SHARE)?
>             >     >  
>             >     > Is that the plan?
>             >     > 
>             >     
>             >     Hi John,
>             >         
>             >     SPDK explicitly depends on POSIX throughout the code base.
> We do have an
>             >     environment abstraction layer designed to make porting SPDK
> to various
>             >     environments (i.e. away from DPDK) easier, but this only
> abstracts
>             > operations
>             >     that are not available through standard POSIX calls. We're
> concerned that
>             > fully
>             >     abstracting POSIX would introduce a significant barrier to
> entry for new
>             >     contributors to the project, while only enabling one
> additional use case
>             > that
>             >     we're aware of. I understand that this one use case happens
> to be yours,
>             > and so
>             >     this is a challenge for you.
>             >         
>             >     In your code today, I assume you carry patches to replace
> the POSIX calls
>             > with
>             >     your available alternatives. We've attempted to make
> carrying these
>             > patches
>             >     reasonably easy by moving all of the POSIX includes into a
> single header
>             >     (include/stdinc.h). Since you're already carrying those
> patches, can you
>             > simply
>             >     choose a different name for your replacement for
> malloc/calloc?
>             >     
>             >     Thanks,
>             >     Ben
>             >     
>             > 
>             
>         
>         _______________________________________________
>         SPDK mailing list
>         SPDK(a)lists.01.org
>         https://lists.01.org/mailman/listinfo/spdk
>         
>     
>     
>