All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rodriguez, Edwin <Ed.Rodriguez at netapp.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] spdk_malloc vs. malloc
Date: Tue, 08 May 2018 17:57:44 +0000	[thread overview]
Message-ID: <8B0ABB2F-D73F-4309-811B-E84F3A51B989@netapp.com> (raw)
In-Reply-To: 1525802004.2784.33.camel@intel.com

[-- Attachment #1: Type: text/plain, Size: 14776 bytes --]

The problem currently with the dpdk based implementation of spdk_malloc is that it ignores the flag altogether so somebody could pass in 0 without it ever being noticed.

From: "Walker, Benjamin" <benjamin.walker(a)intel.com>
Date: Tuesday, May 8, 2018 at 1:54 PM
To: Ed Rodriguez <Ed.Rodriguez(a)netapp.com>, "spdk(a)lists.01.org" <spdk(a)lists.01.org>, "z.khatami88(a)gmail.com" <z.khatami88(a)gmail.com>, "Harris, James R" <james.r.harris(a)intel.com>
Subject: Re: [SPDK] spdk_malloc vs. malloc

On Tue, 2018-05-08 at 17:35 +0000, Rodriguez, Edwin wrote:
Something like this would work for NetApp:

enum spdk_malloc_flags {
     SPDK_MALLOC_DMA,
     SPDK_MALLOC_SHARE,
     SPDK_MALLOC_DMA_SHARE
};
void *spdk_malloc(size_t size, size_t align, uint64_t *phys_addr, int
socket_id, enum spdk_malloc_flags flags);

This is very close to the code that is already merged. The code today uses
#defines instead of an enum, and DMA is 1, SHARE is 2, and DMA+SHARE is 3, but
it's otherwise the same thing. Although in the current code it is technically
possible to pass no flags while in your definition it is not possible, it is
already the case that passing no flags is not valid and the call should fail.

The flags are definitely intended to be prohibited from being 0, which I think
is the key requirement for you guys. We can make that clearer in the
documentation for sure.


Ed R


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Ed Rodriguez <Ed.Rodriguez
@netapp.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Monday, May 7, 2018 at 1:39 PM
To: "Walker, Benjamin" <benjamin.walker(a)intel.com<mailto:benjamin.walker(a)intel.com>>, "Harris, James R" <james.r
.harris(a)intel.com<mailto:.harris(a)intel.com>>, "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>, "z.khatami88(a)gmai
l.com" <z.khatami88(a)gmail.com<mailto:z.khatami88(a)gmail.com>>
Subject: Re: [SPDK] spdk_malloc vs. malloc

spdk_malloc could work for cases 2-4 if the flags param were prohibited from
being 0 - in our implementation we don't do anything special for shared memory
so spdk_free can continue to call the equivalent of spdk_dma_free.

Ed R

On 5/7/18, 1:19 PM, "Walker, Benjamin" <benjamin.walker(a)intel.com<mailto:benjamin.walker(a)intel.com>> wrote:

     SPDK needs four different types of memory allocation operations today:

     1) Standard virtual memory
     2) Process-shared memory
     3) DMA-safe memory ("pinned")
     4) DMA-safe and process-shared memory

     For #1, SPDK is using POSIX to do that allocation and free. That one is a
very
     separate discussion than the other 3, so I'll leave that alone.

     For numbers 2-4, SPDK could either have 3 separate functions or one
function
     with flags to do the allocations. I don't think the allocation side choice
makes
     much of a difference. On the free side, however, we need to decide if
we're
     going to do one generic spdk_free(void *buf), an spdk_free with flags, or
3
     separate free functions corresponding to 3 separate allocation functions.
We're
     currently moving toward spdk_malloc with flags and spdk_free without, but
that
     is up for debate.

     There are at least 3 known implementations of the SPDK env layer today, so
I'll
     attempt to outline what each of those needs so we have some concrete
examples.

     The primary implementation of the env layer, which ships with SPDK, is the
one
     based on DPDK. DPDK internally always allocates all memory as both DMA-
safe and
     shared, so the distinction there isn't necessary. Essentially, allocation
types
     2-4 are all degenerate and map to a single DPDK call. On the free side,
DPDK
     does not need to know what type of allocation it was (again, because
they're all
     degenerate).

     One known alternative implementation (Oracle) has three separate
operations
     corresponding to memory allocation types 2, 3, and 4. However, my
understanding
     is that they also have internal tracking structures such that they can
figure
     out which type of allocation an address came from originally, so they are
able
     to implement spdk_free(void *buf) by doing internal look-ups and don't
need the
     user to pass them flags.

     Another known alternative implementation (NetApp) has two memory
allocation
     operations internally. One corresponds to both types 1 and 2, and the
other
     corresponds to types 3 and 4. Further, on the free side there are
different free
     functions for types 1 and 2 compared to types 3 and 4. The current
abstraction
     in SPDK has a single call for types 2-4 with no flags, so it's valid
criticism
     that this API is difficult for NetApp to implement. We need to figure out
a way
     to address this.

     The easiest solution today, in my opinion, is for the NetApp
implementation to
     always allocate DMA-safe memory in response to calls to spdk_malloc, even
if
     only the shared flag was specified (case #2). Note that SPDK doesn't call
     spdk_malloc for case #1 already, so that's a separate code path. Then
always do
     the DMA-safe memory free in response to spdk_free. That will definitely
make it
     work correctly today, and then the NetApp implementation doesn't need the
flags
     on free either.

     I'd love to have as many people weigh in here as possible. The more
diverse
     opinions we get, the better the ultimate design will be. We absolutely
have to
     make sure this memory abstraction is solid and meets everyone's needs,
while
     still being reasonably simple to use.

     Thanks,
     Ben


     On Fri, 2018-05-04 at 17:17 -0700, Harris, James R wrote:
     > You’re right Ed.  We’ll need to hold off on adding usages of this new
API
     > until we get this sorted out.
     >
     > What if we just add an spdk_dma_shared_malloc() and got rid of the flags
     > completely?  And then just keep all of the existing malloc/free calls
where
     > DMA/shared memory is not required.  This should alleviate the impact to
your
     > code base and I *think* would work for Oracle – but would need Zahra to
     > confirm.  This adds an explicit assumption of course that we won’t have
a need
     > for some other kind of malloc ‘flag’ in the future.
     >
     > Sorry John – I know you wanted to end this thread but I’ve decided to
continue
     > it.  (
     >
     > -Jim
     >
     >
     >
     > On 5/4/18, 3:37 PM, "Rodriguez, Edwin" <Ed.Rodriguez(a)netapp.com<mailto:Ed.Rodriguez(a)netapp.com>> wrote:
     >
     >     The problem I have with the new code is that it makes distinguishing
     > between dma and non-dma free difficult.  In the kernel, dma memory comes
from
     > a different pool than regular allocations, i.e. contigmalloc/contigfree
in
     > FreeBsd.  How does spdk_free distinguish between a block allocated with
     > SPDK_MALLOC_DMA and a regular block?
     >
     >     Ed R
     >
     >     On 5/4/18, 6:30 PM, "SPDK on behalf of Meneghini, John" <spdk-bounce
s(a)list
     > s.01.org on behalf of John.Meneghini(a)netapp.com<mailto:John.Meneghini(a)netapp.com>> wrote:
     >
     >         Thanks, Ben, et al, for your reply.
     >
     >         > you may want to avoid calling them at run-time because they
can
     > fail, but SPDK generally already does
     >         > that for performance reasons anyway. Can you outline what you
mean?
     >
     >         What is run-time?   We are using the SPDK libraries inside of a
kernel
     > kmod.  So, all time is run-time.
     >
     >         I think we can end this thread.  I tried to win this argument
once
     > before. I guess calloc() and malloc() are here to stay... so
     >         it doesn't matter what we do with spdk_malloc().
     >
     >         :-)
     >
     >         If anyone wants to ask me about my views on this topic, I'll be
at the
     > SPDK Developer's Summit in two weeks.
     >
     >         See you then.
     >
     >         /John
     >
     >         On 5/3/18, 7:24 PM, "Walker, Benjamin" <benjamin.walker(a)intel.co<mailto:benjamin.walker(a)intel.co>
m>
     > wrote:
     >
     >             On Thu, 2018-05-03 at 20:14 +0000, Meneghini, John wrote:
     >             > The issue is: spdk_malloc and spdk_calloc are logical APIs
to
     > abstract the
     >             > POSIX malloc and calloc calls.  We chose to NOT use those
API
     > names originally
     >             > and went with the spdk_dma_{} api names. My concern is
that
     > taking those API
     >             > names now will eliminate the possibility of any other use
in the
     > future.
     >             >
     >             > I am not asking to "fully abstract POSIX".  I am simply
saying
     > that the
     >             > malloc(), calloc() and free()  POSIX APIs suffer from some
     > problems.  This is
     >             > why DPDK implemented rte_malloc() and rte_calloc().   As
we move
     > forward to
     >             > build production quality products out of SPDK we will want
to
     > abstract malloc,
     >             > calloc and free; especially in the libraries.  I don't
care
     > about what the
     >             > applications do.  I want a better memory management APIs
in the
     > libraries.
     >
     >             DPDK implemented rte_malloc and rte_calloc to allocate DMA-
safe
     > and shared
     >             memory. They still call POSIX malloc and calloc for all
other
     > types of
     >             allocations. This is the same thing that SPDK has done.
     >
     >             I've never heard of malloc, calloc, and free having problems
to
     > the extent that
     >             you'd want to reimplement them in your application some
other way.
     > Certainly,
     >             you may want to avoid calling them at run-time because they
can
     > fail, but SPDK
     >             generally already does that for performance reasons anyway.
Can
     > you outline what
     >             you mean?
     >
     >             >
     >             > I'd be happy to simply map spdk_malloc/calloc() to
     > rte_malloc/calloc() and I
     >             > don't see the point introducing an spdk_malloc() that only
     > extends
     >             > spdk_dma_malloc().  I'd rather absorb an API change to
     > spdk_dma_malloc().  We
     >             > change many APIs in SPDK from release to release. I don't
see
     > why this API
     >             > can't be changed - with agreement from the SPDK community.
     >             >
     >             > /*
     >             >  * Allocate memory on default heap.
     >             >  */
     >             > void *
     >             > rte_malloc(const char *type, size_t size, unsigned align)
     >             > {
     >             >         return rte_malloc_socket(type, size, align,
     > SOCKET_ID_ANY);
     >             > }
     >             >
     >             > /*
     >             >  * Allocate zero'd memory on default heap.
     >             >  */
     >             > void *
     >             > rte_calloc(const char *type, size_t num, size_t size,
unsigned
     > align)
     >             > {
     >             >         return rte_zmalloc(type, num * size, align);
     >             > }
     >             >
     >             > /John
     >             >
     >             > On 5/3/18, 3:21 PM, "Walker, Benjamin" <benjamin.walker(a)in
tel.co
     > m> wrote:
     >             >
     >             >     On Thu, 2018-05-03 at 18:23 +0000, Meneghini, John
wrote:
     >             >     > >  If the user passes flags == 0 to the new
spdk_malloc()
     > call, this
     >             > could be
     >             >     > implemented by malloc() or equivalent behind the
scenes,
     >             >     >
     >             >     > So, does this mean you’re willing to change all
calls to
     > malloc(size)
     >             > with
     >             >     > spdk_malloc(size, 0, NULL, SPDK_ENV_SOCKET_ID_ANY,
     > SPDK_MALLOC_SHARE)?
     >             >     >
     >             >     > Is that the plan?
     >             >     >
     >             >
     >             >     Hi John,
     >             >
     >             >     SPDK explicitly depends on POSIX throughout the code
base.
     > We do have an
     >             >     environment abstraction layer designed to make porting
SPDK
     > to various
     >             >     environments (i.e. away from DPDK) easier, but this
only
     > abstracts
     >             > operations
     >             >     that are not available through standard POSIX calls.
We're
     > concerned that
     >             > fully
     >             >     abstracting POSIX would introduce a significant
barrier to
     > entry for new
     >             >     contributors to the project, while only enabling one
     > additional use case
     >             > that
     >             >     we're aware of. I understand that this one use case
happens
     > to be yours,
     >             > and so
     >             >     this is a challenge for you.
     >             >
     >             >     In your code today, I assume you carry patches to
replace
     > the POSIX calls
     >             > with
     >             >     your available alternatives. We've attempted to make
     > carrying these
     >             > patches
     >             >     reasonably easy by moving all of the POSIX includes
into a
     > single header
     >             >     (include/stdinc.h). Since you're already carrying
those
     > patches, can you
     >             > simply
     >             >     choose a different name for your replacement for
     > malloc/calloc?
     >             >
     >             >     Thanks,
     >             >     Ben
     >             >
     >             >
     >
     >
     >         _______________________________________________
     >         SPDK mailing list
     >         SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
     >         https://lists.01.org/mailman/listinfo/spdk
     >
     >
     >
     >


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 51417 bytes --]

             reply	other threads:[~2018-05-08 17:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-08 17:57 Rodriguez, Edwin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-05-08 19:01 [SPDK] spdk_malloc vs. malloc Rodriguez, Edwin
2018-05-08 18:10 Walker, Benjamin
2018-05-08 17:53 Walker, Benjamin
2018-05-08 17:35 Rodriguez, Edwin
2018-05-07 17:38 Rodriguez, Edwin
2018-05-07 17:18 Walker, Benjamin
2018-05-05  0:17 Harris, James R
2018-05-04 22:37 Rodriguez, Edwin
2018-05-04 22:29 Meneghini, John
2018-05-03 23:24 Walker, Benjamin
2018-05-03 20:14 Meneghini, John
2018-05-03 19:21 Walker, Benjamin
2018-05-03 18:23 Meneghini, John
2018-05-03 18:09 Harris, James R
2018-05-03 18:05 Verkamp, Daniel
2018-05-03 17:53 Meneghini, John

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8B0ABB2F-D73F-4309-811B-E84F3A51B989@netapp.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.