All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: David Turner <dturner@twopensource.com>
Cc: Remi Galan Alfonso <remi.galan-alfonso@ensimag.grenoble-inp.fr>,
	William Duclot <william.duclot@ensimag.grenoble-inp.fr>,
	git@vger.kernel.org,
	simon rabourg <simon.rabourg@ensimag.grenoble-inp.fr>,
	francois beutin <francois.beutin@ensimag.grenoble-inp.fr>,
	antoine queru <antoine.queru@ensimag.grenoble-inp.fr>,
	matthieu moy <matthieu.moy@grenoble-inp.fr>,
	mhagger@alum.mit.edu
Subject: Re: [PATCH 0/2] strbuf: improve API
Date: Wed, 1 Jun 2016 16:09:59 -0400	[thread overview]
Message-ID: <20160601200959.GA13061@sigill.intra.peff.net> (raw)
In-Reply-To: <1464810629.3988.11.camel@twopensource.com>

On Wed, Jun 01, 2016 at 03:50:29PM -0400, David Turner wrote:

> On Wed, 2016-06-01 at 03:42 -0400, Jeff King wrote:
> >   2. Do caching tricks for strbufs used in tight loops. For example,
> >      have strbuf_release() throw its buffer into a last-used cache,
> > and
> >      let the next strbuf_grow() use that cache entry. This cuts
> > malloc()
> >      out of the loop.
> > 
> >      You'd probably want to protect the cache with a mutex, though.
> 
> 
> ... or make the last-used cache be thread-local.

Good idea.

I almost didn't mention threading at all, because I'd be surprised if
malloc lock contention is a serious bottleneck for us anywhere.

It's hard to really say much concrete because it's not clear to me where
people think strbuf allocation _is_ a bottleneck (or again, _would be_
if we were to use it more).

If we imagine a loop like this:

  for (i = 0; i < n; i++) {
	struct strbuf buf = STRBUF_INIT;
	do_something_with(&buf);
	strbuf_release(&buf);
  }

then yes, we'll call malloc() and free() a lot of times. But unless your
libc malloc is terrible, those calls are all probably just taking a
mutex and reusing the same block from a free list. The strbuf case is
actually really friendly to allocators because our blocks tend to be
predictable sizes (they're sized by ALLOC_GROW, which has a
deterministic set of block sizes).

In practice, I'd imagine loops get more complicated than that. They call
sub-functions which allocate a strbuf, and sometimes don't release it
until other strbufs have been allocated, etc. The proposal in this
thread is basically using the call stack to mirror the allocation
patterns. So when the pattern of allocations matches an actual stack,
that's good, but I'd expect a reasonably smart allocator to perform
similarly. And when it's not a true stack, the stack-based strbufs are
going to end up with wasted stack space hanging around even after we've
free()'d the memory and could reuse it. And I'd still expect an
allocator based off a linked list or something to handle such cases
pretty well.

There are also other non-big-O factors at play in modern systems, like
CPU cache footprints. Is heap memory cheaper or more expensive to access
than stack? I can imagine stack is kept in the cache better, but if
you're reusing the same heap block over and over, it probably stays in
cache, too. But if you have unused stack buffers that _could_ be reused
(but won't be because you'd have to manually feed them to a new strbuf),
that hurts your footprint. And on top of that, the stack proposals I've
seen generally involve over-sizing the stack buffers out of a fear of
actually calling malloc.

And then on top of that, there is the question of whether anything
involving a strbuf allocation is actually a tight enough loop to even
_care_ about cache dynamics.

So again, I'm slightly negative on anything that makes the code even
slightly more complex, especially the call sites, if we can't show
actual measurable improvement numbers.

Sorry, this turned into a mini-rant, and it's not really directed at
you, David. Your email was just what spurred me to put some of these
thoughts into words. :)

-Peff

  reply	other threads:[~2016-06-01 20:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-30 10:36 [PATCH 0/2] strbuf: improve API William Duclot
2016-05-30 10:36 ` [PATCH 1/2] strbuf: add tests William Duclot
2016-05-30 11:26   ` Johannes Schindelin
2016-05-30 13:42     ` Simon Rabourg
2016-05-30 11:56   ` Matthieu Moy
2016-05-31  2:04   ` Michael Haggerty
2016-05-31  9:48     ` Simon Rabourg
2016-05-30 10:36 ` [PATCH 2/2] strbuf: allow to use preallocated memory William Duclot
2016-05-30 12:13   ` Johannes Schindelin
2016-05-30 13:20     ` William Duclot
2016-05-31  6:21       ` Johannes Schindelin
2016-05-31  3:05     ` Michael Haggerty
2016-05-31  6:41       ` Johannes Schindelin
2016-05-31  8:25         ` Michael Haggerty
2016-05-30 12:52   ` Matthieu Moy
2016-05-30 14:15     ` William Duclot
2016-05-30 14:34       ` Matthieu Moy
2016-05-30 15:16         ` William Duclot
2016-05-31  4:05     ` Michael Haggerty
2016-05-31 15:59       ` William Duclot
2016-06-03 14:04       ` William Duclot
2016-05-30 21:56   ` Mike Hommey
2016-05-30 22:46     ` William Duclot
2016-05-30 22:50       ` Mike Hommey
2016-05-31  6:34   ` Junio C Hamano
2016-05-31 15:45     ` William
2016-05-31 15:54       ` Matthieu Moy
2016-05-31 16:08         ` William Duclot
2016-05-30 11:32 ` [PATCH 0/2] strbuf: improve API Remi Galan Alfonso
2016-06-01  7:42   ` Jeff King
2016-06-01 19:50     ` David Turner
2016-06-01 20:09       ` Jeff King [this message]
2016-06-01 20:22         ` David Turner
2016-06-01 21:07     ` Jeff King
2016-06-02 11:11       ` Michael Haggerty
2016-06-02 12:58         ` Matthieu Moy
2016-06-02 14:22           ` William Duclot
2016-06-24 17:20         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160601200959.GA13061@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=antoine.queru@ensimag.grenoble-inp.fr \
    --cc=dturner@twopensource.com \
    --cc=francois.beutin@ensimag.grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=mhagger@alum.mit.edu \
    --cc=remi.galan-alfonso@ensimag.grenoble-inp.fr \
    --cc=simon.rabourg@ensimag.grenoble-inp.fr \
    --cc=william.duclot@ensimag.grenoble-inp.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.