All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@jeffhostetler.com
Cc: git@vger.kernel.org, gitster@pobox.com,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v4] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Fri, 14 Apr 2017 15:52:19 -0400	[thread overview]
Message-ID: <20170414195219.qmc4w46t7t6brlp4@sigill.intra.peff.net> (raw)
In-Reply-To: <20170414192554.26683-2-git@jeffhostetler.com>

On Fri, Apr 14, 2017 at 07:25:54PM +0000, git@jeffhostetler.com wrote:

>  	for (i = 0; i < n; i++, dirmask >>= 1) {
> -		const unsigned char *sha1 = NULL;
> -		if (dirmask & 1)
> -			sha1 = names[i].oid->hash;
> -		buf[i] = fill_tree_descriptor(t+i, sha1);
> +		if (i > 0 && are_same_oid(&names[i], &names[i - 1]))
> +			t[i] = t[i - 1];
> +		else if (i > 1 && are_same_oid(&names[i], &names[i - 2]))
> +			t[i] = t[i - 2];
> +		else {
> +			const unsigned char *sha1 = NULL;
> +			if (dirmask & 1)
> +				sha1 = names[i].oid->hash;
> +			buf[nr_buf++] = fill_tree_descriptor(t+i, sha1);
> +		}

This looks fine to me.

Just musing (and I do not think we need to go further than your patch),
we're slowly walking towards an actual object-content cache. The "buf"
array is now essentially a cache of all oids we've loaded, but it
doesn't know its sha1s. So we could actually encapsulate all of the
caching:

  struct object_cache {
	  int nr_entries;
	  struct object_cache_entry {
		  struct object_id oid;
		  void *data;
	  } cache[MAX_UNPACK_TREES];
  };

and then ask it "have you seen oid X" rather than playing games with
looking at "i - 1". Of course it would have to do a linear search, so
the next step is to replace its array with a hashmap.

And now suddenly we have a reusable object-content cache that you could
use like:

  struct object_cache = {0};
  for (...) {
    /* maybe reads fresh, or maybe gets it from the cache */
    void *data = read_object_data_cached(&oid, &cache);
  }
  /* operation done, release the cache */
  clear_object_cache(&cache);

which would work anywhere you expect to load N objects and see some
overlap.

Of course it would be nicer still if this all just happened
automatically behind the scenes of read_object_data(). But it would have
to keep an _extra_ copy of each object, since the caller expects to be
able to free it. We'd probably have to return instead a struct with
buffer/size in it along with a reference counter.

I don't think any of that is worth it unless there are spots where we
really expect there to be a lot of cases where we hit the same objects
in rapid succession. I don't think there should be, though. Our usual
"caching" mechanism is to create a "struct object", which is enough to
perform most operations (and has a much smaller memory footprint).

So again, just musing. I think your patch is fine as-is.

-Peff

  reply	other threads:[~2017-04-14 19:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14 19:25 [PATCH v4] unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-14 19:25 ` git
2017-04-14 19:52   ` Jeff King [this message]
2017-04-14 21:06     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170414195219.qmc4w46t7t6brlp4@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.