All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [RFC PATCH 2/5] Add delta-islands.{c,h}
Date: Tue, 24 Jul 2018 09:47:29 -0700	[thread overview]
Message-ID: <xmqqpnzczila.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180722054836.28935-3-chriscool@tuxfamily.org> (Christian Couder's message of "Sun, 22 Jul 2018 07:48:33 +0200")

Christian Couder <christian.couder@gmail.com> writes:

> From: Jeff King <peff@peff.net>
>
> Hosting providers that allow users to "fork" existing
> repos want as much as possible those forks to use a
> small amount of disk space.

"... want those forks to consume as little amount of disk space as
possible?"  Or perhaps "... want those forks to share as much disk
space as possible"?

> Using alternates to keep all the objects from all the forks into a
> unique central repo is way to do that, but ...

s/is way to/is a way to/, I guess, but that makes it sound as if the
series invented a way to share objects without using alternates.

> it can have some
> drawbacks. Especially when packing the central repo, deltas will
> be created between objects from different forks.
>
> This can make cloning or fetching a fork much slower and much more
> CPU intensive as Git might have to compute new deltas for many
> objects to avoid sending objects from a different fork.
>
> Delta islands is a way to store objects from different forks in
> the same repo and packfile without having deltas between objects
> from different forks.

I think another paragraph between the above two is needed to briefly
outline the central idea (which would make it clear why that is
called "delta islands") is called for.  Perhaps

	Because the inefficiency primarily arises when an object is
	delitified against another object that does not exist in the
	same fork, we partion objects into sets that appear in the
	same fork, and define "delta islands".  When finding delta
	base, we do not allow an object outside the same island to
	be considered as its base.

or something along that line.  Perhaps that would even make the last
paragraph unnecessary.

> +struct island_bitmap {
> +	uint32_t refcount;
> +	uint32_t bits[];
> +};
> +
> +static uint32_t island_bitmap_size;
> +
> +/*
> + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
> + * of "old". Otherwise, the new bitmap is empty.
> + */
> +static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
> +{
> +	size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
> +	struct island_bitmap *b = xcalloc(1, size);

Is one of the variants of flex array macros applicable here?

> +	if (old)
> +		memcpy(b, old, size);
> +
> +	b->refcount = 1;
> +	return b;
> +}
> +
> +static void island_bitmap_or(struct island_bitmap *a, const struct island_bitmap *b)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < island_bitmap_size; ++i)
> +		a->bits[i] |= b->bits[i];
> +}
> +
> +static int island_bitmap_is_subset(struct island_bitmap *self,
> +		struct island_bitmap *super)
> +{
> +	uint32_t i;
> +
> +	if (self == super)
> +		return 1;
> +
> +	for (i = 0; i < island_bitmap_size; ++i) {
> +		if ((self->bits[i] & super->bits[i]) != self->bits[i])
> +			return 0;
> +	}
> +
> +	return 1;
> +}
> +#define ISLAND_BITMAP_BLOCK(x) (x / 32)
> +#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
> +
> +static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
> +{
> +	self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
> +}
> +
> +static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
> +{
> +	return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 0;
> +}
> +

Not necessarily a complaint, but do we need another implementation
of bitmap here, or the compressed bitmap used for the pack bitmap is
unsuited for the purpose of this thing (e.g. perhaps it is overkill,
as we won't be shooting for saving disk footprint of a bitmap that
we are not going to save on disk anyway)?

> +static int cmp_tree_depth(const void *va, const void *vb)
> +{
> +	struct object_entry *a = *(struct object_entry **)va;
> +	struct object_entry *b = *(struct object_entry **)vb;
> +	return a->tree_depth - b->tree_depth;
> +}
> +
> +void resolve_tree_islands(int progress, struct packing_data *to_pack)
> +{
> +	struct progress *progress_state = NULL;
> +	struct object_entry **todo;
> +	int nr = 0;
> +	int i;
> +
> +	if (!island_marks)
> +		return;
> +
> +	/*
> +	 * We process only trees, as commits and tags have already been handled
> +	 * (and passed their marks on to root trees, as well. We must make sure
> +	 * to process them in descending tree-depth order so that marks
> +	 * propagate down the tree properly, even if a sub-tree is found in
> +	 * multiple parent trees.
> +	 */
> +	todo = xmalloc(to_pack->nr_objects * sizeof(*todo));
> +	for (i = 0; i < to_pack->nr_objects; i++) {
> +		if (oe_type(&to_pack->objects[i]) == OBJ_TREE)
> +			todo[nr++] = &to_pack->objects[i];
> +	}
> +	qsort(todo, nr, sizeof(*todo), cmp_tree_depth);

Hmph, at this stage nobody actually seems to set tree_depth; I am
wondering how a tree that is at the root in one commit but appears
as a subdirectory in another commit is handled by this code.

> +static regex_t *island_regexes;
> +static unsigned int island_regexes_alloc, island_regexes_nr;
> +static const char *core_island_name;

Are these (and the bitmap & hashtable) something that "everything in
the_repository" folks would come in and nuke as global variables?  I
haven't thought deeply about it, but these smell like that there
would be one set of such in-core variables per one in-core object
store plus refs (i.e. repository instance).


  parent reply	other threads:[~2018-07-24 16:47 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-22  5:48 [RFC PATCH 0/5] Add delta islands support Christian Couder
2018-07-22  5:48 ` [RFC PATCH 1/5] packfile: make get_delta_base() non static Christian Couder
2018-07-24 16:19   ` Junio C Hamano
2018-07-27 11:29     ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 2/5] Add delta-islands.{c,h} Christian Couder
2018-07-22  8:50   ` Duy Nguyen
2018-07-22 13:57     ` Christian Couder
2018-08-05 18:53     ` Christian Couder
2018-08-06 14:17       ` Jeff King
2018-08-06 15:53       ` Duy Nguyen
2018-08-06 18:54         ` Christian Couder
2018-08-06 19:21           ` Duy Nguyen
2018-07-24 16:47   ` Junio C Hamano [this message]
2018-07-27 13:02     ` Jeff King
2018-07-27  9:40   ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 3/5] pack-objects: add delta-islands support Christian Couder
2018-07-22  8:55   ` Duy Nguyen
2018-08-05 17:28     ` Christian Couder
2018-07-23 18:52   ` Stefan Beller
2018-07-24  9:58     ` Jeff King
2018-07-24 17:20       ` Stefan Beller
2018-07-27 13:13         ` Jeff King
2018-07-27 17:22           ` Stefan Beller
2018-07-28  9:00             ` Jeff King
2018-07-28 12:12               ` Christian Couder
2018-07-24 17:03   ` Junio C Hamano
2018-07-24 17:11     ` Junio C Hamano
2018-08-05 17:40       ` Christian Couder
2018-08-06  8:44         ` Christian Couder
2018-08-06 13:58           ` Jeff King
2018-07-22  5:48 ` [RFC PATCH 4/5] repack: " Christian Couder
2018-07-22  5:48 ` [RFC PATCH 5/5] t: add t9930-delta-islands.sh Christian Couder
2018-07-24 10:24   ` Jeff King
2018-07-24 10:16 ` [RFC PATCH 0/5] Add delta islands support Jeff King
2018-07-24 17:18   ` Junio C Hamano
2018-07-24 21:14     ` Jeff King
2018-07-26 13:34 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpnzczila.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.