All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH] pack-objects: fix performance issues on packing large deltas
Date: Sat, 21 Jul 2018 06:47:11 +0200	[thread overview]
Message-ID: <20180721044711.GA31376@duynguyen.home> (raw)
In-Reply-To: <CABPp-BGJAWXOCPsej=fWWDJkh-7BAV9m8yEDiy2NVkGTRCmk4A@mail.gmail.com>

On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:
> > This patch provides a better fallback that is
> >
> > - cheaper in terms of cpu and io because we won't have to read
> >   existing pack files as much
> >
> > - better in terms of pack size because the pack heuristics is back to
> >   2.17.0 time, we do not drop large deltas at all
> >
> > If we encounter any delta (on-disk or created during try_delta phase)
> > that is larger than the 2MB limit, we stop using delta_size_ field for
> > this because it can't contain such size anyway. A new array of delta
> > size is dynamically allocated and can hold all the deltas that 2.17.0
> > can [1]. All current delta sizes are migrated over.
> >
> > With this, we do not have to drop deltas in try_delta() anymore. Of
> > course the downside is we use slightly more memory, even compared to
> > 2.17.0. But since this is considered an uncommon case, a bit more
> > memory consumption should not be a problem.
> 
> Out of curiosity, would it be possible to use the delta_size_ field
> for deltas that are small enough, and only use an external data
> structure (perhaps a hash rather than an array) for the few deltas
> that are large?

We could. And because repack time is still a bit higher in your
linux.git case. Let's try this. No locking in common case and very
small locked region when we hit large deltas

-- 8< --
diff --git a/pack-objects.c b/pack-objects.c
index eef344b7c1..e3c32bbfc2 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -88,23 +88,6 @@ struct object_entry *packlist_find(struct packing_data *pdata,
 	return &pdata->objects[pdata->index[i] - 1];
 }
 
-uint32_t *new_delta_size_array(struct packing_data *pack)
-{
-	uint32_t *delta_size;
-	uint32_t i;
-
-	/*
-	 * nr_alloc, not nr_objects to align with realloc() strategy in
-	 * packlist_alloc()
-	 */
-	ALLOC_ARRAY(delta_size, pack->nr_alloc);
-
-	for (i = 0; i < pack->nr_objects; i++)
-		delta_size[i] = pack->objects[i].delta_size_;
-
-	return delta_size;
-}
-
 static void prepare_in_pack_by_idx(struct packing_data *pdata)
 {
 	struct packed_git **mapping, *p;
diff --git a/pack-objects.h b/pack-objects.h
index 9f977ae800..11890e7217 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -15,7 +15,7 @@
  * above this limit. Don't lower it too much.
  */
 #define OE_SIZE_BITS		31
-#define OE_DELTA_SIZE_BITS	24
+#define OE_DELTA_SIZE_BITS	23
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -94,6 +94,7 @@ struct object_entry {
 				     * uses the same base as me
 				     */
 	unsigned delta_size_:OE_DELTA_SIZE_BITS; /* delta data size (uncompressed) */
+	unsigned delta_size_valid:1;
 	unsigned char in_pack_header_size;
 	unsigned in_pack_idx:OE_IN_PACK_BITS;	/* already in pack */
 	unsigned z_delta_size:OE_Z_DELTA_BITS;
@@ -353,37 +354,26 @@ static inline void oe_set_size(struct packing_data *pack,
 static inline unsigned long oe_delta_size(struct packing_data *pack,
 					  const struct object_entry *e)
 {
-	unsigned long size;
-
-	packing_data_lock(pack);
-	if (pack->delta_size)
-		size = pack->delta_size[e - pack->objects];
+	if (e->delta_size_valid)
+		return e->delta_size_;
 	else
-		size = e->delta_size_;
-	packing_data_unlock(pack);
-	return size;
+		return pack->delta_size[e - pack->objects];
 }
 
-uint32_t *new_delta_size_array(struct packing_data *pdata);
 static inline void oe_set_delta_size(struct packing_data *pack,
 				     struct object_entry *e,
 				     unsigned long size)
 {
-	packing_data_lock(pack);
-	if (!pack->delta_size && size < pack->oe_delta_size_limit) {
-		packing_data_unlock(pack);
+	if (size < pack->oe_delta_size_limit) {
 		e->delta_size_ = size;
-		return;
+		e->delta_size_valid = 1;
+	} else {
+		packing_data_lock(pack);
+		if (!pack->delta_size)
+			ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
+		packing_data_unlock(pack);
+		pack->delta_size[e - pack->objects] = size;
 	}
-	/*
-	 * We have had at least one delta size exceeding OE_DELTA_SIZE_BITS
-	 * limit. delta_size_ will not be used anymore. All delta sizes are
-	 * now from the delta_size[] array.
-	 */
-	if (!pack->delta_size)
-		pack->delta_size = new_delta_size_array(pack);
-	pack->delta_size[e - pack->objects] = size;
-	packing_data_unlock(pack);
 }
 
 #endif
-- 8< --

> > --
> > 2.18.0.rc2.476.g39500d3211
> 
> Missing the 2.18.0 tag?  ;-)

Hehe I was a bit busy lately and have not rebased my branch on the
latest and greatest version.
--
Duy

  parent reply	other threads:[~2018-07-21  4:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 22:51 2.18.0 Regression: packing performance and effectiveness Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v1: revert "pack-objects: shrink delta_size field in struct object_entry" Elijah Newren
2018-07-18 22:51 ` [RFC PATCH] fix-v2: make OE_DELTA_SIZE_BITS a bit bigger Elijah Newren
2018-07-19  5:41 ` 2.18.0 Regression: packing performance and effectiveness Duy Nguyen
2018-07-19  5:49   ` Jeff King
2018-07-19 15:27   ` Elijah Newren
2018-07-19 15:43     ` Duy Nguyen
2018-07-19  5:44 ` Jeff King
2018-07-19  5:57   ` Duy Nguyen
2018-07-19 15:16     ` Duy Nguyen
2018-07-19 16:42       ` Elijah Newren
2018-07-19 17:23         ` Jeff King
2018-07-19 17:31           ` Duy Nguyen
2018-07-19 18:24             ` Duy Nguyen
2018-07-19 19:17               ` Jeff King
2018-07-19 23:11               ` Elijah Newren
2018-07-20  5:28                 ` Jeff King
2018-07-20  5:30                   ` Jeff King
2018-07-20  5:47                   ` Duy Nguyen
2018-07-20 17:21                   ` Elijah Newren
2018-07-19 17:04       ` Jeff King
2018-07-19 19:25       ` Junio C Hamano
2018-07-19 19:27         ` Junio C Hamano
2018-07-20 15:39 ` [PATCH] pack-objects: fix performance issues on packing large deltas Nguyễn Thái Ngọc Duy
2018-07-20 17:40   ` Jeff King
2018-07-21  4:23     ` Duy Nguyen
2018-07-23 21:37       ` Jeff King
2018-07-20 17:43   ` Elijah Newren
2018-07-20 23:52     ` Elijah Newren
2018-07-21  4:07       ` Duy Nguyen
2018-07-21  7:08         ` Duy Nguyen
2018-07-21  4:47     ` Duy Nguyen [this message]
2018-07-21  6:56       ` Elijah Newren
2018-07-21  7:14         ` Duy Nguyen
2018-07-22  6:22       ` Elijah Newren
2018-07-22  6:49         ` Duy Nguyen
2018-07-23 12:34     ` Elijah Newren
2018-07-23 15:50       ` Duy Nguyen
2018-07-22  8:04   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-23 18:04     ` Junio C Hamano
2018-07-23 18:38       ` Duy Nguyen
2018-07-23 18:49         ` Duy Nguyen
2018-07-23 21:30           ` Jeff King
2018-07-26  8:12     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180721044711.GA31376@duynguyen.home \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.