git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Multithread index-pack
@ 2012-04-11  5:49 Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11  5:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy

This round hopefully fixes MinGW brekages for real. I also added a
perf test but not so sure if this is a correct way to do.

Nguyễn Thái Ngọc Duy (2):
  index-pack: split second pass obj handling into own function
  index-pack: support multithreaded delta resolving

Ramsay Allan Jones (1):
  compat/win32/pthread.h: Add an pthread_key_delete() implementation

 Documentation/git-index-pack.txt |   10 ++
 Makefile                         |    2 +-
 builtin/index-pack.c             |  229 +++++++++++++++++++++++++++++++++-----
 compat/win32/pthread.h           |    5 +
 t/perf/p5302-pack-index.sh       |   40 +++++++
 5 files changed, 257 insertions(+), 29 deletions(-)
 create mode 100755 t/perf/p5302-pack-index.sh

-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation
  2012-04-11  5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
@ 2012-04-11  5:49 ` Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 2/3] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11  5:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy

From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 compat/win32/pthread.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..8ad1873 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -86,6 +86,11 @@ static inline int pthread_key_create(pthread_key_t *keyp, void (*destructor)(voi
 	return (*keyp = TlsAlloc()) == TLS_OUT_OF_INDEXES ? EAGAIN : 0;
 }
 
+static inline int pthread_key_delete(pthread_key_t key)
+{
+	return TlsFree(key) ? 0 : EINVAL;
+}
+
 static inline int pthread_setspecific(pthread_key_t key, const void *value)
 {
 	return TlsSetValue(key, (void *)value) ? 0 : EINVAL;
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/3] index-pack: split second pass obj handling into own function
  2012-04-11  5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
@ 2012-04-11  5:49 ` Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11  5:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c |   31 ++++++++++++++++++-------------
 1 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..918684f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,6 +682,23 @@ static int compare_delta_entry(const void *a, const void *b)
 				   objects[delta_b->obj_no].type);
 }
 
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ *   deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ *   recursively checking if the resulting object is used as a base
+ *   for some more deltas.
+ */
+static void second_pass(struct object_entry *obj)
+{
+	struct base_data *base_obj = alloc_base_data();
+	base_obj->obj = obj;
+	base_obj->data = NULL;
+	find_unresolved_deltas(base_obj);
+	display_progress(progress, nr_resolved_deltas);
+}
+
 /* Parse all objects and return the pack content SHA1 hash */
 static void parse_pack_objects(unsigned char *sha1)
 {
@@ -736,26 +753,14 @@ static void parse_pack_objects(unsigned char *sha1)
 	qsort(deltas, nr_deltas, sizeof(struct delta_entry),
 	      compare_delta_entry);
 
-	/*
-	 * Second pass:
-	 * - for all non-delta objects, look if it is used as a base for
-	 *   deltas;
-	 * - if used as a base, uncompress the object and apply all deltas,
-	 *   recursively checking if the resulting object is used as a base
-	 *   for some more deltas.
-	 */
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
-		struct base_data *base_obj = alloc_base_data();
 
 		if (is_delta_type(obj->type))
 			continue;
-		base_obj->obj = obj;
-		base_obj->data = NULL;
-		find_unresolved_deltas(base_obj);
-		display_progress(progress, nr_resolved_deltas);
+		second_pass(obj);
 	}
 }
 
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/3] index-pack: support multithreaded delta resolving
  2012-04-11  5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
  2012-04-11  5:49 ` [PATCH v3 2/3] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
@ 2012-04-11  5:49 ` Nguyễn Thái Ngọc Duy
  2012-05-03 22:10   ` Junio C Hamano
  2 siblings, 1 reply; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11  5:49 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy

This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.

An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:

1..4
real    0m8.003s  0m5.307s  0m4.321s  0m3.830s
user    0m7.720s  0m8.009s  0m8.133s  0m8.305s
sys     0m0.224s  0m0.372s  0m0.360s  0m0.360s

5..8
real    0m3.727s  0m3.604s  0m3.332s  0m3.369s
user    0m9.361s  0m9.817s  0m9.525s  0m9.769s
sys     0m0.584s  0m0.624s  0m0.540s  0m0.560s

9..12
real    0m3.036s  0m3.139s  0m3.177s  0m2.961s
user    0m8.977s  0m10.205s 0m9.737s  0m10.073s
sys     0m0.596s  0m0.680s  0m0.684s  0m0.680s

13..16
real    0m2.985s  0m2.894s  0m2.975s  0m2.971s
user    0m9.825s  0m10.573s 0m10.833s 0m11.361s
sys     0m0.788s  0m0.732s  0m0.904s  0m1.016s

On an Intel dual core and linux-2.6.git

1..4
real    2m37.789s 2m7.963s  2m0.920s  1m58.213s
user    2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys     0m7.808s  0m11.181s 0m11.224s 0m10.731s

Thanks Ramsay Jones for troubleshooting and support on MinGW platform.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-pack.txt |   10 ++
 Makefile                         |    2 +-
 builtin/index-pack.c             |  202 ++++++++++++++++++++++++++++++++++---
 t/perf/p5302-pack-index.sh       |   40 ++++++++
 4 files changed, 236 insertions(+), 18 deletions(-)
 create mode 100755 t/perf/p5302-pack-index.sh

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--threads=<n>::
+	Specifies the number of threads to spawn when resolving
+	deltas. This requires that index-pack be compiled with
+	pthreads otherwise this option is ignored with a warning.
+	This is meant to reduce packing time on multiprocessor
+	machines. The required amount of memory for the delta search
+	window is however multiplied by the number of threads.
+	Specifying 0 will cause git to auto-detect the number of CPU's
+	and use maximum 3 threads.
+
 
 Note
 ----
diff --git a/Makefile b/Makefile
index be1957a..4d7a3de 100644
--- a/Makefile
+++ b/Makefile
@@ -2160,7 +2160,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
 builtin/bundle.o bundle.o transport.o: bundle.h
 builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
 builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
 builtin/send-pack.o transport.o: send-pack.h
 builtin/log.o builtin/shortlog.o: shortlog.h
 builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..45bc5db 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
 #include "progress.h"
 #include "fsck.h"
 #include "exec_cmd.h"
+#include "thread-utils.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -38,6 +39,14 @@ struct base_data {
 	int ofs_first, ofs_last;
 };
 
+struct thread_local {
+#ifndef NO_PTHREADS
+	pthread_t thread;
+#endif
+	struct base_data *base_cache;
+	size_t base_cache_used;
+};
+
 /*
  * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
  * to memcmp() only the first 20 bytes.
@@ -54,11 +63,13 @@ struct delta_entry {
 
 static struct object_entry *objects;
 static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local nothread_data;
 static int nr_objects;
+static int nr_processed;
 static int nr_deltas;
 static int nr_resolved_deltas;
+static int nr_threads;
+static int threads_active;
 
 static int from_stdin;
 static int strict;
@@ -75,6 +86,75 @@ static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
 static int input_fd, output_fd, pack_fd;
 
+#ifndef NO_PTHREADS
+
+static struct thread_local *thread_data;
+
+static pthread_mutex_t read_mutex;
+#define read_lock()		lock_mutex(&read_mutex)
+#define read_unlock()		unlock_mutex(&read_mutex)
+
+static pthread_mutex_t counter_mutex;
+#define counter_lock()		lock_mutex(&counter_mutex)
+#define counter_unlock()	unlock_mutex(&counter_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock()		lock_mutex(&work_mutex)
+#define work_unlock()		unlock_mutex(&work_mutex)
+
+static pthread_key_t key;
+
+static inline void lock_mutex(pthread_mutex_t *mutex)
+{
+	if (threads_active)
+		pthread_mutex_lock(mutex);
+}
+
+static inline void unlock_mutex(pthread_mutex_t *mutex)
+{
+	if (threads_active)
+		pthread_mutex_unlock(mutex);
+}
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+	init_recursive_mutex(&read_mutex);
+	pthread_mutex_init(&counter_mutex, NULL);
+	pthread_mutex_init(&work_mutex, NULL);
+	pthread_key_create(&key, NULL);
+	thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+	threads_active = 1;
+}
+
+static void cleanup_thread(void)
+{
+	if (!threads_active)
+		return;
+	threads_active = 0;
+	pthread_mutex_destroy(&read_mutex);
+	pthread_mutex_destroy(&counter_mutex);
+	pthread_mutex_destroy(&work_mutex);
+	pthread_key_delete(key);
+	free(thread_data);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define counter_lock()
+#define counter_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
 static int mark_link(struct object *obj, int type, void *data)
 {
 	if (!obj)
@@ -223,6 +303,17 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	die("pack has bad object at offset %lu: %s", offset, buf);
 }
 
+static struct thread_local *get_thread_data()
+{
+#ifndef NO_PTHREADS
+	if (threads_active)
+		return pthread_getspecific(key);
+#endif
+	assert(!threads_active &&
+	       "This should only be reached when all threads are gone");
+	return &nothread_data;
+}
+
 static struct base_data *alloc_base_data(void)
 {
 	struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +328,16 @@ static void free_base_data(struct base_data *c)
 	if (c->data) {
 		free(c->data);
 		c->data = NULL;
-		base_cache_used -= c->size;
+		get_thread_data()->base_cache_used -= c->size;
 	}
 }
 
 static void prune_base_data(struct base_data *retain)
 {
 	struct base_data *b;
-	for (b = base_cache;
-	     base_cache_used > delta_base_cache_limit && b;
+	struct thread_local *data = get_thread_data();
+	for (b = data->base_cache;
+	     data->base_cache_used > delta_base_cache_limit && b;
 	     b = b->child) {
 		if (b->data && b != retain)
 			free_base_data(b);
@@ -257,12 +349,12 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 	if (base)
 		base->child = c;
 	else
-		base_cache = c;
+		get_thread_data()->base_cache = c;
 
 	c->base = base;
 	c->child = NULL;
 	if (c->data)
-		base_cache_used += c->size;
+		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
 }
 
@@ -272,7 +364,7 @@ static void unlink_base_data(struct base_data *c)
 	if (base)
 		base->child = NULL;
 	else
-		base_cache = NULL;
+		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
@@ -461,19 +553,24 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
+	read_lock();
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
 		has_data = read_sha1_file(sha1, &has_type, &has_size);
+		read_unlock();
 		if (!has_data)
 			die("cannot read existing object %s", sha1_to_hex(sha1));
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
 		free(has_data);
-	}
+	} else
+		read_unlock();
+
 	if (strict) {
+		read_lock();
 		if (type == OBJ_BLOB) {
 			struct blob *blob = lookup_blob(sha1);
 			if (blob)
@@ -507,6 +604,7 @@ static void sha1_object(const void *data, unsigned long size,
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
+		read_unlock();
 	}
 }
 
@@ -552,7 +650,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +666,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, "failed to apply delta");
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -596,7 +694,9 @@ static void resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
 		    delta_obj->idx.sha1);
+	counter_lock();
 	nr_resolved_deltas++;
+	counter_unlock();
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
 	base_obj->obj = obj;
 	base_obj->data = NULL;
 	find_unresolved_deltas(base_obj);
-	display_progress(progress, nr_resolved_deltas);
+}
+
+static void *threaded_second_pass(void *arg)
+{
+#ifndef NO_PTHREADS
+	if (threads_active)
+		pthread_setspecific(key, arg);
+#endif
+	for (;;) {
+		int i;
+		work_lock();
+		display_progress(progress, nr_resolved_deltas);
+		while (nr_processed < nr_objects &&
+		       is_delta_type(objects[nr_processed].type))
+			nr_processed++;
+		if (nr_processed >= nr_objects) {
+			work_unlock();
+			break;
+		}
+		i = nr_processed++;
+		work_unlock();
+
+		second_pass(&objects[i]);
+	}
+	return NULL;
 }
 
 /* Parse all objects and return the pack content SHA1 hash */
@@ -755,13 +879,25 @@ static void parse_pack_objects(unsigned char *sha1)
 
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
-	for (i = 0; i < nr_objects; i++) {
-		struct object_entry *obj = &objects[i];
 
-		if (is_delta_type(obj->type))
-			continue;
-		second_pass(obj);
+	nr_processed = 0;
+#ifndef NO_PTHREADS
+	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
+		init_thread();
+		for (i = 0; i < nr_threads; i++) {
+			int ret = pthread_create(&thread_data[i].thread, NULL,
+						 threaded_second_pass, thread_data + i);
+			if (ret)
+				die("unable to create thread: %s", strerror(ret));
+		}
+		for (i = 0; i < nr_threads; i++)
+			pthread_join(thread_data[i].thread, NULL);
+
+		cleanup_thread();
+		return;
 	}
+#endif
+	threaded_second_pass(NULL);
 }
 
 static int write_compressed(struct sha1file *f, void *in, unsigned int size)
@@ -967,6 +1103,18 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 			die("bad pack.indexversion=%"PRIu32, opts->version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.threads")) {
+		nr_threads = git_config_int(k, v);
+		if (nr_threads < 0)
+			die("invalid number of threads specified (%d)",
+			    nr_threads);
+#ifdef NO_PTHREADS
+		if (nr_threads != 1)
+			warning("no threads support, ignoring %s", k);
+		nr_threads = 1;
+#endif
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1125,6 +1273,17 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
 				keep_msg = arg + 7;
+			} else if (!prefixcmp(arg, "--threads=")) {
+				char *end;
+				nr_threads = strtoul(arg+10, &end, 0);
+				if (!arg[10] || *end || nr_threads < 0)
+					usage(index_pack_usage);
+#ifdef NO_PTHREADS
+				if (nr_threads != 1)
+					warning("no threads support, "
+						"ignoring %s", arg);
+				nr_threads = 1;
+#endif
 			} else if (!prefixcmp(arg, "--pack_header=")) {
 				struct pack_header *hdr;
 				char *c;
@@ -1196,6 +1355,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		opts.flags |= WRITE_IDX_STRICT;
 
+#ifndef NO_PTHREADS
+	if (!nr_threads) {
+		nr_threads = online_cpus();
+		/* An experiment showed that more threads does not mean faster */
+		if (nr_threads > 3)
+			nr_threads = 3;
+	}
+#endif
+
 	curr_pack = open_pack_file(pack_name);
 	parse_pack_header();
 	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
new file mode 100755
index 0000000..6cb5b0d
--- /dev/null
+++ b/t/perf/p5302-pack-index.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description="Tests index-pack performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'repack' '
+	git repack -ad &&
+	PACK=`ls .git/objects/pack/*.pack | head -n1` &&
+	test -f "$PACK" &&
+	export PACK
+'
+
+test_perf 'index-pack 0 threads' '
+	GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 1 thread ' '
+	GIT_DIR=t2 GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 2 threads' '
+	GIT_DIR=t3 git index-pack --threads=2 --stdin < $PACK
+'
+
+test_perf 'index-pack 4 threads' '
+	GIT_DIR=t4 git index-pack --threads=4 --stdin < $PACK
+'
+
+test_perf 'index-pack 8 threads' '
+	GIT_DIR=t5 git index-pack --threads=8 --stdin < $PACK
+'
+
+test_perf 'index-pack default number of threads' '
+	GIT_DIR=t6 git index-pack --stdin < $PACK
+'
+
+test_done
-- 
1.7.3.1.256.g2539c.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
  2012-04-11  5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
@ 2012-05-03 22:10   ` Junio C Hamano
  2012-05-04  6:21     ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-05-03 22:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Ramsay Jones

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
>  	base_obj->obj = obj;
>  	base_obj->data = NULL;
>  	find_unresolved_deltas(base_obj);
> -	display_progress(progress, nr_resolved_deltas);
> +}
> +
> +static void *threaded_second_pass(void *arg)
> +{
> +#ifndef NO_PTHREADS
> +	if (threads_active)
> +		pthread_setspecific(key, arg);
> +#endif
> +	for (;;) {
> +		int i;
> +		work_lock();
> +		display_progress(progress, nr_resolved_deltas);
> +		while (nr_processed < nr_objects &&
> +		       is_delta_type(objects[nr_processed].type))
> +			nr_processed++;
> +		if (nr_processed >= nr_objects) {
> +			work_unlock();
> +			break;
> +		}
> +		i = nr_processed++;
> +		work_unlock();
> +
> +		second_pass(&objects[i]);
> +	}
> +	return NULL;
>  }

It may be just the matter of naming, but the above is taking the
abstraction backwards, I think.  Shouldn't it be structured in such a way
that the caller may call second_pass() and its implementation may turn out
to be threaded (or not)?

The naming of "arg" made things worse.  I wasted 5 minutes scratching my
head thinking "arg" was a single specific object that was to be given to
second_pass(), and wondered why it is made into thread-local data.  Name
it "thread_data" or something.

And I think the root cause of this confusion is the way "second_pass" was
split out in the earlier patch.  It is not the entire second-pass, but is
merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
is the second-pass, in other words), and its implementation detail might
change to either thread (i.e. instead of a single line of control
iterating from 0 to nr_objects, each thread grab the next available task
and work on it until everything is exhausted) or not.

By the way, if one object is very heavy and takes a lot of time until
completion, could it be possible that objects[0] is still being processed
for its base data but objects[1] has already completed and an available
thread could work on objects[2]?  How does it learn to process objects[2]
in such a case, or does it wait until the thread working on objects[0] is
done?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
  2012-05-03 22:10   ` Junio C Hamano
@ 2012-05-04  6:21     ` Junio C Hamano
  2012-05-04 12:50       ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-05-04  6:21 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Ramsay Jones

Junio C Hamano <gitster@pobox.com> writes:

> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
>>  	base_obj->obj = obj;
>>  	base_obj->data = NULL;
>>  	find_unresolved_deltas(base_obj);
>> -	display_progress(progress, nr_resolved_deltas);
>> +}
>> +
>> +static void *threaded_second_pass(void *arg)
>> +{
>> +#ifndef NO_PTHREADS
>> +	if (threads_active)
>> +		pthread_setspecific(key, arg);
>> +#endif
>> +	for (;;) {
>> +		int i;
>> +		work_lock();
>> +		display_progress(progress, nr_resolved_deltas);
>> +		while (nr_processed < nr_objects &&
>> +		       is_delta_type(objects[nr_processed].type))
>> +			nr_processed++;
>> +		if (nr_processed >= nr_objects) {
>> +			work_unlock();
>> +			break;
>> +		}
>> +		i = nr_processed++;
>> +		work_unlock();
>> +
>> +		second_pass(&objects[i]);
>> +	}
>> +	return NULL;
>>  }
>
> It may be just the matter of naming, but the above is taking the
> abstraction backwards, I think.  Shouldn't it be structured in such a way
> that the caller may call second_pass() and its implementation may turn out
> to be threaded (or not)?
>
> The naming of "arg" made things worse.  I wasted 5 minutes scratching my
> head thinking "arg" was a single specific object that was to be given to
> second_pass(), and wondered why it is made into thread-local data.  Name
> it "thread_data" or something.
>
> And I think the root cause of this confusion is the way "second_pass" was
> split out in the earlier patch.  It is not the entire second-pass, but is
> merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
> is the second-pass, in other words), and its implementation detail might
> change to either thread (i.e. instead of a single line of control
> iterating from 0 to nr_objects, each thread grab the next available task
> and work on it until everything is exhausted) or not.
>
> By the way, if one object is very heavy and takes a lot of time until
> completion, could it be possible that objects[0] is still being processed
> for its base data but objects[1] has already completed and an available
> thread could work on objects[2]?  How does it learn to process objects[2]
> in such a case, or does it wait until the thread working on objects[0] is
> done?

Please disregard the "By the way" part, except that my confusion that led
to the "By the way" comment was caused by another misnaming, namely,
"nr_processed".  It is not counting "How many of them have we already
processed?"---it merely counts "How many of them have we dispatched?" and
completion of the task does not matter in this critical section, which I
missed.  If it were named "nr_dispatched", I wouldn't have wasted my time
wondering about the loop and writing the "By the way" review comment.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
  2012-05-04  6:21     ` Junio C Hamano
@ 2012-05-04 12:50       ` Nguyen Thai Ngoc Duy
  2012-05-04 15:23         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-05-04 12:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramsay Jones

Junio, can you make a patch (or update current ones) with better
naming? I obviously did not see why these names were bad. You may
provide more convincing explanation than me in the commit message.

On Fri, May 4, 2012 at 1:21 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>>
>>> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
>>>      base_obj->obj = obj;
>>>      base_obj->data = NULL;
>>>      find_unresolved_deltas(base_obj);
>>> -    display_progress(progress, nr_resolved_deltas);
>>> +}
>>> +
>>> +static void *threaded_second_pass(void *arg)
>>> +{
>>> +#ifndef NO_PTHREADS
>>> +    if (threads_active)
>>> +            pthread_setspecific(key, arg);
>>> +#endif
>>> +    for (;;) {
>>> +            int i;
>>> +            work_lock();
>>> +            display_progress(progress, nr_resolved_deltas);
>>> +            while (nr_processed < nr_objects &&
>>> +                   is_delta_type(objects[nr_processed].type))
>>> +                    nr_processed++;
>>> +            if (nr_processed >= nr_objects) {
>>> +                    work_unlock();
>>> +                    break;
>>> +            }
>>> +            i = nr_processed++;
>>> +            work_unlock();
>>> +
>>> +            second_pass(&objects[i]);
>>> +    }
>>> +    return NULL;
>>>  }
>>
>> It may be just the matter of naming, but the above is taking the
>> abstraction backwards, I think.  Shouldn't it be structured in such a way
>> that the caller may call second_pass() and its implementation may turn out
>> to be threaded (or not)?
>>
>> The naming of "arg" made things worse.  I wasted 5 minutes scratching my
>> head thinking "arg" was a single specific object that was to be given to
>> second_pass(), and wondered why it is made into thread-local data.  Name
>> it "thread_data" or something.
>>
>> And I think the root cause of this confusion is the way "second_pass" was
>> split out in the earlier patch.  It is not the entire second-pass, but is
>> merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
>> is the second-pass, in other words), and its implementation detail might
>> change to either thread (i.e. instead of a single line of control
>> iterating from 0 to nr_objects, each thread grab the next available task
>> and work on it until everything is exhausted) or not.
>>
>> By the way, if one object is very heavy and takes a lot of time until
>> completion, could it be possible that objects[0] is still being processed
>> for its base data but objects[1] has already completed and an available
>> thread could work on objects[2]?  How does it learn to process objects[2]
>> in such a case, or does it wait until the thread working on objects[0] is
>> done?
>
> Please disregard the "By the way" part, except that my confusion that led
> to the "By the way" comment was caused by another misnaming, namely,
> "nr_processed".  It is not counting "How many of them have we already
> processed?"---it merely counts "How many of them have we dispatched?" and
> completion of the task does not matter in this critical section, which I
> missed.  If it were named "nr_dispatched", I wouldn't have wasted my time
> wondering about the loop and writing the "By the way" review comment.
>



-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
  2012-05-04 12:50       ` Nguyen Thai Ngoc Duy
@ 2012-05-04 15:23         ` Junio C Hamano
  2012-05-06 12:31           ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-05-04 15:23 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Ramsay Jones

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> Junio, can you make a patch (or update current ones) with better
> naming? I obviously did not see why these names were bad. You may
> provide more convincing explanation than me in the commit message.

I have already explained s/nr_processed/nr_dispatched/.

I do not think second_pass vs threaded_second_pass is merely a naming
issue. The code structure ("taking the abstraction backwards") is the
bigger issue.

The original before the series has:

	parse_pack_objects()
	{
		... a lot of code for the first pass ...
		
		... a lot of code for the second pass which looks like
		for (all object entries) {
			set global state base_obj->obj to it
			find unresolved deltas for that single base_obj
			display progress
		}
	}

and you first split one iteration of the second pass, resulting in

	parse_pack_objects()
	{
		... a lot of code for the first pass ...
		
		for (all object entries) {
			set global state base_obj->obj to it
			second_pass()
		}
	}

which I think is a wrong way to do it, because here is what your next step
ends up with because of that:

	parse_pack_objects()
	{
		... a lot of code for the first pass ...
		
	#if WE SUPPORT THREADING
		for (number of threads we are going to spawn) {
			spawn thread and let it run threaded_second_pass
		}
		for (all threads) {
			cull them when they are done
		}
		return
	#endif
		... when not threading ...
		run threaded_second_pass() in the main process
	}

It could (and should) be like this instead from the beginning, I think:

	parse_pack_objects()
	{
		... a lot of code for the first pass ...
		second_pass()
	}

to encapsulate the whole second_pass() logic in the refactored function,
whose implementation (i.e. how the objects to be processed are picked up
and worked on) will change in the threaded version.  In the first
refactoring patch, second_pass() would still iterate over object entries:

	second_pass()
	{
		for (all object entries) {
			resolve_deltas(base_obj)
		}
	}

And at this point, introduce a helper function "resolve_deltas(base)"
or something that is your "second_pass()".  This deals with only one
family of objects that use the given object as their (recursive) base
objects, and use it in the above loop.

And then multi-threading support can turn that into something like

	second_pass()
	{
	#if WE SUPPORT THREADING
		for (number of threads we are going to spawn) {
			spawn thread and let it run threaded_second_pass
		}
		for (all threads) {
			cull them when they are done
		}
		return
	#endif
		... when not threading ...
		for (all object entries) {
			resolve_deltas(base_obj)
		}
	}

to add support for threaded case.  The threaded_second_pass function does
only one thing and one thing well:

	threaded_second_pass()
	{
		set thread local
		loop forever {
			take work item under critical section
			break if there is no more work
			resolve_deltas(the work item)
		}
	}

Wouldn't the result be much more readable that way?

You may have saved a few lines by making unthreaded code call your
threaded_second_pass() and pretend that the single main process still has
to pick up a work item and work on it in a loop like you did, but I think
it made the logic unnecessarily harder to follow.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation
  2012-05-04 15:23         ` Junio C Hamano
@ 2012-05-06 12:31           ` Nguyễn Thái Ngọc Duy
  2012-05-06 12:31             ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
                               ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy

From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 compat/win32/pthread.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..8ad1873 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -86,6 +86,11 @@ static inline int pthread_key_create(pthread_key_t *keyp, void (*destructor)(voi
 	return (*keyp = TlsAlloc()) == TLS_OUT_OF_INDEXES ? EAGAIN : 0;
 }
 
+static inline int pthread_key_delete(pthread_key_t key)
+{
+	return TlsFree(key) ? 0 : EINVAL;
+}
+
 static inline int pthread_setspecific(pthread_key_t key, const void *value)
 {
 	return TlsSetValue(key, (void *)value) ? 0 : EINVAL;
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] index-pack: restructure pack processing into three main functions
  2012-05-06 12:31           ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
@ 2012-05-06 12:31             ` Nguyễn Thái Ngọc Duy
  2012-05-08  0:19               ` Junio C Hamano
  2012-05-06 12:31             ` [PATCH 3/4] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
  2012-05-06 12:31             ` [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

The second pass in parse_pack_objects() are split into
resolve_deltas(). The final phase, fixing thin pack or just seal the
pack, is now in conclude_pack() function. Main pack processing is now
a sequence of these functions:

 - parse_pack_objects() reads through the input pack
 - resolve_deltas()     makes sure all deltas can be resolved
 - conclude_pack()      seals the output pack
 - write_idx_file()     writes companion index file
 - final()              moves the pack/index to proper place

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c |  128 +++++++++++++++++++++++++++++---------------------
 1 files changed, 75 insertions(+), 53 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..a4be4a6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,19 +682,26 @@ static int compare_delta_entry(const void *a, const void *b)
 				   objects[delta_b->obj_no].type);
 }
 
-/* Parse all objects and return the pack content SHA1 hash */
+static void resolve_base(struct object_entry *obj)
+{
+	struct base_data *base_obj = alloc_base_data();
+	base_obj->obj = obj;
+	base_obj->data = NULL;
+	find_unresolved_deltas(base_obj);
+}
+
+/*
+ * First pass:
+ * - find locations of all objects;
+ * - calculate SHA1 of all non-delta objects;
+ * - remember base (SHA1 or offset) for all deltas.
+ */
 static void parse_pack_objects(unsigned char *sha1)
 {
 	int i;
 	struct delta_entry *delta = deltas;
 	struct stat st;
 
-	/*
-	 * First pass:
-	 * - find locations of all objects;
-	 * - calculate SHA1 of all non-delta objects;
-	 * - remember base (SHA1 or offset) for all deltas.
-	 */
 	if (verbose)
 		progress = start_progress(
 				from_stdin ? "Receiving objects" : "Indexing objects",
@@ -728,6 +735,19 @@ static void parse_pack_objects(unsigned char *sha1)
 	if (S_ISREG(st.st_mode) &&
 			lseek(input_fd, 0, SEEK_CUR) - input_len != st.st_size)
 		die("pack has junk at the end");
+}
+
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ *   deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ *   recursively checking if the resulting object is used as a base
+ *   for some more deltas.
+ */
+static void resolve_deltas(void)
+{
+	int i;
 
 	if (!nr_deltas)
 		return;
@@ -736,29 +756,63 @@ static void parse_pack_objects(unsigned char *sha1)
 	qsort(deltas, nr_deltas, sizeof(struct delta_entry),
 	      compare_delta_entry);
 
-	/*
-	 * Second pass:
-	 * - for all non-delta objects, look if it is used as a base for
-	 *   deltas;
-	 * - if used as a base, uncompress the object and apply all deltas,
-	 *   recursively checking if the resulting object is used as a base
-	 *   for some more deltas.
-	 */
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
-		struct base_data *base_obj = alloc_base_data();
 
 		if (is_delta_type(obj->type))
 			continue;
-		base_obj->obj = obj;
-		base_obj->data = NULL;
-		find_unresolved_deltas(base_obj);
+		resolve_base(obj);
 		display_progress(progress, nr_resolved_deltas);
 	}
 }
 
+/*
+ * Third pass:
+ * - append objects to convert thin pack to full pack if required
+ * - write the final 20-byte SHA-1
+ */
+static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved);
+static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned char *pack_sha1)
+{
+	if (nr_deltas == nr_resolved_deltas) {
+		stop_progress(&progress);
+		/* Flush remaining pack final 20-byte SHA1. */
+		flush();
+		return;
+	}
+
+	if (fix_thin_pack) {
+		struct sha1file *f;
+		unsigned char read_sha1[20], tail_sha1[20];
+		char msg[48];
+		int nr_unresolved = nr_deltas - nr_resolved_deltas;
+		int nr_objects_initial = nr_objects;
+		if (nr_unresolved <= 0)
+			die("confusion beyond insanity");
+		objects = xrealloc(objects,
+				   (nr_objects + nr_unresolved + 1)
+				   * sizeof(*objects));
+		f = sha1fd(output_fd, curr_pack);
+		fix_unresolved_deltas(f, nr_unresolved);
+		sprintf(msg, "completed with %d local objects",
+			nr_objects - nr_objects_initial);
+		stop_progress_msg(&progress, msg);
+		sha1close(f, tail_sha1, 0);
+		hashcpy(read_sha1, pack_sha1);
+		fixup_pack_header_footer(output_fd, pack_sha1,
+					 curr_pack, nr_objects,
+					 read_sha1, consumed_bytes-20);
+		if (hashcmp(read_sha1, tail_sha1) != 0)
+			die("Unexpected tail checksum for %s "
+			    "(disk corruption?)", curr_pack);
+	}
+	if (nr_deltas != nr_resolved_deltas)
+		die("pack has %d unresolved deltas",
+		    nr_deltas - nr_resolved_deltas);
+}
+
 static int write_compressed(struct sha1file *f, void *in, unsigned int size)
 {
 	git_zstream stream;
@@ -1196,40 +1250,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
 	deltas = xcalloc(nr_objects, sizeof(struct delta_entry));
 	parse_pack_objects(pack_sha1);
-	if (nr_deltas == nr_resolved_deltas) {
-		stop_progress(&progress);
-		/* Flush remaining pack final 20-byte SHA1. */
-		flush();
-	} else {
-		if (fix_thin_pack) {
-			struct sha1file *f;
-			unsigned char read_sha1[20], tail_sha1[20];
-			char msg[48];
-			int nr_unresolved = nr_deltas - nr_resolved_deltas;
-			int nr_objects_initial = nr_objects;
-			if (nr_unresolved <= 0)
-				die("confusion beyond insanity");
-			objects = xrealloc(objects,
-					   (nr_objects + nr_unresolved + 1)
-					   * sizeof(*objects));
-			f = sha1fd(output_fd, curr_pack);
-			fix_unresolved_deltas(f, nr_unresolved);
-			sprintf(msg, "completed with %d local objects",
-				nr_objects - nr_objects_initial);
-			stop_progress_msg(&progress, msg);
-			sha1close(f, tail_sha1, 0);
-			hashcpy(read_sha1, pack_sha1);
-			fixup_pack_header_footer(output_fd, pack_sha1,
-						 curr_pack, nr_objects,
-						 read_sha1, consumed_bytes-20);
-			if (hashcmp(read_sha1, tail_sha1) != 0)
-				die("Unexpected tail checksum for %s "
-				    "(disk corruption?)", curr_pack);
-		}
-		if (nr_deltas != nr_resolved_deltas)
-			die("pack has %d unresolved deltas",
-			    nr_deltas - nr_resolved_deltas);
-	}
+	resolve_deltas();
+	conclude_pack(fix_thin_pack, curr_pack, pack_sha1);
 	free(deltas);
 	if (strict)
 		check_objects();
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] index-pack: support multithreaded delta resolving
  2012-05-06 12:31           ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
  2012-05-06 12:31             ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
@ 2012-05-06 12:31             ` Nguyễn Thái Ngọc Duy
  2012-05-06 12:31             ` [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.

An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:

1..4
real    0m8.003s  0m5.307s  0m4.321s  0m3.830s
user    0m7.720s  0m8.009s  0m8.133s  0m8.305s
sys     0m0.224s  0m0.372s  0m0.360s  0m0.360s

5..8
real    0m3.727s  0m3.604s  0m3.332s  0m3.369s
user    0m9.361s  0m9.817s  0m9.525s  0m9.769s
sys     0m0.584s  0m0.624s  0m0.540s  0m0.560s

9..12
real    0m3.036s  0m3.139s  0m3.177s  0m2.961s
user    0m8.977s  0m10.205s 0m9.737s  0m10.073s
sys     0m0.596s  0m0.680s  0m0.684s  0m0.680s

13..16
real    0m2.985s  0m2.894s  0m2.975s  0m2.971s
user    0m9.825s  0m10.573s 0m10.833s 0m11.361s
sys     0m0.788s  0m0.732s  0m0.904s  0m1.016s

On an Intel dual core and linux-2.6.git

1..4
real    2m37.789s 2m7.963s  2m0.920s  1m58.213s
user    2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys     0m7.808s  0m11.181s 0m11.224s 0m10.731s

Thanks Ramsay Jones for troubleshooting and support on MinGW platform.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Changes against 'pu':

 - nr_processed -> nr_dispatched
 - both nr_dispatched threads_active are moved to #ifndef NO_PTHREADS block
 - old second_pass() -> resolve_base()

 Documentation/git-index-pack.txt |   10 ++
 Makefile                         |    2 +-
 builtin/index-pack.c             |  204 +++++++++++++++++++++++++++++++++++--
 t/perf/p5302-pack-index.sh       |   40 ++++++++
 4 files changed, 244 insertions(+), 12 deletions(-)
 create mode 100755 t/perf/p5302-pack-index.sh

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--threads=<n>::
+	Specifies the number of threads to spawn when resolving
+	deltas. This requires that index-pack be compiled with
+	pthreads otherwise this option is ignored with a warning.
+	This is meant to reduce packing time on multiprocessor
+	machines. The required amount of memory for the delta search
+	window is however multiplied by the number of threads.
+	Specifying 0 will cause git to auto-detect the number of CPU's
+	and use maximum 3 threads.
+
 
 Note
 ----
diff --git a/Makefile b/Makefile
index cf2c40b..e41955f 100644
--- a/Makefile
+++ b/Makefile
@@ -2160,7 +2160,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
 builtin/bundle.o bundle.o transport.o: bundle.h
 builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
 builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
 builtin/send-pack.o transport.o: send-pack.h
 builtin/log.o builtin/shortlog.o: shortlog.h
 builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a4be4a6..d4685c5 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
 #include "progress.h"
 #include "fsck.h"
 #include "exec_cmd.h"
+#include "thread-utils.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -38,6 +39,14 @@ struct base_data {
 	int ofs_first, ofs_last;
 };
 
+struct thread_local {
+#ifndef NO_PTHREADS
+	pthread_t thread;
+#endif
+	struct base_data *base_cache;
+	size_t base_cache_used;
+};
+
 /*
  * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
  * to memcmp() only the first 20 bytes.
@@ -54,11 +63,11 @@ struct delta_entry {
 
 static struct object_entry *objects;
 static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local nothread_data;
 static int nr_objects;
 static int nr_deltas;
 static int nr_resolved_deltas;
+static int nr_threads;
 
 static int from_stdin;
 static int strict;
@@ -75,6 +84,77 @@ static git_SHA_CTX input_ctx;
 static uint32_t input_crc32;
 static int input_fd, output_fd, pack_fd;
 
+#ifndef NO_PTHREADS
+
+static struct thread_local *thread_data;
+static int nr_dispatched;
+static int threads_active;
+
+static pthread_mutex_t read_mutex;
+#define read_lock()		lock_mutex(&read_mutex)
+#define read_unlock()		unlock_mutex(&read_mutex)
+
+static pthread_mutex_t counter_mutex;
+#define counter_lock()		lock_mutex(&counter_mutex)
+#define counter_unlock()	unlock_mutex(&counter_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock()		lock_mutex(&work_mutex)
+#define work_unlock()		unlock_mutex(&work_mutex)
+
+static pthread_key_t key;
+
+static inline void lock_mutex(pthread_mutex_t *mutex)
+{
+	if (threads_active)
+		pthread_mutex_lock(mutex);
+}
+
+static inline void unlock_mutex(pthread_mutex_t *mutex)
+{
+	if (threads_active)
+		pthread_mutex_unlock(mutex);
+}
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+	init_recursive_mutex(&read_mutex);
+	pthread_mutex_init(&counter_mutex, NULL);
+	pthread_mutex_init(&work_mutex, NULL);
+	pthread_key_create(&key, NULL);
+	thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+	threads_active = 1;
+}
+
+static void cleanup_thread(void)
+{
+	if (!threads_active)
+		return;
+	threads_active = 0;
+	pthread_mutex_destroy(&read_mutex);
+	pthread_mutex_destroy(&counter_mutex);
+	pthread_mutex_destroy(&work_mutex);
+	pthread_key_delete(key);
+	free(thread_data);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define counter_lock()
+#define counter_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
 static int mark_link(struct object *obj, int type, void *data)
 {
 	if (!obj)
@@ -223,6 +303,25 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
 	die("pack has bad object at offset %lu: %s", offset, buf);
 }
 
+static inline struct thread_local *get_thread_data(void)
+{
+#ifndef NO_PTHREADS
+	if (threads_active)
+		return pthread_getspecific(key);
+	assert(!threads_active &&
+	       "This should only be reached when all threads are gone");
+#endif
+	return &nothread_data;
+}
+
+#ifndef NO_PTHREADS
+static void set_thread_data(struct thread_local *data)
+{
+	if (threads_active)
+		pthread_setspecific(key, data);
+}
+#endif
+
 static struct base_data *alloc_base_data(void)
 {
 	struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +336,16 @@ static void free_base_data(struct base_data *c)
 	if (c->data) {
 		free(c->data);
 		c->data = NULL;
-		base_cache_used -= c->size;
+		get_thread_data()->base_cache_used -= c->size;
 	}
 }
 
 static void prune_base_data(struct base_data *retain)
 {
 	struct base_data *b;
-	for (b = base_cache;
-	     base_cache_used > delta_base_cache_limit && b;
+	struct thread_local *data = get_thread_data();
+	for (b = data->base_cache;
+	     data->base_cache_used > delta_base_cache_limit && b;
 	     b = b->child) {
 		if (b->data && b != retain)
 			free_base_data(b);
@@ -257,12 +357,12 @@ static void link_base_data(struct base_data *base, struct base_data *c)
 	if (base)
 		base->child = c;
 	else
-		base_cache = c;
+		get_thread_data()->base_cache = c;
 
 	c->base = base;
 	c->child = NULL;
 	if (c->data)
-		base_cache_used += c->size;
+		get_thread_data()->base_cache_used += c->size;
 	prune_base_data(c);
 }
 
@@ -272,7 +372,7 @@ static void unlink_base_data(struct base_data *c)
 	if (base)
 		base->child = NULL;
 	else
-		base_cache = NULL;
+		get_thread_data()->base_cache = NULL;
 	free_base_data(c);
 }
 
@@ -461,19 +561,24 @@ static void sha1_object(const void *data, unsigned long size,
 			enum object_type type, unsigned char *sha1)
 {
 	hash_sha1_file(data, size, typename(type), sha1);
+	read_lock();
 	if (has_sha1_file(sha1)) {
 		void *has_data;
 		enum object_type has_type;
 		unsigned long has_size;
 		has_data = read_sha1_file(sha1, &has_type, &has_size);
+		read_unlock();
 		if (!has_data)
 			die("cannot read existing object %s", sha1_to_hex(sha1));
 		if (size != has_size || type != has_type ||
 		    memcmp(data, has_data, size) != 0)
 			die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
 		free(has_data);
-	}
+	} else
+		read_unlock();
+
 	if (strict) {
+		read_lock();
 		if (type == OBJ_BLOB) {
 			struct blob *blob = lookup_blob(sha1);
 			if (blob)
@@ -507,6 +612,7 @@ static void sha1_object(const void *data, unsigned long size,
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
+		read_unlock();
 	}
 }
 
@@ -552,7 +658,7 @@ static void *get_base_data(struct base_data *c)
 		if (!delta_nr) {
 			c->data = get_data_from_pack(obj);
 			c->size = obj->size;
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +674,7 @@ static void *get_base_data(struct base_data *c)
 			free(raw);
 			if (!c->data)
 				bad_object(obj->idx.offset, "failed to apply delta");
-			base_cache_used += c->size;
+			get_thread_data()->base_cache_used += c->size;
 			prune_base_data(c);
 		}
 		free(delta);
@@ -596,7 +702,9 @@ static void resolve_delta(struct object_entry *delta_obj,
 		bad_object(delta_obj->idx.offset, "failed to apply delta");
 	sha1_object(result->data, result->size, delta_obj->real_type,
 		    delta_obj->idx.sha1);
+	counter_lock();
 	nr_resolved_deltas++;
+	counter_unlock();
 }
 
 static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -690,6 +798,30 @@ static void resolve_base(struct object_entry *obj)
 	find_unresolved_deltas(base_obj);
 }
 
+#ifndef NO_PTHREADS
+static void *threaded_second_pass(void *data)
+{
+	set_thread_data(data);
+	for (;;) {
+		int i;
+		work_lock();
+		display_progress(progress, nr_resolved_deltas);
+		while (nr_dispatched < nr_objects &&
+		       is_delta_type(objects[nr_dispatched].type))
+			nr_dispatched++;
+		if (nr_dispatched >= nr_objects) {
+			work_unlock();
+			break;
+		}
+		i = nr_dispatched++;
+		work_unlock();
+
+		resolve_base(&objects[i]);
+	}
+	return NULL;
+}
+#endif
+
 /*
  * First pass:
  * - find locations of all objects;
@@ -758,6 +890,24 @@ static void resolve_deltas(void)
 
 	if (verbose)
 		progress = start_progress("Resolving deltas", nr_deltas);
+
+#ifndef NO_PTHREADS
+	nr_dispatched = 0;
+	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
+		init_thread();
+		for (i = 0; i < nr_threads; i++) {
+			int ret = pthread_create(&thread_data[i].thread, NULL,
+						 threaded_second_pass, thread_data + i);
+			if (ret)
+				die("unable to create thread: %s", strerror(ret));
+		}
+		for (i = 0; i < nr_threads; i++)
+			pthread_join(thread_data[i].thread, NULL);
+		cleanup_thread();
+		return;
+	}
+#endif
+
 	for (i = 0; i < nr_objects; i++) {
 		struct object_entry *obj = &objects[i];
 
@@ -1016,6 +1166,18 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
 			die("bad pack.indexversion=%"PRIu32, opts->version);
 		return 0;
 	}
+	if (!strcmp(k, "pack.threads")) {
+		nr_threads = git_config_int(k, v);
+		if (nr_threads < 0)
+			die("invalid number of threads specified (%d)",
+			    nr_threads);
+#ifdef NO_PTHREADS
+		if (nr_threads != 1)
+			warning("no threads support, ignoring %s", k);
+		nr_threads = 1;
+#endif
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -1174,6 +1336,17 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				keep_msg = "";
 			} else if (!prefixcmp(arg, "--keep=")) {
 				keep_msg = arg + 7;
+			} else if (!prefixcmp(arg, "--threads=")) {
+				char *end;
+				nr_threads = strtoul(arg+10, &end, 0);
+				if (!arg[10] || *end || nr_threads < 0)
+					usage(index_pack_usage);
+#ifdef NO_PTHREADS
+				if (nr_threads != 1)
+					warning("no threads support, "
+						"ignoring %s", arg);
+				nr_threads = 1;
+#endif
 			} else if (!prefixcmp(arg, "--pack_header=")) {
 				struct pack_header *hdr;
 				char *c;
@@ -1245,6 +1418,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (strict)
 		opts.flags |= WRITE_IDX_STRICT;
 
+#ifndef NO_PTHREADS
+	if (!nr_threads) {
+		nr_threads = online_cpus();
+		/* An experiment showed that more threads does not mean faster */
+		if (nr_threads > 3)
+			nr_threads = 3;
+	}
+#endif
+
 	curr_pack = open_pack_file(pack_name);
 	parse_pack_header();
 	objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
new file mode 100755
index 0000000..6cb5b0d
--- /dev/null
+++ b/t/perf/p5302-pack-index.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description="Tests index-pack performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'repack' '
+	git repack -ad &&
+	PACK=`ls .git/objects/pack/*.pack | head -n1` &&
+	test -f "$PACK" &&
+	export PACK
+'
+
+test_perf 'index-pack 0 threads' '
+	GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 1 thread ' '
+	GIT_DIR=t2 GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 2 threads' '
+	GIT_DIR=t3 git index-pack --threads=2 --stdin < $PACK
+'
+
+test_perf 'index-pack 4 threads' '
+	GIT_DIR=t4 git index-pack --threads=4 --stdin < $PACK
+'
+
+test_perf 'index-pack 8 threads' '
+	GIT_DIR=t5 git index-pack --threads=8 --stdin < $PACK
+'
+
+test_perf 'index-pack default number of threads' '
+	GIT_DIR=t6 git index-pack --stdin < $PACK
+'
+
+test_done
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined
  2012-05-06 12:31           ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
  2012-05-06 12:31             ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
  2012-05-06 12:31             ` [PATCH 3/4] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
@ 2012-05-06 12:31             ` Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

NO_PREAD simulates pread() as a sequence of seek, read, seek in
compat/pread.c. The simulation is not thread-safe because another
thread could move the file offset away in the middle of pread
operation. Do not allow threading in that case.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d4685c5..807ee56 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -39,6 +39,11 @@ struct base_data {
 	int ofs_first, ofs_last;
 };
 
+#if !defined(NO_PTHREADS) && defined(NO_PREAD)
+/* NO_PREAD uses compat/pread.c, which is not thread-safe. Disable threading. */
+#define NO_PTHREADS
+#endif
+
 struct thread_local {
 #ifndef NO_PTHREADS
 	pthread_t thread;
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] index-pack: restructure pack processing into three main functions
  2012-05-06 12:31             ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
@ 2012-05-08  0:19               ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2012-05-08  0:19 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The second pass in parse_pack_objects() are split into
> resolve_deltas(). The final phase, fixing thin pack or just seal the
> pack, is now in conclude_pack() function. Main pack processing is now a
> sequence of these functions:
>
>  - parse_pack_objects() reads through the input pack
>  - resolve_deltas()     makes sure all deltas can be resolved
>  - conclude_pack()      seals the output pack
>  - write_idx_file()     writes companion index file
>  - final()              moves the pack/index to proper place

The resulting code looks much more streamlined, instead of just splitting
out the second phase alone.

Very nice.  Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-05-08  0:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11  5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
2012-04-11  5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-04-11  5:49 ` [PATCH v3 2/3] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
2012-04-11  5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-05-03 22:10   ` Junio C Hamano
2012-05-04  6:21     ` Junio C Hamano
2012-05-04 12:50       ` Nguyen Thai Ngoc Duy
2012-05-04 15:23         ` Junio C Hamano
2012-05-06 12:31           ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-05-06 12:31             ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
2012-05-08  0:19               ` Junio C Hamano
2012-05-06 12:31             ` [PATCH 3/4] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-05-06 12:31             ` [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined Nguyễn Thái Ngọc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).