* [PATCH v3 0/3] Multithread index-pack
@ 2012-04-11 5:49 Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11 5:49 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy
This round hopefully fixes MinGW brekages for real. I also added a
perf test but not so sure if this is a correct way to do.
Nguyễn Thái Ngọc Duy (2):
index-pack: split second pass obj handling into own function
index-pack: support multithreaded delta resolving
Ramsay Allan Jones (1):
compat/win32/pthread.h: Add an pthread_key_delete() implementation
Documentation/git-index-pack.txt | 10 ++
Makefile | 2 +-
builtin/index-pack.c | 229 +++++++++++++++++++++++++++++++++-----
compat/win32/pthread.h | 5 +
t/perf/p5302-pack-index.sh | 40 +++++++
5 files changed, 257 insertions(+), 29 deletions(-)
create mode 100755 t/perf/p5302-pack-index.sh
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation
2012-04-11 5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
@ 2012-04-11 5:49 ` Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 2/3] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11 5:49 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy
From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
compat/win32/pthread.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..8ad1873 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -86,6 +86,11 @@ static inline int pthread_key_create(pthread_key_t *keyp, void (*destructor)(voi
return (*keyp = TlsAlloc()) == TLS_OUT_OF_INDEXES ? EAGAIN : 0;
}
+static inline int pthread_key_delete(pthread_key_t key)
+{
+ return TlsFree(key) ? 0 : EINVAL;
+}
+
static inline int pthread_setspecific(pthread_key_t key, const void *value)
{
return TlsSetValue(key, (void *)value) ? 0 : EINVAL;
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/3] index-pack: split second pass obj handling into own function
2012-04-11 5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
@ 2012-04-11 5:49 ` Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11 5:49 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/index-pack.c | 31 ++++++++++++++++++-------------
1 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..918684f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,6 +682,23 @@ static int compare_delta_entry(const void *a, const void *b)
objects[delta_b->obj_no].type);
}
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ * deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ * recursively checking if the resulting object is used as a base
+ * for some more deltas.
+ */
+static void second_pass(struct object_entry *obj)
+{
+ struct base_data *base_obj = alloc_base_data();
+ base_obj->obj = obj;
+ base_obj->data = NULL;
+ find_unresolved_deltas(base_obj);
+ display_progress(progress, nr_resolved_deltas);
+}
+
/* Parse all objects and return the pack content SHA1 hash */
static void parse_pack_objects(unsigned char *sha1)
{
@@ -736,26 +753,14 @@ static void parse_pack_objects(unsigned char *sha1)
qsort(deltas, nr_deltas, sizeof(struct delta_entry),
compare_delta_entry);
- /*
- * Second pass:
- * - for all non-delta objects, look if it is used as a base for
- * deltas;
- * - if used as a base, uncompress the object and apply all deltas,
- * recursively checking if the resulting object is used as a base
- * for some more deltas.
- */
if (verbose)
progress = start_progress("Resolving deltas", nr_deltas);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- struct base_data *base_obj = alloc_base_data();
if (is_delta_type(obj->type))
continue;
- base_obj->obj = obj;
- base_obj->data = NULL;
- find_unresolved_deltas(base_obj);
- display_progress(progress, nr_resolved_deltas);
+ second_pass(obj);
}
}
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 3/3] index-pack: support multithreaded delta resolving
2012-04-11 5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 2/3] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
@ 2012-04-11 5:49 ` Nguyễn Thái Ngọc Duy
2012-05-03 22:10 ` Junio C Hamano
2 siblings, 1 reply; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-04-11 5:49 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy
This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.
An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:
1..4
real 0m8.003s 0m5.307s 0m4.321s 0m3.830s
user 0m7.720s 0m8.009s 0m8.133s 0m8.305s
sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s
5..8
real 0m3.727s 0m3.604s 0m3.332s 0m3.369s
user 0m9.361s 0m9.817s 0m9.525s 0m9.769s
sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s
9..12
real 0m3.036s 0m3.139s 0m3.177s 0m2.961s
user 0m8.977s 0m10.205s 0m9.737s 0m10.073s
sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s
13..16
real 0m2.985s 0m2.894s 0m2.975s 0m2.971s
user 0m9.825s 0m10.573s 0m10.833s 0m11.361s
sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s
On an Intel dual core and linux-2.6.git
1..4
real 2m37.789s 2m7.963s 2m0.920s 1m58.213s
user 2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s
Thanks Ramsay Jones for troubleshooting and support on MinGW platform.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/git-index-pack.txt | 10 ++
Makefile | 2 +-
builtin/index-pack.c | 202 ++++++++++++++++++++++++++++++++++---
t/perf/p5302-pack-index.sh | 40 ++++++++
4 files changed, 236 insertions(+), 18 deletions(-)
create mode 100755 t/perf/p5302-pack-index.sh
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
--strict::
Die, if the pack contains broken objects or links.
+--threads=<n>::
+ Specifies the number of threads to spawn when resolving
+ deltas. This requires that index-pack be compiled with
+ pthreads otherwise this option is ignored with a warning.
+ This is meant to reduce packing time on multiprocessor
+ machines. The required amount of memory for the delta search
+ window is however multiplied by the number of threads.
+ Specifying 0 will cause git to auto-detect the number of CPU's
+ and use maximum 3 threads.
+
Note
----
diff --git a/Makefile b/Makefile
index be1957a..4d7a3de 100644
--- a/Makefile
+++ b/Makefile
@@ -2160,7 +2160,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
builtin/bundle.o bundle.o transport.o: bundle.h
builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
builtin/send-pack.o transport.o: send-pack.h
builtin/log.o builtin/shortlog.o: shortlog.h
builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 918684f..45bc5db 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
#include "progress.h"
#include "fsck.h"
#include "exec_cmd.h"
+#include "thread-utils.h"
static const char index_pack_usage[] =
"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -38,6 +39,14 @@ struct base_data {
int ofs_first, ofs_last;
};
+struct thread_local {
+#ifndef NO_PTHREADS
+ pthread_t thread;
+#endif
+ struct base_data *base_cache;
+ size_t base_cache_used;
+};
+
/*
* Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
* to memcmp() only the first 20 bytes.
@@ -54,11 +63,13 @@ struct delta_entry {
static struct object_entry *objects;
static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local nothread_data;
static int nr_objects;
+static int nr_processed;
static int nr_deltas;
static int nr_resolved_deltas;
+static int nr_threads;
+static int threads_active;
static int from_stdin;
static int strict;
@@ -75,6 +86,75 @@ static git_SHA_CTX input_ctx;
static uint32_t input_crc32;
static int input_fd, output_fd, pack_fd;
+#ifndef NO_PTHREADS
+
+static struct thread_local *thread_data;
+
+static pthread_mutex_t read_mutex;
+#define read_lock() lock_mutex(&read_mutex)
+#define read_unlock() unlock_mutex(&read_mutex)
+
+static pthread_mutex_t counter_mutex;
+#define counter_lock() lock_mutex(&counter_mutex)
+#define counter_unlock() unlock_mutex(&counter_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock() lock_mutex(&work_mutex)
+#define work_unlock() unlock_mutex(&work_mutex)
+
+static pthread_key_t key;
+
+static inline void lock_mutex(pthread_mutex_t *mutex)
+{
+ if (threads_active)
+ pthread_mutex_lock(mutex);
+}
+
+static inline void unlock_mutex(pthread_mutex_t *mutex)
+{
+ if (threads_active)
+ pthread_mutex_unlock(mutex);
+}
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+ init_recursive_mutex(&read_mutex);
+ pthread_mutex_init(&counter_mutex, NULL);
+ pthread_mutex_init(&work_mutex, NULL);
+ pthread_key_create(&key, NULL);
+ thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+ threads_active = 1;
+}
+
+static void cleanup_thread(void)
+{
+ if (!threads_active)
+ return;
+ threads_active = 0;
+ pthread_mutex_destroy(&read_mutex);
+ pthread_mutex_destroy(&counter_mutex);
+ pthread_mutex_destroy(&work_mutex);
+ pthread_key_delete(key);
+ free(thread_data);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define counter_lock()
+#define counter_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
static int mark_link(struct object *obj, int type, void *data)
{
if (!obj)
@@ -223,6 +303,17 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
die("pack has bad object at offset %lu: %s", offset, buf);
}
+static struct thread_local *get_thread_data()
+{
+#ifndef NO_PTHREADS
+ if (threads_active)
+ return pthread_getspecific(key);
+#endif
+ assert(!threads_active &&
+ "This should only be reached when all threads are gone");
+ return ¬hread_data;
+}
+
static struct base_data *alloc_base_data(void)
{
struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +328,16 @@ static void free_base_data(struct base_data *c)
if (c->data) {
free(c->data);
c->data = NULL;
- base_cache_used -= c->size;
+ get_thread_data()->base_cache_used -= c->size;
}
}
static void prune_base_data(struct base_data *retain)
{
struct base_data *b;
- for (b = base_cache;
- base_cache_used > delta_base_cache_limit && b;
+ struct thread_local *data = get_thread_data();
+ for (b = data->base_cache;
+ data->base_cache_used > delta_base_cache_limit && b;
b = b->child) {
if (b->data && b != retain)
free_base_data(b);
@@ -257,12 +349,12 @@ static void link_base_data(struct base_data *base, struct base_data *c)
if (base)
base->child = c;
else
- base_cache = c;
+ get_thread_data()->base_cache = c;
c->base = base;
c->child = NULL;
if (c->data)
- base_cache_used += c->size;
+ get_thread_data()->base_cache_used += c->size;
prune_base_data(c);
}
@@ -272,7 +364,7 @@ static void unlink_base_data(struct base_data *c)
if (base)
base->child = NULL;
else
- base_cache = NULL;
+ get_thread_data()->base_cache = NULL;
free_base_data(c);
}
@@ -461,19 +553,24 @@ static void sha1_object(const void *data, unsigned long size,
enum object_type type, unsigned char *sha1)
{
hash_sha1_file(data, size, typename(type), sha1);
+ read_lock();
if (has_sha1_file(sha1)) {
void *has_data;
enum object_type has_type;
unsigned long has_size;
has_data = read_sha1_file(sha1, &has_type, &has_size);
+ read_unlock();
if (!has_data)
die("cannot read existing object %s", sha1_to_hex(sha1));
if (size != has_size || type != has_type ||
memcmp(data, has_data, size) != 0)
die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
free(has_data);
- }
+ } else
+ read_unlock();
+
if (strict) {
+ read_lock();
if (type == OBJ_BLOB) {
struct blob *blob = lookup_blob(sha1);
if (blob)
@@ -507,6 +604,7 @@ static void sha1_object(const void *data, unsigned long size,
}
obj->flags |= FLAG_CHECKED;
}
+ read_unlock();
}
}
@@ -552,7 +650,7 @@ static void *get_base_data(struct base_data *c)
if (!delta_nr) {
c->data = get_data_from_pack(obj);
c->size = obj->size;
- base_cache_used += c->size;
+ get_thread_data()->base_cache_used += c->size;
prune_base_data(c);
}
for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +666,7 @@ static void *get_base_data(struct base_data *c)
free(raw);
if (!c->data)
bad_object(obj->idx.offset, "failed to apply delta");
- base_cache_used += c->size;
+ get_thread_data()->base_cache_used += c->size;
prune_base_data(c);
}
free(delta);
@@ -596,7 +694,9 @@ static void resolve_delta(struct object_entry *delta_obj,
bad_object(delta_obj->idx.offset, "failed to apply delta");
sha1_object(result->data, result->size, delta_obj->real_type,
delta_obj->idx.sha1);
+ counter_lock();
nr_resolved_deltas++;
+ counter_unlock();
}
static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
base_obj->obj = obj;
base_obj->data = NULL;
find_unresolved_deltas(base_obj);
- display_progress(progress, nr_resolved_deltas);
+}
+
+static void *threaded_second_pass(void *arg)
+{
+#ifndef NO_PTHREADS
+ if (threads_active)
+ pthread_setspecific(key, arg);
+#endif
+ for (;;) {
+ int i;
+ work_lock();
+ display_progress(progress, nr_resolved_deltas);
+ while (nr_processed < nr_objects &&
+ is_delta_type(objects[nr_processed].type))
+ nr_processed++;
+ if (nr_processed >= nr_objects) {
+ work_unlock();
+ break;
+ }
+ i = nr_processed++;
+ work_unlock();
+
+ second_pass(&objects[i]);
+ }
+ return NULL;
}
/* Parse all objects and return the pack content SHA1 hash */
@@ -755,13 +879,25 @@ static void parse_pack_objects(unsigned char *sha1)
if (verbose)
progress = start_progress("Resolving deltas", nr_deltas);
- for (i = 0; i < nr_objects; i++) {
- struct object_entry *obj = &objects[i];
- if (is_delta_type(obj->type))
- continue;
- second_pass(obj);
+ nr_processed = 0;
+#ifndef NO_PTHREADS
+ if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
+ init_thread();
+ for (i = 0; i < nr_threads; i++) {
+ int ret = pthread_create(&thread_data[i].thread, NULL,
+ threaded_second_pass, thread_data + i);
+ if (ret)
+ die("unable to create thread: %s", strerror(ret));
+ }
+ for (i = 0; i < nr_threads; i++)
+ pthread_join(thread_data[i].thread, NULL);
+
+ cleanup_thread();
+ return;
}
+#endif
+ threaded_second_pass(NULL);
}
static int write_compressed(struct sha1file *f, void *in, unsigned int size)
@@ -967,6 +1103,18 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
die("bad pack.indexversion=%"PRIu32, opts->version);
return 0;
}
+ if (!strcmp(k, "pack.threads")) {
+ nr_threads = git_config_int(k, v);
+ if (nr_threads < 0)
+ die("invalid number of threads specified (%d)",
+ nr_threads);
+#ifdef NO_PTHREADS
+ if (nr_threads != 1)
+ warning("no threads support, ignoring %s", k);
+ nr_threads = 1;
+#endif
+ return 0;
+ }
return git_default_config(k, v, cb);
}
@@ -1125,6 +1273,17 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
keep_msg = "";
} else if (!prefixcmp(arg, "--keep=")) {
keep_msg = arg + 7;
+ } else if (!prefixcmp(arg, "--threads=")) {
+ char *end;
+ nr_threads = strtoul(arg+10, &end, 0);
+ if (!arg[10] || *end || nr_threads < 0)
+ usage(index_pack_usage);
+#ifdef NO_PTHREADS
+ if (nr_threads != 1)
+ warning("no threads support, "
+ "ignoring %s", arg);
+ nr_threads = 1;
+#endif
} else if (!prefixcmp(arg, "--pack_header=")) {
struct pack_header *hdr;
char *c;
@@ -1196,6 +1355,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
if (strict)
opts.flags |= WRITE_IDX_STRICT;
+#ifndef NO_PTHREADS
+ if (!nr_threads) {
+ nr_threads = online_cpus();
+ /* An experiment showed that more threads does not mean faster */
+ if (nr_threads > 3)
+ nr_threads = 3;
+ }
+#endif
+
curr_pack = open_pack_file(pack_name);
parse_pack_header();
objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
new file mode 100755
index 0000000..6cb5b0d
--- /dev/null
+++ b/t/perf/p5302-pack-index.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description="Tests index-pack performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'repack' '
+ git repack -ad &&
+ PACK=`ls .git/objects/pack/*.pack | head -n1` &&
+ test -f "$PACK" &&
+ export PACK
+'
+
+test_perf 'index-pack 0 threads' '
+ GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 1 thread ' '
+ GIT_DIR=t2 GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 2 threads' '
+ GIT_DIR=t3 git index-pack --threads=2 --stdin < $PACK
+'
+
+test_perf 'index-pack 4 threads' '
+ GIT_DIR=t4 git index-pack --threads=4 --stdin < $PACK
+'
+
+test_perf 'index-pack 8 threads' '
+ GIT_DIR=t5 git index-pack --threads=8 --stdin < $PACK
+'
+
+test_perf 'index-pack default number of threads' '
+ GIT_DIR=t6 git index-pack --stdin < $PACK
+'
+
+test_done
--
1.7.3.1.256.g2539c.dirty
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
2012-04-11 5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
@ 2012-05-03 22:10 ` Junio C Hamano
2012-05-04 6:21 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-05-03 22:10 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Ramsay Jones
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
> base_obj->obj = obj;
> base_obj->data = NULL;
> find_unresolved_deltas(base_obj);
> - display_progress(progress, nr_resolved_deltas);
> +}
> +
> +static void *threaded_second_pass(void *arg)
> +{
> +#ifndef NO_PTHREADS
> + if (threads_active)
> + pthread_setspecific(key, arg);
> +#endif
> + for (;;) {
> + int i;
> + work_lock();
> + display_progress(progress, nr_resolved_deltas);
> + while (nr_processed < nr_objects &&
> + is_delta_type(objects[nr_processed].type))
> + nr_processed++;
> + if (nr_processed >= nr_objects) {
> + work_unlock();
> + break;
> + }
> + i = nr_processed++;
> + work_unlock();
> +
> + second_pass(&objects[i]);
> + }
> + return NULL;
> }
It may be just the matter of naming, but the above is taking the
abstraction backwards, I think. Shouldn't it be structured in such a way
that the caller may call second_pass() and its implementation may turn out
to be threaded (or not)?
The naming of "arg" made things worse. I wasted 5 minutes scratching my
head thinking "arg" was a single specific object that was to be given to
second_pass(), and wondered why it is made into thread-local data. Name
it "thread_data" or something.
And I think the root cause of this confusion is the way "second_pass" was
split out in the earlier patch. It is not the entire second-pass, but is
merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
is the second-pass, in other words), and its implementation detail might
change to either thread (i.e. instead of a single line of control
iterating from 0 to nr_objects, each thread grab the next available task
and work on it until everything is exhausted) or not.
By the way, if one object is very heavy and takes a lot of time until
completion, could it be possible that objects[0] is still being processed
for its base data but objects[1] has already completed and an available
thread could work on objects[2]? How does it learn to process objects[2]
in such a case, or does it wait until the thread working on objects[0] is
done?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
2012-05-03 22:10 ` Junio C Hamano
@ 2012-05-04 6:21 ` Junio C Hamano
2012-05-04 12:50 ` Nguyen Thai Ngoc Duy
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-05-04 6:21 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Ramsay Jones
Junio C Hamano <gitster@pobox.com> writes:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
>> base_obj->obj = obj;
>> base_obj->data = NULL;
>> find_unresolved_deltas(base_obj);
>> - display_progress(progress, nr_resolved_deltas);
>> +}
>> +
>> +static void *threaded_second_pass(void *arg)
>> +{
>> +#ifndef NO_PTHREADS
>> + if (threads_active)
>> + pthread_setspecific(key, arg);
>> +#endif
>> + for (;;) {
>> + int i;
>> + work_lock();
>> + display_progress(progress, nr_resolved_deltas);
>> + while (nr_processed < nr_objects &&
>> + is_delta_type(objects[nr_processed].type))
>> + nr_processed++;
>> + if (nr_processed >= nr_objects) {
>> + work_unlock();
>> + break;
>> + }
>> + i = nr_processed++;
>> + work_unlock();
>> +
>> + second_pass(&objects[i]);
>> + }
>> + return NULL;
>> }
>
> It may be just the matter of naming, but the above is taking the
> abstraction backwards, I think. Shouldn't it be structured in such a way
> that the caller may call second_pass() and its implementation may turn out
> to be threaded (or not)?
>
> The naming of "arg" made things worse. I wasted 5 minutes scratching my
> head thinking "arg" was a single specific object that was to be given to
> second_pass(), and wondered why it is made into thread-local data. Name
> it "thread_data" or something.
>
> And I think the root cause of this confusion is the way "second_pass" was
> split out in the earlier patch. It is not the entire second-pass, but is
> merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
> is the second-pass, in other words), and its implementation detail might
> change to either thread (i.e. instead of a single line of control
> iterating from 0 to nr_objects, each thread grab the next available task
> and work on it until everything is exhausted) or not.
>
> By the way, if one object is very heavy and takes a lot of time until
> completion, could it be possible that objects[0] is still being processed
> for its base data but objects[1] has already completed and an available
> thread could work on objects[2]? How does it learn to process objects[2]
> in such a case, or does it wait until the thread working on objects[0] is
> done?
Please disregard the "By the way" part, except that my confusion that led
to the "By the way" comment was caused by another misnaming, namely,
"nr_processed". It is not counting "How many of them have we already
processed?"---it merely counts "How many of them have we dispatched?" and
completion of the task does not matter in this critical section, which I
missed. If it were named "nr_dispatched", I wouldn't have wasted my time
wondering about the loop and writing the "By the way" review comment.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
2012-05-04 6:21 ` Junio C Hamano
@ 2012-05-04 12:50 ` Nguyen Thai Ngoc Duy
2012-05-04 15:23 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-05-04 12:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Ramsay Jones
Junio, can you make a patch (or update current ones) with better
naming? I obviously did not see why these names were bad. You may
provide more convincing explanation than me in the commit message.
On Fri, May 4, 2012 at 1:21 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>>
>>> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
>>> base_obj->obj = obj;
>>> base_obj->data = NULL;
>>> find_unresolved_deltas(base_obj);
>>> - display_progress(progress, nr_resolved_deltas);
>>> +}
>>> +
>>> +static void *threaded_second_pass(void *arg)
>>> +{
>>> +#ifndef NO_PTHREADS
>>> + if (threads_active)
>>> + pthread_setspecific(key, arg);
>>> +#endif
>>> + for (;;) {
>>> + int i;
>>> + work_lock();
>>> + display_progress(progress, nr_resolved_deltas);
>>> + while (nr_processed < nr_objects &&
>>> + is_delta_type(objects[nr_processed].type))
>>> + nr_processed++;
>>> + if (nr_processed >= nr_objects) {
>>> + work_unlock();
>>> + break;
>>> + }
>>> + i = nr_processed++;
>>> + work_unlock();
>>> +
>>> + second_pass(&objects[i]);
>>> + }
>>> + return NULL;
>>> }
>>
>> It may be just the matter of naming, but the above is taking the
>> abstraction backwards, I think. Shouldn't it be structured in such a way
>> that the caller may call second_pass() and its implementation may turn out
>> to be threaded (or not)?
>>
>> The naming of "arg" made things worse. I wasted 5 minutes scratching my
>> head thinking "arg" was a single specific object that was to be given to
>> second_pass(), and wondered why it is made into thread-local data. Name
>> it "thread_data" or something.
>>
>> And I think the root cause of this confusion is the way "second_pass" was
>> split out in the earlier patch. It is not the entire second-pass, but is
>> merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
>> is the second-pass, in other words), and its implementation detail might
>> change to either thread (i.e. instead of a single line of control
>> iterating from 0 to nr_objects, each thread grab the next available task
>> and work on it until everything is exhausted) or not.
>>
>> By the way, if one object is very heavy and takes a lot of time until
>> completion, could it be possible that objects[0] is still being processed
>> for its base data but objects[1] has already completed and an available
>> thread could work on objects[2]? How does it learn to process objects[2]
>> in such a case, or does it wait until the thread working on objects[0] is
>> done?
>
> Please disregard the "By the way" part, except that my confusion that led
> to the "By the way" comment was caused by another misnaming, namely,
> "nr_processed". It is not counting "How many of them have we already
> processed?"---it merely counts "How many of them have we dispatched?" and
> completion of the task does not matter in this critical section, which I
> missed. If it were named "nr_dispatched", I wouldn't have wasted my time
> wondering about the loop and writing the "By the way" review comment.
>
--
Duy
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving
2012-05-04 12:50 ` Nguyen Thai Ngoc Duy
@ 2012-05-04 15:23 ` Junio C Hamano
2012-05-06 12:31 ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-05-04 15:23 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: git, Ramsay Jones
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> Junio, can you make a patch (or update current ones) with better
> naming? I obviously did not see why these names were bad. You may
> provide more convincing explanation than me in the commit message.
I have already explained s/nr_processed/nr_dispatched/.
I do not think second_pass vs threaded_second_pass is merely a naming
issue. The code structure ("taking the abstraction backwards") is the
bigger issue.
The original before the series has:
parse_pack_objects()
{
... a lot of code for the first pass ...
... a lot of code for the second pass which looks like
for (all object entries) {
set global state base_obj->obj to it
find unresolved deltas for that single base_obj
display progress
}
}
and you first split one iteration of the second pass, resulting in
parse_pack_objects()
{
... a lot of code for the first pass ...
for (all object entries) {
set global state base_obj->obj to it
second_pass()
}
}
which I think is a wrong way to do it, because here is what your next step
ends up with because of that:
parse_pack_objects()
{
... a lot of code for the first pass ...
#if WE SUPPORT THREADING
for (number of threads we are going to spawn) {
spawn thread and let it run threaded_second_pass
}
for (all threads) {
cull them when they are done
}
return
#endif
... when not threading ...
run threaded_second_pass() in the main process
}
It could (and should) be like this instead from the beginning, I think:
parse_pack_objects()
{
... a lot of code for the first pass ...
second_pass()
}
to encapsulate the whole second_pass() logic in the refactored function,
whose implementation (i.e. how the objects to be processed are picked up
and worked on) will change in the threaded version. In the first
refactoring patch, second_pass() would still iterate over object entries:
second_pass()
{
for (all object entries) {
resolve_deltas(base_obj)
}
}
And at this point, introduce a helper function "resolve_deltas(base)"
or something that is your "second_pass()". This deals with only one
family of objects that use the given object as their (recursive) base
objects, and use it in the above loop.
And then multi-threading support can turn that into something like
second_pass()
{
#if WE SUPPORT THREADING
for (number of threads we are going to spawn) {
spawn thread and let it run threaded_second_pass
}
for (all threads) {
cull them when they are done
}
return
#endif
... when not threading ...
for (all object entries) {
resolve_deltas(base_obj)
}
}
to add support for threaded case. The threaded_second_pass function does
only one thing and one thing well:
threaded_second_pass()
{
set thread local
loop forever {
take work item under critical section
break if there is no more work
resolve_deltas(the work item)
}
}
Wouldn't the result be much more readable that way?
You may have saved a few lines by making unthreaded code call your
threaded_second_pass() and pretend that the single main process still has
to pick up a work item and work on it in a loop like you did, but I think
it made the logic unnecessarily harder to follow.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation
2012-05-04 15:23 ` Junio C Hamano
@ 2012-05-06 12:31 ` Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Ramsay Jones, Nguyễn Thái Ngọc Duy
From: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
compat/win32/pthread.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..8ad1873 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -86,6 +86,11 @@ static inline int pthread_key_create(pthread_key_t *keyp, void (*destructor)(voi
return (*keyp = TlsAlloc()) == TLS_OUT_OF_INDEXES ? EAGAIN : 0;
}
+static inline int pthread_key_delete(pthread_key_t key)
+{
+ return TlsFree(key) ? 0 : EINVAL;
+}
+
static inline int pthread_setspecific(pthread_key_t key, const void *value)
{
return TlsSetValue(key, (void *)value) ? 0 : EINVAL;
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] index-pack: restructure pack processing into three main functions
2012-05-06 12:31 ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
@ 2012-05-06 12:31 ` Nguyễn Thái Ngọc Duy
2012-05-08 0:19 ` Junio C Hamano
2012-05-06 12:31 ` [PATCH 3/4] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined Nguyễn Thái Ngọc Duy
2 siblings, 1 reply; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
The second pass in parse_pack_objects() are split into
resolve_deltas(). The final phase, fixing thin pack or just seal the
pack, is now in conclude_pack() function. Main pack processing is now
a sequence of these functions:
- parse_pack_objects() reads through the input pack
- resolve_deltas() makes sure all deltas can be resolved
- conclude_pack() seals the output pack
- write_idx_file() writes companion index file
- final() moves the pack/index to proper place
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/index-pack.c | 128 +++++++++++++++++++++++++++++---------------------
1 files changed, 75 insertions(+), 53 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index dd1c5c9..a4be4a6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -682,19 +682,26 @@ static int compare_delta_entry(const void *a, const void *b)
objects[delta_b->obj_no].type);
}
-/* Parse all objects and return the pack content SHA1 hash */
+static void resolve_base(struct object_entry *obj)
+{
+ struct base_data *base_obj = alloc_base_data();
+ base_obj->obj = obj;
+ base_obj->data = NULL;
+ find_unresolved_deltas(base_obj);
+}
+
+/*
+ * First pass:
+ * - find locations of all objects;
+ * - calculate SHA1 of all non-delta objects;
+ * - remember base (SHA1 or offset) for all deltas.
+ */
static void parse_pack_objects(unsigned char *sha1)
{
int i;
struct delta_entry *delta = deltas;
struct stat st;
- /*
- * First pass:
- * - find locations of all objects;
- * - calculate SHA1 of all non-delta objects;
- * - remember base (SHA1 or offset) for all deltas.
- */
if (verbose)
progress = start_progress(
from_stdin ? "Receiving objects" : "Indexing objects",
@@ -728,6 +735,19 @@ static void parse_pack_objects(unsigned char *sha1)
if (S_ISREG(st.st_mode) &&
lseek(input_fd, 0, SEEK_CUR) - input_len != st.st_size)
die("pack has junk at the end");
+}
+
+/*
+ * Second pass:
+ * - for all non-delta objects, look if it is used as a base for
+ * deltas;
+ * - if used as a base, uncompress the object and apply all deltas,
+ * recursively checking if the resulting object is used as a base
+ * for some more deltas.
+ */
+static void resolve_deltas(void)
+{
+ int i;
if (!nr_deltas)
return;
@@ -736,29 +756,63 @@ static void parse_pack_objects(unsigned char *sha1)
qsort(deltas, nr_deltas, sizeof(struct delta_entry),
compare_delta_entry);
- /*
- * Second pass:
- * - for all non-delta objects, look if it is used as a base for
- * deltas;
- * - if used as a base, uncompress the object and apply all deltas,
- * recursively checking if the resulting object is used as a base
- * for some more deltas.
- */
if (verbose)
progress = start_progress("Resolving deltas", nr_deltas);
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
- struct base_data *base_obj = alloc_base_data();
if (is_delta_type(obj->type))
continue;
- base_obj->obj = obj;
- base_obj->data = NULL;
- find_unresolved_deltas(base_obj);
+ resolve_base(obj);
display_progress(progress, nr_resolved_deltas);
}
}
+/*
+ * Third pass:
+ * - append objects to convert thin pack to full pack if required
+ * - write the final 20-byte SHA-1
+ */
+static void fix_unresolved_deltas(struct sha1file *f, int nr_unresolved);
+static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned char *pack_sha1)
+{
+ if (nr_deltas == nr_resolved_deltas) {
+ stop_progress(&progress);
+ /* Flush remaining pack final 20-byte SHA1. */
+ flush();
+ return;
+ }
+
+ if (fix_thin_pack) {
+ struct sha1file *f;
+ unsigned char read_sha1[20], tail_sha1[20];
+ char msg[48];
+ int nr_unresolved = nr_deltas - nr_resolved_deltas;
+ int nr_objects_initial = nr_objects;
+ if (nr_unresolved <= 0)
+ die("confusion beyond insanity");
+ objects = xrealloc(objects,
+ (nr_objects + nr_unresolved + 1)
+ * sizeof(*objects));
+ f = sha1fd(output_fd, curr_pack);
+ fix_unresolved_deltas(f, nr_unresolved);
+ sprintf(msg, "completed with %d local objects",
+ nr_objects - nr_objects_initial);
+ stop_progress_msg(&progress, msg);
+ sha1close(f, tail_sha1, 0);
+ hashcpy(read_sha1, pack_sha1);
+ fixup_pack_header_footer(output_fd, pack_sha1,
+ curr_pack, nr_objects,
+ read_sha1, consumed_bytes-20);
+ if (hashcmp(read_sha1, tail_sha1) != 0)
+ die("Unexpected tail checksum for %s "
+ "(disk corruption?)", curr_pack);
+ }
+ if (nr_deltas != nr_resolved_deltas)
+ die("pack has %d unresolved deltas",
+ nr_deltas - nr_resolved_deltas);
+}
+
static int write_compressed(struct sha1file *f, void *in, unsigned int size)
{
git_zstream stream;
@@ -1196,40 +1250,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
deltas = xcalloc(nr_objects, sizeof(struct delta_entry));
parse_pack_objects(pack_sha1);
- if (nr_deltas == nr_resolved_deltas) {
- stop_progress(&progress);
- /* Flush remaining pack final 20-byte SHA1. */
- flush();
- } else {
- if (fix_thin_pack) {
- struct sha1file *f;
- unsigned char read_sha1[20], tail_sha1[20];
- char msg[48];
- int nr_unresolved = nr_deltas - nr_resolved_deltas;
- int nr_objects_initial = nr_objects;
- if (nr_unresolved <= 0)
- die("confusion beyond insanity");
- objects = xrealloc(objects,
- (nr_objects + nr_unresolved + 1)
- * sizeof(*objects));
- f = sha1fd(output_fd, curr_pack);
- fix_unresolved_deltas(f, nr_unresolved);
- sprintf(msg, "completed with %d local objects",
- nr_objects - nr_objects_initial);
- stop_progress_msg(&progress, msg);
- sha1close(f, tail_sha1, 0);
- hashcpy(read_sha1, pack_sha1);
- fixup_pack_header_footer(output_fd, pack_sha1,
- curr_pack, nr_objects,
- read_sha1, consumed_bytes-20);
- if (hashcmp(read_sha1, tail_sha1) != 0)
- die("Unexpected tail checksum for %s "
- "(disk corruption?)", curr_pack);
- }
- if (nr_deltas != nr_resolved_deltas)
- die("pack has %d unresolved deltas",
- nr_deltas - nr_resolved_deltas);
- }
+ resolve_deltas();
+ conclude_pack(fix_thin_pack, curr_pack, pack_sha1);
free(deltas);
if (strict)
check_objects();
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/4] index-pack: support multithreaded delta resolving
2012-05-06 12:31 ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
@ 2012-05-06 12:31 ` Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined Nguyễn Thái Ngọc Duy
2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.
An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:
1..4
real 0m8.003s 0m5.307s 0m4.321s 0m3.830s
user 0m7.720s 0m8.009s 0m8.133s 0m8.305s
sys 0m0.224s 0m0.372s 0m0.360s 0m0.360s
5..8
real 0m3.727s 0m3.604s 0m3.332s 0m3.369s
user 0m9.361s 0m9.817s 0m9.525s 0m9.769s
sys 0m0.584s 0m0.624s 0m0.540s 0m0.560s
9..12
real 0m3.036s 0m3.139s 0m3.177s 0m2.961s
user 0m8.977s 0m10.205s 0m9.737s 0m10.073s
sys 0m0.596s 0m0.680s 0m0.684s 0m0.680s
13..16
real 0m2.985s 0m2.894s 0m2.975s 0m2.971s
user 0m9.825s 0m10.573s 0m10.833s 0m11.361s
sys 0m0.788s 0m0.732s 0m0.904s 0m1.016s
On an Intel dual core and linux-2.6.git
1..4
real 2m37.789s 2m7.963s 2m0.920s 1m58.213s
user 2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys 0m7.808s 0m11.181s 0m11.224s 0m10.731s
Thanks Ramsay Jones for troubleshooting and support on MinGW platform.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Changes against 'pu':
- nr_processed -> nr_dispatched
- both nr_dispatched threads_active are moved to #ifndef NO_PTHREADS block
- old second_pass() -> resolve_base()
Documentation/git-index-pack.txt | 10 ++
Makefile | 2 +-
builtin/index-pack.c | 204 +++++++++++++++++++++++++++++++++++--
t/perf/p5302-pack-index.sh | 40 ++++++++
4 files changed, 244 insertions(+), 12 deletions(-)
create mode 100755 t/perf/p5302-pack-index.sh
diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 909687f..39e6d0d 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,16 @@ OPTIONS
--strict::
Die, if the pack contains broken objects or links.
+--threads=<n>::
+ Specifies the number of threads to spawn when resolving
+ deltas. This requires that index-pack be compiled with
+ pthreads otherwise this option is ignored with a warning.
+ This is meant to reduce packing time on multiprocessor
+ machines. The required amount of memory for the delta search
+ window is however multiplied by the number of threads.
+ Specifying 0 will cause git to auto-detect the number of CPU's
+ and use maximum 3 threads.
+
Note
----
diff --git a/Makefile b/Makefile
index cf2c40b..e41955f 100644
--- a/Makefile
+++ b/Makefile
@@ -2160,7 +2160,7 @@ builtin/branch.o builtin/checkout.o builtin/clone.o builtin/reset.o branch.o tra
builtin/bundle.o bundle.o transport.o: bundle.h
builtin/bisect--helper.o builtin/rev-list.o bisect.o: bisect.h
builtin/clone.o builtin/fetch-pack.o transport.o: fetch-pack.h
-builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
+builtin/index-pack.o builtin/grep.o builtin/pack-objects.o transport-helper.o thread-utils.o: thread-utils.h
builtin/send-pack.o transport.o: send-pack.h
builtin/log.o builtin/shortlog.o: shortlog.h
builtin/prune.o builtin/reflog.o reachable.o: reachable.h
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a4be4a6..d4685c5 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -9,6 +9,7 @@
#include "progress.h"
#include "fsck.h"
#include "exec_cmd.h"
+#include "thread-utils.h"
static const char index_pack_usage[] =
"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
@@ -38,6 +39,14 @@ struct base_data {
int ofs_first, ofs_last;
};
+struct thread_local {
+#ifndef NO_PTHREADS
+ pthread_t thread;
+#endif
+ struct base_data *base_cache;
+ size_t base_cache_used;
+};
+
/*
* Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
* to memcmp() only the first 20 bytes.
@@ -54,11 +63,11 @@ struct delta_entry {
static struct object_entry *objects;
static struct delta_entry *deltas;
-static struct base_data *base_cache;
-static size_t base_cache_used;
+static struct thread_local nothread_data;
static int nr_objects;
static int nr_deltas;
static int nr_resolved_deltas;
+static int nr_threads;
static int from_stdin;
static int strict;
@@ -75,6 +84,77 @@ static git_SHA_CTX input_ctx;
static uint32_t input_crc32;
static int input_fd, output_fd, pack_fd;
+#ifndef NO_PTHREADS
+
+static struct thread_local *thread_data;
+static int nr_dispatched;
+static int threads_active;
+
+static pthread_mutex_t read_mutex;
+#define read_lock() lock_mutex(&read_mutex)
+#define read_unlock() unlock_mutex(&read_mutex)
+
+static pthread_mutex_t counter_mutex;
+#define counter_lock() lock_mutex(&counter_mutex)
+#define counter_unlock() unlock_mutex(&counter_mutex)
+
+static pthread_mutex_t work_mutex;
+#define work_lock() lock_mutex(&work_mutex)
+#define work_unlock() unlock_mutex(&work_mutex)
+
+static pthread_key_t key;
+
+static inline void lock_mutex(pthread_mutex_t *mutex)
+{
+ if (threads_active)
+ pthread_mutex_lock(mutex);
+}
+
+static inline void unlock_mutex(pthread_mutex_t *mutex)
+{
+ if (threads_active)
+ pthread_mutex_unlock(mutex);
+}
+
+/*
+ * Mutex and conditional variable can't be statically-initialized on Windows.
+ */
+static void init_thread(void)
+{
+ init_recursive_mutex(&read_mutex);
+ pthread_mutex_init(&counter_mutex, NULL);
+ pthread_mutex_init(&work_mutex, NULL);
+ pthread_key_create(&key, NULL);
+ thread_data = xcalloc(nr_threads, sizeof(*thread_data));
+ threads_active = 1;
+}
+
+static void cleanup_thread(void)
+{
+ if (!threads_active)
+ return;
+ threads_active = 0;
+ pthread_mutex_destroy(&read_mutex);
+ pthread_mutex_destroy(&counter_mutex);
+ pthread_mutex_destroy(&work_mutex);
+ pthread_key_delete(key);
+ free(thread_data);
+}
+
+#else
+
+#define read_lock()
+#define read_unlock()
+
+#define counter_lock()
+#define counter_unlock()
+
+#define work_lock()
+#define work_unlock()
+
+#endif
+
+
static int mark_link(struct object *obj, int type, void *data)
{
if (!obj)
@@ -223,6 +303,25 @@ static NORETURN void bad_object(unsigned long offset, const char *format, ...)
die("pack has bad object at offset %lu: %s", offset, buf);
}
+static inline struct thread_local *get_thread_data(void)
+{
+#ifndef NO_PTHREADS
+ if (threads_active)
+ return pthread_getspecific(key);
+ assert(!threads_active &&
+ "This should only be reached when all threads are gone");
+#endif
+ return ¬hread_data;
+}
+
+#ifndef NO_PTHREADS
+static void set_thread_data(struct thread_local *data)
+{
+ if (threads_active)
+ pthread_setspecific(key, data);
+}
+#endif
+
static struct base_data *alloc_base_data(void)
{
struct base_data *base = xmalloc(sizeof(struct base_data));
@@ -237,15 +336,16 @@ static void free_base_data(struct base_data *c)
if (c->data) {
free(c->data);
c->data = NULL;
- base_cache_used -= c->size;
+ get_thread_data()->base_cache_used -= c->size;
}
}
static void prune_base_data(struct base_data *retain)
{
struct base_data *b;
- for (b = base_cache;
- base_cache_used > delta_base_cache_limit && b;
+ struct thread_local *data = get_thread_data();
+ for (b = data->base_cache;
+ data->base_cache_used > delta_base_cache_limit && b;
b = b->child) {
if (b->data && b != retain)
free_base_data(b);
@@ -257,12 +357,12 @@ static void link_base_data(struct base_data *base, struct base_data *c)
if (base)
base->child = c;
else
- base_cache = c;
+ get_thread_data()->base_cache = c;
c->base = base;
c->child = NULL;
if (c->data)
- base_cache_used += c->size;
+ get_thread_data()->base_cache_used += c->size;
prune_base_data(c);
}
@@ -272,7 +372,7 @@ static void unlink_base_data(struct base_data *c)
if (base)
base->child = NULL;
else
- base_cache = NULL;
+ get_thread_data()->base_cache = NULL;
free_base_data(c);
}
@@ -461,19 +561,24 @@ static void sha1_object(const void *data, unsigned long size,
enum object_type type, unsigned char *sha1)
{
hash_sha1_file(data, size, typename(type), sha1);
+ read_lock();
if (has_sha1_file(sha1)) {
void *has_data;
enum object_type has_type;
unsigned long has_size;
has_data = read_sha1_file(sha1, &has_type, &has_size);
+ read_unlock();
if (!has_data)
die("cannot read existing object %s", sha1_to_hex(sha1));
if (size != has_size || type != has_type ||
memcmp(data, has_data, size) != 0)
die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
free(has_data);
- }
+ } else
+ read_unlock();
+
if (strict) {
+ read_lock();
if (type == OBJ_BLOB) {
struct blob *blob = lookup_blob(sha1);
if (blob)
@@ -507,6 +612,7 @@ static void sha1_object(const void *data, unsigned long size,
}
obj->flags |= FLAG_CHECKED;
}
+ read_unlock();
}
}
@@ -552,7 +658,7 @@ static void *get_base_data(struct base_data *c)
if (!delta_nr) {
c->data = get_data_from_pack(obj);
c->size = obj->size;
- base_cache_used += c->size;
+ get_thread_data()->base_cache_used += c->size;
prune_base_data(c);
}
for (; delta_nr > 0; delta_nr--) {
@@ -568,7 +674,7 @@ static void *get_base_data(struct base_data *c)
free(raw);
if (!c->data)
bad_object(obj->idx.offset, "failed to apply delta");
- base_cache_used += c->size;
+ get_thread_data()->base_cache_used += c->size;
prune_base_data(c);
}
free(delta);
@@ -596,7 +702,9 @@ static void resolve_delta(struct object_entry *delta_obj,
bad_object(delta_obj->idx.offset, "failed to apply delta");
sha1_object(result->data, result->size, delta_obj->real_type,
delta_obj->idx.sha1);
+ counter_lock();
nr_resolved_deltas++;
+ counter_unlock();
}
static struct base_data *find_unresolved_deltas_1(struct base_data *base,
@@ -690,6 +798,30 @@ static void resolve_base(struct object_entry *obj)
find_unresolved_deltas(base_obj);
}
+#ifndef NO_PTHREADS
+static void *threaded_second_pass(void *data)
+{
+ set_thread_data(data);
+ for (;;) {
+ int i;
+ work_lock();
+ display_progress(progress, nr_resolved_deltas);
+ while (nr_dispatched < nr_objects &&
+ is_delta_type(objects[nr_dispatched].type))
+ nr_dispatched++;
+ if (nr_dispatched >= nr_objects) {
+ work_unlock();
+ break;
+ }
+ i = nr_dispatched++;
+ work_unlock();
+
+ resolve_base(&objects[i]);
+ }
+ return NULL;
+}
+#endif
+
/*
* First pass:
* - find locations of all objects;
@@ -758,6 +890,24 @@ static void resolve_deltas(void)
if (verbose)
progress = start_progress("Resolving deltas", nr_deltas);
+
+#ifndef NO_PTHREADS
+ nr_dispatched = 0;
+ if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
+ init_thread();
+ for (i = 0; i < nr_threads; i++) {
+ int ret = pthread_create(&thread_data[i].thread, NULL,
+ threaded_second_pass, thread_data + i);
+ if (ret)
+ die("unable to create thread: %s", strerror(ret));
+ }
+ for (i = 0; i < nr_threads; i++)
+ pthread_join(thread_data[i].thread, NULL);
+ cleanup_thread();
+ return;
+ }
+#endif
+
for (i = 0; i < nr_objects; i++) {
struct object_entry *obj = &objects[i];
@@ -1016,6 +1166,18 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
die("bad pack.indexversion=%"PRIu32, opts->version);
return 0;
}
+ if (!strcmp(k, "pack.threads")) {
+ nr_threads = git_config_int(k, v);
+ if (nr_threads < 0)
+ die("invalid number of threads specified (%d)",
+ nr_threads);
+#ifdef NO_PTHREADS
+ if (nr_threads != 1)
+ warning("no threads support, ignoring %s", k);
+ nr_threads = 1;
+#endif
+ return 0;
+ }
return git_default_config(k, v, cb);
}
@@ -1174,6 +1336,17 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
keep_msg = "";
} else if (!prefixcmp(arg, "--keep=")) {
keep_msg = arg + 7;
+ } else if (!prefixcmp(arg, "--threads=")) {
+ char *end;
+ nr_threads = strtoul(arg+10, &end, 0);
+ if (!arg[10] || *end || nr_threads < 0)
+ usage(index_pack_usage);
+#ifdef NO_PTHREADS
+ if (nr_threads != 1)
+ warning("no threads support, "
+ "ignoring %s", arg);
+ nr_threads = 1;
+#endif
} else if (!prefixcmp(arg, "--pack_header=")) {
struct pack_header *hdr;
char *c;
@@ -1245,6 +1418,15 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
if (strict)
opts.flags |= WRITE_IDX_STRICT;
+#ifndef NO_PTHREADS
+ if (!nr_threads) {
+ nr_threads = online_cpus();
+ /* An experiment showed that more threads does not mean faster */
+ if (nr_threads > 3)
+ nr_threads = 3;
+ }
+#endif
+
curr_pack = open_pack_file(pack_name);
parse_pack_header();
objects = xcalloc(nr_objects + 1, sizeof(struct object_entry));
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
new file mode 100755
index 0000000..6cb5b0d
--- /dev/null
+++ b/t/perf/p5302-pack-index.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description="Tests index-pack performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'repack' '
+ git repack -ad &&
+ PACK=`ls .git/objects/pack/*.pack | head -n1` &&
+ test -f "$PACK" &&
+ export PACK
+'
+
+test_perf 'index-pack 0 threads' '
+ GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 1 thread ' '
+ GIT_DIR=t2 GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
+'
+
+test_perf 'index-pack 2 threads' '
+ GIT_DIR=t3 git index-pack --threads=2 --stdin < $PACK
+'
+
+test_perf 'index-pack 4 threads' '
+ GIT_DIR=t4 git index-pack --threads=4 --stdin < $PACK
+'
+
+test_perf 'index-pack 8 threads' '
+ GIT_DIR=t5 git index-pack --threads=8 --stdin < $PACK
+'
+
+test_perf 'index-pack default number of threads' '
+ GIT_DIR=t6 git index-pack --stdin < $PACK
+'
+
+test_done
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined
2012-05-06 12:31 ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 3/4] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
@ 2012-05-06 12:31 ` Nguyễn Thái Ngọc Duy
2 siblings, 0 replies; 13+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-05-06 12:31 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
NO_PREAD simulates pread() as a sequence of seek, read, seek in
compat/pread.c. The simulation is not thread-safe because another
thread could move the file offset away in the middle of pread
operation. Do not allow threading in that case.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/index-pack.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d4685c5..807ee56 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -39,6 +39,11 @@ struct base_data {
int ofs_first, ofs_last;
};
+#if !defined(NO_PTHREADS) && defined(NO_PREAD)
+/* NO_PREAD uses compat/pread.c, which is not thread-safe. Disable threading. */
+#define NO_PTHREADS
+#endif
+
struct thread_local {
#ifndef NO_PTHREADS
pthread_t thread;
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] index-pack: restructure pack processing into three main functions
2012-05-06 12:31 ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
@ 2012-05-08 0:19 ` Junio C Hamano
0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2012-05-08 0:19 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> The second pass in parse_pack_objects() are split into
> resolve_deltas(). The final phase, fixing thin pack or just seal the
> pack, is now in conclude_pack() function. Main pack processing is now a
> sequence of these functions:
>
> - parse_pack_objects() reads through the input pack
> - resolve_deltas() makes sure all deltas can be resolved
> - conclude_pack() seals the output pack
> - write_idx_file() writes companion index file
> - final() moves the pack/index to proper place
The resulting code looks much more streamlined, instead of just splitting
out the second phase alone.
Very nice. Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-05-08 0:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11 5:49 [PATCH v3 0/3] Multithread index-pack Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 1/3] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 2/3] index-pack: split second pass obj handling into own function Nguyễn Thái Ngọc Duy
2012-04-11 5:49 ` [PATCH v3 3/3] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-05-03 22:10 ` Junio C Hamano
2012-05-04 6:21 ` Junio C Hamano
2012-05-04 12:50 ` Nguyen Thai Ngoc Duy
2012-05-04 15:23 ` Junio C Hamano
2012-05-06 12:31 ` [PATCH 1/4] compat/win32/pthread.h: Add an pthread_key_delete() implementation Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 2/4] index-pack: restructure pack processing into three main functions Nguyễn Thái Ngọc Duy
2012-05-08 0:19 ` Junio C Hamano
2012-05-06 12:31 ` [PATCH 3/4] index-pack: support multithreaded delta resolving Nguyễn Thái Ngọc Duy
2012-05-06 12:31 ` [PATCH 4/4] index-pack: disable threading if NO_PREAD is defined Nguyễn Thái Ngọc Duy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).