* [PATCH 1/9] fsck: do not reuse child_process structs
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
@ 2018-11-12 14:46 ` Jeff King
2018-11-12 15:26 ` Derrick Stolee
2018-11-12 14:47 ` [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with() Jeff King
` (8 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:46 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
The run-command API makes no promises about what is left in a struct
child_process after a command finishes, and it's not safe to simply
reuse it again for a similar command. In particular:
- if you use child->args or child->env_array, they are cleared after
finish_command()
- likewise, start_command() may point child->argv at child->args->argv;
reusing that would lead to accessing freed memory
- the in/out/err may hold pipe descriptors from the previous run
These two calls are _probably_ OK because they do not use any of those
features. But it's only by chance, and may break in the future; let's
reinitialize our struct for each program we run.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/fsck.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 06eb421720..b10f2b154c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -841,6 +841,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
prepare_alt_odb(the_repository);
for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+ child_process_init(&commit_graph_verify);
+ commit_graph_verify.argv = verify_argv;
+ commit_graph_verify.git_cmd = 1;
verify_argv[2] = "--object-dir";
verify_argv[3] = alt->path;
if (run_command(&commit_graph_verify))
@@ -859,6 +862,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
prepare_alt_odb(the_repository);
for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+ child_process_init(&midx_verify);
+ midx_verify.argv = midx_argv;
+ midx_verify.git_cmd = 1;
midx_argv[2] = "--object-dir";
midx_argv[3] = alt->path;
if (run_command(&midx_verify))
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 1/9] fsck: do not reuse child_process structs
2018-11-12 14:46 ` [PATCH 1/9] fsck: do not reuse child_process structs Jeff King
@ 2018-11-12 15:26 ` Derrick Stolee
0 siblings, 0 replies; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 15:26 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On 11/12/2018 9:46 AM, Jeff King wrote:
> The run-command API makes no promises about what is left in a struct
> child_process after a command finishes, and it's not safe to simply
> reuse it again for a similar command. In particular:
>
> - if you use child->args or child->env_array, they are cleared after
> finish_command()
>
> - likewise, start_command() may point child->argv at child->args->argv;
> reusing that would lead to accessing freed memory
>
> - the in/out/err may hold pipe descriptors from the previous run
Thanks! This is helpful information.
> These two calls are _probably_ OK because they do not use any of those
> features. But it's only by chance, and may break in the future; let's
> reinitialize our struct for each program we run.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> builtin/fsck.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 06eb421720..b10f2b154c 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -841,6 +841,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>
> prepare_alt_odb(the_repository);
> for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
> + child_process_init(&commit_graph_verify);
> + commit_graph_verify.argv = verify_argv;
> + commit_graph_verify.git_cmd = 1;
> verify_argv[2] = "--object-dir";
> verify_argv[3] = alt->path;
> if (run_command(&commit_graph_verify))
> @@ -859,6 +862,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>
> prepare_alt_odb(the_repository);
> for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
> + child_process_init(&midx_verify);
> + midx_verify.argv = midx_argv;
> + midx_verify.git_cmd = 1;
> midx_argv[2] = "--object-dir";
> midx_argv[3] = alt->path;
> if (run_command(&midx_verify))
Looks good to me.
-Stolee
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with()
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:46 ` [PATCH 1/9] fsck: do not reuse child_process structs Jeff King
@ 2018-11-12 14:47 ` Jeff King
2018-11-12 18:23 ` Stefan Beller
2018-11-12 14:48 ` [PATCH 3/9] rename "alternate_object_database" to "object_directory" Jeff King
` (7 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:47 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
Using strip_suffix() lets us avoid repeating ourselves. It also makes
the handling of "/" a bit less subtle (we strip one less character than
we matched in order to leave it in place, but we can just as easily
include the "/" when we add more path components).
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/submodule--helper.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 676175b9be..28b9449e82 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1268,16 +1268,17 @@ static int add_possible_reference_from_superproject(
struct alternate_object_database *alt, void *sas_cb)
{
struct submodule_alternate_setup *sas = sas_cb;
+ size_t len;
/*
* If the alternate object store is another repository, try the
* standard layout with .git/(modules/<name>)+/objects
*/
- if (ends_with(alt->path, "/objects")) {
+ if (strip_suffix(alt->path, "/objects", &len)) {
char *sm_alternate;
struct strbuf sb = STRBUF_INIT;
struct strbuf err = STRBUF_INIT;
- strbuf_add(&sb, alt->path, strlen(alt->path) - strlen("objects"));
+ strbuf_add(&sb, alt->path, len);
/*
* We need to end the new path with '/' to mark it as a dir,
@@ -1285,7 +1286,7 @@ static int add_possible_reference_from_superproject(
* as the last part of a missing submodule reference would
* be taken as a file name.
*/
- strbuf_addf(&sb, "modules/%s/", sas->submodule_name);
+ strbuf_addf(&sb, "/modules/%s/", sas->submodule_name);
sm_alternate = compute_alternate_path(sb.buf, &err);
if (sm_alternate) {
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with()
2018-11-12 14:47 ` [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with() Jeff King
@ 2018-11-12 18:23 ` Stefan Beller
0 siblings, 0 replies; 99+ messages in thread
From: Stefan Beller @ 2018-11-12 18:23 UTC (permalink / raw)
To: Jeff King
Cc: gerardu, Ævar Arnfjörð Bjarmason, Junio C Hamano,
git, René Scharfe, tikuta
On Mon, Nov 12, 2018 at 6:47 AM Jeff King <peff@peff.net> wrote:
>
> Using strip_suffix() lets us avoid repeating ourselves. It also makes
> the handling of "/" a bit less subtle (we strip one less character than
> we matched in order to leave it in place, but we can just as easily
> include the "/" when we add more path components).
>
> Signed-off-by: Jeff King <peff@peff.net>
This makes sense. Thanks!
(This patch caught my attention as it's a submodule thing,
but now looking at the rest of the series)
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 3/9] rename "alternate_object_database" to "object_directory"
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:46 ` [PATCH 1/9] fsck: do not reuse child_process structs Jeff King
2018-11-12 14:47 ` [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with() Jeff King
@ 2018-11-12 14:48 ` Jeff King
2018-11-12 15:30 ` Derrick Stolee
2018-11-12 14:48 ` [PATCH 4/9] sha1_file_name(): overwrite buffer instead of appending Jeff King
` (6 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:48 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
In preparation for unifying the handling of alt odb's and the normal
repo object directory, let's use a more neutral name. This patch is
purely mechanical, swapping the type name, and converting any variables
named "alt" to "odb". There should be no functional change, but it will
reduce the noise in subsequent diffs.
Signed-off-by: Jeff King <peff@peff.net>
---
I waffled on calling this object_database instead of object_directory.
But really, it is very specifically about the directory (packed
storage, including packs from alternates, is handled elsewhere).
builtin/count-objects.c | 4 ++--
builtin/fsck.c | 16 ++++++-------
builtin/submodule--helper.c | 6 ++---
commit-graph.c | 10 ++++----
object-store.h | 14 +++++------
object.c | 10 ++++----
packfile.c | 8 +++----
sha1-file.c | 48 ++++++++++++++++++-------------------
sha1-name.c | 20 ++++++++--------
transport.c | 2 +-
10 files changed, 69 insertions(+), 69 deletions(-)
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index a7cad052c6..3fae474f6f 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -78,10 +78,10 @@ static int count_cruft(const char *basename, const char *path, void *data)
return 0;
}
-static int print_alternate(struct alternate_object_database *alt, void *data)
+static int print_alternate(struct object_directory *odb, void *data)
{
printf("alternate: ");
- quote_c_style(alt->path, NULL, stdout, 0);
+ quote_c_style(odb->path, NULL, stdout, 0);
putchar('\n');
return 0;
}
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b10f2b154c..55153cf92a 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -688,7 +688,7 @@ static struct option fsck_opts[] = {
int cmd_fsck(int argc, const char **argv, const char *prefix)
{
int i;
- struct alternate_object_database *alt;
+ struct object_directory *odb;
/* fsck knows how to handle missing promisor objects */
fetch_if_missing = 0;
@@ -725,14 +725,14 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
} else {
- struct alternate_object_database *alt_odb_list;
+ struct object_directory *alt_odb_list;
fsck_object_dir(get_object_directory());
prepare_alt_odb(the_repository);
alt_odb_list = the_repository->objects->alt_odb_list;
- for (alt = alt_odb_list; alt; alt = alt->next)
- fsck_object_dir(alt->path);
+ for (odb = alt_odb_list; odb; odb = odb->next)
+ fsck_object_dir(odb->path);
if (check_full) {
struct packed_git *p;
@@ -840,12 +840,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
errors_found |= ERROR_COMMIT_GRAPH;
prepare_alt_odb(the_repository);
- for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+ for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
child_process_init(&commit_graph_verify);
commit_graph_verify.argv = verify_argv;
commit_graph_verify.git_cmd = 1;
verify_argv[2] = "--object-dir";
- verify_argv[3] = alt->path;
+ verify_argv[3] = odb->path;
if (run_command(&commit_graph_verify))
errors_found |= ERROR_COMMIT_GRAPH;
}
@@ -861,12 +861,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
errors_found |= ERROR_COMMIT_GRAPH;
prepare_alt_odb(the_repository);
- for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+ for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
child_process_init(&midx_verify);
midx_verify.argv = midx_argv;
midx_verify.git_cmd = 1;
midx_argv[2] = "--object-dir";
- midx_argv[3] = alt->path;
+ midx_argv[3] = odb->path;
if (run_command(&midx_verify))
errors_found |= ERROR_COMMIT_GRAPH;
}
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 28b9449e82..3ae451bc46 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1265,7 +1265,7 @@ struct submodule_alternate_setup {
SUBMODULE_ALTERNATE_ERROR_IGNORE, NULL }
static int add_possible_reference_from_superproject(
- struct alternate_object_database *alt, void *sas_cb)
+ struct object_directory *odb, void *sas_cb)
{
struct submodule_alternate_setup *sas = sas_cb;
size_t len;
@@ -1274,11 +1274,11 @@ static int add_possible_reference_from_superproject(
* If the alternate object store is another repository, try the
* standard layout with .git/(modules/<name>)+/objects
*/
- if (strip_suffix(alt->path, "/objects", &len)) {
+ if (strip_suffix(odb->path, "/objects", &len)) {
char *sm_alternate;
struct strbuf sb = STRBUF_INIT;
struct strbuf err = STRBUF_INIT;
- strbuf_add(&sb, alt->path, len);
+ strbuf_add(&sb, odb->path, len);
/*
* We need to end the new path with '/' to mark it as a dir,
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..5dd3f5b15c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -230,7 +230,7 @@ static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
*/
static int prepare_commit_graph(struct repository *r)
{
- struct alternate_object_database *alt;
+ struct object_directory *odb;
char *obj_dir;
int config_value;
@@ -255,10 +255,10 @@ static int prepare_commit_graph(struct repository *r)
obj_dir = r->objects->objectdir;
prepare_commit_graph_one(r, obj_dir);
prepare_alt_odb(r);
- for (alt = r->objects->alt_odb_list;
- !r->objects->commit_graph && alt;
- alt = alt->next)
- prepare_commit_graph_one(r, alt->path);
+ for (odb = r->objects->alt_odb_list;
+ !r->objects->commit_graph && odb;
+ odb = odb->next)
+ prepare_commit_graph_one(r, odb->path);
return !!r->objects->commit_graph;
}
diff --git a/object-store.h b/object-store.h
index 63b7605a3e..122d5f75e2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -7,8 +7,8 @@
#include "sha1-array.h"
#include "strbuf.h"
-struct alternate_object_database {
- struct alternate_object_database *next;
+struct object_directory {
+ struct object_directory *next;
/* see alt_scratch_buf() */
struct strbuf scratch;
@@ -32,14 +32,14 @@ struct alternate_object_database {
};
void prepare_alt_odb(struct repository *r);
char *compute_alternate_path(const char *path, struct strbuf *err);
-typedef int alt_odb_fn(struct alternate_object_database *, void *);
+typedef int alt_odb_fn(struct object_directory *, void *);
int foreach_alt_odb(alt_odb_fn, void*);
/*
* Allocate a "struct alternate_object_database" but do _not_ actually
* add it to the list of alternates.
*/
-struct alternate_object_database *alloc_alt_odb(const char *dir);
+struct object_directory *alloc_alt_odb(const char *dir);
/*
* Add the directory to the on-disk alternates file; the new entry will also
@@ -60,7 +60,7 @@ void add_to_alternates_memory(const char *dir);
* alternate. Always use this over direct access to alt->scratch, as it
* cleans up any previous use of the scratch buffer.
*/
-struct strbuf *alt_scratch_buf(struct alternate_object_database *alt);
+struct strbuf *alt_scratch_buf(struct object_directory *odb);
struct packed_git {
struct packed_git *next;
@@ -100,8 +100,8 @@ struct raw_object_store {
/* Path to extra alternate object database if not NULL */
char *alternate_db;
- struct alternate_object_database *alt_odb_list;
- struct alternate_object_database **alt_odb_tail;
+ struct object_directory *alt_odb_list;
+ struct object_directory **alt_odb_tail;
/*
* Objects that should be substituted by other objects
diff --git a/object.c b/object.c
index e54160550c..6af8e908bb 100644
--- a/object.c
+++ b/object.c
@@ -482,17 +482,17 @@ struct raw_object_store *raw_object_store_new(void)
return o;
}
-static void free_alt_odb(struct alternate_object_database *alt)
+static void free_alt_odb(struct object_directory *odb)
{
- strbuf_release(&alt->scratch);
- oid_array_clear(&alt->loose_objects_cache);
- free(alt);
+ strbuf_release(&odb->scratch);
+ oid_array_clear(&odb->loose_objects_cache);
+ free(odb);
}
static void free_alt_odbs(struct raw_object_store *o)
{
while (o->alt_odb_list) {
- struct alternate_object_database *next;
+ struct object_directory *next;
next = o->alt_odb_list->next;
free_alt_odb(o->alt_odb_list);
diff --git a/packfile.c b/packfile.c
index f2850a00b5..d6d511cfd2 100644
--- a/packfile.c
+++ b/packfile.c
@@ -966,16 +966,16 @@ static void prepare_packed_git_mru(struct repository *r)
static void prepare_packed_git(struct repository *r)
{
- struct alternate_object_database *alt;
+ struct object_directory *odb;
if (r->objects->packed_git_initialized)
return;
prepare_multi_pack_index_one(r, r->objects->objectdir, 1);
prepare_packed_git_one(r, r->objects->objectdir, 1);
prepare_alt_odb(r);
- for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
- prepare_multi_pack_index_one(r, alt->path, 0);
- prepare_packed_git_one(r, alt->path, 0);
+ for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
+ prepare_multi_pack_index_one(r, odb->path, 0);
+ prepare_packed_git_one(r, odb->path, 0);
}
rearrange_packed_git(r);
diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..a3cc650a0a 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -353,16 +353,16 @@ void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned cha
fill_sha1_path(buf, sha1);
}
-struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
+struct strbuf *alt_scratch_buf(struct object_directory *odb)
{
- strbuf_setlen(&alt->scratch, alt->base_len);
- return &alt->scratch;
+ strbuf_setlen(&odb->scratch, odb->base_len);
+ return &odb->scratch;
}
-static const char *alt_sha1_path(struct alternate_object_database *alt,
+static const char *alt_sha1_path(struct object_directory *odb,
const unsigned char *sha1)
{
- struct strbuf *buf = alt_scratch_buf(alt);
+ struct strbuf *buf = alt_scratch_buf(odb);
fill_sha1_path(buf, sha1);
return buf->buf;
}
@@ -374,7 +374,7 @@ static int alt_odb_usable(struct raw_object_store *o,
struct strbuf *path,
const char *normalized_objdir)
{
- struct alternate_object_database *alt;
+ struct object_directory *odb;
/* Detect cases where alternate disappeared */
if (!is_directory(path->buf)) {
@@ -388,8 +388,8 @@ static int alt_odb_usable(struct raw_object_store *o,
* Prevent the common mistake of listing the same
* thing twice, or object directory itself.
*/
- for (alt = o->alt_odb_list; alt; alt = alt->next) {
- if (!fspathcmp(path->buf, alt->path))
+ for (odb = o->alt_odb_list; odb; odb = odb->next) {
+ if (!fspathcmp(path->buf, odb->path))
return 0;
}
if (!fspathcmp(path->buf, normalized_objdir))
@@ -402,7 +402,7 @@ static int alt_odb_usable(struct raw_object_store *o,
* Prepare alternate object database registry.
*
* The variable alt_odb_list points at the list of struct
- * alternate_object_database. The elements on this list come from
+ * object_directory. The elements on this list come from
* non-empty elements from colon separated ALTERNATE_DB_ENVIRONMENT
* environment variable, and $GIT_OBJECT_DIRECTORY/info/alternates,
* whose contents is similar to that environment variable but can be
@@ -419,7 +419,7 @@ static void read_info_alternates(struct repository *r,
static int link_alt_odb_entry(struct repository *r, const char *entry,
const char *relative_base, int depth, const char *normalized_objdir)
{
- struct alternate_object_database *ent;
+ struct object_directory *ent;
struct strbuf pathbuf = STRBUF_INIT;
if (!is_absolute_path(entry) && relative_base) {
@@ -540,9 +540,9 @@ static void read_info_alternates(struct repository *r,
free(path);
}
-struct alternate_object_database *alloc_alt_odb(const char *dir)
+struct object_directory *alloc_alt_odb(const char *dir)
{
- struct alternate_object_database *ent;
+ struct object_directory *ent;
FLEX_ALLOC_STR(ent, path, dir);
strbuf_init(&ent->scratch, 0);
@@ -684,7 +684,7 @@ char *compute_alternate_path(const char *path, struct strbuf *err)
int foreach_alt_odb(alt_odb_fn fn, void *cb)
{
- struct alternate_object_database *ent;
+ struct object_directory *ent;
int r = 0;
prepare_alt_odb(the_repository);
@@ -743,10 +743,10 @@ static int check_and_freshen_local(const struct object_id *oid, int freshen)
static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
{
- struct alternate_object_database *alt;
+ struct object_directory *odb;
prepare_alt_odb(the_repository);
- for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
- const char *path = alt_sha1_path(alt, oid->hash);
+ for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
+ const char *path = alt_sha1_path(odb, oid->hash);
if (check_and_freshen_file(path, freshen))
return 1;
}
@@ -893,7 +893,7 @@ int git_open_cloexec(const char *name, int flags)
static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
struct stat *st, const char **path)
{
- struct alternate_object_database *alt;
+ struct object_directory *odb;
static struct strbuf buf = STRBUF_INIT;
strbuf_reset(&buf);
@@ -905,8 +905,8 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
prepare_alt_odb(r);
errno = ENOENT;
- for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
- *path = alt_sha1_path(alt, sha1);
+ for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
+ *path = alt_sha1_path(odb, sha1);
if (!lstat(*path, st))
return 0;
}
@@ -922,7 +922,7 @@ static int open_sha1_file(struct repository *r,
const unsigned char *sha1, const char **path)
{
int fd;
- struct alternate_object_database *alt;
+ struct object_directory *odb;
int most_interesting_errno;
static struct strbuf buf = STRBUF_INIT;
@@ -936,8 +936,8 @@ static int open_sha1_file(struct repository *r,
most_interesting_errno = errno;
prepare_alt_odb(r);
- for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
- *path = alt_sha1_path(alt, sha1);
+ for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
+ *path = alt_sha1_path(odb, sha1);
fd = git_open(*path);
if (fd >= 0)
return fd;
@@ -2139,14 +2139,14 @@ struct loose_alt_odb_data {
void *data;
};
-static int loose_from_alt_odb(struct alternate_object_database *alt,
+static int loose_from_alt_odb(struct object_directory *odb,
void *vdata)
{
struct loose_alt_odb_data *data = vdata;
struct strbuf buf = STRBUF_INIT;
int r;
- strbuf_addstr(&buf, alt->path);
+ strbuf_addstr(&buf, odb->path);
r = for_each_loose_file_in_objdir_buf(&buf,
data->cb, NULL, NULL,
data->data);
diff --git a/sha1-name.c b/sha1-name.c
index faa60f69e3..2594aa79f8 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -95,8 +95,8 @@ static int match_sha(unsigned, const unsigned char *, const unsigned char *);
static void find_short_object_filename(struct disambiguate_state *ds)
{
int subdir_nr = ds->bin_pfx.hash[0];
- struct alternate_object_database *alt;
- static struct alternate_object_database *fakeent;
+ struct object_directory *odb;
+ static struct object_directory *fakeent;
if (!fakeent) {
/*
@@ -110,24 +110,24 @@ static void find_short_object_filename(struct disambiguate_state *ds)
}
fakeent->next = the_repository->objects->alt_odb_list;
- for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
+ for (odb = fakeent; odb && !ds->ambiguous; odb = odb->next) {
int pos;
- if (!alt->loose_objects_subdir_seen[subdir_nr]) {
- struct strbuf *buf = alt_scratch_buf(alt);
+ if (!odb->loose_objects_subdir_seen[subdir_nr]) {
+ struct strbuf *buf = alt_scratch_buf(odb);
for_each_file_in_obj_subdir(subdir_nr, buf,
append_loose_object,
NULL, NULL,
- &alt->loose_objects_cache);
- alt->loose_objects_subdir_seen[subdir_nr] = 1;
+ &odb->loose_objects_cache);
+ odb->loose_objects_subdir_seen[subdir_nr] = 1;
}
- pos = oid_array_lookup(&alt->loose_objects_cache, &ds->bin_pfx);
+ pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx);
if (pos < 0)
pos = -1 - pos;
- while (!ds->ambiguous && pos < alt->loose_objects_cache.nr) {
+ while (!ds->ambiguous && pos < odb->loose_objects_cache.nr) {
const struct object_id *oid;
- oid = alt->loose_objects_cache.oid + pos;
+ oid = odb->loose_objects_cache.oid + pos;
if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash))
break;
update_candidates(ds, oid);
diff --git a/transport.c b/transport.c
index 5a74b609ff..040e92c134 100644
--- a/transport.c
+++ b/transport.c
@@ -1433,7 +1433,7 @@ struct alternate_refs_data {
void *data;
};
-static int refs_from_alternate_cb(struct alternate_object_database *e,
+static int refs_from_alternate_cb(struct object_directory *e,
void *data)
{
struct strbuf path = STRBUF_INIT;
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 3/9] rename "alternate_object_database" to "object_directory"
2018-11-12 14:48 ` [PATCH 3/9] rename "alternate_object_database" to "object_directory" Jeff King
@ 2018-11-12 15:30 ` Derrick Stolee
2018-11-12 15:36 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 15:30 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On 11/12/2018 9:48 AM, Jeff King wrote:
> In preparation for unifying the handling of alt odb's and the normal
> repo object directory, let's use a more neutral name. This patch is
> purely mechanical, swapping the type name, and converting any variables
> named "alt" to "odb". There should be no functional change, but it will
> reduce the noise in subsequent diffs.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I waffled on calling this object_database instead of object_directory.
> But really, it is very specifically about the directory (packed
> storage, including packs from alternates, is handled elsewhere).
That makes sense. Each alternate makes its own object directory, but is
part of a larger object database. It also helps clarify a difference
from the object_store.
My only complaint is that you have a lot of variable names with "odb"
which are now object_directory pointers. Perhaps "odb" -> "objdir"? Or
is that just too much change?
>
> builtin/count-objects.c | 4 ++--
> builtin/fsck.c | 16 ++++++-------
> builtin/submodule--helper.c | 6 ++---
> commit-graph.c | 10 ++++----
> object-store.h | 14 +++++------
> object.c | 10 ++++----
> packfile.c | 8 +++----
> sha1-file.c | 48 ++++++++++++++++++-------------------
> sha1-name.c | 20 ++++++++--------
> transport.c | 2 +-
> 10 files changed, 69 insertions(+), 69 deletions(-)
>
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index a7cad052c6..3fae474f6f 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -78,10 +78,10 @@ static int count_cruft(const char *basename, const char *path, void *data)
> return 0;
> }
>
> -static int print_alternate(struct alternate_object_database *alt, void *data)
> +static int print_alternate(struct object_directory *odb, void *data)
> {
> printf("alternate: ");
> - quote_c_style(alt->path, NULL, stdout, 0);
> + quote_c_style(odb->path, NULL, stdout, 0);
> putchar('\n');
> return 0;
> }
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index b10f2b154c..55153cf92a 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -688,7 +688,7 @@ static struct option fsck_opts[] = {
> int cmd_fsck(int argc, const char **argv, const char *prefix)
> {
> int i;
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
>
> /* fsck knows how to handle missing promisor objects */
> fetch_if_missing = 0;
> @@ -725,14 +725,14 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
> for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
> } else {
> - struct alternate_object_database *alt_odb_list;
> + struct object_directory *alt_odb_list;
>
> fsck_object_dir(get_object_directory());
>
> prepare_alt_odb(the_repository);
> alt_odb_list = the_repository->objects->alt_odb_list;
> - for (alt = alt_odb_list; alt; alt = alt->next)
> - fsck_object_dir(alt->path);
> + for (odb = alt_odb_list; odb; odb = odb->next)
> + fsck_object_dir(odb->path);
>
> if (check_full) {
> struct packed_git *p;
> @@ -840,12 +840,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> errors_found |= ERROR_COMMIT_GRAPH;
>
> prepare_alt_odb(the_repository);
> - for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
> + for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
> child_process_init(&commit_graph_verify);
> commit_graph_verify.argv = verify_argv;
> commit_graph_verify.git_cmd = 1;
> verify_argv[2] = "--object-dir";
> - verify_argv[3] = alt->path;
> + verify_argv[3] = odb->path;
> if (run_command(&commit_graph_verify))
> errors_found |= ERROR_COMMIT_GRAPH;
> }
> @@ -861,12 +861,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> errors_found |= ERROR_COMMIT_GRAPH;
>
> prepare_alt_odb(the_repository);
> - for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
> + for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
> child_process_init(&midx_verify);
> midx_verify.argv = midx_argv;
> midx_verify.git_cmd = 1;
> midx_argv[2] = "--object-dir";
> - midx_argv[3] = alt->path;
> + midx_argv[3] = odb->path;
> if (run_command(&midx_verify))
> errors_found |= ERROR_COMMIT_GRAPH;
> }
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 28b9449e82..3ae451bc46 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1265,7 +1265,7 @@ struct submodule_alternate_setup {
> SUBMODULE_ALTERNATE_ERROR_IGNORE, NULL }
>
> static int add_possible_reference_from_superproject(
> - struct alternate_object_database *alt, void *sas_cb)
> + struct object_directory *odb, void *sas_cb)
> {
> struct submodule_alternate_setup *sas = sas_cb;
> size_t len;
> @@ -1274,11 +1274,11 @@ static int add_possible_reference_from_superproject(
> * If the alternate object store is another repository, try the
> * standard layout with .git/(modules/<name>)+/objects
> */
> - if (strip_suffix(alt->path, "/objects", &len)) {
> + if (strip_suffix(odb->path, "/objects", &len)) {
> char *sm_alternate;
> struct strbuf sb = STRBUF_INIT;
> struct strbuf err = STRBUF_INIT;
> - strbuf_add(&sb, alt->path, len);
> + strbuf_add(&sb, odb->path, len);
>
> /*
> * We need to end the new path with '/' to mark it as a dir,
> diff --git a/commit-graph.c b/commit-graph.c
> index 40c855f185..5dd3f5b15c 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -230,7 +230,7 @@ static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
> */
> static int prepare_commit_graph(struct repository *r)
> {
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
> char *obj_dir;
> int config_value;
>
> @@ -255,10 +255,10 @@ static int prepare_commit_graph(struct repository *r)
> obj_dir = r->objects->objectdir;
> prepare_commit_graph_one(r, obj_dir);
> prepare_alt_odb(r);
> - for (alt = r->objects->alt_odb_list;
> - !r->objects->commit_graph && alt;
> - alt = alt->next)
> - prepare_commit_graph_one(r, alt->path);
> + for (odb = r->objects->alt_odb_list;
> + !r->objects->commit_graph && odb;
> + odb = odb->next)
> + prepare_commit_graph_one(r, odb->path);
> return !!r->objects->commit_graph;
> }
>
> diff --git a/object-store.h b/object-store.h
> index 63b7605a3e..122d5f75e2 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -7,8 +7,8 @@
> #include "sha1-array.h"
> #include "strbuf.h"
>
> -struct alternate_object_database {
> - struct alternate_object_database *next;
> +struct object_directory {
> + struct object_directory *next;
>
> /* see alt_scratch_buf() */
> struct strbuf scratch;
> @@ -32,14 +32,14 @@ struct alternate_object_database {
> };
> void prepare_alt_odb(struct repository *r);
> char *compute_alternate_path(const char *path, struct strbuf *err);
> -typedef int alt_odb_fn(struct alternate_object_database *, void *);
> +typedef int alt_odb_fn(struct object_directory *, void *);
> int foreach_alt_odb(alt_odb_fn, void*);
>
> /*
> * Allocate a "struct alternate_object_database" but do _not_ actually
> * add it to the list of alternates.
> */
> -struct alternate_object_database *alloc_alt_odb(const char *dir);
> +struct object_directory *alloc_alt_odb(const char *dir);
>
> /*
> * Add the directory to the on-disk alternates file; the new entry will also
> @@ -60,7 +60,7 @@ void add_to_alternates_memory(const char *dir);
> * alternate. Always use this over direct access to alt->scratch, as it
> * cleans up any previous use of the scratch buffer.
> */
> -struct strbuf *alt_scratch_buf(struct alternate_object_database *alt);
> +struct strbuf *alt_scratch_buf(struct object_directory *odb);
>
> struct packed_git {
> struct packed_git *next;
> @@ -100,8 +100,8 @@ struct raw_object_store {
> /* Path to extra alternate object database if not NULL */
> char *alternate_db;
>
> - struct alternate_object_database *alt_odb_list;
> - struct alternate_object_database **alt_odb_tail;
> + struct object_directory *alt_odb_list;
> + struct object_directory **alt_odb_tail;
>
> /*
> * Objects that should be substituted by other objects
> diff --git a/object.c b/object.c
> index e54160550c..6af8e908bb 100644
> --- a/object.c
> +++ b/object.c
> @@ -482,17 +482,17 @@ struct raw_object_store *raw_object_store_new(void)
> return o;
> }
>
> -static void free_alt_odb(struct alternate_object_database *alt)
> +static void free_alt_odb(struct object_directory *odb)
> {
> - strbuf_release(&alt->scratch);
> - oid_array_clear(&alt->loose_objects_cache);
> - free(alt);
> + strbuf_release(&odb->scratch);
> + oid_array_clear(&odb->loose_objects_cache);
> + free(odb);
> }
>
> static void free_alt_odbs(struct raw_object_store *o)
> {
> while (o->alt_odb_list) {
> - struct alternate_object_database *next;
> + struct object_directory *next;
>
> next = o->alt_odb_list->next;
> free_alt_odb(o->alt_odb_list);
> diff --git a/packfile.c b/packfile.c
> index f2850a00b5..d6d511cfd2 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -966,16 +966,16 @@ static void prepare_packed_git_mru(struct repository *r)
>
> static void prepare_packed_git(struct repository *r)
> {
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
>
> if (r->objects->packed_git_initialized)
> return;
> prepare_multi_pack_index_one(r, r->objects->objectdir, 1);
> prepare_packed_git_one(r, r->objects->objectdir, 1);
> prepare_alt_odb(r);
> - for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
> - prepare_multi_pack_index_one(r, alt->path, 0);
> - prepare_packed_git_one(r, alt->path, 0);
> + for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> + prepare_multi_pack_index_one(r, odb->path, 0);
> + prepare_packed_git_one(r, odb->path, 0);
> }
> rearrange_packed_git(r);
>
> diff --git a/sha1-file.c b/sha1-file.c
> index dd0b6aa873..a3cc650a0a 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -353,16 +353,16 @@ void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned cha
> fill_sha1_path(buf, sha1);
> }
>
> -struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
> +struct strbuf *alt_scratch_buf(struct object_directory *odb)
> {
> - strbuf_setlen(&alt->scratch, alt->base_len);
> - return &alt->scratch;
> + strbuf_setlen(&odb->scratch, odb->base_len);
> + return &odb->scratch;
> }
>
> -static const char *alt_sha1_path(struct alternate_object_database *alt,
> +static const char *alt_sha1_path(struct object_directory *odb,
> const unsigned char *sha1)
> {
> - struct strbuf *buf = alt_scratch_buf(alt);
> + struct strbuf *buf = alt_scratch_buf(odb);
> fill_sha1_path(buf, sha1);
> return buf->buf;
> }
> @@ -374,7 +374,7 @@ static int alt_odb_usable(struct raw_object_store *o,
> struct strbuf *path,
> const char *normalized_objdir)
> {
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
>
> /* Detect cases where alternate disappeared */
> if (!is_directory(path->buf)) {
> @@ -388,8 +388,8 @@ static int alt_odb_usable(struct raw_object_store *o,
> * Prevent the common mistake of listing the same
> * thing twice, or object directory itself.
> */
> - for (alt = o->alt_odb_list; alt; alt = alt->next) {
> - if (!fspathcmp(path->buf, alt->path))
> + for (odb = o->alt_odb_list; odb; odb = odb->next) {
> + if (!fspathcmp(path->buf, odb->path))
> return 0;
> }
> if (!fspathcmp(path->buf, normalized_objdir))
> @@ -402,7 +402,7 @@ static int alt_odb_usable(struct raw_object_store *o,
> * Prepare alternate object database registry.
> *
> * The variable alt_odb_list points at the list of struct
> - * alternate_object_database. The elements on this list come from
> + * object_directory. The elements on this list come from
> * non-empty elements from colon separated ALTERNATE_DB_ENVIRONMENT
> * environment variable, and $GIT_OBJECT_DIRECTORY/info/alternates,
> * whose contents is similar to that environment variable but can be
> @@ -419,7 +419,7 @@ static void read_info_alternates(struct repository *r,
> static int link_alt_odb_entry(struct repository *r, const char *entry,
> const char *relative_base, int depth, const char *normalized_objdir)
> {
> - struct alternate_object_database *ent;
> + struct object_directory *ent;
> struct strbuf pathbuf = STRBUF_INIT;
>
> if (!is_absolute_path(entry) && relative_base) {
> @@ -540,9 +540,9 @@ static void read_info_alternates(struct repository *r,
> free(path);
> }
>
> -struct alternate_object_database *alloc_alt_odb(const char *dir)
> +struct object_directory *alloc_alt_odb(const char *dir)
> {
> - struct alternate_object_database *ent;
> + struct object_directory *ent;
>
> FLEX_ALLOC_STR(ent, path, dir);
> strbuf_init(&ent->scratch, 0);
> @@ -684,7 +684,7 @@ char *compute_alternate_path(const char *path, struct strbuf *err)
>
> int foreach_alt_odb(alt_odb_fn fn, void *cb)
> {
> - struct alternate_object_database *ent;
> + struct object_directory *ent;
> int r = 0;
>
> prepare_alt_odb(the_repository);
> @@ -743,10 +743,10 @@ static int check_and_freshen_local(const struct object_id *oid, int freshen)
>
> static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
> {
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
> prepare_alt_odb(the_repository);
> - for (alt = the_repository->objects->alt_odb_list; alt; alt = alt->next) {
> - const char *path = alt_sha1_path(alt, oid->hash);
> + for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
> + const char *path = alt_sha1_path(odb, oid->hash);
> if (check_and_freshen_file(path, freshen))
> return 1;
> }
> @@ -893,7 +893,7 @@ int git_open_cloexec(const char *name, int flags)
> static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> struct stat *st, const char **path)
> {
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
> static struct strbuf buf = STRBUF_INIT;
>
> strbuf_reset(&buf);
> @@ -905,8 +905,8 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
>
> prepare_alt_odb(r);
> errno = ENOENT;
> - for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
> - *path = alt_sha1_path(alt, sha1);
> + for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> + *path = alt_sha1_path(odb, sha1);
> if (!lstat(*path, st))
> return 0;
> }
> @@ -922,7 +922,7 @@ static int open_sha1_file(struct repository *r,
> const unsigned char *sha1, const char **path)
> {
> int fd;
> - struct alternate_object_database *alt;
> + struct object_directory *odb;
> int most_interesting_errno;
> static struct strbuf buf = STRBUF_INIT;
>
> @@ -936,8 +936,8 @@ static int open_sha1_file(struct repository *r,
> most_interesting_errno = errno;
>
> prepare_alt_odb(r);
> - for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
> - *path = alt_sha1_path(alt, sha1);
> + for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> + *path = alt_sha1_path(odb, sha1);
> fd = git_open(*path);
> if (fd >= 0)
> return fd;
> @@ -2139,14 +2139,14 @@ struct loose_alt_odb_data {
> void *data;
> };
>
> -static int loose_from_alt_odb(struct alternate_object_database *alt,
> +static int loose_from_alt_odb(struct object_directory *odb,
> void *vdata)
> {
> struct loose_alt_odb_data *data = vdata;
> struct strbuf buf = STRBUF_INIT;
> int r;
>
> - strbuf_addstr(&buf, alt->path);
> + strbuf_addstr(&buf, odb->path);
> r = for_each_loose_file_in_objdir_buf(&buf,
> data->cb, NULL, NULL,
> data->data);
> diff --git a/sha1-name.c b/sha1-name.c
> index faa60f69e3..2594aa79f8 100644
> --- a/sha1-name.c
> +++ b/sha1-name.c
> @@ -95,8 +95,8 @@ static int match_sha(unsigned, const unsigned char *, const unsigned char *);
> static void find_short_object_filename(struct disambiguate_state *ds)
> {
> int subdir_nr = ds->bin_pfx.hash[0];
> - struct alternate_object_database *alt;
> - static struct alternate_object_database *fakeent;
> + struct object_directory *odb;
> + static struct object_directory *fakeent;
>
> if (!fakeent) {
> /*
> @@ -110,24 +110,24 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> }
> fakeent->next = the_repository->objects->alt_odb_list;
>
> - for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
> + for (odb = fakeent; odb && !ds->ambiguous; odb = odb->next) {
> int pos;
>
> - if (!alt->loose_objects_subdir_seen[subdir_nr]) {
> - struct strbuf *buf = alt_scratch_buf(alt);
> + if (!odb->loose_objects_subdir_seen[subdir_nr]) {
> + struct strbuf *buf = alt_scratch_buf(odb);
> for_each_file_in_obj_subdir(subdir_nr, buf,
> append_loose_object,
> NULL, NULL,
> - &alt->loose_objects_cache);
> - alt->loose_objects_subdir_seen[subdir_nr] = 1;
> + &odb->loose_objects_cache);
> + odb->loose_objects_subdir_seen[subdir_nr] = 1;
> }
>
> - pos = oid_array_lookup(&alt->loose_objects_cache, &ds->bin_pfx);
> + pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx);
> if (pos < 0)
> pos = -1 - pos;
> - while (!ds->ambiguous && pos < alt->loose_objects_cache.nr) {
> + while (!ds->ambiguous && pos < odb->loose_objects_cache.nr) {
> const struct object_id *oid;
> - oid = alt->loose_objects_cache.oid + pos;
> + oid = odb->loose_objects_cache.oid + pos;
> if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash))
> break;
> update_candidates(ds, oid);
> diff --git a/transport.c b/transport.c
> index 5a74b609ff..040e92c134 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1433,7 +1433,7 @@ struct alternate_refs_data {
> void *data;
> };
>
> -static int refs_from_alternate_cb(struct alternate_object_database *e,
> +static int refs_from_alternate_cb(struct object_directory *e,
> void *data)
> {
> struct strbuf path = STRBUF_INIT;
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 3/9] rename "alternate_object_database" to "object_directory"
2018-11-12 15:30 ` Derrick Stolee
@ 2018-11-12 15:36 ` Jeff King
2018-11-12 19:41 ` Ramsay Jones
0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 15:36 UTC (permalink / raw)
To: Derrick Stolee
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12, 2018 at 10:30:55AM -0500, Derrick Stolee wrote:
> On 11/12/2018 9:48 AM, Jeff King wrote:
> > In preparation for unifying the handling of alt odb's and the normal
> > repo object directory, let's use a more neutral name. This patch is
> > purely mechanical, swapping the type name, and converting any variables
> > named "alt" to "odb". There should be no functional change, but it will
> > reduce the noise in subsequent diffs.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > I waffled on calling this object_database instead of object_directory.
> > But really, it is very specifically about the directory (packed
> > storage, including packs from alternates, is handled elsewhere).
>
> That makes sense. Each alternate makes its own object directory, but is part
> of a larger object database. It also helps clarify a difference from the
> object_store.
>
> My only complaint is that you have a lot of variable names with "odb" which
> are now object_directory pointers. Perhaps "odb" -> "objdir"? Or is that
> just too much change?
Yeah, that was part of my waffling. ;)
From my conversions, usually "objdir" is a string holding the pathname,
though that's not set in stone. I also like that "odb" is the same short
length as "alt", which helps with conversion.
But I dunno.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 3/9] rename "alternate_object_database" to "object_directory"
2018-11-12 15:36 ` Jeff King
@ 2018-11-12 19:41 ` Ramsay Jones
0 siblings, 0 replies; 99+ messages in thread
From: Ramsay Jones @ 2018-11-12 19:41 UTC (permalink / raw)
To: Jeff King, Derrick Stolee
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, Takuto Ikuta
On 12/11/2018 15:36, Jeff King wrote:
> On Mon, Nov 12, 2018 at 10:30:55AM -0500, Derrick Stolee wrote:
>
>> On 11/12/2018 9:48 AM, Jeff King wrote:
>>> In preparation for unifying the handling of alt odb's and the normal
>>> repo object directory, let's use a more neutral name. This patch is
>>> purely mechanical, swapping the type name, and converting any variables
>>> named "alt" to "odb". There should be no functional change, but it will
>>> reduce the noise in subsequent diffs.
>>>
>>> Signed-off-by: Jeff King <peff@peff.net>
>>> ---
>>> I waffled on calling this object_database instead of object_directory.
>>> But really, it is very specifically about the directory (packed
>>> storage, including packs from alternates, is handled elsewhere).
>>
>> That makes sense. Each alternate makes its own object directory, but is part
>> of a larger object database. It also helps clarify a difference from the
>> object_store.
>>
>> My only complaint is that you have a lot of variable names with "odb" which
>> are now object_directory pointers. Perhaps "odb" -> "objdir"? Or is that
>> just too much change?
>
> Yeah, that was part of my waffling. ;)
>
>>From my conversions, usually "objdir" is a string holding the pathname,
> though that's not set in stone. I also like that "odb" is the same short
> length as "alt", which helps with conversion.
While reading the patch, I keep thinking it should be 'obd' for
OBject Directory. ;-)
[Given my track record in naming things, please take with a _huge_
pinch of salt!]
ATB,
Ramsay Jones
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 4/9] sha1_file_name(): overwrite buffer instead of appending
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (2 preceding siblings ...)
2018-11-12 14:48 ` [PATCH 3/9] rename "alternate_object_database" to "object_directory" Jeff King
@ 2018-11-12 14:48 ` Jeff King
2018-11-12 15:32 ` Derrick Stolee
2018-11-12 14:49 ` [PATCH 5/9] handle alternates paths the same as the main object dir Jeff King
` (5 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:48 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
The sha1_file_name() function is used to generate the path to a loose
object in the object directory. It doesn't make much sense for it to
append, since the the path we write may be absolute (i.e., you cannot
reliably build up a path with it). Because many callers use it with a
static buffer, they have to strbuf_reset() manually before each call
(and the other callers always use an empty buffer, so they don't care
either way). Let's handle this automatically.
Since we're changing the semantics, let's take the opportunity to give
it a more hash-neutral name (which will also catch any callers from
topics in flight).
Signed-off-by: Jeff King <peff@peff.net>
---
http-walker.c | 2 +-
http.c | 4 ++--
object-store.h | 2 +-
sha1-file.c | 18 ++++++++----------
4 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/http-walker.c b/http-walker.c
index b3334bf657..0a392c85b6 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -547,7 +547,7 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
ret = error("File %s has bad hash", hex);
} else if (req->rename < 0) {
struct strbuf buf = STRBUF_INIT;
- sha1_file_name(the_repository, &buf, req->sha1);
+ loose_object_path(the_repository, &buf, req->sha1);
ret = error("unable to write sha1 filename %s", buf.buf);
strbuf_release(&buf);
}
diff --git a/http.c b/http.c
index 3dc8c560d6..46c2e7a275 100644
--- a/http.c
+++ b/http.c
@@ -2314,7 +2314,7 @@ struct http_object_request *new_http_object_request(const char *base_url,
hashcpy(freq->sha1, sha1);
freq->localfile = -1;
- sha1_file_name(the_repository, &filename, sha1);
+ loose_object_path(the_repository, &filename, sha1);
strbuf_addf(&freq->tmpfile, "%s.temp", filename.buf);
strbuf_addf(&prevfile, "%s.prev", filename.buf);
@@ -2465,7 +2465,7 @@ int finish_http_object_request(struct http_object_request *freq)
unlink_or_warn(freq->tmpfile.buf);
return -1;
}
- sha1_file_name(the_repository, &filename, freq->sha1);
+ loose_object_path(the_repository, &filename, freq->sha1);
freq->rename = finalize_object_file(freq->tmpfile.buf, filename.buf);
strbuf_release(&filename);
diff --git a/object-store.h b/object-store.h
index 122d5f75e2..fefa17e380 100644
--- a/object-store.h
+++ b/object-store.h
@@ -157,7 +157,7 @@ void raw_object_store_clear(struct raw_object_store *o);
* Put in `buf` the name of the file in the local object database that
* would be used to store a loose object with the specified sha1.
*/
-void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
+void loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size);
diff --git a/sha1-file.c b/sha1-file.c
index a3cc650a0a..478eac326b 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -346,8 +346,10 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
}
}
-void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned char *sha1)
+void loose_object_path(struct repository *r, struct strbuf *buf,
+ const unsigned char *sha1)
{
+ strbuf_reset(buf);
strbuf_addstr(buf, r->objects->objectdir);
strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
@@ -735,8 +737,7 @@ static int check_and_freshen_local(const struct object_id *oid, int freshen)
{
static struct strbuf buf = STRBUF_INIT;
- strbuf_reset(&buf);
- sha1_file_name(the_repository, &buf, oid->hash);
+ loose_object_path(the_repository, &buf, oid->hash);
return check_and_freshen_file(buf.buf, freshen);
}
@@ -888,7 +889,7 @@ int git_open_cloexec(const char *name, int flags)
*
* The "path" out-parameter will give the path of the object we found (if any).
* Note that it may point to static storage and is only valid until another
- * call to sha1_file_name(), etc.
+ * call to loose_object_path(), etc.
*/
static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
struct stat *st, const char **path)
@@ -896,8 +897,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
struct object_directory *odb;
static struct strbuf buf = STRBUF_INIT;
- strbuf_reset(&buf);
- sha1_file_name(r, &buf, sha1);
+ loose_object_path(r, &buf, sha1);
*path = buf.buf;
if (!lstat(*path, st))
@@ -926,8 +926,7 @@ static int open_sha1_file(struct repository *r,
int most_interesting_errno;
static struct strbuf buf = STRBUF_INIT;
- strbuf_reset(&buf);
- sha1_file_name(r, &buf, sha1);
+ loose_object_path(r, &buf, sha1);
*path = buf.buf;
fd = git_open(*path);
@@ -1626,8 +1625,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
static struct strbuf tmp_file = STRBUF_INIT;
static struct strbuf filename = STRBUF_INIT;
- strbuf_reset(&filename);
- sha1_file_name(the_repository, &filename, oid->hash);
+ loose_object_path(the_repository, &filename, oid->hash);
fd = create_tmpfile(&tmp_file, filename.buf);
if (fd < 0) {
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* [PATCH 5/9] handle alternates paths the same as the main object dir
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (3 preceding siblings ...)
2018-11-12 14:48 ` [PATCH 4/9] sha1_file_name(): overwrite buffer instead of appending Jeff King
@ 2018-11-12 14:49 ` Jeff King
2018-11-12 15:38 ` Derrick Stolee
2018-11-12 14:50 ` [PATCH 6/9] sha1-file: use an object_directory for " Jeff King
` (4 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:49 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
When we generate loose file paths for the main object directory, the
caller provides a buffer to loose_object_path (formerly sha1_file_name).
The callers generally keep their own static buffer to avoid excessive
reallocations.
But for alternate directories, each struct carries its own scratch
buffer. This is needlessly different; let's unify them.
We could go either direction here, but this patch moves the alternates
struct over to the main directory style (rather than vice-versa).
Technically the alternates style is more efficient, as it avoids
rewriting the object directory name on each call. But this is unlikely
to matter in practice, as we avoid reallocations either way (and nobody
has ever noticed or complained that the main object directory is copying
a few extra bytes before making a much more expensive system call).
And this has the advantage that the reusable buffers are tied to
particular calls, which makes the invalidation rules simpler (for
example, the return value from stat_sha1_file() used to be invalidated
by basically any other object call, but now it is affected only by other
calls to stat_sha1_file()).
We do steal the trick from alt_sha1_path() of returning a pointer to the
filled buffer, which makes a few conversions more convenient.
Signed-off-by: Jeff King <peff@peff.net>
---
object-store.h | 14 +-------------
object.c | 1 -
sha1-file.c | 44 ++++++++++++++++----------------------------
sha1-name.c | 8 ++++++--
4 files changed, 23 insertions(+), 44 deletions(-)
diff --git a/object-store.h b/object-store.h
index fefa17e380..b2fa0d0df0 100644
--- a/object-store.h
+++ b/object-store.h
@@ -10,10 +10,6 @@
struct object_directory {
struct object_directory *next;
- /* see alt_scratch_buf() */
- struct strbuf scratch;
- size_t base_len;
-
/*
* Used to store the results of readdir(3) calls when searching
* for unique abbreviated hashes. This cache is never
@@ -54,14 +50,6 @@ void add_to_alternates_file(const char *dir);
*/
void add_to_alternates_memory(const char *dir);
-/*
- * Returns a scratch strbuf pre-filled with the alternate object directory,
- * including a trailing slash, which can be used to access paths in the
- * alternate. Always use this over direct access to alt->scratch, as it
- * cleans up any previous use of the scratch buffer.
- */
-struct strbuf *alt_scratch_buf(struct object_directory *odb);
-
struct packed_git {
struct packed_git *next;
struct list_head mru;
@@ -157,7 +145,7 @@ void raw_object_store_clear(struct raw_object_store *o);
* Put in `buf` the name of the file in the local object database that
* would be used to store a loose object with the specified sha1.
*/
-void loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
+const char *loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size);
diff --git a/object.c b/object.c
index 6af8e908bb..dd485ac629 100644
--- a/object.c
+++ b/object.c
@@ -484,7 +484,6 @@ struct raw_object_store *raw_object_store_new(void)
static void free_alt_odb(struct object_directory *odb)
{
- strbuf_release(&odb->scratch);
oid_array_clear(&odb->loose_objects_cache);
free(odb);
}
diff --git a/sha1-file.c b/sha1-file.c
index 478eac326b..15db6b61a9 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -346,27 +346,20 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
}
}
-void loose_object_path(struct repository *r, struct strbuf *buf,
- const unsigned char *sha1)
+static const char *odb_loose_path(const char *path, struct strbuf *buf,
+ const unsigned char *sha1)
{
strbuf_reset(buf);
- strbuf_addstr(buf, r->objects->objectdir);
+ strbuf_addstr(buf, path);
strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
+ return buf->buf;
}
-struct strbuf *alt_scratch_buf(struct object_directory *odb)
+const char *loose_object_path(struct repository *r, struct strbuf *buf,
+ const unsigned char *sha1)
{
- strbuf_setlen(&odb->scratch, odb->base_len);
- return &odb->scratch;
-}
-
-static const char *alt_sha1_path(struct object_directory *odb,
- const unsigned char *sha1)
-{
- struct strbuf *buf = alt_scratch_buf(odb);
- fill_sha1_path(buf, sha1);
- return buf->buf;
+ return odb_loose_path(r->objects->objectdir, buf, sha1);
}
/*
@@ -547,9 +540,6 @@ struct object_directory *alloc_alt_odb(const char *dir)
struct object_directory *ent;
FLEX_ALLOC_STR(ent, path, dir);
- strbuf_init(&ent->scratch, 0);
- strbuf_addf(&ent->scratch, "%s/", dir);
- ent->base_len = ent->scratch.len;
return ent;
}
@@ -745,10 +735,12 @@ static int check_and_freshen_local(const struct object_id *oid, int freshen)
static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
{
struct object_directory *odb;
+ static struct strbuf path = STRBUF_INIT;
+
prepare_alt_odb(the_repository);
for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
- const char *path = alt_sha1_path(odb, oid->hash);
- if (check_and_freshen_file(path, freshen))
+ odb_loose_path(odb->path, &path, oid->hash);
+ if (check_and_freshen_file(path.buf, freshen))
return 1;
}
return 0;
@@ -889,7 +881,7 @@ int git_open_cloexec(const char *name, int flags)
*
* The "path" out-parameter will give the path of the object we found (if any).
* Note that it may point to static storage and is only valid until another
- * call to loose_object_path(), etc.
+ * call to stat_sha1_file().
*/
static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
struct stat *st, const char **path)
@@ -897,16 +889,14 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
struct object_directory *odb;
static struct strbuf buf = STRBUF_INIT;
- loose_object_path(r, &buf, sha1);
- *path = buf.buf;
-
+ *path = loose_object_path(r, &buf, sha1);
if (!lstat(*path, st))
return 0;
prepare_alt_odb(r);
errno = ENOENT;
for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
- *path = alt_sha1_path(odb, sha1);
+ *path = odb_loose_path(odb->path, &buf, sha1);
if (!lstat(*path, st))
return 0;
}
@@ -926,9 +916,7 @@ static int open_sha1_file(struct repository *r,
int most_interesting_errno;
static struct strbuf buf = STRBUF_INIT;
- loose_object_path(r, &buf, sha1);
- *path = buf.buf;
-
+ *path = loose_object_path(r, &buf, sha1);
fd = git_open(*path);
if (fd >= 0)
return fd;
@@ -936,7 +924,7 @@ static int open_sha1_file(struct repository *r,
prepare_alt_odb(r);
for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
- *path = alt_sha1_path(odb, sha1);
+ *path = odb_loose_path(odb->path, &buf, sha1);
fd = git_open(*path);
if (fd >= 0)
return fd;
diff --git a/sha1-name.c b/sha1-name.c
index 2594aa79f8..96a8e71482 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -97,6 +97,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
int subdir_nr = ds->bin_pfx.hash[0];
struct object_directory *odb;
static struct object_directory *fakeent;
+ struct strbuf buf = STRBUF_INIT;
if (!fakeent) {
/*
@@ -114,8 +115,9 @@ static void find_short_object_filename(struct disambiguate_state *ds)
int pos;
if (!odb->loose_objects_subdir_seen[subdir_nr]) {
- struct strbuf *buf = alt_scratch_buf(odb);
- for_each_file_in_obj_subdir(subdir_nr, buf,
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, odb->path);
+ for_each_file_in_obj_subdir(subdir_nr, &buf,
append_loose_object,
NULL, NULL,
&odb->loose_objects_cache);
@@ -134,6 +136,8 @@ static void find_short_object_filename(struct disambiguate_state *ds)
pos++;
}
}
+
+ strbuf_release(&buf);
}
static int match_sha(unsigned len, const unsigned char *a, const unsigned char *b)
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 5/9] handle alternates paths the same as the main object dir
2018-11-12 14:49 ` [PATCH 5/9] handle alternates paths the same as the main object dir Jeff King
@ 2018-11-12 15:38 ` Derrick Stolee
2018-11-12 15:46 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 15:38 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On 11/12/2018 9:49 AM, Jeff King wrote:
> When we generate loose file paths for the main object directory, the
> caller provides a buffer to loose_object_path (formerly sha1_file_name).
> The callers generally keep their own static buffer to avoid excessive
> reallocations.
>
> But for alternate directories, each struct carries its own scratch
> buffer. This is needlessly different; let's unify them.
>
> We could go either direction here, but this patch moves the alternates
> struct over to the main directory style (rather than vice-versa).
> Technically the alternates style is more efficient, as it avoids
> rewriting the object directory name on each call. But this is unlikely
> to matter in practice, as we avoid reallocations either way (and nobody
> has ever noticed or complained that the main object directory is copying
> a few extra bytes before making a much more expensive system call).
Hm. I've complained in the past [1] about a simple method like
strbuf_addf() over loose objects, but that was during abbreviation
checks so we were adding the string for every loose object but not
actually reading the objects.
[1]
https://public-inbox.org/git/20171201174956.143245-1-dstolee@microsoft.com/
The other concern I have is for alternates that may have long-ish paths
to their object directories.
So, this is worth keeping an eye on, but is likely to be fine.
> And this has the advantage that the reusable buffers are tied to
> particular calls, which makes the invalidation rules simpler (for
> example, the return value from stat_sha1_file() used to be invalidated
> by basically any other object call, but now it is affected only by other
> calls to stat_sha1_file()).
>
> We do steal the trick from alt_sha1_path() of returning a pointer to the
> filled buffer, which makes a few conversions more convenient.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> object-store.h | 14 +-------------
> object.c | 1 -
> sha1-file.c | 44 ++++++++++++++++----------------------------
> sha1-name.c | 8 ++++++--
> 4 files changed, 23 insertions(+), 44 deletions(-)
>
> diff --git a/object-store.h b/object-store.h
> index fefa17e380..b2fa0d0df0 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -10,10 +10,6 @@
> struct object_directory {
> struct object_directory *next;
>
> - /* see alt_scratch_buf() */
> - struct strbuf scratch;
> - size_t base_len;
> -
> /*
> * Used to store the results of readdir(3) calls when searching
> * for unique abbreviated hashes. This cache is never
> @@ -54,14 +50,6 @@ void add_to_alternates_file(const char *dir);
> */
> void add_to_alternates_memory(const char *dir);
>
> -/*
> - * Returns a scratch strbuf pre-filled with the alternate object directory,
> - * including a trailing slash, which can be used to access paths in the
> - * alternate. Always use this over direct access to alt->scratch, as it
> - * cleans up any previous use of the scratch buffer.
> - */
> -struct strbuf *alt_scratch_buf(struct object_directory *odb);
> -
> struct packed_git {
> struct packed_git *next;
> struct list_head mru;
> @@ -157,7 +145,7 @@ void raw_object_store_clear(struct raw_object_store *o);
> * Put in `buf` the name of the file in the local object database that
> * would be used to store a loose object with the specified sha1.
> */
> -void loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
> +const char *loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
>
> void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size);
>
> diff --git a/object.c b/object.c
> index 6af8e908bb..dd485ac629 100644
> --- a/object.c
> +++ b/object.c
> @@ -484,7 +484,6 @@ struct raw_object_store *raw_object_store_new(void)
>
> static void free_alt_odb(struct object_directory *odb)
> {
> - strbuf_release(&odb->scratch);
> oid_array_clear(&odb->loose_objects_cache);
> free(odb);
> }
> diff --git a/sha1-file.c b/sha1-file.c
> index 478eac326b..15db6b61a9 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -346,27 +346,20 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
> }
> }
>
> -void loose_object_path(struct repository *r, struct strbuf *buf,
> - const unsigned char *sha1)
> +static const char *odb_loose_path(const char *path, struct strbuf *buf,
> + const unsigned char *sha1)
> {
> strbuf_reset(buf);
> - strbuf_addstr(buf, r->objects->objectdir);
> + strbuf_addstr(buf, path);
> strbuf_addch(buf, '/');
> fill_sha1_path(buf, sha1);
> + return buf->buf;
> }
>
> -struct strbuf *alt_scratch_buf(struct object_directory *odb)
> +const char *loose_object_path(struct repository *r, struct strbuf *buf,
> + const unsigned char *sha1)
> {
> - strbuf_setlen(&odb->scratch, odb->base_len);
> - return &odb->scratch;
> -}
> -
> -static const char *alt_sha1_path(struct object_directory *odb,
> - const unsigned char *sha1)
> -{
> - struct strbuf *buf = alt_scratch_buf(odb);
> - fill_sha1_path(buf, sha1);
> - return buf->buf;
> + return odb_loose_path(r->objects->objectdir, buf, sha1);
> }
>
> /*
> @@ -547,9 +540,6 @@ struct object_directory *alloc_alt_odb(const char *dir)
> struct object_directory *ent;
>
> FLEX_ALLOC_STR(ent, path, dir);
> - strbuf_init(&ent->scratch, 0);
> - strbuf_addf(&ent->scratch, "%s/", dir);
> - ent->base_len = ent->scratch.len;
>
> return ent;
> }
> @@ -745,10 +735,12 @@ static int check_and_freshen_local(const struct object_id *oid, int freshen)
> static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
> {
> struct object_directory *odb;
> + static struct strbuf path = STRBUF_INIT;
> +
> prepare_alt_odb(the_repository);
> for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
> - const char *path = alt_sha1_path(odb, oid->hash);
> - if (check_and_freshen_file(path, freshen))
> + odb_loose_path(odb->path, &path, oid->hash);
> + if (check_and_freshen_file(path.buf, freshen))
> return 1;
> }
> return 0;
> @@ -889,7 +881,7 @@ int git_open_cloexec(const char *name, int flags)
> *
> * The "path" out-parameter will give the path of the object we found (if any).
> * Note that it may point to static storage and is only valid until another
> - * call to loose_object_path(), etc.
> + * call to stat_sha1_file().
> */
> static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> struct stat *st, const char **path)
> @@ -897,16 +889,14 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> struct object_directory *odb;
> static struct strbuf buf = STRBUF_INIT;
>
> - loose_object_path(r, &buf, sha1);
> - *path = buf.buf;
> -
> + *path = loose_object_path(r, &buf, sha1);
> if (!lstat(*path, st))
> return 0;
>
> prepare_alt_odb(r);
> errno = ENOENT;
> for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> - *path = alt_sha1_path(odb, sha1);
> + *path = odb_loose_path(odb->path, &buf, sha1);
> if (!lstat(*path, st))
> return 0;
> }
> @@ -926,9 +916,7 @@ static int open_sha1_file(struct repository *r,
> int most_interesting_errno;
> static struct strbuf buf = STRBUF_INIT;
>
> - loose_object_path(r, &buf, sha1);
> - *path = buf.buf;
> -
> + *path = loose_object_path(r, &buf, sha1);
> fd = git_open(*path);
> if (fd >= 0)
> return fd;
> @@ -936,7 +924,7 @@ static int open_sha1_file(struct repository *r,
>
> prepare_alt_odb(r);
> for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> - *path = alt_sha1_path(odb, sha1);
> + *path = odb_loose_path(odb->path, &buf, sha1);
> fd = git_open(*path);
> if (fd >= 0)
> return fd;
> diff --git a/sha1-name.c b/sha1-name.c
> index 2594aa79f8..96a8e71482 100644
> --- a/sha1-name.c
> +++ b/sha1-name.c
> @@ -97,6 +97,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> int subdir_nr = ds->bin_pfx.hash[0];
> struct object_directory *odb;
> static struct object_directory *fakeent;
> + struct strbuf buf = STRBUF_INIT;
>
> if (!fakeent) {
> /*
> @@ -114,8 +115,9 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> int pos;
>
> if (!odb->loose_objects_subdir_seen[subdir_nr]) {
> - struct strbuf *buf = alt_scratch_buf(odb);
> - for_each_file_in_obj_subdir(subdir_nr, buf,
> + strbuf_reset(&buf);
> + strbuf_addstr(&buf, odb->path);
> + for_each_file_in_obj_subdir(subdir_nr, &buf,
> append_loose_object,
> NULL, NULL,
> &odb->loose_objects_cache);
> @@ -134,6 +136,8 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> pos++;
> }
> }
> +
> + strbuf_release(&buf);
> }
>
> static int match_sha(unsigned len, const unsigned char *a, const unsigned char *b)
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 5/9] handle alternates paths the same as the main object dir
2018-11-12 15:38 ` Derrick Stolee
@ 2018-11-12 15:46 ` Jeff King
2018-11-12 15:50 ` Derrick Stolee
0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 15:46 UTC (permalink / raw)
To: Derrick Stolee
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12, 2018 at 10:38:28AM -0500, Derrick Stolee wrote:
> > We could go either direction here, but this patch moves the alternates
> > struct over to the main directory style (rather than vice-versa).
> > Technically the alternates style is more efficient, as it avoids
> > rewriting the object directory name on each call. But this is unlikely
> > to matter in practice, as we avoid reallocations either way (and nobody
> > has ever noticed or complained that the main object directory is copying
> > a few extra bytes before making a much more expensive system call).
>
> Hm. I've complained in the past [1] about a simple method like strbuf_addf()
> over loose objects, but that was during abbreviation checks so we were
> adding the string for every loose object but not actually reading the
> objects.
>
> [1]
> https://public-inbox.org/git/20171201174956.143245-1-dstolee@microsoft.com/
I suspect that had more to do with the cost of snprintf() than the extra
bytes being copied. And here we'd still be using addstr and addch
exclusively. I'm open to numeric arguments to the contrary, though. :)
There's actually a lot of low-hanging fruit there for pre-sizing, too.
E.g., fill_sha1_path() calls strbuf_addch() in a loop, but it could
quite easily grow the 41 bytes it needs ahead of time. I wouldn't want
to change that without finding a measurable improvement, though. It
might not be a big deal due to fec501dae8 (strbuf_addch: avoid calling
strbuf_grow, 2015-04-16).
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 5/9] handle alternates paths the same as the main object dir
2018-11-12 15:46 ` Jeff King
@ 2018-11-12 15:50 ` Derrick Stolee
0 siblings, 0 replies; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 15:50 UTC (permalink / raw)
To: Jeff King
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, Takuto Ikuta
On 11/12/2018 10:46 AM, Jeff King wrote:
> On Mon, Nov 12, 2018 at 10:38:28AM -0500, Derrick Stolee wrote:
>
>>> We could go either direction here, but this patch moves the alternates
>>> struct over to the main directory style (rather than vice-versa).
>>> Technically the alternates style is more efficient, as it avoids
>>> rewriting the object directory name on each call. But this is unlikely
>>> to matter in practice, as we avoid reallocations either way (and nobody
>>> has ever noticed or complained that the main object directory is copying
>>> a few extra bytes before making a much more expensive system call).
>> Hm. I've complained in the past [1] about a simple method like strbuf_addf()
>> over loose objects, but that was during abbreviation checks so we were
>> adding the string for every loose object but not actually reading the
>> objects.
>>
>> [1]
>> https://public-inbox.org/git/20171201174956.143245-1-dstolee@microsoft.com/
> I suspect that had more to do with the cost of snprintf() than the extra
> bytes being copied. And here we'd still be using addstr and addch
> exclusively. I'm open to numeric arguments to the contrary, though. :)
I agree. I don't think it is worth investigating now, as the performance
difference should be moot. I am making a mental note to take a look here
if I notice a performance regression later. ;)
> There's actually a lot of low-hanging fruit there for pre-sizing, too.
> E.g., fill_sha1_path() calls strbuf_addch() in a loop, but it could
> quite easily grow the 41 bytes it needs ahead of time. I wouldn't want
> to change that without finding a measurable improvement, though. It
> might not be a big deal due to fec501dae8 (strbuf_addch: avoid calling
> strbuf_grow, 2015-04-16).
>
> -Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 6/9] sha1-file: use an object_directory for the main object dir
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (4 preceding siblings ...)
2018-11-12 14:49 ` [PATCH 5/9] handle alternates paths the same as the main object dir Jeff King
@ 2018-11-12 14:50 ` Jeff King
2018-11-12 15:48 ` Derrick Stolee
2018-11-12 14:50 ` [PATCH 7/9] object-store: provide helpers for loose_objects_cache Jeff King
` (3 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:50 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
Our handling of alternate object directories is needlessly different
from the main object directory. As a result, many places in the code
basically look like this:
do_something(r->objects->objdir);
for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
do_something(odb->path);
That gets annoying when do_something() is non-trivial, and we've
resorted to gross hacks like creating fake alternates (see
find_short_object_filename()).
Instead, let's give each raw_object_store a unified list of
object_directory structs. The first will be the main store, and
everything after is an alternate. Very few callers even care about the
distinction, and can just loop over the whole list (and those who care
can just treat the first element differently).
A few observations:
- we don't need r->objects->objectdir anymore, and can just
mechanically convert that to r->objects->odb->path
- object_directory's path field needs to become a real pointer rather
than a FLEX_ARRAY, in order to fill it with expand_base_dir()
- we'll call prepare_alt_odb() earlier in many functions (i.e.,
outside of the loop). This may result in us calling it even when our
function would be satisfied looking only at the main odb.
But this doesn't matter in practice. It's not a very expensive
operation in the first place, and in the majority of cases it will
be a noop. We call it already (and cache its results) in
prepare_packed_git(), and we'll generally check packs before loose
objects. So essentially every program is going to call it
immediately once per program.
Arguably we should just prepare_alt_odb() immediately upon setting
up the repository's object directory, which would save us sprinkling
calls throughout the code base (and forgetting to do so has been a
source of subtle bugs in the past). But I've stopped short of that
here, since there are already a lot of other moving parts in this
patch.
- Most call sites just get shorter. The check_and_freshen() functions
are an exception, because they have entry points to handle local and
nonlocal directories separately.
Signed-off-by: Jeff King <peff@peff.net>
---
If the "the first one is the main store, the rest are alternates" bit is
too subtle, we could mark each "struct object_directory" with a bit for
"is_local".
builtin/fsck.c | 21 ++-------
builtin/grep.c | 2 +-
commit-graph.c | 5 +-
environment.c | 4 +-
object-store.h | 27 ++++++-----
object.c | 19 ++++----
packfile.c | 10 ++--
path.c | 2 +-
repository.c | 8 +++-
sha1-file.c | 122 ++++++++++++++++++-------------------------------
sha1-name.c | 17 ++-----
11 files changed, 90 insertions(+), 147 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 55153cf92a..15338bd178 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -725,13 +725,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
} else {
- struct object_directory *alt_odb_list;
-
- fsck_object_dir(get_object_directory());
-
prepare_alt_odb(the_repository);
- alt_odb_list = the_repository->objects->alt_odb_list;
- for (odb = alt_odb_list; odb; odb = odb->next)
+ for (odb = the_repository->objects->odb; odb; odb = odb->next)
fsck_object_dir(odb->path);
if (check_full) {
@@ -834,13 +829,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
- commit_graph_verify.argv = verify_argv;
- commit_graph_verify.git_cmd = 1;
- if (run_command(&commit_graph_verify))
- errors_found |= ERROR_COMMIT_GRAPH;
-
prepare_alt_odb(the_repository);
- for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
+ for (odb = the_repository->objects->odb; odb; odb = odb->next) {
child_process_init(&commit_graph_verify);
commit_graph_verify.argv = verify_argv;
commit_graph_verify.git_cmd = 1;
@@ -855,13 +845,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
struct child_process midx_verify = CHILD_PROCESS_INIT;
const char *midx_argv[] = { "multi-pack-index", "verify", NULL, NULL, NULL };
- midx_verify.argv = midx_argv;
- midx_verify.git_cmd = 1;
- if (run_command(&midx_verify))
- errors_found |= ERROR_COMMIT_GRAPH;
-
prepare_alt_odb(the_repository);
- for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
+ for (odb = the_repository->objects->odb; odb; odb = odb->next) {
child_process_init(&midx_verify);
midx_verify.argv = midx_argv;
midx_verify.git_cmd = 1;
diff --git a/builtin/grep.c b/builtin/grep.c
index d8508ddf79..714c8d91ba 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -441,7 +441,7 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
* object.
*/
grep_read_lock();
- add_to_alternates_memory(submodule.objects->objectdir);
+ add_to_alternates_memory(submodule.objects->odb->path);
grep_read_unlock();
if (oid) {
diff --git a/commit-graph.c b/commit-graph.c
index 5dd3f5b15c..99163c244b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -231,7 +231,6 @@ static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
static int prepare_commit_graph(struct repository *r)
{
struct object_directory *odb;
- char *obj_dir;
int config_value;
if (r->objects->commit_graph_attempted)
@@ -252,10 +251,8 @@ static int prepare_commit_graph(struct repository *r)
if (!commit_graph_compatible(r))
return 0;
- obj_dir = r->objects->objectdir;
- prepare_commit_graph_one(r, obj_dir);
prepare_alt_odb(r);
- for (odb = r->objects->alt_odb_list;
+ for (odb = r->objects->odb;
!r->objects->commit_graph && odb;
odb = odb->next)
prepare_commit_graph_one(r, odb->path);
diff --git a/environment.c b/environment.c
index 3f3c8746c2..441ce56690 100644
--- a/environment.c
+++ b/environment.c
@@ -274,9 +274,9 @@ const char *get_git_work_tree(void)
char *get_object_directory(void)
{
- if (!the_repository->objects->objectdir)
+ if (!the_repository->objects->odb)
BUG("git environment hasn't been setup");
- return the_repository->objects->objectdir;
+ return the_repository->objects->odb->path;
}
int odb_mkstemp(struct strbuf *temp_filename, const char *pattern)
diff --git a/object-store.h b/object-store.h
index b2fa0d0df0..30faf7b391 100644
--- a/object-store.h
+++ b/object-store.h
@@ -24,19 +24,14 @@ struct object_directory {
* Path to the alternative object store. If this is a relative path,
* it is relative to the current working directory.
*/
- char path[FLEX_ARRAY];
+ char *path;
};
+
void prepare_alt_odb(struct repository *r);
char *compute_alternate_path(const char *path, struct strbuf *err);
typedef int alt_odb_fn(struct object_directory *, void *);
int foreach_alt_odb(alt_odb_fn, void*);
-/*
- * Allocate a "struct alternate_object_database" but do _not_ actually
- * add it to the list of alternates.
- */
-struct object_directory *alloc_alt_odb(const char *dir);
-
/*
* Add the directory to the on-disk alternates file; the new entry will also
* take effect in the current process.
@@ -80,17 +75,21 @@ struct multi_pack_index;
struct raw_object_store {
/*
- * Path to the repository's object store.
- * Cannot be NULL after initialization.
+ * Set of all object directories; the main directory is first (and
+ * cannot be NULL after initialization). Subsequent directories are
+ * alternates.
*/
- char *objectdir;
+ struct object_directory *odb;
+ struct object_directory **odb_tail;
+ int loaded_alternates;
- /* Path to extra alternate object database if not NULL */
+ /*
+ * A list of alternate object directories loaded from the environment;
+ * this should not generally need to be accessed directly, but will
+ * populate the "odb" list when prepare_alt_odb() is run.
+ */
char *alternate_db;
- struct object_directory *alt_odb_list;
- struct object_directory **alt_odb_tail;
-
/*
* Objects that should be substituted by other objects
* (see git-replace(1)).
diff --git a/object.c b/object.c
index dd485ac629..79d636091c 100644
--- a/object.c
+++ b/object.c
@@ -482,26 +482,26 @@ struct raw_object_store *raw_object_store_new(void)
return o;
}
-static void free_alt_odb(struct object_directory *odb)
+static void free_object_directory(struct object_directory *odb)
{
+ free(odb->path);
oid_array_clear(&odb->loose_objects_cache);
free(odb);
}
-static void free_alt_odbs(struct raw_object_store *o)
+static void free_object_directories(struct raw_object_store *o)
{
- while (o->alt_odb_list) {
+ while (o->odb) {
struct object_directory *next;
- next = o->alt_odb_list->next;
- free_alt_odb(o->alt_odb_list);
- o->alt_odb_list = next;
+ next = o->odb->next;
+ free_object_directory(o->odb);
+ o->odb = next;
}
}
void raw_object_store_clear(struct raw_object_store *o)
{
- FREE_AND_NULL(o->objectdir);
FREE_AND_NULL(o->alternate_db);
oidmap_free(o->replace_map, 1);
@@ -511,8 +511,9 @@ void raw_object_store_clear(struct raw_object_store *o)
o->commit_graph = NULL;
o->commit_graph_attempted = 0;
- free_alt_odbs(o);
- o->alt_odb_tail = NULL;
+ free_object_directories(o);
+ o->odb_tail = NULL;
+ o->loaded_alternates = 0;
INIT_LIST_HEAD(&o->packed_git_mru);
close_all_packs(o);
diff --git a/packfile.c b/packfile.c
index d6d511cfd2..1eda33247f 100644
--- a/packfile.c
+++ b/packfile.c
@@ -970,12 +970,12 @@ static void prepare_packed_git(struct repository *r)
if (r->objects->packed_git_initialized)
return;
- prepare_multi_pack_index_one(r, r->objects->objectdir, 1);
- prepare_packed_git_one(r, r->objects->objectdir, 1);
+
prepare_alt_odb(r);
- for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
- prepare_multi_pack_index_one(r, odb->path, 0);
- prepare_packed_git_one(r, odb->path, 0);
+ for (odb = r->objects->odb; odb; odb = odb->next) {
+ int local = (odb == r->objects->odb);
+ prepare_multi_pack_index_one(r, odb->path, local);
+ prepare_packed_git_one(r, odb->path, local);
}
rearrange_packed_git(r);
diff --git a/path.c b/path.c
index ba06ec5b2d..e8609cf56d 100644
--- a/path.c
+++ b/path.c
@@ -383,7 +383,7 @@ static void adjust_git_path(const struct repository *repo,
strbuf_splice(buf, 0, buf->len,
repo->index_file, strlen(repo->index_file));
else if (dir_prefix(base, "objects"))
- replace_dir(buf, git_dir_len + 7, repo->objects->objectdir);
+ replace_dir(buf, git_dir_len + 7, repo->objects->odb->path);
else if (git_hooks_path && dir_prefix(base, "hooks"))
replace_dir(buf, git_dir_len + 5, git_hooks_path);
else if (repo->different_commondir)
diff --git a/repository.c b/repository.c
index 5dd1486718..7b02e1dffa 100644
--- a/repository.c
+++ b/repository.c
@@ -63,8 +63,14 @@ void repo_set_gitdir(struct repository *repo,
free(old_gitdir);
repo_set_commondir(repo, o->commondir);
- expand_base_dir(&repo->objects->objectdir, o->object_dir,
+
+ if (!repo->objects->odb) {
+ repo->objects->odb = xcalloc(1, sizeof(*repo->objects->odb));
+ repo->objects->odb_tail = &repo->objects->odb->next;
+ }
+ expand_base_dir(&repo->objects->odb->path, o->object_dir,
repo->commondir, "objects");
+
free(repo->objects->alternate_db);
repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
expand_base_dir(&repo->graft_file, o->graft_file,
diff --git a/sha1-file.c b/sha1-file.c
index 15db6b61a9..503262edd2 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -346,11 +346,12 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
}
}
-static const char *odb_loose_path(const char *path, struct strbuf *buf,
+static const char *odb_loose_path(struct object_directory *odb,
+ struct strbuf *buf,
const unsigned char *sha1)
{
strbuf_reset(buf);
- strbuf_addstr(buf, path);
+ strbuf_addstr(buf, odb->path);
strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
return buf->buf;
@@ -359,7 +360,7 @@ static const char *odb_loose_path(const char *path, struct strbuf *buf,
const char *loose_object_path(struct repository *r, struct strbuf *buf,
const unsigned char *sha1)
{
- return odb_loose_path(r->objects->objectdir, buf, sha1);
+ return odb_loose_path(r->objects->odb, buf, sha1);
}
/*
@@ -383,7 +384,7 @@ static int alt_odb_usable(struct raw_object_store *o,
* Prevent the common mistake of listing the same
* thing twice, or object directory itself.
*/
- for (odb = o->alt_odb_list; odb; odb = odb->next) {
+ for (odb = o->odb; odb; odb = odb->next) {
if (!fspathcmp(path->buf, odb->path))
return 0;
}
@@ -442,11 +443,12 @@ static int link_alt_odb_entry(struct repository *r, const char *entry,
return -1;
}
- ent = alloc_alt_odb(pathbuf.buf);
+ ent = xcalloc(1, sizeof(*ent));
+ ent->path = xstrdup(pathbuf.buf);
/* add the alternate entry */
- *r->objects->alt_odb_tail = ent;
- r->objects->alt_odb_tail = &(ent->next);
+ *r->objects->odb_tail = ent;
+ r->objects->odb_tail = &(ent->next);
ent->next = NULL;
/* recursively add alternates */
@@ -500,7 +502,7 @@ static void link_alt_odb_entries(struct repository *r, const char *alt,
return;
}
- strbuf_add_absolute_path(&objdirbuf, r->objects->objectdir);
+ strbuf_add_absolute_path(&objdirbuf, r->objects->odb->path);
if (strbuf_normalize_path(&objdirbuf) < 0)
die(_("unable to normalize object directory: %s"),
objdirbuf.buf);
@@ -535,15 +537,6 @@ static void read_info_alternates(struct repository *r,
free(path);
}
-struct object_directory *alloc_alt_odb(const char *dir)
-{
- struct object_directory *ent;
-
- FLEX_ALLOC_STR(ent, path, dir);
-
- return ent;
-}
-
void add_to_alternates_file(const char *reference)
{
struct lock_file lock = LOCK_INIT;
@@ -580,7 +573,7 @@ void add_to_alternates_file(const char *reference)
fprintf_or_die(out, "%s\n", reference);
if (commit_lock_file(&lock))
die_errno(_("unable to move new alternates file into place"));
- if (the_repository->objects->alt_odb_tail)
+ if (the_repository->objects->loaded_alternates)
link_alt_odb_entries(the_repository, reference,
'\n', NULL, 0);
}
@@ -680,7 +673,7 @@ int foreach_alt_odb(alt_odb_fn fn, void *cb)
int r = 0;
prepare_alt_odb(the_repository);
- for (ent = the_repository->objects->alt_odb_list; ent; ent = ent->next) {
+ for (ent = the_repository->objects->odb->next; ent; ent = ent->next) {
r = fn(ent, cb);
if (r)
break;
@@ -690,13 +683,13 @@ int foreach_alt_odb(alt_odb_fn fn, void *cb)
void prepare_alt_odb(struct repository *r)
{
- if (r->objects->alt_odb_tail)
+ if (r->objects->loaded_alternates)
return;
- r->objects->alt_odb_tail = &r->objects->alt_odb_list;
link_alt_odb_entries(r, r->objects->alternate_db, PATH_SEP, NULL, 0);
- read_info_alternates(r, r->objects->objectdir, 0);
+ read_info_alternates(r, r->objects->odb->path, 0);
+ r->objects->loaded_alternates = 1;
}
/* Returns 1 if we have successfully freshened the file, 0 otherwise. */
@@ -723,24 +716,27 @@ int check_and_freshen_file(const char *fn, int freshen)
return 1;
}
-static int check_and_freshen_local(const struct object_id *oid, int freshen)
+static int check_and_freshen_odb(struct object_directory *odb,
+ const struct object_id *oid,
+ int freshen)
{
- static struct strbuf buf = STRBUF_INIT;
-
- loose_object_path(the_repository, &buf, oid->hash);
+ static struct strbuf path = STRBUF_INIT;
+ odb_loose_path(odb, &path, oid->hash);
+ return check_and_freshen_file(path.buf, freshen);
+}
- return check_and_freshen_file(buf.buf, freshen);
+static int check_and_freshen_local(const struct object_id *oid, int freshen)
+{
+ return check_and_freshen_odb(the_repository->objects->odb, oid, freshen);
}
static int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
{
struct object_directory *odb;
- static struct strbuf path = STRBUF_INIT;
prepare_alt_odb(the_repository);
- for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
- odb_loose_path(odb->path, &path, oid->hash);
- if (check_and_freshen_file(path.buf, freshen))
+ for (odb = the_repository->objects->odb->next; odb; odb = odb->next) {
+ if (check_and_freshen_odb(odb, oid, freshen))
return 1;
}
return 0;
@@ -889,14 +885,9 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
struct object_directory *odb;
static struct strbuf buf = STRBUF_INIT;
- *path = loose_object_path(r, &buf, sha1);
- if (!lstat(*path, st))
- return 0;
-
prepare_alt_odb(r);
- errno = ENOENT;
- for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
- *path = odb_loose_path(odb->path, &buf, sha1);
+ for (odb = r->objects->odb; odb; odb = odb->next) {
+ *path = odb_loose_path(odb, &buf, sha1);
if (!lstat(*path, st))
return 0;
}
@@ -913,21 +904,16 @@ static int open_sha1_file(struct repository *r,
{
int fd;
struct object_directory *odb;
- int most_interesting_errno;
+ int most_interesting_errno = ENOENT;
static struct strbuf buf = STRBUF_INIT;
- *path = loose_object_path(r, &buf, sha1);
- fd = git_open(*path);
- if (fd >= 0)
- return fd;
- most_interesting_errno = errno;
-
prepare_alt_odb(r);
- for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
- *path = odb_loose_path(odb->path, &buf, sha1);
+ for (odb = r->objects->odb; odb; odb = odb->next) {
+ *path = odb_loose_path(odb, &buf, sha1);
fd = git_open(*path);
if (fd >= 0)
return fd;
+
if (most_interesting_errno == ENOENT)
most_interesting_errno = errno;
}
@@ -2120,43 +2106,23 @@ int for_each_loose_file_in_objdir(const char *path,
return r;
}
-struct loose_alt_odb_data {
- each_loose_object_fn *cb;
- void *data;
-};
-
-static int loose_from_alt_odb(struct object_directory *odb,
- void *vdata)
-{
- struct loose_alt_odb_data *data = vdata;
- struct strbuf buf = STRBUF_INIT;
- int r;
-
- strbuf_addstr(&buf, odb->path);
- r = for_each_loose_file_in_objdir_buf(&buf,
- data->cb, NULL, NULL,
- data->data);
- strbuf_release(&buf);
- return r;
-}
-
int for_each_loose_object(each_loose_object_fn cb, void *data,
enum for_each_object_flags flags)
{
- struct loose_alt_odb_data alt;
- int r;
+ struct object_directory *odb;
- r = for_each_loose_file_in_objdir(get_object_directory(),
- cb, NULL, NULL, data);
- if (r)
- return r;
+ prepare_alt_odb(the_repository);
+ for (odb = the_repository->objects->odb; odb; odb = odb->next) {
+ int r = for_each_loose_file_in_objdir(odb->path, cb, NULL,
+ NULL, data);
+ if (r)
+ return r;
- if (flags & FOR_EACH_OBJECT_LOCAL_ONLY)
- return 0;
+ if (flags & FOR_EACH_OBJECT_LOCAL_ONLY)
+ break;
+ }
- alt.cb = cb;
- alt.data = data;
- return foreach_alt_odb(loose_from_alt_odb, &alt);
+ return 0;
}
static int check_stream_sha1(git_zstream *stream,
diff --git a/sha1-name.c b/sha1-name.c
index 96a8e71482..358ca5e288 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -96,22 +96,11 @@ static void find_short_object_filename(struct disambiguate_state *ds)
{
int subdir_nr = ds->bin_pfx.hash[0];
struct object_directory *odb;
- static struct object_directory *fakeent;
struct strbuf buf = STRBUF_INIT;
- if (!fakeent) {
- /*
- * Create a "fake" alternate object database that
- * points to our own object database, to make it
- * easier to get a temporary working space in
- * alt->name/alt->base while iterating over the
- * object databases including our own.
- */
- fakeent = alloc_alt_odb(get_object_directory());
- }
- fakeent->next = the_repository->objects->alt_odb_list;
-
- for (odb = fakeent; odb && !ds->ambiguous; odb = odb->next) {
+ for (odb = the_repository->objects->odb;
+ odb && !ds->ambiguous;
+ odb = odb->next) {
int pos;
if (!odb->loose_objects_subdir_seen[subdir_nr]) {
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
2018-11-12 14:50 ` [PATCH 6/9] sha1-file: use an object_directory for " Jeff King
@ 2018-11-12 15:48 ` Derrick Stolee
2018-11-12 16:09 ` Jeff King
2018-11-12 18:48 ` Stefan Beller
0 siblings, 2 replies; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 15:48 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On 11/12/2018 9:50 AM, Jeff King wrote:
> Our handling of alternate object directories is needlessly different
> from the main object directory. As a result, many places in the code
> basically look like this:
>
> do_something(r->objects->objdir);
>
> for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
> do_something(odb->path);
>
> That gets annoying when do_something() is non-trivial, and we've
> resorted to gross hacks like creating fake alternates (see
> find_short_object_filename()).
>
> Instead, let's give each raw_object_store a unified list of
> object_directory structs. The first will be the main store, and
> everything after is an alternate. Very few callers even care about the
> distinction, and can just loop over the whole list (and those who care
> can just treat the first element differently).
>
> A few observations:
>
> - we don't need r->objects->objectdir anymore, and can just
> mechanically convert that to r->objects->odb->path
>
> - object_directory's path field needs to become a real pointer rather
> than a FLEX_ARRAY, in order to fill it with expand_base_dir()
>
> - we'll call prepare_alt_odb() earlier in many functions (i.e.,
> outside of the loop). This may result in us calling it even when our
> function would be satisfied looking only at the main odb.
>
> But this doesn't matter in practice. It's not a very expensive
> operation in the first place, and in the majority of cases it will
> be a noop. We call it already (and cache its results) in
> prepare_packed_git(), and we'll generally check packs before loose
> objects. So essentially every program is going to call it
> immediately once per program.
>
> Arguably we should just prepare_alt_odb() immediately upon setting
> up the repository's object directory, which would save us sprinkling
> calls throughout the code base (and forgetting to do so has been a
> source of subtle bugs in the past). But I've stopped short of that
> here, since there are already a lot of other moving parts in this
> patch.
>
> - Most call sites just get shorter. The check_and_freshen() functions
> are an exception, because they have entry points to handle local and
> nonlocal directories separately.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> If the "the first one is the main store, the rest are alternates" bit is
> too subtle, we could mark each "struct object_directory" with a bit for
> "is_local".
This is probably a good thing to do proactively. We have the equivalent
in the packed_git struct, but that's also because they get out of order.
At the moment, I can't think of a read-only action that needs to treat
the local object directory more carefully. The closest I know about is
'git pack-objects --local', but that also writes a pack-file.
I assume that when we write a pack-file to the "default location" we use
get_object_directory() instead of referring to the default object_directory?
>
> builtin/fsck.c | 21 ++-------
> builtin/grep.c | 2 +-
> commit-graph.c | 5 +-
> environment.c | 4 +-
> object-store.h | 27 ++++++-----
> object.c | 19 ++++----
> packfile.c | 10 ++--
> path.c | 2 +-
> repository.c | 8 +++-
> sha1-file.c | 122 ++++++++++++++++++-------------------------------
> sha1-name.c | 17 ++-----
> 11 files changed, 90 insertions(+), 147 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 55153cf92a..15338bd178 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -725,13 +725,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
> for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
> } else {
> - struct object_directory *alt_odb_list;
> -
> - fsck_object_dir(get_object_directory());
> -
> prepare_alt_odb(the_repository);
> - alt_odb_list = the_repository->objects->alt_odb_list;
> - for (odb = alt_odb_list; odb; odb = odb->next)
> + for (odb = the_repository->objects->odb; odb; odb = odb->next)
> fsck_object_dir(odb->path);
>
> if (check_full) {
> @@ -834,13 +829,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
> const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
>
> - commit_graph_verify.argv = verify_argv;
> - commit_graph_verify.git_cmd = 1;
> - if (run_command(&commit_graph_verify))
> - errors_found |= ERROR_COMMIT_GRAPH;
> -
> prepare_alt_odb(the_repository);
> - for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
> + for (odb = the_repository->objects->odb; odb; odb = odb->next) {
> child_process_init(&commit_graph_verify);
> commit_graph_verify.argv = verify_argv;
> commit_graph_verify.git_cmd = 1;
> @@ -855,13 +845,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
> struct child_process midx_verify = CHILD_PROCESS_INIT;
> const char *midx_argv[] = { "multi-pack-index", "verify", NULL, NULL, NULL };
>
> - midx_verify.argv = midx_argv;
> - midx_verify.git_cmd = 1;
> - if (run_command(&midx_verify))
> - errors_found |= ERROR_COMMIT_GRAPH;
> -
> prepare_alt_odb(the_repository);
> - for (odb = the_repository->objects->alt_odb_list; odb; odb = odb->next) {
> + for (odb = the_repository->objects->odb; odb; odb = odb->next) {
> child_process_init(&midx_verify);
> midx_verify.argv = midx_argv;
> midx_verify.git_cmd = 1;
> diff --git a/builtin/grep.c b/builtin/grep.c
> index d8508ddf79..714c8d91ba 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -441,7 +441,7 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject,
> * object.
> */
> grep_read_lock();
> - add_to_alternates_memory(submodule.objects->objectdir);
> + add_to_alternates_memory(submodule.objects->odb->path);
> grep_read_unlock();
>
> if (oid) {
> diff --git a/commit-graph.c b/commit-graph.c
> index 5dd3f5b15c..99163c244b 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -231,7 +231,6 @@ static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
> static int prepare_commit_graph(struct repository *r)
> {
> struct object_directory *odb;
> - char *obj_dir;
> int config_value;
>
> if (r->objects->commit_graph_attempted)
> @@ -252,10 +251,8 @@ static int prepare_commit_graph(struct repository *r)
> if (!commit_graph_compatible(r))
> return 0;
>
> - obj_dir = r->objects->objectdir;
> - prepare_commit_graph_one(r, obj_dir);
> prepare_alt_odb(r);
> - for (odb = r->objects->alt_odb_list;
> + for (odb = r->objects->odb;
> !r->objects->commit_graph && odb;
> odb = odb->next)
> prepare_commit_graph_one(r, odb->path);
> diff --git a/environment.c b/environment.c
> index 3f3c8746c2..441ce56690 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -274,9 +274,9 @@ const char *get_git_work_tree(void)
>
> char *get_object_directory(void)
> {
> - if (!the_repository->objects->objectdir)
> + if (!the_repository->objects->odb)
> BUG("git environment hasn't been setup");
> - return the_repository->objects->objectdir;
> + return the_repository->objects->odb->path;
> }
>
> int odb_mkstemp(struct strbuf *temp_filename, const char *pattern)
> diff --git a/object-store.h b/object-store.h
> index b2fa0d0df0..30faf7b391 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -24,19 +24,14 @@ struct object_directory {
> * Path to the alternative object store. If this is a relative path,
> * it is relative to the current working directory.
> */
> - char path[FLEX_ARRAY];
> + char *path;
> };
> +
> void prepare_alt_odb(struct repository *r);
> char *compute_alternate_path(const char *path, struct strbuf *err);
> typedef int alt_odb_fn(struct object_directory *, void *);
> int foreach_alt_odb(alt_odb_fn, void*);
>
> -/*
> - * Allocate a "struct alternate_object_database" but do _not_ actually
> - * add it to the list of alternates.
> - */
> -struct object_directory *alloc_alt_odb(const char *dir);
> -
> /*
> * Add the directory to the on-disk alternates file; the new entry will also
> * take effect in the current process.
> @@ -80,17 +75,21 @@ struct multi_pack_index;
>
> struct raw_object_store {
> /*
> - * Path to the repository's object store.
> - * Cannot be NULL after initialization.
> + * Set of all object directories; the main directory is first (and
> + * cannot be NULL after initialization). Subsequent directories are
> + * alternates.
> */
> - char *objectdir;
> + struct object_directory *odb;
> + struct object_directory **odb_tail;
> + int loaded_alternates;
>
> - /* Path to extra alternate object database if not NULL */
> + /*
> + * A list of alternate object directories loaded from the environment;
> + * this should not generally need to be accessed directly, but will
> + * populate the "odb" list when prepare_alt_odb() is run.
> + */
> char *alternate_db;
>
> - struct object_directory *alt_odb_list;
> - struct object_directory **alt_odb_tail;
> -
> /*
> * Objects that should be substituted by other objects
> * (see git-replace(1)).
> diff --git a/object.c b/object.c
> index dd485ac629..79d636091c 100644
> --- a/object.c
> +++ b/object.c
> @@ -482,26 +482,26 @@ struct raw_object_store *raw_object_store_new(void)
> return o;
> }
>
> -static void free_alt_odb(struct object_directory *odb)
> +static void free_object_directory(struct object_directory *odb)
> {
> + free(odb->path);
> oid_array_clear(&odb->loose_objects_cache);
> free(odb);
> }
>
> -static void free_alt_odbs(struct raw_object_store *o)
> +static void free_object_directories(struct raw_object_store *o)
> {
> - while (o->alt_odb_list) {
> + while (o->odb) {
> struct object_directory *next;
>
> - next = o->alt_odb_list->next;
> - free_alt_odb(o->alt_odb_list);
> - o->alt_odb_list = next;
> + next = o->odb->next;
> + free_object_directory(o->odb);
> + o->odb = next;
> }
> }
>
> void raw_object_store_clear(struct raw_object_store *o)
> {
> - FREE_AND_NULL(o->objectdir);
> FREE_AND_NULL(o->alternate_db);
>
> oidmap_free(o->replace_map, 1);
> @@ -511,8 +511,9 @@ void raw_object_store_clear(struct raw_object_store *o)
> o->commit_graph = NULL;
> o->commit_graph_attempted = 0;
>
> - free_alt_odbs(o);
> - o->alt_odb_tail = NULL;
> + free_object_directories(o);
> + o->odb_tail = NULL;
> + o->loaded_alternates = 0;
>
> INIT_LIST_HEAD(&o->packed_git_mru);
> close_all_packs(o);
> diff --git a/packfile.c b/packfile.c
> index d6d511cfd2..1eda33247f 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -970,12 +970,12 @@ static void prepare_packed_git(struct repository *r)
>
> if (r->objects->packed_git_initialized)
> return;
> - prepare_multi_pack_index_one(r, r->objects->objectdir, 1);
> - prepare_packed_git_one(r, r->objects->objectdir, 1);
> +
> prepare_alt_odb(r);
> - for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> - prepare_multi_pack_index_one(r, odb->path, 0);
> - prepare_packed_git_one(r, odb->path, 0);
> + for (odb = r->objects->odb; odb; odb = odb->next) {
> + int local = (odb == r->objects->odb);
Here seems to be a place where `odb->is_local` would help.
> + prepare_multi_pack_index_one(r, odb->path, local);
> + prepare_packed_git_one(r, odb->path, local);
> }
> rearrange_packed_git(r);
>
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
2018-11-12 15:48 ` Derrick Stolee
@ 2018-11-12 16:09 ` Jeff King
2018-11-12 19:04 ` Stefan Beller
2018-11-12 18:48 ` Stefan Beller
1 sibling, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 16:09 UTC (permalink / raw)
To: Derrick Stolee
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12, 2018 at 10:48:36AM -0500, Derrick Stolee wrote:
> > If the "the first one is the main store, the rest are alternates" bit is
> > too subtle, we could mark each "struct object_directory" with a bit for
> > "is_local".
>
> This is probably a good thing to do proactively. We have the equivalent in
> the packed_git struct, but that's also because they get out of order. At the
> moment, I can't think of a read-only action that needs to treat the local
> object directory more carefully. The closest I know about is 'git
> pack-objects --local', but that also writes a pack-file.
>
> I assume that when we write a pack-file to the "default location" we use
> get_object_directory() instead of referring to the default object_directory?
Generally, yes, though that should eventually be going away in favor of
accessing it via a "struct repository". And after my series,
get_object_directory() is just returning the_repository->objects->odb->path
(i.e., using the "first one is main" rule).
One thing that makes me nervous about a "local" flag in each struct is
that it implies that it's the source of truth for where to write to. So
what does git_object_directory() look like after that? Do we leave it
with the "first one is main" rule? Or does it become:
for (odb = the_repository->objects->odb; odb; odb = odb->next) {
if (odb->local)
return odb->path;
}
return NULL; /* yikes? */
? That feels like it's making things more complicated, not less.
> > diff --git a/packfile.c b/packfile.c
> > index d6d511cfd2..1eda33247f 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -970,12 +970,12 @@ static void prepare_packed_git(struct repository *r)
> > if (r->objects->packed_git_initialized)
> > return;
> > - prepare_multi_pack_index_one(r, r->objects->objectdir, 1);
> > - prepare_packed_git_one(r, r->objects->objectdir, 1);
> > +
> > prepare_alt_odb(r);
> > - for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> > - prepare_multi_pack_index_one(r, odb->path, 0);
> > - prepare_packed_git_one(r, odb->path, 0);
> > + for (odb = r->objects->odb; odb; odb = odb->next) {
> > + int local = (odb == r->objects->odb);
>
> Here seems to be a place where `odb->is_local` would help.
Yes, though I don't mind this spot in particular, as the check is pretty
straight-forward.
I think an example that would benefit more is the check_and_freshen()
stuff. There we have two almost-the-same wrappers, one of which operates
on just the first element of the list, and the other of which operates
on all of the elements after the first.
It could become:
static int check_and_freshen_odb(struct object_directory *odb_list,
const struct object_id *oid,
int freshen,
int local)
{
struct object_directory *odb;
for (odb = odb_list; odb; odb = odb->next) {
static struct strbuf path = STRBUF_INIT;
if (odb->local != local)
continue;
odb_loose_path(odb, &path, oid->hash);
return check_and_freshen_file(path.buf, freshen);
}
}
int check_and_freshen_local(const struct object_id *oid, int freshen)
{
return check_and_freshen_odb(the_repository->objects->odb, oid,
freshen, 1);
}
int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
{
return check_and_freshen_odb(the_repository->objects->odb, oid,
freshen, 0);
}
I'm not sure that is a big improvement over the patch we're replying to,
though.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
2018-11-12 16:09 ` Jeff King
@ 2018-11-12 19:04 ` Stefan Beller
2018-11-22 17:42 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: Stefan Beller @ 2018-11-12 19:04 UTC (permalink / raw)
To: Jeff King
Cc: Derrick Stolee, gerardu, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, tikuta
On Mon, Nov 12, 2018 at 8:09 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Nov 12, 2018 at 10:48:36AM -0500, Derrick Stolee wrote:
>
> > > If the "the first one is the main store, the rest are alternates" bit is
> > > too subtle, we could mark each "struct object_directory" with a bit for
> > > "is_local".
> >
> > This is probably a good thing to do proactively. We have the equivalent in
> > the packed_git struct, but that's also because they get out of order. At the
> > moment, I can't think of a read-only action that needs to treat the local
> > object directory more carefully. The closest I know about is 'git
> > pack-objects --local', but that also writes a pack-file.
> >
> > I assume that when we write a pack-file to the "default location" we use
> > get_object_directory() instead of referring to the default object_directory?
>
> Generally, yes, though that should eventually be going away in favor of
> accessing it via a "struct repository". And after my series,
> get_object_directory() is just returning the_repository->objects->odb->path
> (i.e., using the "first one is main" rule).
>
> One thing that makes me nervous about a "local" flag in each struct is
> that it implies that it's the source of truth for where to write to. So
> what does git_object_directory() look like after that? Do we leave it
> with the "first one is main" rule? Or does it become:
s/git/get/ ;-) get_object_directory is very old and was introduced in
e1b10391ea (Use git config file for committer name and email info,
2005-10-11) by Linus.
I would argue that we might want to get rid of that function now,
actually as it doesn't seem to add value to the code (assuming the
BUG never triggers), and using a_repo->objects->objectdir
or after this series a_repo->objects->odb->path; is just as short.
$ git grep get_object_directory |wc -l
30
$ git grep -- "->objects->objectdir" |wc -l
10
Ah well, we're not there yet.
> for (odb = the_repository->objects->odb; odb; odb = odb->next) {
> if (odb->local)
> return odb->path;
> }
> return NULL; /* yikes? */
>
> ? That feels like it's making things more complicated, not less.
It depends if the caller cares about the local flag.
I'd think we can have more than one local, eventually?
Just think of the partial clone stuff that may have a local
set of promised stuff and another set of actual objects,
which may be stored in different local odbs.
If the caller cares about the distinction, they would need
to write out this loop as above themselves.
If they don't care, we could migrate them to not
use this function, so we can get rid of it?
> > > - for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> > > - prepare_multi_pack_index_one(r, odb->path, 0);
> > > - prepare_packed_git_one(r, odb->path, 0);
> > > + for (odb = r->objects->odb; odb; odb = odb->next) {
> > > + int local = (odb == r->objects->odb);
> >
> > Here seems to be a place where `odb->is_local` would help.
>
> Yes, though I don't mind this spot in particular, as the check is pretty
> straight-forward.
>
> I think an example that would benefit more is the check_and_freshen()
> stuff. There we have two almost-the-same wrappers, one of which operates
> on just the first element of the list, and the other of which operates
> on all of the elements after the first.
>
> It could become:
>
> static int check_and_freshen_odb(struct object_directory *odb_list,
> const struct object_id *oid,
> int freshen,
> int local)
> {
> struct object_directory *odb;
>
> for (odb = odb_list; odb; odb = odb->next) {
> static struct strbuf path = STRBUF_INIT;
>
> if (odb->local != local)
> continue;
>
> odb_loose_path(odb, &path, oid->hash);
> return check_and_freshen_file(path.buf, freshen);
> }
> }
>
> int check_and_freshen_local(const struct object_id *oid, int freshen)
> {
> return check_and_freshen_odb(the_repository->objects->odb, oid,
> freshen, 1);
> }
>
> int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
> {
> return check_and_freshen_odb(the_repository->objects->odb, oid,
> freshen, 0);
> }
>
I am fine with (a maybe better documented) "first is local" rule, but
the code above looks intriguing, except a little wasteful
(we need two full loops in check_and_freshen, but ideally we
can do by just one loop).
What does the local flag mean anyway in a world where we
have many odbs in a repository, that are not distinguishable
except by their order? AFAICT it is actually to be used for differentiating
how much we care in fsck/cat-file/packing, as it may be borrowed
from an alternate, so maybe the flag is rather to be named
after ownership and not so much about it locality?
(I think "borrowed" or "owned" or even just "important"
or "external" or "alternate" may work)
Stefan
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
2018-11-12 19:04 ` Stefan Beller
@ 2018-11-22 17:42 ` Jeff King
0 siblings, 0 replies; 99+ messages in thread
From: Jeff King @ 2018-11-22 17:42 UTC (permalink / raw)
To: Stefan Beller
Cc: Derrick Stolee, gerardu, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, tikuta
On Mon, Nov 12, 2018 at 11:04:52AM -0800, Stefan Beller wrote:
> > for (odb = the_repository->objects->odb; odb; odb = odb->next) {
> > if (odb->local)
> > return odb->path;
> > }
> > return NULL; /* yikes? */
> >
> > ? That feels like it's making things more complicated, not less.
>
> It depends if the caller cares about the local flag.
>
> I'd think we can have more than one local, eventually?
> Just think of the partial clone stuff that may have a local
> set of promised stuff and another set of actual objects,
> which may be stored in different local odbs.
Yeah, but I think the definition of "local" gets very tricky there, and
we'll have to think about what it means. So I'd actually prefer to punt
on doing anything too clever at this point.
> If the caller cares about the distinction, they would need
> to write out this loop as above themselves.
> If they don't care, we could migrate them to not
> use this function, so we can get rid of it?
Yes, I do think in the long run we'd want to get rid of most calls to
get_object_directory(). Not only because it uses the_repository, but
because most callers should be asking for a specific action: I want to
write an object, or I want to read an object.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
2018-11-12 15:48 ` Derrick Stolee
2018-11-12 16:09 ` Jeff King
@ 2018-11-12 18:48 ` Stefan Beller
1 sibling, 0 replies; 99+ messages in thread
From: Stefan Beller @ 2018-11-12 18:48 UTC (permalink / raw)
To: Derrick Stolee
Cc: Jeff King, gerardu, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, tikuta
On Mon, Nov 12, 2018 at 7:48 AM Derrick Stolee <stolee@gmail.com> wrote:
>
[... lots of quoted text...]
Some email readers are very good at recognizing unchanged quoted
text and collapse it, not so at
https://public-inbox.org/git/421d3b43-3425-72c9-218e-facd86e28267@gmail.com/
which I use to read through this series. It would help if you'd cut most
of the (con)text that is not nearby to your reply, as I read the context
email just before your reply.
Thanks,
Stefan
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 7/9] object-store: provide helpers for loose_objects_cache
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (5 preceding siblings ...)
2018-11-12 14:50 ` [PATCH 6/9] sha1-file: use an object_directory for " Jeff King
@ 2018-11-12 14:50 ` Jeff King
2018-11-12 19:24 ` René Scharfe
2018-11-12 14:54 ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
` (2 subsequent siblings)
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:50 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
Our object_directory struct has a loose objects cache that all users of
the struct can see. But the only one that knows how to load the cache is
find_short_object_filename(). Let's extract that logic in to a reusable
function.
While we're at it, let's also reset the cache when we re-read the object
directories. This shouldn't have an impact on performance, as re-reads
are meant to be rare (and are already expensive, so we avoid them with
things like OBJECT_INFO_QUICK).
Since the cache is already meant to be an approximation, it's tempting
to skip even this bit of safety. But it's necessary to allow more code
to use it. For instance, fetch-pack explicitly re-reads the object
directory after performing its fetch, and would be confused if we didn't
clear the cache.
Signed-off-by: Jeff King <peff@peff.net>
---
object-store.h | 18 +++++++++++++-----
packfile.c | 8 ++++++++
sha1-file.c | 26 ++++++++++++++++++++++++++
sha1-name.c | 21 +--------------------
4 files changed, 48 insertions(+), 25 deletions(-)
diff --git a/object-store.h b/object-store.h
index 30faf7b391..bf1e0cb761 100644
--- a/object-store.h
+++ b/object-store.h
@@ -11,11 +11,12 @@ struct object_directory {
struct object_directory *next;
/*
- * Used to store the results of readdir(3) calls when searching
- * for unique abbreviated hashes. This cache is never
- * invalidated, thus it's racy and not necessarily accurate.
- * That's fine for its purpose; don't use it for tasks requiring
- * greater accuracy!
+ * Used to store the results of readdir(3) calls when we are OK
+ * sacrificing accuracy due to races for speed. That includes
+ * our search for unique abbreviated hashes. Don't use it for tasks
+ * requiring greater accuracy!
+ *
+ * Be sure to call odb_load_loose_cache() before using.
*/
char loose_objects_subdir_seen[256];
struct oid_array loose_objects_cache;
@@ -45,6 +46,13 @@ void add_to_alternates_file(const char *dir);
*/
void add_to_alternates_memory(const char *dir);
+/*
+ * Populate an odb's loose object cache for one particular subdirectory (i.e.,
+ * the one that corresponds to the first byte of objects you're interested in,
+ * from 0 to 255 inclusive).
+ */
+void odb_load_loose_cache(struct object_directory *odb, int subdir_nr);
+
struct packed_git {
struct packed_git *next;
struct list_head mru;
diff --git a/packfile.c b/packfile.c
index 1eda33247f..91fd40efb0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -987,6 +987,14 @@ static void prepare_packed_git(struct repository *r)
void reprepare_packed_git(struct repository *r)
{
+ struct object_directory *odb;
+
+ for (odb = r->objects->odb; odb; odb = odb->next) {
+ oid_array_clear(&odb->loose_objects_cache);
+ memset(&odb->loose_objects_subdir_seen, 0,
+ sizeof(odb->loose_objects_subdir_seen));
+ }
+
r->objects->approximate_object_count_valid = 0;
r->objects->packed_git_initialized = 0;
prepare_packed_git(r);
diff --git a/sha1-file.c b/sha1-file.c
index 503262edd2..4aae716a37 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2125,6 +2125,32 @@ int for_each_loose_object(each_loose_object_fn cb, void *data,
return 0;
}
+static int append_loose_object(const struct object_id *oid, const char *path,
+ void *data)
+{
+ oid_array_append(data, oid);
+ return 0;
+}
+
+void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
+{
+ struct strbuf buf = STRBUF_INIT;
+
+ if (subdir_nr < 0 ||
+ subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen))
+ BUG("subdir_nr out of range");
+
+ if (odb->loose_objects_subdir_seen[subdir_nr])
+ return;
+
+ strbuf_addstr(&buf, odb->path);
+ for_each_file_in_obj_subdir(subdir_nr, &buf,
+ append_loose_object,
+ NULL, NULL,
+ &odb->loose_objects_cache);
+ odb->loose_objects_subdir_seen[subdir_nr] = 1;
+}
+
static int check_stream_sha1(git_zstream *stream,
const char *hdr,
unsigned long size,
diff --git a/sha1-name.c b/sha1-name.c
index 358ca5e288..b24502811b 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -83,36 +83,19 @@ static void update_candidates(struct disambiguate_state *ds, const struct object
/* otherwise, current can be discarded and candidate is still good */
}
-static int append_loose_object(const struct object_id *oid, const char *path,
- void *data)
-{
- oid_array_append(data, oid);
- return 0;
-}
-
static int match_sha(unsigned, const unsigned char *, const unsigned char *);
static void find_short_object_filename(struct disambiguate_state *ds)
{
int subdir_nr = ds->bin_pfx.hash[0];
struct object_directory *odb;
- struct strbuf buf = STRBUF_INIT;
for (odb = the_repository->objects->odb;
odb && !ds->ambiguous;
odb = odb->next) {
int pos;
- if (!odb->loose_objects_subdir_seen[subdir_nr]) {
- strbuf_reset(&buf);
- strbuf_addstr(&buf, odb->path);
- for_each_file_in_obj_subdir(subdir_nr, &buf,
- append_loose_object,
- NULL, NULL,
- &odb->loose_objects_cache);
- odb->loose_objects_subdir_seen[subdir_nr] = 1;
- }
-
+ odb_load_loose_cache(odb, subdir_nr);
pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx);
if (pos < 0)
pos = -1 - pos;
@@ -125,8 +108,6 @@ static void find_short_object_filename(struct disambiguate_state *ds)
pos++;
}
}
-
- strbuf_release(&buf);
}
static int match_sha(unsigned len, const unsigned char *a, const unsigned char *b)
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 7/9] object-store: provide helpers for loose_objects_cache
2018-11-12 14:50 ` [PATCH 7/9] object-store: provide helpers for loose_objects_cache Jeff King
@ 2018-11-12 19:24 ` René Scharfe
2018-11-12 20:16 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-11-12 19:24 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
Takuto Ikuta
Am 12.11.2018 um 15:50 schrieb Jeff King:
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2125,6 +2125,32 @@ int for_each_loose_object(each_loose_object_fn cb, void *data,
> return 0;
> }
>
> +static int append_loose_object(const struct object_id *oid, const char *path,
> + void *data)
> +{
> + oid_array_append(data, oid);
> + return 0;
> +}
> +
> +void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
> +{
> + struct strbuf buf = STRBUF_INIT;
> +
> + if (subdir_nr < 0 ||
Why not make subdir_nr unsigned (like in for_each_file_in_obj_subdir()), and
get rid of this first check?
> + subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen))
Using unsigned char for subdir_nr would allow removing the second check as
well, but might hide invalid values in implicit conversions, I guess.
> + BUG("subdir_nr out of range");
Showing the invalid value (like in for_each_file_in_obj_subdir()) would make
debugging easier in case the impossible actually happens.
> +
> + if (odb->loose_objects_subdir_seen[subdir_nr])
> + return;
> +
> + strbuf_addstr(&buf, odb->path);
> + for_each_file_in_obj_subdir(subdir_nr, &buf,
> + append_loose_object,
> + NULL, NULL,
> + &odb->loose_objects_cache);
> + odb->loose_objects_subdir_seen[subdir_nr] = 1;
About here would be the ideal new home for ...
> +}
> +
> static int check_stream_sha1(git_zstream *stream,
> const char *hdr,
> unsigned long size,
> diff --git a/sha1-name.c b/sha1-name.c
> index 358ca5e288..b24502811b 100644
> --- a/sha1-name.c
> +++ b/sha1-name.c
> @@ -83,36 +83,19 @@ static void update_candidates(struct disambiguate_state *ds, const struct object
> /* otherwise, current can be discarded and candidate is still good */
> }
>
> -static int append_loose_object(const struct object_id *oid, const char *path,
> - void *data)
> -{
> - oid_array_append(data, oid);
> - return 0;
> -}
> -
> static int match_sha(unsigned, const unsigned char *, const unsigned char *);
>
> static void find_short_object_filename(struct disambiguate_state *ds)
> {
> int subdir_nr = ds->bin_pfx.hash[0];
> struct object_directory *odb;
> - struct strbuf buf = STRBUF_INIT;
>
> for (odb = the_repository->objects->odb;
> odb && !ds->ambiguous;
> odb = odb->next) {
> int pos;
>
> - if (!odb->loose_objects_subdir_seen[subdir_nr]) {
> - strbuf_reset(&buf);
> - strbuf_addstr(&buf, odb->path);
> - for_each_file_in_obj_subdir(subdir_nr, &buf,
> - append_loose_object,
> - NULL, NULL,
> - &odb->loose_objects_cache);
> - odb->loose_objects_subdir_seen[subdir_nr] = 1;
> - }
> -
> + odb_load_loose_cache(odb, subdir_nr);
> pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx);
> if (pos < 0)
> pos = -1 - pos;
> @@ -125,8 +108,6 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> pos++;
> }
> }
> -
> - strbuf_release(&buf);
... this line.
> }
>
> static int match_sha(unsigned len, const unsigned char *a, const unsigned char *b)
>
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 7/9] object-store: provide helpers for loose_objects_cache
2018-11-12 19:24 ` René Scharfe
@ 2018-11-12 20:16 ` Jeff King
0 siblings, 0 replies; 99+ messages in thread
From: Jeff King @ 2018-11-12 20:16 UTC (permalink / raw)
To: René Scharfe
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, Takuto Ikuta
On Mon, Nov 12, 2018 at 08:24:59PM +0100, René Scharfe wrote:
> > +void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
> > +{
> > + struct strbuf buf = STRBUF_INIT;
> > +
> > + if (subdir_nr < 0 ||
>
> Why not make subdir_nr unsigned (like in for_each_file_in_obj_subdir()), and
> get rid of this first check?
I stole the use of "int" from your code. ;)
More seriously, though, I wondered if callers might have sign issues
assigning from a "signed char". Usually we hold object ids in an
"unsigned char", but what happens if I do:
signed char foo[] = { 1, 2, 3, 4 };
odb_load_loose_cache(foo[0]);
when the parameter is "unsigned"?
I'll admit I get lost in all of the integer promotion rules there, but
are we sure there's no way we can end up with a funky truncation?
If the answer is no, then I agree that your suggestion is a strict
improvement.
> > + subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen))
>
> Using unsigned char for subdir_nr would allow removing the second check as
> well, but might hide invalid values in implicit conversions, I guess.
Yeah, I know that one could be a dangerous truncation.
I also considered just taking an object_id, which would make the
function "load the cache such that this oid would be valid". And it's
not necessarily the caller's business how much we load.
But that's OK for OBJECT_INFO_QUICK, but it's pretty darn subtle for the
abbrev code. That code doesn't care about just one object, but wants all
objects that share its prefix. That works now because we know that the
prefix is always at least 2 hex chars, so it's OK to load just that
subset.
> > + BUG("subdir_nr out of range");
>
> Showing the invalid value (like in for_each_file_in_obj_subdir()) would make
> debugging easier in case the impossible actually happens.
Good suggestion.
> > + strbuf_addstr(&buf, odb->path);
> > + for_each_file_in_obj_subdir(subdir_nr, &buf,
> > + append_loose_object,
> > + NULL, NULL,
> > + &odb->loose_objects_cache);
> > + odb->loose_objects_subdir_seen[subdir_nr] = 1;
>
> About here would be the ideal new home for ...
> [...]
> > -
> > - strbuf_release(&buf);
>
> ... this line.
Oops, thanks. I toyed with making the strbuf here static, which is why I
dropped the release. But since we only use it on a cache miss, I decided
it was better to avoid the hidden global (and then of course forgot to
re-add the release).
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (6 preceding siblings ...)
2018-11-12 14:50 ` [PATCH 7/9] object-store: provide helpers for loose_objects_cache Jeff King
@ 2018-11-12 14:54 ` Jeff King
2018-11-12 16:00 ` Derrick Stolee
` (2 more replies)
2018-11-12 14:55 ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
2018-11-12 16:02 ` [PATCH 0/9] caching loose objects Derrick Stolee
9 siblings, 3 replies; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:54 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
In cases where we expect to ask has_sha1_file() about a lot of objects
that we are not likely to have (e.g., during fetch negotiation), we
already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
a simultaneous write or repack) for speed (we avoid re-scanning the pack
directory).
However, even checking for loose objects can be expensive, as we will
stat() each one. On many systems this cost isn't too noticeable, but
stat() can be particularly slow on some operating systems, or due to
network filesystems.
Since the QUICK flag already tells us that we're OK with a slightly
stale answer, we can use that as a cue to look in our in-memory cache of
each object directory. That basically trades an in-memory binary search
for a stat() call.
Note that it is possible for this to actually be _slower_. We'll do a
full readdir() to fill the cache, so if you have a very large number of
loose objects and a very small number of lookups, that readdir() may end
up more expensive.
This shouldn't be a big deal in practice. If you have a large number of
reachable loose objects, you'll already run into performance problems
(which you should remedy by repacking). You may have unreachable objects
which wouldn't otherwise impact performance. Usually these would go away
with the prune step of "git gc", but they may be held for up to 2 weeks
in the default configuration.
So it comes down to how many such objects you might reasonably expect to
have, how much slower is readdir() on N entries versus M stat() calls
(and here we really care about the syscall backing readdir(), like
getdents() on Linux, but I'll just call this readdir() below).
If N is much smaller than M (a typical packed repo), we know this is a
big win (few readdirs() followed by many uses of the resulting cache).
When N and M are similar in size, it's also a win. We care about the
latency of making a syscall, and readdir() should be giving us many
values in a single call. How many?
On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
entries per call (which is 64 bytes per entry; the name itself is 38
bytes, plus there are some other fields). So we can imagine that this is
always a win as long as the number of loose objects in the repository is
a factor of 500 less than the number of lookups you make. It's hard to
auto-tune this because we don't generally know up front how many lookups
we're going to do. But it's unlikely for this to perform significantly
worse.
Signed-off-by: Jeff King <peff@peff.net>
---
There's some obvious hand-waving in the paragraphs above. I would love
it if somebody with an NFS system could do some before/after timings
with various numbers of loose objects, to get a sense of where the
breakeven point is.
My gut is that we do not need the complexity of a cache-size limit, nor
of a config option to disable this. But it would be nice to have a real
number where "reasonable" ends and "pathological" begins. :)
object-store.h | 1 +
sha1-file.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/object-store.h b/object-store.h
index bf1e0cb761..60758efad8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -13,6 +13,7 @@ struct object_directory {
/*
* Used to store the results of readdir(3) calls when we are OK
* sacrificing accuracy due to races for speed. That includes
+ * object existence with OBJECT_INFO_QUICK, as well as
* our search for unique abbreviated hashes. Don't use it for tasks
* requiring greater accuracy!
*
diff --git a/sha1-file.c b/sha1-file.c
index 4aae716a37..e53da0b701 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r,
return -1;
}
+static int quick_has_loose(struct repository *r,
+ const unsigned char *sha1)
+{
+ int subdir_nr = sha1[0];
+ struct object_id oid;
+ struct object_directory *odb;
+
+ hashcpy(oid.hash, sha1);
+
+ prepare_alt_odb(r);
+ for (odb = r->objects->odb; odb; odb = odb->next) {
+ odb_load_loose_cache(odb, subdir_nr);
+ if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
+ return 1;
+ }
+ return 0;
+}
+
/*
* Map the loose object at "path" if it is not NULL, or the path found by
* searching for a loose object named "sha1".
@@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r,
if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
const char *path;
struct stat st;
+ if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
+ return quick_has_loose(r, sha1) ? 0 : -1;
if (stat_sha1_file(r, sha1, &st, &path) < 0)
return -1;
if (oi->disk_sizep)
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 14:54 ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
@ 2018-11-12 16:00 ` Derrick Stolee
2018-11-12 16:01 ` Ævar Arnfjörð Bjarmason
2018-11-27 20:48 ` René Scharfe
2 siblings, 0 replies; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 16:00 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On 11/12/2018 9:54 AM, Jeff King wrote:
> In cases where we expect to ask has_sha1_file() about a lot of objects
> that we are not likely to have (e.g., during fetch negotiation), we
> already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
> a simultaneous write or repack) for speed (we avoid re-scanning the pack
> directory).
>
> However, even checking for loose objects can be expensive, as we will
> stat() each one. On many systems this cost isn't too noticeable, but
> stat() can be particularly slow on some operating systems, or due to
> network filesystems.
>
> Since the QUICK flag already tells us that we're OK with a slightly
> stale answer, we can use that as a cue to look in our in-memory cache of
> each object directory. That basically trades an in-memory binary search
> for a stat() call.
>
> Note that it is possible for this to actually be _slower_. We'll do a
> full readdir() to fill the cache, so if you have a very large number of
> loose objects and a very small number of lookups, that readdir() may end
> up more expensive.
>
> This shouldn't be a big deal in practice. If you have a large number of
> reachable loose objects, you'll already run into performance problems
> (which you should remedy by repacking). You may have unreachable objects
> which wouldn't otherwise impact performance. Usually these would go away
> with the prune step of "git gc", but they may be held for up to 2 weeks
> in the default configuration.
>
> So it comes down to how many such objects you might reasonably expect to
> have, how much slower is readdir() on N entries versus M stat() calls
> (and here we really care about the syscall backing readdir(), like
> getdents() on Linux, but I'll just call this readdir() below).
>
> If N is much smaller than M (a typical packed repo), we know this is a
> big win (few readdirs() followed by many uses of the resulting cache).
> When N and M are similar in size, it's also a win. We care about the
> latency of making a syscall, and readdir() should be giving us many
> values in a single call. How many?
>
> On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
> entries per call (which is 64 bytes per entry; the name itself is 38
> bytes, plus there are some other fields). So we can imagine that this is
> always a win as long as the number of loose objects in the repository is
> a factor of 500 less than the number of lookups you make. It's hard to
> auto-tune this because we don't generally know up front how many lookups
> we're going to do. But it's unlikely for this to perform significantly
> worse.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There's some obvious hand-waving in the paragraphs above. I would love
> it if somebody with an NFS system could do some before/after timings
> with various numbers of loose objects, to get a sense of where the
> breakeven point is.
>
> My gut is that we do not need the complexity of a cache-size limit, nor
> of a config option to disable this. But it would be nice to have a real
> number where "reasonable" ends and "pathological" begins. :)
I'm interested in such numbers, but do not have the appropriate setup to
test.
I think the tradeoffs you mention above are reasonable. There's also
some chance that this isn't "extra" work but is just "earlier" work, as
the abbreviation code would load these loose object directories.
>
> object-store.h | 1 +
> sha1-file.c | 20 ++++++++++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/object-store.h b/object-store.h
> index bf1e0cb761..60758efad8 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -13,6 +13,7 @@ struct object_directory {
> /*
> * Used to store the results of readdir(3) calls when we are OK
> * sacrificing accuracy due to races for speed. That includes
> + * object existence with OBJECT_INFO_QUICK, as well as
> * our search for unique abbreviated hashes. Don't use it for tasks
> * requiring greater accuracy!
> *
> diff --git a/sha1-file.c b/sha1-file.c
> index 4aae716a37..e53da0b701 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r,
> return -1;
> }
>
> +static int quick_has_loose(struct repository *r,
> + const unsigned char *sha1)
> +{
> + int subdir_nr = sha1[0];
> + struct object_id oid;
> + struct object_directory *odb;
> +
> + hashcpy(oid.hash, sha1);
> +
> + prepare_alt_odb(r);
> + for (odb = r->objects->odb; odb; odb = odb->next) {
> + odb_load_loose_cache(odb, subdir_nr);
> + if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
> + return 1;
> + }
> + return 0;
> +}
> +
> /*
> * Map the loose object at "path" if it is not NULL, or the path found by
> * searching for a loose object named "sha1".
> @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r,
> if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
> const char *path;
> struct stat st;
> + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
> + return quick_has_loose(r, sha1) ? 0 : -1;
> if (stat_sha1_file(r, sha1, &st, &path) < 0)
> return -1;
> if (oi->disk_sizep)
LGTM.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 14:54 ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
2018-11-12 16:00 ` Derrick Stolee
@ 2018-11-12 16:01 ` Ævar Arnfjörð Bjarmason
2018-11-12 16:21 ` Jeff King
2018-11-27 20:48 ` René Scharfe
2 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 16:01 UTC (permalink / raw)
To: Jeff King
Cc: Geert Jansen, Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12 2018, Jeff King wrote:
> In cases where we expect to ask has_sha1_file() about a lot of objects
> that we are not likely to have (e.g., during fetch negotiation), we
> already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
> a simultaneous write or repack) for speed (we avoid re-scanning the pack
> directory).
>
> However, even checking for loose objects can be expensive, as we will
> stat() each one. On many systems this cost isn't too noticeable, but
> stat() can be particularly slow on some operating systems, or due to
> network filesystems.
>
> Since the QUICK flag already tells us that we're OK with a slightly
> stale answer, we can use that as a cue to look in our in-memory cache of
> each object directory. That basically trades an in-memory binary search
> for a stat() call.
>
> Note that it is possible for this to actually be _slower_. We'll do a
> full readdir() to fill the cache, so if you have a very large number of
> loose objects and a very small number of lookups, that readdir() may end
> up more expensive.
>
> This shouldn't be a big deal in practice. If you have a large number of
> reachable loose objects, you'll already run into performance problems
> (which you should remedy by repacking). You may have unreachable objects
> which wouldn't otherwise impact performance. Usually these would go away
> with the prune step of "git gc", but they may be held for up to 2 weeks
> in the default configuration.
>
> So it comes down to how many such objects you might reasonably expect to
> have, how much slower is readdir() on N entries versus M stat() calls
> (and here we really care about the syscall backing readdir(), like
> getdents() on Linux, but I'll just call this readdir() below).
>
> If N is much smaller than M (a typical packed repo), we know this is a
> big win (few readdirs() followed by many uses of the resulting cache).
> When N and M are similar in size, it's also a win. We care about the
> latency of making a syscall, and readdir() should be giving us many
> values in a single call. How many?
>
> On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
> entries per call (which is 64 bytes per entry; the name itself is 38
> bytes, plus there are some other fields). So we can imagine that this is
> always a win as long as the number of loose objects in the repository is
> a factor of 500 less than the number of lookups you make. It's hard to
> auto-tune this because we don't generally know up front how many lookups
> we're going to do. But it's unlikely for this to perform significantly
> worse.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There's some obvious hand-waving in the paragraphs above. I would love
> it if somebody with an NFS system could do some before/after timings
> with various numbers of loose objects, to get a sense of where the
> breakeven point is.
>
> My gut is that we do not need the complexity of a cache-size limit, nor
> of a config option to disable this. But it would be nice to have a real
> number where "reasonable" ends and "pathological" begins. :)
I'm happy to test this on some of the NFS we have locally, and started
out with a plan to write some for-loop using the low-level API (so it
would look up all 256), fake populate .git/objects/?? with N number of
objects etc, but ran out of time.
Do you have something ready that you think would be representative and I
could just run? If not I'll try to pick this up again...
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 16:01 ` Ævar Arnfjörð Bjarmason
@ 2018-11-12 16:21 ` Jeff King
2018-11-12 22:18 ` Ævar Arnfjörð Bjarmason
2018-11-12 22:44 ` Geert Jansen
0 siblings, 2 replies; 99+ messages in thread
From: Jeff King @ 2018-11-12 16:21 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Geert Jansen, Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > There's some obvious hand-waving in the paragraphs above. I would love
> > it if somebody with an NFS system could do some before/after timings
> > with various numbers of loose objects, to get a sense of where the
> > breakeven point is.
> >
> > My gut is that we do not need the complexity of a cache-size limit, nor
> > of a config option to disable this. But it would be nice to have a real
> > number where "reasonable" ends and "pathological" begins. :)
>
> I'm happy to test this on some of the NFS we have locally, and started
> out with a plan to write some for-loop using the low-level API (so it
> would look up all 256), fake populate .git/objects/?? with N number of
> objects etc, but ran out of time.
>
> Do you have something ready that you think would be representative and I
> could just run? If not I'll try to pick this up again...
No, but they don't even really need to be actual objects. So I suspect
something like:
git init
for i in $(seq 256); do
i=$(printf %02x $i)
mkdir -p .git/objects/$i
for j in $(seq --format=%038g 1000); do
echo foo >.git/objects/$i/$j
done
done
git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
might work (for various values of 1000). The shell loop would probably
be faster as perl, too. :)
Make sure you clear the object directory between runs, though (otherwise
the subsequent index-pack's really do find collisions and spend time
accessing the objects).
If you want real objects, you could probably just dump a bunch of
sequential blobs to fast-import, and then pipe the result to
unpack-objects.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 16:21 ` Jeff King
@ 2018-11-12 22:18 ` Ævar Arnfjörð Bjarmason
2018-11-12 22:30 ` Ævar Arnfjörð Bjarmason
2018-11-12 22:44 ` Geert Jansen
1 sibling, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 22:18 UTC (permalink / raw)
To: Jeff King
Cc: Geert Jansen, Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12 2018, Jeff King wrote:
> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> > There's some obvious hand-waving in the paragraphs above. I would love
>> > it if somebody with an NFS system could do some before/after timings
>> > with various numbers of loose objects, to get a sense of where the
>> > breakeven point is.
>> >
>> > My gut is that we do not need the complexity of a cache-size limit, nor
>> > of a config option to disable this. But it would be nice to have a real
>> > number where "reasonable" ends and "pathological" begins. :)
>>
>> I'm happy to test this on some of the NFS we have locally, and started
>> out with a plan to write some for-loop using the low-level API (so it
>> would look up all 256), fake populate .git/objects/?? with N number of
>> objects etc, but ran out of time.
>>
>> Do you have something ready that you think would be representative and I
>> could just run? If not I'll try to pick this up again...
>
> No, but they don't even really need to be actual objects. So I suspect
> something like:
>
> git init
> for i in $(seq 256); do
> i=$(printf %02x $i)
> mkdir -p .git/objects/$i
> for j in $(seq --format=%038g 1000); do
> echo foo >.git/objects/$i/$j
> done
> done
> git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>
> might work (for various values of 1000). The shell loop would probably
> be faster as perl, too. :)
>
> Make sure you clear the object directory between runs, though (otherwise
> the subsequent index-pack's really do find collisions and spend time
> accessing the objects).
>
> If you want real objects, you could probably just dump a bunch of
> sequential blobs to fast-import, and then pipe the result to
> unpack-objects.
>
> -Peff
I did a very ad-hoc test against a NetApp filer using the test script
quoted at the end of this E-Mail. The test compared origin/master, this
branch of yours, and my core.checkCollisions=false branch.
When run with DBD-mysql.git (just some random ~1k commit repo I had):
$ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
I get:
Test origin/master peff/jk/loose-cache avar/check-collisions-config
------------------------------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7%
0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3%
0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6%
0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2%
0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1%
0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3%
0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9%
I then hacked it to test against git.git, but skipped origin/master for
that one because it takes *ages*. So just mine v.s. yours:
$ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
[...]
Test peff/jk/loose-cache avar/check-collisions-config
---------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9%
0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1%
0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2%
0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1%
0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2%
0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2%
0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8%
So not much of a practical difference perhaps. But then again this isn't
a very realistic test case of anything. Rarely are you going to push a
history of something the size of git.git into a repo with this many
loose objects.
Using sha1collisiondetection.git is I think the most realistic scenario,
i.e. you'll often end up fetching/pushing something roughly the size of
its entire history on a big repo, and with it:
Test peff/jk/loose-cache avar/check-collisions-config
---------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8%
0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7%
0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5%
0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0%
0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1%
0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3%
0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9%
As noted in previous threads I have an in-house monorepo where (due to
expiry policies) loose objects hover around the 256*250 mark.
The script, which is hacky as hell and takes shortcuts not to re-create
the huge fake loose object collection every time (takes ages). Perhaps
you're interested in incorporating some version of this into a v2. To be
useful it should take some target path as an env variable.
$ cat t/perf/p0008-index-pack.sh
#!/bin/sh
test_description="Tests performance of index-pack with loose objects"
. ./perf-lib.sh
test_perf_fresh_repo
test_expect_success 'setup tests' '
for count in 1 10 100 250 500 750 1000
do
if test -d /mnt/ontap_githackers/repo-$count.git
then
rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack
else
git init --bare /mnt/ontap_githackers/repo-$count.git &&
(
cd /mnt/ontap_githackers/repo-$count.git &&
for i in $(seq 0 255)
do
i=$(printf %02x $i) &&
mkdir objects/$i &&
for j in $(seq --format=%038g $count)
do
>objects/$i/$j
done
done
)
fi
done
'
for count in 1 10 100 250 500 750 1000
do
echo 3 | sudo tee /proc/sys/vm/drop_caches
test_perf "index-pack with 256*$count loose objects" "
(
cd /mnt/ontap_githackers/repo-$count.git &&
rm -fv objects/pack/*;
git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack
)
"
done
test_done
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 22:18 ` Ævar Arnfjörð Bjarmason
@ 2018-11-12 22:30 ` Ævar Arnfjörð Bjarmason
2018-11-13 10:02 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 22:30 UTC (permalink / raw)
To: Jeff King
Cc: Geert Jansen, Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Nov 12 2018, Jeff King wrote:
>
>> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>>> > There's some obvious hand-waving in the paragraphs above. I would love
>>> > it if somebody with an NFS system could do some before/after timings
>>> > with various numbers of loose objects, to get a sense of where the
>>> > breakeven point is.
>>> >
>>> > My gut is that we do not need the complexity of a cache-size limit, nor
>>> > of a config option to disable this. But it would be nice to have a real
>>> > number where "reasonable" ends and "pathological" begins. :)
>>>
>>> I'm happy to test this on some of the NFS we have locally, and started
>>> out with a plan to write some for-loop using the low-level API (so it
>>> would look up all 256), fake populate .git/objects/?? with N number of
>>> objects etc, but ran out of time.
>>>
>>> Do you have something ready that you think would be representative and I
>>> could just run? If not I'll try to pick this up again...
>>
>> No, but they don't even really need to be actual objects. So I suspect
>> something like:
>>
>> git init
>> for i in $(seq 256); do
>> i=$(printf %02x $i)
>> mkdir -p .git/objects/$i
>> for j in $(seq --format=%038g 1000); do
>> echo foo >.git/objects/$i/$j
>> done
>> done
>> git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>>
>> might work (for various values of 1000). The shell loop would probably
>> be faster as perl, too. :)
>>
>> Make sure you clear the object directory between runs, though (otherwise
>> the subsequent index-pack's really do find collisions and spend time
>> accessing the objects).
>>
>> If you want real objects, you could probably just dump a bunch of
>> sequential blobs to fast-import, and then pipe the result to
>> unpack-objects.
>>
>> -Peff
>
> I did a very ad-hoc test against a NetApp filer using the test script
> quoted at the end of this E-Mail. The test compared origin/master, this
> branch of yours, and my core.checkCollisions=false branch.
>
> When run with DBD-mysql.git (just some random ~1k commit repo I had):
>
> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>
> I get:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> ------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7%
> 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3%
> 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6%
> 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2%
> 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1%
> 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3%
> 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9%
>
> I then hacked it to test against git.git, but skipped origin/master for
> that one because it takes *ages*. So just mine v.s. yours:
>
> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
> [...]
> Test peff/jk/loose-cache avar/check-collisions-config
> ---------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9%
> 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1%
> 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2%
> 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1%
> 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2%
> 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2%
> 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8%
>
> So not much of a practical difference perhaps. But then again this isn't
> a very realistic test case of anything. Rarely are you going to push a
> history of something the size of git.git into a repo with this many
> loose objects.
>
> Using sha1collisiondetection.git is I think the most realistic scenario,
> i.e. you'll often end up fetching/pushing something roughly the size of
> its entire history on a big repo, and with it:
>
> Test peff/jk/loose-cache avar/check-collisions-config
> ---------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8%
> 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7%
> 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5%
> 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0%
> 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1%
> 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3%
> 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9%
>
> As noted in previous threads I have an in-house monorepo where (due to
> expiry policies) loose objects hover around the 256*250 mark.
>
> The script, which is hacky as hell and takes shortcuts not to re-create
> the huge fake loose object collection every time (takes ages). Perhaps
> you're interested in incorporating some version of this into a v2. To be
> useful it should take some target path as an env variable.
I forgot perhaps the most useful metric. Testing against origin/master
too on the sha1collisiondetection.git repo, which as noted above I think
is a good stand-in for making a medium sized push to a big repo. This
shows when the loose cache becomes counterproductive:
Test origin/master peff/jk/loose-cache avar/check-collisions-config
-------------------------------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 0.42(0.04+0.03) 0.17(0.04+0.00) -59.5% 0.04(0.03+0.00) -90.5%
0008.3: index-pack with 256*10 loose objects 0.49(0.04+0.03) 0.19(0.04+0.01) -61.2% 0.04(0.02+0.00) -91.8%
0008.4: index-pack with 256*100 loose objects 0.49(0.04+0.04) 0.33(0.18+0.01) -32.7% 0.05(0.02+0.00) -89.8%
0008.5: index-pack with 256*250 loose objects 0.54(0.03+0.04) 0.59(0.43+0.02) +9.3% 0.04(0.02+0.01) -92.6%
0008.6: index-pack with 256*500 loose objects 0.49(0.04+0.03) 1.04(0.83+0.07) +112.2% 0.04(0.02+0.00) -91.8%
0008.7: index-pack with 256*750 loose objects 0.56(0.04+0.05) 1.50(1.28+0.08) +167.9% 0.04(0.02+0.00) -92.9%
0008.8: index-pack with 256*1000 loose objects 0.54(0.05+0.03) 1.95(1.68+0.13) +261.1% 0.04(0.02+0.00) -92.6%
I still think it's best to take this patch series since it's unlikely
we're making anything worse in practice, the >50k objects case is a
really high number, which I don't think is worth worrying about.
But I am somewhat paranoid about the potential performance
regression. I.e. this is me testing against a really expensive and
relatively well performing NetApp NFS device where the ping stats are:
rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms
So I suspect this might get a lot worse for setups which don't enjoy the
same performance or network locality.
> $ cat t/perf/p0008-index-pack.sh
> #!/bin/sh
>
> test_description="Tests performance of index-pack with loose objects"
>
> . ./perf-lib.sh
>
> test_perf_fresh_repo
>
> test_expect_success 'setup tests' '
> for count in 1 10 100 250 500 750 1000
> do
> if test -d /mnt/ontap_githackers/repo-$count.git
> then
> rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack
> else
> git init --bare /mnt/ontap_githackers/repo-$count.git &&
> (
> cd /mnt/ontap_githackers/repo-$count.git &&
> for i in $(seq 0 255)
> do
> i=$(printf %02x $i) &&
> mkdir objects/$i &&
> for j in $(seq --format=%038g $count)
> do
> >objects/$i/$j
> done
> done
> )
> fi
> done
> '
>
> for count in 1 10 100 250 500 750 1000
> do
> echo 3 | sudo tee /proc/sys/vm/drop_caches
> test_perf "index-pack with 256*$count loose objects" "
> (
> cd /mnt/ontap_githackers/repo-$count.git &&
> rm -fv objects/pack/*;
> git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack
> )
> "
> done
>
> test_done
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 22:30 ` Ævar Arnfjörð Bjarmason
@ 2018-11-13 10:02 ` Ævar Arnfjörð Bjarmason
2018-11-14 18:21 ` René Scharfe
2018-12-02 10:52 ` René Scharfe
0 siblings, 2 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-13 10:02 UTC (permalink / raw)
To: Jeff King
Cc: Geert Jansen, Junio C Hamano, git, René Scharfe, Takuto Ikuta
On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Nov 12 2018, Jeff King wrote:
>>
>>> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>> > There's some obvious hand-waving in the paragraphs above. I would love
>>>> > it if somebody with an NFS system could do some before/after timings
>>>> > with various numbers of loose objects, to get a sense of where the
>>>> > breakeven point is.
>>>> >
>>>> > My gut is that we do not need the complexity of a cache-size limit, nor
>>>> > of a config option to disable this. But it would be nice to have a real
>>>> > number where "reasonable" ends and "pathological" begins. :)
>>>>
>>>> I'm happy to test this on some of the NFS we have locally, and started
>>>> out with a plan to write some for-loop using the low-level API (so it
>>>> would look up all 256), fake populate .git/objects/?? with N number of
>>>> objects etc, but ran out of time.
>>>>
>>>> Do you have something ready that you think would be representative and I
>>>> could just run? If not I'll try to pick this up again...
>>>
>>> No, but they don't even really need to be actual objects. So I suspect
>>> something like:
>>>
>>> git init
>>> for i in $(seq 256); do
>>> i=$(printf %02x $i)
>>> mkdir -p .git/objects/$i
>>> for j in $(seq --format=%038g 1000); do
>>> echo foo >.git/objects/$i/$j
>>> done
>>> done
>>> git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>>>
>>> might work (for various values of 1000). The shell loop would probably
>>> be faster as perl, too. :)
>>>
>>> Make sure you clear the object directory between runs, though (otherwise
>>> the subsequent index-pack's really do find collisions and spend time
>>> accessing the objects).
>>>
>>> If you want real objects, you could probably just dump a bunch of
>>> sequential blobs to fast-import, and then pipe the result to
>>> unpack-objects.
>>>
>>> -Peff
>>
>> I did a very ad-hoc test against a NetApp filer using the test script
>> quoted at the end of this E-Mail. The test compared origin/master, this
>> branch of yours, and my core.checkCollisions=false branch.
>>
>> When run with DBD-mysql.git (just some random ~1k commit repo I had):
>>
>> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>>
>> I get:
>>
>> Test origin/master peff/jk/loose-cache avar/check-collisions-config
>> ------------------------------------------------------------------------------------------------------------------------
>> 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7%
>> 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3%
>> 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6%
>> 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2%
>> 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1%
>> 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3%
>> 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9%
>>
>> I then hacked it to test against git.git, but skipped origin/master for
>> that one because it takes *ages*. So just mine v.s. yours:
>>
>> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>> [...]
>> Test peff/jk/loose-cache avar/check-collisions-config
>> ---------------------------------------------------------------------------------------------------
>> 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9%
>> 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1%
>> 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2%
>> 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1%
>> 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2%
>> 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2%
>> 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8%
>>
>> So not much of a practical difference perhaps. But then again this isn't
>> a very realistic test case of anything. Rarely are you going to push a
>> history of something the size of git.git into a repo with this many
>> loose objects.
>>
>> Using sha1collisiondetection.git is I think the most realistic scenario,
>> i.e. you'll often end up fetching/pushing something roughly the size of
>> its entire history on a big repo, and with it:
>>
>> Test peff/jk/loose-cache avar/check-collisions-config
>> ---------------------------------------------------------------------------------------------------
>> 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8%
>> 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7%
>> 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5%
>> 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0%
>> 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1%
>> 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3%
>> 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9%
>>
>> As noted in previous threads I have an in-house monorepo where (due to
>> expiry policies) loose objects hover around the 256*250 mark.
>>
>> The script, which is hacky as hell and takes shortcuts not to re-create
>> the huge fake loose object collection every time (takes ages). Perhaps
>> you're interested in incorporating some version of this into a v2. To be
>> useful it should take some target path as an env variable.
>
> I forgot perhaps the most useful metric. Testing against origin/master
> too on the sha1collisiondetection.git repo, which as noted above I think
> is a good stand-in for making a medium sized push to a big repo. This
> shows when the loose cache becomes counterproductive:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> -------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 0.42(0.04+0.03) 0.17(0.04+0.00) -59.5% 0.04(0.03+0.00) -90.5%
> 0008.3: index-pack with 256*10 loose objects 0.49(0.04+0.03) 0.19(0.04+0.01) -61.2% 0.04(0.02+0.00) -91.8%
> 0008.4: index-pack with 256*100 loose objects 0.49(0.04+0.04) 0.33(0.18+0.01) -32.7% 0.05(0.02+0.00) -89.8%
> 0008.5: index-pack with 256*250 loose objects 0.54(0.03+0.04) 0.59(0.43+0.02) +9.3% 0.04(0.02+0.01) -92.6%
> 0008.6: index-pack with 256*500 loose objects 0.49(0.04+0.03) 1.04(0.83+0.07) +112.2% 0.04(0.02+0.00) -91.8%
> 0008.7: index-pack with 256*750 loose objects 0.56(0.04+0.05) 1.50(1.28+0.08) +167.9% 0.04(0.02+0.00) -92.9%
> 0008.8: index-pack with 256*1000 loose objects 0.54(0.05+0.03) 1.95(1.68+0.13) +261.1% 0.04(0.02+0.00) -92.6%
>
> I still think it's best to take this patch series since it's unlikely
> we're making anything worse in practice, the >50k objects case is a
> really high number, which I don't think is worth worrying about.
>
> But I am somewhat paranoid about the potential performance
> regression. I.e. this is me testing against a really expensive and
> relatively well performing NetApp NFS device where the ping stats are:
>
> rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms
>
> So I suspect this might get a lot worse for setups which don't enjoy the
> same performance or network locality.
I tried this with the same filer mounted from another DC with ~10x the
RTT:
rtt min/avg/max/mdev = 11.553/11.618/11.739/0.121 ms
But otherwise the same setup (same machine type/specs mounting it). It
had the opposite results of what I was expecting:
Test origin/master peff/jk/loose-cache avar/check-collisions-config
------------------------------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 7.78(0.04+0.03) 2.75(0.03+0.01) -64.7% 0.40(0.02+0.00) -94.9%
0008.3: index-pack with 256*10 loose objects 7.75(0.04+0.04) 2.77(0.05+0.01) -64.3% 0.40(0.02+0.00) -94.8%
0008.4: index-pack with 256*100 loose objects 7.75(0.05+0.02) 2.91(0.18+0.01) -62.5% 0.40(0.02+0.00) -94.8%
0008.5: index-pack with 256*250 loose objects 7.73(0.04+0.04) 3.19(0.43+0.02) -58.7% 0.40(0.02+0.00) -94.8%
0008.6: index-pack with 256*500 loose objects 7.73(0.04+0.04) 3.64(0.83+0.05) -52.9% 0.40(0.02+0.00) -94.8%
0008.7: index-pack with 256*750 loose objects 7.73(0.04+0.02) 4.14(1.29+0.07) -46.4% 0.40(0.02+0.00) -94.8%
0008.8: index-pack with 256*1000 loose objects 7.73(0.04+0.03) 4.55(1.72+0.09) -41.1% 0.40(0.02+0.01) -94.8%
I.e. there the cliff of where the cache becomes counterproductive comes
much later, not earlier. The sha1collisiondetection.git repo has 418
objects.
So is it cheaper to fill a huge cache than look up those 418? I don't
know, haven't dug. But so far what this suggests is that we're helping
slow FSs to the detriment of faster ones.
So here's the same test not against NFS, but the local ext4 fs (CO7;
Linux 3.10) for sha1collisiondetection.git:
Test origin/master peff/jk/loose-cache avar/check-collisions-config
--------------------------------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 0.02(0.02+0.00) 0.02(0.02+0.01) +0.0% 0.02(0.02+0.00) +0.0%
0008.3: index-pack with 256*10 loose objects 0.02(0.02+0.00) 0.03(0.03+0.00) +50.0% 0.02(0.02+0.00) +0.0%
0008.4: index-pack with 256*100 loose objects 0.02(0.02+0.00) 0.17(0.16+0.01) +750.0% 0.02(0.02+0.00) +0.0%
0008.5: index-pack with 256*250 loose objects 0.02(0.02+0.00) 0.43(0.40+0.03) +2050.0% 0.02(0.02+0.00) +0.0%
0008.6: index-pack with 256*500 loose objects 0.02(0.02+0.00) 0.88(0.80+0.09) +4300.0% 0.02(0.02+0.00) +0.0%
0008.7: index-pack with 256*750 loose objects 0.02(0.02+0.00) 1.35(1.27+0.09) +6650.0% 0.02(0.02+0.00) +0.0%
0008.8: index-pack with 256*1000 loose objects 0.02(0.02+0.00) 1.83(1.70+0.14) +9050.0% 0.02(0.02+0.00) +0.0%
And for mu.git, a ~20k object repo:
Test origin/master peff/jk/loose-cache avar/check-collisions-config
-------------------------------------------------------------------------------------------------------------------------
0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4%
0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4%
0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7%
0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7%
0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7%
0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7%
0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4%
All of which is to say that I think it definitely makes sense to re-roll
this with a perf test, and a switch to toggle it + docs explaining the
caveats & pointing to the perf test. It's a clear win in some scenarios,
but a big loss in others.
>> $ cat t/perf/p0008-index-pack.sh
>> #!/bin/sh
>>
>> test_description="Tests performance of index-pack with loose objects"
>>
>> . ./perf-lib.sh
>>
>> test_perf_fresh_repo
>>
>> test_expect_success 'setup tests' '
>> for count in 1 10 100 250 500 750 1000
>> do
>> if test -d /mnt/ontap_githackers/repo-$count.git
>> then
>> rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack
>> else
>> git init --bare /mnt/ontap_githackers/repo-$count.git &&
>> (
>> cd /mnt/ontap_githackers/repo-$count.git &&
>> for i in $(seq 0 255)
>> do
>> i=$(printf %02x $i) &&
>> mkdir objects/$i &&
>> for j in $(seq --format=%038g $count)
>> do
>> >objects/$i/$j
>> done
>> done
>> )
>> fi
>> done
>> '
>>
>> for count in 1 10 100 250 500 750 1000
>> do
>> echo 3 | sudo tee /proc/sys/vm/drop_caches
>> test_perf "index-pack with 256*$count loose objects" "
>> (
>> cd /mnt/ontap_githackers/repo-$count.git &&
>> rm -fv objects/pack/*;
>> git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack
>> )
>> "
>> done
>>
>> test_done
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-13 10:02 ` Ævar Arnfjörð Bjarmason
@ 2018-11-14 18:21 ` René Scharfe
2018-12-02 10:52 ` René Scharfe
1 sibling, 0 replies; 99+ messages in thread
From: René Scharfe @ 2018-11-14 18:21 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, Jeff King
Cc: Geert Jansen, Junio C Hamano, git, Takuto Ikuta
Am 13.11.2018 um 11:02 schrieb Ævar Arnfjörð Bjarmason:
> So here's the same test not against NFS, but the local ext4 fs (CO7;
> Linux 3.10) for sha1collisiondetection.git:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> --------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 0.02(0.02+0.00) 0.02(0.02+0.01) +0.0% 0.02(0.02+0.00) +0.0%
> 0008.3: index-pack with 256*10 loose objects 0.02(0.02+0.00) 0.03(0.03+0.00) +50.0% 0.02(0.02+0.00) +0.0%
> 0008.4: index-pack with 256*100 loose objects 0.02(0.02+0.00) 0.17(0.16+0.01) +750.0% 0.02(0.02+0.00) +0.0%
> 0008.5: index-pack with 256*250 loose objects 0.02(0.02+0.00) 0.43(0.40+0.03) +2050.0% 0.02(0.02+0.00) +0.0%
> 0008.6: index-pack with 256*500 loose objects 0.02(0.02+0.00) 0.88(0.80+0.09) +4300.0% 0.02(0.02+0.00) +0.0%
> 0008.7: index-pack with 256*750 loose objects 0.02(0.02+0.00) 1.35(1.27+0.09) +6650.0% 0.02(0.02+0.00) +0.0%
> 0008.8: index-pack with 256*1000 loose objects 0.02(0.02+0.00) 1.83(1.70+0.14) +9050.0% 0.02(0.02+0.00) +0.0%
Ouch.
> And for mu.git, a ~20k object repo:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> -------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4%
> 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4%
> 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7%
> 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7%
> 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7%
> 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7%
> 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4%
>
> All of which is to say that I think it definitely makes sense to re-roll
> this with a perf test, and a switch to toggle it + docs explaining the
> caveats & pointing to the perf test. It's a clear win in some scenarios,
> but a big loss in others.
Right, but can we perhaps find a way to toggle it automatically, like
the special fetch-pack cache tried to do?
So the code needs to decide between using lstat() on individual loose
objects files or opendir()+readdir() to load the names in a whole
fan-out directory. Intuitively I'd try to solve it using red tape, by
measuring the duration of both kinds of calls, and then try to find a
heuristic based on those numbers. Is the overhead worth it?
But first, may I interest you in some further complication? We can
also use access(2) to check for the existence of files. It doesn't
need to fill in struct stat, so may have a slight advantage if we
don't need any of that information. The following patch is a
replacement for patch 8 and improves performance by ca. 3% with
git.git on an SSD for me; I'm curious to see how it does on NFS:
---
sha1-file.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/sha1-file.c b/sha1-file.c
index b77dacedc7..5315c37cbc 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -888,8 +888,13 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
*path = odb_loose_path(odb, &buf, sha1);
- if (!lstat(*path, st))
- return 0;
+ if (st) {
+ if (!lstat(*path, st))
+ return 0;
+ } else {
+ if (!access(*path, F_OK))
+ return 0;
+ }
}
return -1;
@@ -1171,7 +1176,8 @@ static int sha1_loose_object_info(struct repository *r,
if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
const char *path;
struct stat st;
- if (stat_sha1_file(r, sha1, &st, &path) < 0)
+ struct stat *stp = oi->disk_sizep ? &st : NULL;
+ if (stat_sha1_file(r, sha1, stp, &path) < 0)
return -1;
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
@@ -1382,7 +1388,6 @@ void *read_object_file_extended(const struct object_id *oid,
void *data;
const struct packed_git *p;
const char *path;
- struct stat st;
const struct object_id *repl = lookup_replace ?
lookup_replace_object(the_repository, oid) : oid;
@@ -1399,7 +1404,7 @@ void *read_object_file_extended(const struct object_id *oid,
die(_("replacement %s not found for %s"),
oid_to_hex(repl), oid_to_hex(oid));
- if (!stat_sha1_file(the_repository, repl->hash, &st, &path))
+ if (!stat_sha1_file(the_repository, repl->hash, NULL, &path))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
--
2.19.1
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-13 10:02 ` Ævar Arnfjörð Bjarmason
2018-11-14 18:21 ` René Scharfe
@ 2018-12-02 10:52 ` René Scharfe
2018-12-03 22:04 ` Jeff King
1 sibling, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-12-02 10:52 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, Jeff King
Cc: Geert Jansen, Junio C Hamano, git, Takuto Ikuta
Am 13.11.2018 um 11:02 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
>>
>>> I get:
>>>
>>> Test origin/master peff/jk/loose-cache avar/check-collisions-config
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7%
>>> 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3%
>>> 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6%
>>> 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2%
>>> 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1%
>>> 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3%
>>> 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9%
>>>
>>> I then hacked it to test against git.git, but skipped origin/master for
>>> that one because it takes *ages*. So just mine v.s. yours:
>>>
>>> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>>> [...]
>>> Test peff/jk/loose-cache avar/check-collisions-config
>>> ---------------------------------------------------------------------------------------------------
>>> 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9%
>>> 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1%
>>> 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2%
>>> 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1%
>>> 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2%
>>> 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2%
>>> 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8%
>>>
>>> So not much of a practical difference perhaps. But then again this isn't
>>> a very realistic test case of anything. Rarely are you going to push a
>>> history of something the size of git.git into a repo with this many
>>> loose objects.
>>>
>>> Using sha1collisiondetection.git is I think the most realistic scenario,
>>> i.e. you'll often end up fetching/pushing something roughly the size of
>>> its entire history on a big repo, and with it:
>>>
>>> Test peff/jk/loose-cache avar/check-collisions-config
>>> ---------------------------------------------------------------------------------------------------
>>> 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8%
>>> 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7%
>>> 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5%
>>> 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0%
>>> 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1%
>>> 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3%
>>> 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9%
>>>
>>> As noted in previous threads I have an in-house monorepo where (due to
>>> expiry policies) loose objects hover around the 256*250 mark.
>>>
>>> The script, which is hacky as hell and takes shortcuts not to re-create
>>> the huge fake loose object collection every time (takes ages). Perhaps
>>> you're interested in incorporating some version of this into a v2. To be
>>> useful it should take some target path as an env variable.
>>
>> I forgot perhaps the most useful metric. Testing against origin/master
>> too on the sha1collisiondetection.git repo, which as noted above I think
>> is a good stand-in for making a medium sized push to a big repo. This
>> shows when the loose cache becomes counterproductive:
>>
>> Test origin/master peff/jk/loose-cache avar/check-collisions-config
>> -------------------------------------------------------------------------------------------------------------------------
>> 0008.2: index-pack with 256*1 loose objects 0.42(0.04+0.03) 0.17(0.04+0.00) -59.5% 0.04(0.03+0.00) -90.5%
>> 0008.3: index-pack with 256*10 loose objects 0.49(0.04+0.03) 0.19(0.04+0.01) -61.2% 0.04(0.02+0.00) -91.8%
>> 0008.4: index-pack with 256*100 loose objects 0.49(0.04+0.04) 0.33(0.18+0.01) -32.7% 0.05(0.02+0.00) -89.8%
>> 0008.5: index-pack with 256*250 loose objects 0.54(0.03+0.04) 0.59(0.43+0.02) +9.3% 0.04(0.02+0.01) -92.6%
>> 0008.6: index-pack with 256*500 loose objects 0.49(0.04+0.03) 1.04(0.83+0.07) +112.2% 0.04(0.02+0.00) -91.8%
>> 0008.7: index-pack with 256*750 loose objects 0.56(0.04+0.05) 1.50(1.28+0.08) +167.9% 0.04(0.02+0.00) -92.9%
>> 0008.8: index-pack with 256*1000 loose objects 0.54(0.05+0.03) 1.95(1.68+0.13) +261.1% 0.04(0.02+0.00) -92.6%
>>
>> I still think it's best to take this patch series since it's unlikely
>> we're making anything worse in practice, the >50k objects case is a
>> really high number, which I don't think is worth worrying about.
>>
>> But I am somewhat paranoid about the potential performance
>> regression. I.e. this is me testing against a really expensive and
>> relatively well performing NetApp NFS device where the ping stats are:
>>
>> rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms
>>
>> So I suspect this might get a lot worse for setups which don't enjoy the
>> same performance or network locality.
>
> I tried this with the same filer mounted from another DC with ~10x the
> RTT:
>
> rtt min/avg/max/mdev = 11.553/11.618/11.739/0.121 ms
>
> But otherwise the same setup (same machine type/specs mounting it). It
> had the opposite results of what I was expecting:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> ------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 7.78(0.04+0.03) 2.75(0.03+0.01) -64.7% 0.40(0.02+0.00) -94.9%
> 0008.3: index-pack with 256*10 loose objects 7.75(0.04+0.04) 2.77(0.05+0.01) -64.3% 0.40(0.02+0.00) -94.8%
> 0008.4: index-pack with 256*100 loose objects 7.75(0.05+0.02) 2.91(0.18+0.01) -62.5% 0.40(0.02+0.00) -94.8%
> 0008.5: index-pack with 256*250 loose objects 7.73(0.04+0.04) 3.19(0.43+0.02) -58.7% 0.40(0.02+0.00) -94.8%
> 0008.6: index-pack with 256*500 loose objects 7.73(0.04+0.04) 3.64(0.83+0.05) -52.9% 0.40(0.02+0.00) -94.8%
> 0008.7: index-pack with 256*750 loose objects 7.73(0.04+0.02) 4.14(1.29+0.07) -46.4% 0.40(0.02+0.00) -94.8%
> 0008.8: index-pack with 256*1000 loose objects 7.73(0.04+0.03) 4.55(1.72+0.09) -41.1% 0.40(0.02+0.01) -94.8%
>
> I.e. there the cliff of where the cache becomes counterproductive comes
> much later, not earlier. The sha1collisiondetection.git repo has 418
> objects.
>
> So is it cheaper to fill a huge cache than look up those 418? I don't
> know, haven't dug. But so far what this suggests is that we're helping
> slow FSs to the detriment of faster ones.
>
> So here's the same test not against NFS, but the local ext4 fs (CO7;
> Linux 3.10) for sha1collisiondetection.git:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> --------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 0.02(0.02+0.00) 0.02(0.02+0.01) +0.0% 0.02(0.02+0.00) +0.0%
> 0008.3: index-pack with 256*10 loose objects 0.02(0.02+0.00) 0.03(0.03+0.00) +50.0% 0.02(0.02+0.00) +0.0%
> 0008.4: index-pack with 256*100 loose objects 0.02(0.02+0.00) 0.17(0.16+0.01) +750.0% 0.02(0.02+0.00) +0.0%
> 0008.5: index-pack with 256*250 loose objects 0.02(0.02+0.00) 0.43(0.40+0.03) +2050.0% 0.02(0.02+0.00) +0.0%
> 0008.6: index-pack with 256*500 loose objects 0.02(0.02+0.00) 0.88(0.80+0.09) +4300.0% 0.02(0.02+0.00) +0.0%
> 0008.7: index-pack with 256*750 loose objects 0.02(0.02+0.00) 1.35(1.27+0.09) +6650.0% 0.02(0.02+0.00) +0.0%
> 0008.8: index-pack with 256*1000 loose objects 0.02(0.02+0.00) 1.83(1.70+0.14) +9050.0% 0.02(0.02+0.00) +0.0%
>
> And for mu.git, a ~20k object repo:
>
> Test origin/master peff/jk/loose-cache avar/check-collisions-config
> -------------------------------------------------------------------------------------------------------------------------
> 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4%
> 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4%
> 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7%
> 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7%
> 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7%
> 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7%
> 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4%
OK, here's another theory: The cache scales badly with increasing
numbers of loose objects because it sorts the array 256 times as it is
filled. Loading it fully and sorting once would help, as would using
one array per subdirectory.
We can simulate the oid_array operations with test-sha1-array. It has
no explicit sort command, but we can use for_each_unique for that; we
just have to add 127.5 extra calls (that don't sort) to get the same
amount of output in the two latter cases, to be able to compare just
the sort time:
for command in '
foreach (0..255) {
$subdir = sprintf("%02x", $_);
foreach (1..$ARGV[0]) {
printf("append %s%038d\n", $subdir, $_);
}
# intermediate sort
print "for_each_unique\n";
}
' '
foreach (0..255) {
$subdir = sprintf("%02x", $_);
foreach (1..$ARGV[0]) {
printf("append %s%038d\n", $subdir, $_);
}
}
# sort once at the end
print "for_each_unique\n";
# ... and generate roughly the same amount of output
foreach (0..127) {
print "for_each_unique\n";
}
' '
foreach (0..255) {
$subdir = sprintf("%02x", $_);
foreach (1..$ARGV[0]) {
printf("append %s%038d\n", $subdir, $_);
}
# sort each subdirectory separately
print "for_each_unique\n";
# ... and generate roughly the same amount of output
foreach (0..127) {
print "for_each_unique\n";
}
print "clear\n";
}
'
do
time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l
done
My results:
32896000
real 0m6.521s
user 0m5.269s
sys 0m2.234s
33024000
real 0m3.464s
user 0m2.178s
sys 0m2.251s
33024000
real 0m3.353s
user 0m2.179s
sys 0m1.939s
So this adds up to a significant amount of time spent on sorting. Here's
a patch, on top of next:
-- >8 --
Subject: [PATCH] object-store: use one oid_array per subdirectory for loose cache
The loose objects cache is filled one subdirectory at a time as needed.
It is stored in an oid_array, which has to be resorted after each add
operation. So when querying a wide range of objects the array needs to
be resorted up to 256 times -- once for each subdirectory. This is not
efficient.
Use one oid_array for each subdirectory. This ensures that entries have
to only be sorted once.
It speeds up cache operations in a repository with ca. 100 loose
objects per subdirectory (it's used for collision checks for the
placeholders %h, %t and %p):
$ git count-objects
25805 objects, 302452 kilobytes
$ (cd t/perf && ./run HEAD^ HEAD ./p4205-log-pretty-formats.sh)
[...]
Test HEAD^ HEAD
--------------------------------------------------------------------
4205.1: log with %H 0.56(0.52+0.04) 0.57(0.54+0.02) +1.8%
4205.2: log with %h 0.92(0.86+0.06) 0.66(0.62+0.04) -28.3%
4205.3: log with %T 0.56(0.52+0.04) 0.57(0.55+0.01) +1.8%
4205.4: log with %t 0.92(0.88+0.04) 0.67(0.62+0.05) -27.2%
4205.5: log with %P 0.57(0.54+0.02) 0.57(0.54+0.03) +0.0%
4205.6: log with %p 0.92(0.86+0.05) 0.64(0.60+0.04) -30.4%
4205.7: log with %h-%h-%h 1.02(0.98+0.04) 0.72(0.69+0.03) -29.4%
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
This patch goes on top of next. p4205 is better suited to show the
cost of sorting in the loose objects cache than the slooow index-pack.
This change does fix the index-pack regression on ext4 for me as well,
though. Not sure it warrants adding a loose objects test to p5302.
object-store.h | 2 +-
object.c | 5 ++++-
packfile.c | 4 +++-
sha1-file.c | 5 +++--
sha1-name.c | 8 +++++---
5 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/object-store.h b/object-store.h
index 8dceed0f31..ee67a50980 100644
--- a/object-store.h
+++ b/object-store.h
@@ -20,7 +20,7 @@ struct object_directory {
* Be sure to call odb_load_loose_cache() before using.
*/
char loose_objects_subdir_seen[256];
- struct oid_array loose_objects_cache;
+ struct oid_array loose_objects_cache[256];
/*
* Path to the alternative object store. If this is a relative path,
diff --git a/object.c b/object.c
index c29a97a7e9..965493ba76 100644
--- a/object.c
+++ b/object.c
@@ -484,8 +484,11 @@ struct raw_object_store *raw_object_store_new(void)
static void free_object_directory(struct object_directory *odb)
{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(odb->loose_objects_cache); i++)
+ oid_array_clear(&odb->loose_objects_cache[i]);
free(odb->path);
- oid_array_clear(&odb->loose_objects_cache);
free(odb);
}
diff --git a/packfile.c b/packfile.c
index 56496bb425..3ef6a241b7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -995,7 +995,9 @@ void reprepare_packed_git(struct repository *r)
struct object_directory *odb;
for (odb = r->objects->odb; odb; odb = odb->next) {
- oid_array_clear(&odb->loose_objects_cache);
+ int i;
+ for (i = 0; i < ARRAY_SIZE(odb->loose_objects_cache); i++)
+ oid_array_clear(&odb->loose_objects_cache[i]);
memset(&odb->loose_objects_subdir_seen, 0,
sizeof(odb->loose_objects_subdir_seen));
}
diff --git a/sha1-file.c b/sha1-file.c
index 05f63dfd4e..d2f5e65865 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -933,7 +933,8 @@ static int quick_has_loose(struct repository *r,
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
odb_load_loose_cache(odb, subdir_nr);
- if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
+ if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
+ &oid) >= 0)
return 1;
}
return 0;
@@ -2173,7 +2174,7 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
for_each_file_in_obj_subdir(subdir_nr, &buf,
append_loose_object,
NULL, NULL,
- &odb->loose_objects_cache);
+ &odb->loose_objects_cache[subdir_nr]);
odb->loose_objects_subdir_seen[subdir_nr] = 1;
strbuf_release(&buf);
}
diff --git a/sha1-name.c b/sha1-name.c
index b24502811b..fdb22147b2 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -94,14 +94,16 @@ static void find_short_object_filename(struct disambiguate_state *ds)
odb && !ds->ambiguous;
odb = odb->next) {
int pos;
+ struct oid_array *loose_subdir_objects;
odb_load_loose_cache(odb, subdir_nr);
- pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx);
+ loose_subdir_objects = &odb->loose_objects_cache[subdir_nr];
+ pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx);
if (pos < 0)
pos = -1 - pos;
- while (!ds->ambiguous && pos < odb->loose_objects_cache.nr) {
+ while (!ds->ambiguous && pos < loose_subdir_objects->nr) {
const struct object_id *oid;
- oid = odb->loose_objects_cache.oid + pos;
+ oid = loose_subdir_objects->oid + pos;
if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash))
break;
update_candidates(ds, oid);
--
2.19.2
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-02 10:52 ` René Scharfe
@ 2018-12-03 22:04 ` Jeff King
2018-12-04 21:45 ` René Scharfe
0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-12-03 22:04 UTC (permalink / raw)
To: René Scharfe
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote:
> > And for mu.git, a ~20k object repo:
> >
> > Test origin/master peff/jk/loose-cache avar/check-collisions-config
> > -------------------------------------------------------------------------------------------------------------------------
> > 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4%
> > 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4%
> > 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7%
> > 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7%
> > 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7%
> > 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7%
> > 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4%
>
> OK, here's another theory: The cache scales badly with increasing
> numbers of loose objects because it sorts the array 256 times as it is
> filled. Loading it fully and sorting once would help, as would using
> one array per subdirectory.
Yeah, that makes sense. This was actually how I had planned to do it
originally, but then I ended up just reusing the existing single-array
approach from the abbrev code.
I hadn't actually thought about the repeated sortings (but that
definitely makes sense that they would hurt in these pathological
cases), but more just that we get a 256x reduction in N for our binary
search (in fact we already do this first-byte lookup-table trick for
pack index lookups).
Your patch looks good to me. We may want to do one thing on top:
> diff --git a/object-store.h b/object-store.h
> index 8dceed0f31..ee67a50980 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -20,7 +20,7 @@ struct object_directory {
> * Be sure to call odb_load_loose_cache() before using.
> */
> char loose_objects_subdir_seen[256];
> - struct oid_array loose_objects_cache;
> + struct oid_array loose_objects_cache[256];
The comment in the context there is warning callers to remember to load
the cache first. Now that we have individual caches, might it make sense
to change the interface a bit, and make these members private. I.e.,
something like:
struct oid_array *odb_loose_cache(struct object_directory *odb,
int subdir_nr)
{
if (!loose_objects_subdir_seen[subdir_nr])
odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
return &odb->loose_objects_cache[subdir_nr];
}
That's harder to get wrong, and this:
> diff --git a/sha1-file.c b/sha1-file.c
> index 05f63dfd4e..d2f5e65865 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -933,7 +933,8 @@ static int quick_has_loose(struct repository *r,
> prepare_alt_odb(r);
> for (odb = r->objects->odb; odb; odb = odb->next) {
> odb_load_loose_cache(odb, subdir_nr);
> - if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
> + if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
> + &oid) >= 0)
> return 1;
> }
becomes:
struct oid_array *cache = odb_loose_cache(odb, subdir_nr);
if (oid_array_lookup(cache, &oid))
return 1;
(An even simpler interface would be a single function that computes
subdir_nr and does the lookup itself, but that would not be enough for
find_short_object_filename()).
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-03 22:04 ` Jeff King
@ 2018-12-04 21:45 ` René Scharfe
2018-12-05 4:46 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-12-04 21:45 UTC (permalink / raw)
To: Jeff King
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
Am 03.12.2018 um 23:04 schrieb Jeff King:
> On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote:
>
>>> And for mu.git, a ~20k object repo:
>>>
>>> Test origin/master peff/jk/loose-cache avar/check-collisions-config
>>> -------------------------------------------------------------------------------------------------------------------------
>>> 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4%
>>> 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4%
>>> 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7%
>>> 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7%
>>> 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7%
>>> 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7%
>>> 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4%
>>
>> OK, here's another theory: The cache scales badly with increasing
>> numbers of loose objects because it sorts the array 256 times as it is
>> filled. Loading it fully and sorting once would help, as would using
>> one array per subdirectory.
>
> Yeah, that makes sense. This was actually how I had planned to do it
> originally, but then I ended up just reusing the existing single-array
> approach from the abbrev code.
>
> I hadn't actually thought about the repeated sortings (but that
> definitely makes sense that they would hurt in these pathological
> cases), but more just that we get a 256x reduction in N for our binary
> search (in fact we already do this first-byte lookup-table trick for
> pack index lookups).
Skipping eight steps in a binary search is something, but it's faster
even without that.
Just realized that the demo code can use "lookup" instead of the much
more expensive "for_each_unique" to sort. D'oh! With that change:
for command in '
foreach (0..255) {
$subdir = sprintf("%02x", $_);
foreach (1..$ARGV[0]) {
printf("append %s%038d\n", $subdir, $_);
}
# intermediate sort
print "lookup " . "0" x 40 . "\n";
}
' '
foreach (0..255) {
$subdir = sprintf("%02x", $_);
foreach (1..$ARGV[0]) {
printf("append %s%038d\n", $subdir, $_);
}
}
# sort once at the end
print "lookup " . "0" x 40 . "\n";
' '
foreach (0..255) {
$subdir = sprintf("%02x", $_);
foreach (1..$ARGV[0]) {
printf("append %s%038d\n", $subdir, $_);
}
# sort each subdirectory separately
print "lookup " . "0" x 40 . "\n";
print "clear\n";
}
'
do
time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l
done
And the results make the scale of the improvement more obvious:
256
real 0m3.476s
user 0m3.466s
sys 0m0.099s
1
real 0m0.157s
user 0m0.148s
sys 0m0.046s
256
real 0m0.117s
user 0m0.116s
sys 0m0.051s
> Your patch looks good to me. We may want to do one thing on top:
>
>> diff --git a/object-store.h b/object-store.h
>> index 8dceed0f31..ee67a50980 100644
>> --- a/object-store.h
>> +++ b/object-store.h
>> @@ -20,7 +20,7 @@ struct object_directory {
>> * Be sure to call odb_load_loose_cache() before using.
>> */
>> char loose_objects_subdir_seen[256];
>> - struct oid_array loose_objects_cache;
>> + struct oid_array loose_objects_cache[256];
>
> The comment in the context there is warning callers to remember to load
> the cache first. Now that we have individual caches, might it make sense
> to change the interface a bit, and make these members private. I.e.,
> something like:
>
> struct oid_array *odb_loose_cache(struct object_directory *odb,
> int subdir_nr)
> {
> if (!loose_objects_subdir_seen[subdir_nr])
> odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
>
> return &odb->loose_objects_cache[subdir_nr];
> }
Sure. And it should take an object_id pointer instead of a subdir_nr --
less duplication, nicer interface (patch below).
It would be nice if it could return a const pointer to discourage
messing up the cache, but that's not compatible with oid_array_lookup().
And quick_has_loose() should be converted to object_id as well -- adding
a function that takes a SHA-1 is a regression. :D
René
---
object-store.h | 8 ++++----
sha1-file.c | 19 ++++++++-----------
sha1-name.c | 4 +---
3 files changed, 13 insertions(+), 18 deletions(-)
diff --git a/object-store.h b/object-store.h
index ee67a50980..dd9efdd276 100644
--- a/object-store.h
+++ b/object-store.h
@@ -48,11 +48,11 @@ void add_to_alternates_file(const char *dir);
void add_to_alternates_memory(const char *dir);
/*
- * Populate an odb's loose object cache for one particular subdirectory (i.e.,
- * the one that corresponds to the first byte of objects you're interested in,
- * from 0 to 255 inclusive).
+ * Populate and return the loose object cache array corresponding to the
+ * given object ID.
*/
-void odb_load_loose_cache(struct object_directory *odb, int subdir_nr);
+struct oid_array *odb_loose_cache(struct object_directory *odb,
+ const struct object_id *oid);
struct packed_git {
struct packed_git *next;
diff --git a/sha1-file.c b/sha1-file.c
index d2f5e65865..38af6d5d0b 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -924,7 +924,6 @@ static int open_sha1_file(struct repository *r,
static int quick_has_loose(struct repository *r,
const unsigned char *sha1)
{
- int subdir_nr = sha1[0];
struct object_id oid;
struct object_directory *odb;
@@ -932,9 +931,7 @@ static int quick_has_loose(struct repository *r,
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
- odb_load_loose_cache(odb, subdir_nr);
- if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
- &oid) >= 0)
+ if (oid_array_lookup(odb_loose_cache(odb, &oid), &oid) >= 0)
return 1;
}
return 0;
@@ -2159,24 +2156,24 @@ static int append_loose_object(const struct object_id *oid, const char *path,
return 0;
}
-void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
+struct oid_array *odb_loose_cache(struct object_directory *odb,
+ const struct object_id *oid)
{
+ int subdir_nr = oid->hash[0];
+ struct oid_array *subdir_array = &odb->loose_objects_cache[subdir_nr];
struct strbuf buf = STRBUF_INIT;
- if (subdir_nr < 0 ||
- subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen))
- BUG("subdir_nr out of range");
-
if (odb->loose_objects_subdir_seen[subdir_nr])
- return;
+ return subdir_array;
strbuf_addstr(&buf, odb->path);
for_each_file_in_obj_subdir(subdir_nr, &buf,
append_loose_object,
NULL, NULL,
- &odb->loose_objects_cache[subdir_nr]);
+ subdir_array);
odb->loose_objects_subdir_seen[subdir_nr] = 1;
strbuf_release(&buf);
+ return subdir_array;
}
static int check_stream_sha1(git_zstream *stream,
diff --git a/sha1-name.c b/sha1-name.c
index fdb22147b2..4fc6368ce5 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -87,7 +87,6 @@ static int match_sha(unsigned, const unsigned char *, const unsigned char *);
static void find_short_object_filename(struct disambiguate_state *ds)
{
- int subdir_nr = ds->bin_pfx.hash[0];
struct object_directory *odb;
for (odb = the_repository->objects->odb;
@@ -96,8 +95,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
int pos;
struct oid_array *loose_subdir_objects;
- odb_load_loose_cache(odb, subdir_nr);
- loose_subdir_objects = &odb->loose_objects_cache[subdir_nr];
+ loose_subdir_objects = odb_loose_cache(odb, &ds->bin_pfx);
pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx);
if (pos < 0)
pos = -1 - pos;
--
2.19.2
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-04 21:45 ` René Scharfe
@ 2018-12-05 4:46 ` Jeff King
2018-12-05 6:02 ` René Scharfe
0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-12-05 4:46 UTC (permalink / raw)
To: René Scharfe
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:
> > The comment in the context there is warning callers to remember to load
> > the cache first. Now that we have individual caches, might it make sense
> > to change the interface a bit, and make these members private. I.e.,
> > something like:
> >
> > struct oid_array *odb_loose_cache(struct object_directory *odb,
> > int subdir_nr)
> > {
> > if (!loose_objects_subdir_seen[subdir_nr])
> > odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
> >
> > return &odb->loose_objects_cache[subdir_nr];
> > }
>
> Sure. And it should take an object_id pointer instead of a subdir_nr --
> less duplication, nicer interface (patch below).
I had considered that initially, but it's a little less flexible if a
caller doesn't actually have an oid. Though both of the proposed callers
do, so it's probably OK to worry about that if and when it ever comes up
(the most plausible thing in my mind is doing some abbrev-like analysis
without looking to abbreviate a _particular_ oid).
> It would be nice if it could return a const pointer to discourage
> messing up the cache, but that's not compatible with oid_array_lookup().
Yeah.
> And quick_has_loose() should be converted to object_id as well -- adding
> a function that takes a SHA-1 is a regression. :D
I actually wrote it that way initially, but doing the hashcpy() in the
caller is a bit more awkward. My thought was to punt on that until the
rest of the surrounding code starts handling oids.
> ---
> object-store.h | 8 ++++----
> sha1-file.c | 19 ++++++++-----------
> sha1-name.c | 4 +---
> 3 files changed, 13 insertions(+), 18 deletions(-)
This patch looks sane. How do you want to handle it? I'd assumed your
earlier one would be for applying on top, but this one doesn't have a
commit message. Did you want me to squash down the individual hunks?
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-05 4:46 ` Jeff King
@ 2018-12-05 6:02 ` René Scharfe
2018-12-05 6:51 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-12-05 6:02 UTC (permalink / raw)
To: Jeff King, Ævar Arnfjörð Bjarmason
Cc: Geert Jansen, Junio C Hamano, git, Takuto Ikuta
Am 05.12.2018 um 05:46 schrieb Jeff King:
> On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:
>
>>> The comment in the context there is warning callers to remember to load
>>> the cache first. Now that we have individual caches, might it make sense
>>> to change the interface a bit, and make these members private. I.e.,
>>> something like:
>>>
>>> struct oid_array *odb_loose_cache(struct object_directory *odb,
>>> int subdir_nr)
>>> {
>>> if (!loose_objects_subdir_seen[subdir_nr])
>>> odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
>>>
>>> return &odb->loose_objects_cache[subdir_nr];
>>> }
>>
>> Sure. And it should take an object_id pointer instead of a subdir_nr --
>> less duplication, nicer interface (patch below).
>
> I had considered that initially, but it's a little less flexible if a
> caller doesn't actually have an oid. Though both of the proposed callers
> do, so it's probably OK to worry about that if and when it ever comes up
> (the most plausible thing in my mind is doing some abbrev-like analysis
> without looking to abbreviate a _particular_ oid).
Right, let's focus on concrete requirements of current callers. YAGNI..
:)
>> And quick_has_loose() should be converted to object_id as well -- adding
>> a function that takes a SHA-1 is a regression. :D
>
> I actually wrote it that way initially, but doing the hashcpy() in the
> caller is a bit more awkward. My thought was to punt on that until the
> rest of the surrounding code starts handling oids.
The level of awkwardness is the same to me, but sha1_loose_object_info()
is much longer already, so it might feel worse to add it there. This
function is easily converted to struct object_id, though, as its single
caller can pass one on -- this makes the copy unnecessary.
> This patch looks sane. How do you want to handle it? I'd assumed your
> earlier one would be for applying on top, but this one doesn't have a
> commit message. Did you want me to squash down the individual hunks?
I'm waiting for the first one (object-store: use one oid_array per
subdirectory for loose cache) to be accepted, as it aims to solve a
user-visible performance regression, i.e. that's where the meat is.
I'm particularly interested in performance numbers from Ævar for it.
I can send the last one properly later, and add patches for converting
quick_has_loose() to struct object_id. Those are just cosmetic..
René
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-05 6:02 ` René Scharfe
@ 2018-12-05 6:51 ` Jeff King
2018-12-05 8:15 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-12-05 6:51 UTC (permalink / raw)
To: René Scharfe
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
On Wed, Dec 05, 2018 at 07:02:17AM +0100, René Scharfe wrote:
> > I actually wrote it that way initially, but doing the hashcpy() in the
> > caller is a bit more awkward. My thought was to punt on that until the
> > rest of the surrounding code starts handling oids.
>
> The level of awkwardness is the same to me, but sha1_loose_object_info()
> is much longer already, so it might feel worse to add it there.
Right, what I meant was that in sha1_loose_object_info():
if (special_case)
handle_special_case();
is easier to follow than a block setting up the special case and then
calling the function.
> This
> function is easily converted to struct object_id, though, as its single
> caller can pass one on -- this makes the copy unnecessary.
If you mean modifying sha1_loose_object_info() to take an oid, then
sure, I agree that is a good step forward (and that is exactly the "punt
until" moment I meant).
> > This patch looks sane. How do you want to handle it? I'd assumed your
> > earlier one would be for applying on top, but this one doesn't have a
> > commit message. Did you want me to squash down the individual hunks?
>
> I'm waiting for the first one (object-store: use one oid_array per
> subdirectory for loose cache) to be accepted, as it aims to solve a
> user-visible performance regression, i.e. that's where the meat is.
> I'm particularly interested in performance numbers from Ævar for it.
>
> I can send the last one properly later, and add patches for converting
> quick_has_loose() to struct object_id. Those are just cosmetic..
Great, thanks. I just wanted to make sure these improvements weren't
going to slip through the cracks.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-05 6:51 ` Jeff King
@ 2018-12-05 8:15 ` Jeff King
2018-12-05 18:41 ` René Scharfe
0 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-12-05 8:15 UTC (permalink / raw)
To: René Scharfe
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
On Wed, Dec 05, 2018 at 01:51:36AM -0500, Jeff King wrote:
> > This
> > function is easily converted to struct object_id, though, as its single
> > caller can pass one on -- this makes the copy unnecessary.
>
> If you mean modifying sha1_loose_object_info() to take an oid, then
> sure, I agree that is a good step forward (and that is exactly the "punt
> until" moment I meant).
So the simple thing is to do that, and then have it pass oid->hash to
the other functions it uses. If we start to convert those, there's a
little bit of a rabbit hole, but it's actually not too bad.
Most of the spill-over is into the dumb-http code. Note that it actually
uses sha1 itself! That probably needs to be the_hash_algo (though I'm
not even sure how we'd negotiate the algorithm across a dumb fetch). At
any rate, I don't think this patch makes anything _worse_ in that
respect.
diff --git a/http-push.c b/http-push.c
index cd48590912..0141b0ad53 100644
--- a/http-push.c
+++ b/http-push.c
@@ -255,7 +255,7 @@ static void start_fetch_loose(struct transfer_request *request)
struct active_request_slot *slot;
struct http_object_request *obj_req;
- obj_req = new_http_object_request(repo->url, request->obj->oid.hash);
+ obj_req = new_http_object_request(repo->url, &request->obj->oid);
if (obj_req == NULL) {
request->state = ABORTED;
return;
diff --git a/http-walker.c b/http-walker.c
index 0a392c85b6..29b59e2fe0 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -58,7 +58,7 @@ static void start_object_request(struct walker *walker,
struct active_request_slot *slot;
struct http_object_request *req;
- req = new_http_object_request(obj_req->repo->base, obj_req->oid.hash);
+ req = new_http_object_request(obj_req->repo->base, &obj_req->oid);
if (req == NULL) {
obj_req->state = ABORTED;
return;
@@ -543,11 +543,11 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
} else if (req->zret != Z_STREAM_END) {
walker->corrupt_object_found++;
ret = error("File %s (%s) corrupt", hex, req->url);
- } else if (!hasheq(obj_req->oid.hash, req->real_sha1)) {
+ } else if (!oideq(&obj_req->oid, &req->real_oid)) {
ret = error("File %s has bad hash", hex);
} else if (req->rename < 0) {
struct strbuf buf = STRBUF_INIT;
- loose_object_path(the_repository, &buf, req->sha1);
+ loose_object_path(the_repository, &buf, &req->oid);
ret = error("unable to write sha1 filename %s", buf.buf);
strbuf_release(&buf);
}
diff --git a/http.c b/http.c
index 7cfa7a16e0..e95b5b9be0 100644
--- a/http.c
+++ b/http.c
@@ -2298,9 +2298,9 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb,
}
struct http_object_request *new_http_object_request(const char *base_url,
- unsigned char *sha1)
+ const struct object_id *oid)
{
- char *hex = sha1_to_hex(sha1);
+ char *hex = oid_to_hex(oid);
struct strbuf filename = STRBUF_INIT;
struct strbuf prevfile = STRBUF_INIT;
int prevlocal;
@@ -2311,10 +2311,10 @@ struct http_object_request *new_http_object_request(const char *base_url,
freq = xcalloc(1, sizeof(*freq));
strbuf_init(&freq->tmpfile, 0);
- hashcpy(freq->sha1, sha1);
+ oidcpy(&freq->oid, oid);
freq->localfile = -1;
- loose_object_path(the_repository, &filename, sha1);
+ loose_object_path(the_repository, &filename, oid);
strbuf_addf(&freq->tmpfile, "%s.temp", filename.buf);
strbuf_addf(&prevfile, "%s.prev", filename.buf);
@@ -2456,16 +2456,16 @@ int finish_http_object_request(struct http_object_request *freq)
}
git_inflate_end(&freq->stream);
- git_SHA1_Final(freq->real_sha1, &freq->c);
+ git_SHA1_Final(freq->real_oid.hash, &freq->c);
if (freq->zret != Z_STREAM_END) {
unlink_or_warn(freq->tmpfile.buf);
return -1;
}
- if (!hasheq(freq->sha1, freq->real_sha1)) {
+ if (!oideq(&freq->oid, &freq->real_oid)) {
unlink_or_warn(freq->tmpfile.buf);
return -1;
}
- loose_object_path(the_repository, &filename, freq->sha1);
+ loose_object_path(the_repository, &filename, &freq->oid);
freq->rename = finalize_object_file(freq->tmpfile.buf, filename.buf);
strbuf_release(&filename);
diff --git a/http.h b/http.h
index d305ca1dc7..66c52b2e1e 100644
--- a/http.h
+++ b/http.h
@@ -224,8 +224,8 @@ struct http_object_request {
CURLcode curl_result;
char errorstr[CURL_ERROR_SIZE];
long http_code;
- unsigned char sha1[20];
- unsigned char real_sha1[20];
+ struct object_id oid;
+ struct object_id real_oid;
git_SHA_CTX c;
git_zstream stream;
int zret;
@@ -234,7 +234,7 @@ struct http_object_request {
};
extern struct http_object_request *new_http_object_request(
- const char *base_url, unsigned char *sha1);
+ const char *base_url, const struct object_id *oid);
extern void process_http_object_request(struct http_object_request *freq);
extern int finish_http_object_request(struct http_object_request *freq);
extern void abort_http_object_request(struct http_object_request *freq);
diff --git a/object-store.h b/object-store.h
index fecbb7e094..265d0d8e1f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -151,11 +151,13 @@ void raw_object_store_clear(struct raw_object_store *o);
/*
* Put in `buf` the name of the file in the local object database that
- * would be used to store a loose object with the specified sha1.
+ * would be used to store a loose object with the specified oid.
*/
-const char *loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
+const char *loose_object_path(struct repository *r, struct strbuf *buf,
+ const struct object_id *oid);
-void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size);
+void *map_loose_object(struct repository *r, const struct object_id *oid,
+ unsigned long *size);
extern void *read_object_file_extended(const struct object_id *oid,
enum object_type *type,
diff --git a/sha1-file.c b/sha1-file.c
index 3ddf4c9426..0705709036 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -333,12 +333,12 @@ int raceproof_create_file(const char *path, create_file_fn fn, void *cb)
return ret;
}
-static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
+static void fill_loose_path(struct strbuf *buf, const struct object_id *oid)
{
int i;
for (i = 0; i < the_hash_algo->rawsz; i++) {
static char hex[] = "0123456789abcdef";
- unsigned int val = sha1[i];
+ unsigned int val = oid->hash[i];
strbuf_addch(buf, hex[val >> 4]);
strbuf_addch(buf, hex[val & 0xf]);
if (!i)
@@ -348,19 +348,19 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
static const char *odb_loose_path(struct object_directory *odb,
struct strbuf *buf,
- const unsigned char *sha1)
+ const struct object_id *oid)
{
strbuf_reset(buf);
strbuf_addstr(buf, odb->path);
strbuf_addch(buf, '/');
- fill_sha1_path(buf, sha1);
+ fill_loose_path(buf, oid);
return buf->buf;
}
const char *loose_object_path(struct repository *r, struct strbuf *buf,
- const unsigned char *sha1)
+ const struct object_id *oid)
{
- return odb_loose_path(r->objects->odb, buf, sha1);
+ return odb_loose_path(r->objects->odb, buf, oid);
}
/*
@@ -721,7 +721,7 @@ static int check_and_freshen_odb(struct object_directory *odb,
int freshen)
{
static struct strbuf path = STRBUF_INIT;
- odb_loose_path(odb, &path, oid->hash);
+ odb_loose_path(odb, &path, oid);
return check_and_freshen_file(path.buf, freshen);
}
@@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags)
* Note that it may point to static storage and is only valid until another
* call to stat_sha1_file().
*/
-static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
- struct stat *st, const char **path)
+static int stat_loose_object(struct repository *r, const struct object_id *oid,
+ struct stat *st, const char **path)
{
struct object_directory *odb;
static struct strbuf buf = STRBUF_INIT;
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
- *path = odb_loose_path(odb, &buf, sha1);
+ *path = odb_loose_path(odb, &buf, oid);
if (!lstat(*path, st))
return 0;
}
@@ -900,7 +900,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
* descriptor. See the caveats on the "path" parameter above.
*/
static int open_sha1_file(struct repository *r,
- const unsigned char *sha1, const char **path)
+ const struct object_id *oid, const char **path)
{
int fd;
struct object_directory *odb;
@@ -909,7 +909,7 @@ static int open_sha1_file(struct repository *r,
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
- *path = odb_loose_path(odb, &buf, sha1);
+ *path = odb_loose_path(odb, &buf, oid);
fd = git_open(*path);
if (fd >= 0)
return fd;
@@ -922,19 +922,16 @@ static int open_sha1_file(struct repository *r,
}
static int quick_has_loose(struct repository *r,
- const unsigned char *sha1)
+ const struct object_id *oid)
{
- int subdir_nr = sha1[0];
- struct object_id oid;
+ int subdir_nr = oid->hash[0];
struct object_directory *odb;
- hashcpy(oid.hash, sha1);
-
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
odb_load_loose_cache(odb, subdir_nr);
if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
- &oid) >= 0)
+ oid) >= 0)
return 1;
}
return 0;
@@ -944,8 +941,8 @@ static int quick_has_loose(struct repository *r,
* Map the loose object at "path" if it is not NULL, or the path found by
* searching for a loose object named "sha1".
*/
-static void *map_sha1_file_1(struct repository *r, const char *path,
- const unsigned char *sha1, unsigned long *size)
+static void *map_loose_object_1(struct repository *r, const char *path,
+ const struct object_id *oid, unsigned long *size)
{
void *map;
int fd;
@@ -953,7 +950,7 @@ static void *map_sha1_file_1(struct repository *r, const char *path,
if (path)
fd = git_open(path);
else
- fd = open_sha1_file(r, sha1, &path);
+ fd = open_sha1_file(r, oid, &path);
map = NULL;
if (fd >= 0) {
struct stat st;
@@ -972,10 +969,11 @@ static void *map_sha1_file_1(struct repository *r, const char *path,
return map;
}
-void *map_sha1_file(struct repository *r,
- const unsigned char *sha1, unsigned long *size)
+void *map_loose_object(struct repository *r,
+ const struct object_id *oid,
+ unsigned long *size)
{
- return map_sha1_file_1(r, NULL, sha1, size);
+ return map_loose_object_1(r, NULL, oid, size);
}
static int unpack_sha1_short_header(git_zstream *stream,
@@ -1045,7 +1043,9 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
return -1;
}
-static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)
+static void *unpack_loose_rest(git_zstream *stream,
+ void *buffer, unsigned long size,
+ const struct object_id *oid)
{
int bytes = strlen(buffer) + 1;
unsigned char *buf = xmallocz(size);
@@ -1082,10 +1082,10 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s
}
if (status < 0)
- error(_("corrupt loose object '%s'"), sha1_to_hex(sha1));
+ error(_("corrupt loose object '%s'"), oid_to_hex(oid));
else if (stream->avail_in)
error(_("garbage at end of loose object '%s'"),
- sha1_to_hex(sha1));
+ oid_to_hex(oid));
free(buf);
return NULL;
}
@@ -1164,9 +1164,9 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep)
return parse_sha1_header_extended(hdr, &oi, 0);
}
-static int sha1_loose_object_info(struct repository *r,
- const unsigned char *sha1,
- struct object_info *oi, int flags)
+static int loose_object_info(struct repository *r,
+ const struct object_id *oid,
+ struct object_info *oi, int flags)
{
int status = 0;
unsigned long mapsize;
@@ -1191,15 +1191,15 @@ static int sha1_loose_object_info(struct repository *r,
const char *path;
struct stat st;
if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
- return quick_has_loose(r, sha1) ? 0 : -1;
- if (stat_sha1_file(r, sha1, &st, &path) < 0)
+ return quick_has_loose(r, oid) ? 0 : -1;
+ if (stat_loose_object(r, oid, &st, &path) < 0)
return -1;
if (oi->disk_sizep)
*oi->disk_sizep = st.st_size;
return 0;
}
- map = map_sha1_file(r, sha1, &mapsize);
+ map = map_loose_object(r, oid, &mapsize);
if (!map)
return -1;
@@ -1211,22 +1211,22 @@ static int sha1_loose_object_info(struct repository *r,
if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
status = error(_("unable to unpack %s header with --allow-unknown-type"),
- sha1_to_hex(sha1));
+ oid_to_hex(oid));
} else if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
status = error(_("unable to unpack %s header"),
- sha1_to_hex(sha1));
+ oid_to_hex(oid));
if (status < 0)
; /* Do nothing */
else if (hdrbuf.len) {
if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
- sha1_to_hex(sha1));
+ oid_to_hex(oid));
} else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0)
- status = error(_("unable to parse %s header"), sha1_to_hex(sha1));
+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
if (status >= 0 && oi->contentp) {
- *oi->contentp = unpack_sha1_rest(&stream, hdr,
- *oi->sizep, sha1);
+ *oi->contentp = unpack_loose_rest(&stream, hdr,
+ *oi->sizep, oid);
if (!*oi->contentp) {
git_inflate_end(&stream);
status = -1;
@@ -1292,7 +1292,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
return -1;
/* Most likely it's a loose object. */
- if (!sha1_loose_object_info(r, real->hash, oi, flags))
+ if (!loose_object_info(r, real, oi, flags))
return 0;
/* Not a loose object; someone else may have just packed it. */
@@ -1420,7 +1420,7 @@ void *read_object_file_extended(const struct object_id *oid,
die(_("replacement %s not found for %s"),
oid_to_hex(repl), oid_to_hex(oid));
- if (!stat_sha1_file(the_repository, repl->hash, &st, &path))
+ if (!stat_loose_object(the_repository, repl, &st, &path))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
@@ -1620,7 +1620,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
static struct strbuf tmp_file = STRBUF_INIT;
static struct strbuf filename = STRBUF_INIT;
- loose_object_path(the_repository, &filename, oid->hash);
+ loose_object_path(the_repository, &filename, oid);
fd = create_tmpfile(&tmp_file, filename.buf);
if (fd < 0) {
@@ -2196,7 +2196,7 @@ static int check_stream_sha1(git_zstream *stream,
/*
* This size comparison must be "<=" to read the final zlib packets;
- * see the comment in unpack_sha1_rest for details.
+ * see the comment in unpack_loose_rest for details.
*/
while (total_read <= size &&
(status == Z_OK ||
@@ -2245,7 +2245,7 @@ int read_loose_object(const char *path,
*contents = NULL;
- map = map_sha1_file_1(the_repository, path, NULL, &mapsize);
+ map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
goto out;
@@ -2267,7 +2267,7 @@ int read_loose_object(const char *path,
if (check_stream_sha1(&stream, hdr, *size, path, expected_oid->hash) < 0)
goto out;
} else {
- *contents = unpack_sha1_rest(&stream, hdr, *size, expected_oid->hash);
+ *contents = unpack_loose_rest(&stream, hdr, *size, expected_oid);
if (!*contents) {
error(_("unable to unpack contents of %s"), path);
git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index ac7c7a22f9..9049146bc1 100644
--- a/streaming.c
+++ b/streaming.c
@@ -338,8 +338,8 @@ static struct stream_vtbl loose_vtbl = {
static open_method_decl(loose)
{
- st->u.loose.mapped = map_sha1_file(the_repository,
- oid->hash, &st->u.loose.mapsize);
+ st->u.loose.mapped = map_loose_object(the_repository,
+ oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
if ((unpack_sha1_header(&st->z,
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-05 8:15 ` Jeff King
@ 2018-12-05 18:41 ` René Scharfe
2018-12-05 20:17 ` Jeff King
0 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-12-05 18:41 UTC (permalink / raw)
To: Jeff King
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
Am 05.12.2018 um 09:15 schrieb Jeff King:
> On Wed, Dec 05, 2018 at 01:51:36AM -0500, Jeff King wrote:
>
>>> This
>>> function is easily converted to struct object_id, though, as its single
>>> caller can pass one on -- this makes the copy unnecessary.
>>
>> If you mean modifying sha1_loose_object_info() to take an oid, then
>> sure, I agree that is a good step forward (and that is exactly the "punt
>> until" moment I meant).
>
> So the simple thing is to do that, and then have it pass oid->hash to
> the other functions it uses.
Yes.
> If we start to convert those, there's a
> little bit of a rabbit hole, but it's actually not too bad.
You don't need to crawl in just for quick_has_loose(), but eventually
everything has to be converted. It seems a bit much for one patch, but
perhaps that's just my ever-decreasing attention span speaking.
Converting one function prototype or struct member at a time seems
about the right amount of change per patch to me. That's not always
possible due to entanglement, of course.
> Most of the spill-over is into the dumb-http code. Note that it actually
> uses sha1 itself! That probably needs to be the_hash_algo (though I'm
> not even sure how we'd negotiate the algorithm across a dumb fetch). At
> any rate, I don't think this patch makes anything _worse_ in that
> respect.
Right.
> diff --git a/sha1-file.c b/sha1-file.c
> index 3ddf4c9426..0705709036 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -333,12 +333,12 @@ int raceproof_create_file(const char *path, create_file_fn fn, void *cb)
> return ret;
> }
>
> -static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
> +static void fill_loose_path(struct strbuf *buf, const struct object_id *oid)
The new name fits.
> @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags)
> * Note that it may point to static storage and is only valid until another
> * call to stat_sha1_file().
> */
> -static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> - struct stat *st, const char **path)
> +static int stat_loose_object(struct repository *r, const struct object_id *oid,
> + struct stat *st, const char **path)
Hmm, read_sha1_file() was renamed to read_object_file() earlier this
year, and I'd have expected this to become stat_object_file(). Names..
Anyway, the comment above and one a few lines below should be updated
as well.
> {
> struct object_directory *odb;
> static struct strbuf buf = STRBUF_INIT;
>
> prepare_alt_odb(r);
> for (odb = r->objects->odb; odb; odb = odb->next) {
> - *path = odb_loose_path(odb, &buf, sha1);
> + *path = odb_loose_path(odb, &buf, oid);
> if (!lstat(*path, st))
> return 0;
> }
> @@ -900,7 +900,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> * descriptor. See the caveats on the "path" parameter above.
> */
> static int open_sha1_file(struct repository *r,
> - const unsigned char *sha1, const char **path)
> + const struct object_id *oid, const char **path)
That function should lose the "sha1" in its name as well.
> -static void *map_sha1_file_1(struct repository *r, const char *path,
> - const unsigned char *sha1, unsigned long *size)
> +static void *map_loose_object_1(struct repository *r, const char *path,
> + const struct object_id *oid, unsigned long *size)
Similarly, map_object_file_1()?
> -void *map_sha1_file(struct repository *r,
> - const unsigned char *sha1, unsigned long *size)
> +void *map_loose_object(struct repository *r,
> + const struct object_id *oid,
> + unsigned long *size)
Similar.
> @@ -1045,7 +1043,9 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
> return -1;
> }
>
> -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)
> +static void *unpack_loose_rest(git_zstream *stream,
> + void *buffer, unsigned long size,
> + const struct object_id *oid)
Hmm, both old and new name here look weird to me at this point.
> -static int sha1_loose_object_info(struct repository *r,
> - const unsigned char *sha1,
> - struct object_info *oi, int flags)
> +static int loose_object_info(struct repository *r,
> + const struct object_id *oid,
> + struct object_info *oi, int flags)
And nothing of value was lost. :)
René
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-12-05 18:41 ` René Scharfe
@ 2018-12-05 20:17 ` Jeff King
0 siblings, 0 replies; 99+ messages in thread
From: Jeff King @ 2018-12-05 20:17 UTC (permalink / raw)
To: René Scharfe
Cc: Ævar Arnfjörð Bjarmason, Geert Jansen,
Junio C Hamano, git, Takuto Ikuta
On Wed, Dec 05, 2018 at 07:41:44PM +0100, René Scharfe wrote:
> > If we start to convert those, there's a
> > little bit of a rabbit hole, but it's actually not too bad.
>
> You don't need to crawl in just for quick_has_loose(), but eventually
> everything has to be converted. It seems a bit much for one patch, but
> perhaps that's just my ever-decreasing attention span speaking.
Yeah, my normal process here is to dig to the bottom of the rabbit hole,
and then break it into actual patches. I just shared the middle state
here. ;)
I suspect the http bits could be split off into their own thing. The
bits in sha1-file.c I'd plan to mostly do all together, as they are
a family of related functions.
Mostly I wasn't sure how to wrap this up with the other changes. It's
obviously pretty invasive, and I don't want it to get in the way of
actual functional changes we've been discussing.
> > @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags)
> > * Note that it may point to static storage and is only valid until another
> > * call to stat_sha1_file().
> > */
> > -static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> > - struct stat *st, const char **path)
> > +static int stat_loose_object(struct repository *r, const struct object_id *oid,
> > + struct stat *st, const char **path)
>
> Hmm, read_sha1_file() was renamed to read_object_file() earlier this
> year, and I'd have expected this to become stat_object_file(). Names..
read_object_file() is about reading an object from _any_ source. These
are specifically about loose objects, and I think that distinction is
important (both here and for map_loose_object, etc).
I'd actually argue that read_object_file() should just be read_object(),
but that already exists. Sigh. (I think it's fixable, but obviously
orthogonal to this topic).
> Anyway, the comment above and one a few lines below should be updated
> as well.
Thanks, fixed.
> > static int open_sha1_file(struct repository *r,
> > - const unsigned char *sha1, const char **path)
> > + const struct object_id *oid, const char **path)
>
> That function should lose the "sha1" in its name as well.
Yep, fixed.
> > -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)
> > +static void *unpack_loose_rest(git_zstream *stream,
> > + void *buffer, unsigned long size,
> > + const struct object_id *oid)
>
> Hmm, both old and new name here look weird to me at this point.
It makes more sense in the pairing of unpack_sha1_header() and
unpack_sha1_rest(). Maybe "body" would be better than "rest".
At any rate, it probably makes sense to rename them together (but I
didn't touch the "header" one here). Maybe the name changes should come
as a separate patch. I was mostly changing them here because I was
changing the signatures anyway, and had to touch all of the callers.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 16:21 ` Jeff King
2018-11-12 22:18 ` Ævar Arnfjörð Bjarmason
@ 2018-11-12 22:44 ` Geert Jansen
1 sibling, 0 replies; 99+ messages in thread
From: Geert Jansen @ 2018-11-12 22:44 UTC (permalink / raw)
To: Jeff King
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On Mon, Nov 12, 2018 at 11:21:51AM -0500, Jeff King wrote:
> No, but they don't even really need to be actual objects. So I suspect
> something like:
>
> git init
> for i in $(seq 256); do
> i=$(printf %02x $i)
> mkdir -p .git/objects/$i
> for j in $(seq --format=%038g 1000); do
> echo foo >.git/objects/$i/$j
> done
> done
> git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>
> might work (for various values of 1000). The shell loop would probably
> be faster as perl, too. :)
>
> Make sure you clear the object directory between runs, though (otherwise
> the subsequent index-pack's really do find collisions and spend time
> accessing the objects).
Below are my results. They are not as comprehensive as Ævar's tests. Similary I
kept the loose objects between tests and removed the packs instead. And I also
used the "echo 3 | sudo tee /proc/sys/vm/drop_caches" trick :)
This is with git.git:
origin/master jk/loose-object-cache
256*100 objects 520s 13.5s (-97%)
256*1000 objects 826s 59s (-93%)
I've started a 256*10K setup but that's still creating the 2.5M loose objects.
I'll post the results when it's done. I would expect that jk/loose-object-cache
is still marginally faster than origin/master based on a simple linear
extrapolation.
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-12 14:54 ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
2018-11-12 16:00 ` Derrick Stolee
2018-11-12 16:01 ` Ævar Arnfjörð Bjarmason
@ 2018-11-27 20:48 ` René Scharfe
2018-12-01 19:49 ` Jeff King
2 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-11-27 20:48 UTC (permalink / raw)
To: Jeff King, Geert Jansen, Ævar Arnfjörð Bjarmason
Cc: Junio C Hamano, git, Takuto Ikuta
Am 12.11.2018 um 15:54 schrieb Jeff King:
> diff --git a/sha1-file.c b/sha1-file.c
> index 4aae716a37..e53da0b701 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r,
> return -1;
> }
>
> +static int quick_has_loose(struct repository *r,
> + const unsigned char *sha1)
> +{
> + int subdir_nr = sha1[0];
> + struct object_id oid;
> + struct object_directory *odb;
> +
> + hashcpy(oid.hash, sha1);
> +
> + prepare_alt_odb(r);
> + for (odb = r->objects->odb; odb; odb = odb->next) {
> + odb_load_loose_cache(odb, subdir_nr);
Is this thread-safe? What happens if e.g. one index-pack thread resizes
the array while another one sorts it?
Loading the cache explicitly up-front would avoid that, and improves
performance a bit in my (very limited) tests on an SSD. Demo patch for
next at the bottom. How does it do against your test cases?
> + if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
> + return 1;
> + }
> + return 0;
> +}
> +
> /*
> * Map the loose object at "path" if it is not NULL, or the path found by
> * searching for a loose object named "sha1".
> @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r,
> if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
> const char *path;
> struct stat st;
> + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
> + return quick_has_loose(r, sha1) ? 0 : -1;
> if (stat_sha1_file(r, sha1, &st, &path) < 0)
> return -1;
> if (oi->disk_sizep)
>
builtin/fetch.c | 2 ++
builtin/index-pack.c | 2 ++
fetch-pack.c | 2 ++
object-store.h | 1 +
sha1-file.c | 30 +++++++++++++++++++++++++++---
5 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index e0140327aa..4b031f5da5 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -301,6 +301,8 @@ static void find_non_local_tags(const struct ref *refs,
refname_hash_init(&existing_refs);
refname_hash_init(&remote_refs);
+ repo_load_loose_cache(the_repository);
+
for_each_ref(add_one_refname, &existing_refs);
for (ref = refs; ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ac1f4ea9a7..7fc6321c77 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1772,6 +1772,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
if (show_stat)
obj_stat = xcalloc(st_add(nr_objects, 1), sizeof(struct object_stat));
ofs_deltas = xcalloc(nr_objects, sizeof(struct ofs_delta_entry));
+ if (startup_info->have_repository)
+ repo_load_loose_cache(the_repository);
parse_pack_objects(pack_hash);
if (report_end_of_input)
write_in_full(2, "\0", 1);
diff --git a/fetch-pack.c b/fetch-pack.c
index dd6700bda9..96c4624d9e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -656,6 +656,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
save_commit_buffer = 0;
+ repo_load_loose_cache(the_repository);
+
for (ref = *refs; ref; ref = ref->next) {
struct object *o;
diff --git a/object-store.h b/object-store.h
index 8dceed0f31..f98dd3c857 100644
--- a/object-store.h
+++ b/object-store.h
@@ -53,6 +53,7 @@ void add_to_alternates_memory(const char *dir);
* from 0 to 255 inclusive).
*/
void odb_load_loose_cache(struct object_directory *odb, int subdir_nr);
+void repo_load_loose_cache(struct repository *r);
struct packed_git {
struct packed_git *next;
diff --git a/sha1-file.c b/sha1-file.c
index 05f63dfd4e..ae12f0a198 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -921,10 +921,19 @@ static int open_sha1_file(struct repository *r,
return -1;
}
+static int quick_has_loose_odb(struct object_directory *odb,
+ const struct object_id *oid)
+{
+ int subdir_nr = oid->hash[0];
+
+ if (odb->loose_objects_subdir_seen[subdir_nr])
+ return oid_array_lookup(&odb->loose_objects_cache, oid) >= 0;
+ return check_and_freshen_odb(odb, oid, 0);
+}
+
static int quick_has_loose(struct repository *r,
const unsigned char *sha1)
{
- int subdir_nr = sha1[0];
struct object_id oid;
struct object_directory *odb;
@@ -932,8 +941,7 @@ static int quick_has_loose(struct repository *r,
prepare_alt_odb(r);
for (odb = r->objects->odb; odb; odb = odb->next) {
- odb_load_loose_cache(odb, subdir_nr);
- if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
+ if (quick_has_loose_odb(odb, &oid))
return 1;
}
return 0;
@@ -2178,6 +2186,22 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
strbuf_release(&buf);
}
+void repo_load_loose_cache(struct repository *r)
+{
+ struct object_directory *odb;
+
+ prepare_alt_odb(r);
+ for (odb = r->objects->odb; odb; odb = odb->next) {
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(odb->loose_objects_subdir_seen); i++)
+ odb_load_loose_cache(odb, i);
+
+ /* Sort as a side-effect, only read the cache from here on. */
+ oid_array_lookup(&odb->loose_objects_cache, &null_oid);
+ }
+}
+
static int check_stream_sha1(git_zstream *stream,
const char *hdr,
unsigned long size,
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check
2018-11-27 20:48 ` René Scharfe
@ 2018-12-01 19:49 ` Jeff King
0 siblings, 0 replies; 99+ messages in thread
From: Jeff King @ 2018-12-01 19:49 UTC (permalink / raw)
To: René Scharfe
Cc: Geert Jansen, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, Takuto Ikuta
On Tue, Nov 27, 2018 at 09:48:57PM +0100, René Scharfe wrote:
> > +static int quick_has_loose(struct repository *r,
> > + const unsigned char *sha1)
> > +{
> > + int subdir_nr = sha1[0];
> > + struct object_id oid;
> > + struct object_directory *odb;
> > +
> > + hashcpy(oid.hash, sha1);
> > +
> > + prepare_alt_odb(r);
> > + for (odb = r->objects->odb; odb; odb = odb->next) {
> > + odb_load_loose_cache(odb, subdir_nr);
>
> Is this thread-safe? What happens if e.g. one index-pack thread resizes
> the array while another one sorts it?
No, but neither is any of the object_info / has_object_file path, which
may use static function-local buffers, or (before my series) alt scratch
bufs, or even call reprepare_packed_git().
In the long run, I think the solution is probably going to be pushing
some mutexes into the right places, and putting one around the cache
fill is an obvious place.
> Loading the cache explicitly up-front would avoid that, and improves
> performance a bit in my (very limited) tests on an SSD. Demo patch for
> next at the bottom. How does it do against your test cases?
It's going to do badly on corner cases where we don't need to load every
object subdirectory, and one or more of them are big. I.e., if I look up
"1234abcd", the current code only needs to fault in $GIT_DIR/objects/12.
Pre-loading means we'd hit them all. Even without a lot of objects, on
NFS that's 256 latencies instead of 1.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* [PATCH 9/9] fetch-pack: drop custom loose object cache
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (7 preceding siblings ...)
2018-11-12 14:54 ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
@ 2018-11-12 14:55 ` Jeff King
2018-11-12 19:25 ` René Scharfe
2018-11-12 16:02 ` [PATCH 0/9] caching loose objects Derrick Stolee
9 siblings, 1 reply; 99+ messages in thread
From: Jeff King @ 2018-11-12 14:55 UTC (permalink / raw)
To: Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
loose objects we don't have.
Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
custom solution.
Note that this might perform slightly differently, as the original code
stopped calling readdir() when we saw more loose objects than there were
refs. So:
1. The old code might have spent work on readdir() to fill the cache,
but then decided there were too many loose objects, wasting that
effort.
2. The new code might spend a lot of time on readdir() if you have a
lot of loose objects, even though there are very few objects to
ask about.
In practice it probably won't matter either way; see the previous commit
for some discussion of the tradeoff.
Signed-off-by: Jeff King <peff@peff.net>
---
fetch-pack.c | 39 ++-------------------------------------
1 file changed, 2 insertions(+), 37 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index b3ed7121bc..25a88f4eb2 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -636,23 +636,6 @@ struct loose_object_iter {
struct ref *refs;
};
-/*
- * If the number of refs is not larger than the number of loose objects,
- * this function stops inserting.
- */
-static int add_loose_objects_to_set(const struct object_id *oid,
- const char *path,
- void *data)
-{
- struct loose_object_iter *iter = data;
- oidset_insert(iter->loose_object_set, oid);
- if (iter->refs == NULL)
- return 1;
-
- iter->refs = iter->refs->next;
- return 0;
-}
-
/*
* Mark recent commits available locally and reachable from a local ref as
* COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
@@ -670,30 +653,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
struct ref *ref;
int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
- struct oidset loose_oid_set = OIDSET_INIT;
- int use_oidset = 0;
- struct loose_object_iter iter = {&loose_oid_set, *refs};
-
- /* Enumerate all loose objects or know refs are not so many. */
- use_oidset = !for_each_loose_object(add_loose_objects_to_set,
- &iter, 0);
save_commit_buffer = 0;
for (ref = *refs; ref; ref = ref->next) {
struct object *o;
- unsigned int flags = OBJECT_INFO_QUICK;
- if (use_oidset &&
- !oidset_contains(&loose_oid_set, &ref->old_oid)) {
- /*
- * I know this does not exist in the loose form,
- * so check if it exists in a non-loose form.
- */
- flags |= OBJECT_INFO_IGNORE_LOOSE;
- }
-
- if (!has_object_file_with_flags(&ref->old_oid, flags))
+ if (!has_object_file_with_flags(&ref->old_oid,
+ OBJECT_INFO_QUICK))
continue;
o = parse_object(the_repository, &ref->old_oid);
if (!o)
@@ -710,8 +677,6 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
}
}
- oidset_clear(&loose_oid_set);
-
if (!args->deepen) {
for_each_ref(mark_complete_oid, NULL);
for_each_cached_alternate(NULL, mark_alternate_complete);
--
2.19.1.1577.g2c5b293d4f
^ permalink raw reply related [flat|nested] 99+ messages in thread
* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
2018-11-12 14:55 ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
@ 2018-11-12 19:25 ` René Scharfe
2018-11-12 19:32 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 99+ messages in thread
From: René Scharfe @ 2018-11-12 19:25 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
Takuto Ikuta
Am 12.11.2018 um 15:55 schrieb Jeff King:
> Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
> object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
> loose objects we don't have.
>
> Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
> custom solution.
>
> Note that this might perform slightly differently, as the original code
> stopped calling readdir() when we saw more loose objects than there were
> refs. So:
>
> 1. The old code might have spent work on readdir() to fill the cache,
> but then decided there were too many loose objects, wasting that
> effort.
>
> 2. The new code might spend a lot of time on readdir() if you have a
> lot of loose objects, even though there are very few objects to
> ask about.
Plus the old code used an oidset while the new one uses an oid_array.
> In practice it probably won't matter either way; see the previous commit
> for some discussion of the tradeoff.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> fetch-pack.c | 39 ++-------------------------------------
> 1 file changed, 2 insertions(+), 37 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index b3ed7121bc..25a88f4eb2 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -636,23 +636,6 @@ struct loose_object_iter {
> struct ref *refs;
> };
>
> -/*
> - * If the number of refs is not larger than the number of loose objects,
> - * this function stops inserting.
> - */
> -static int add_loose_objects_to_set(const struct object_id *oid,
> - const char *path,
> - void *data)
> -{
> - struct loose_object_iter *iter = data;
> - oidset_insert(iter->loose_object_set, oid);
> - if (iter->refs == NULL)
> - return 1;
> -
> - iter->refs = iter->refs->next;
> - return 0;
> -}
> -
> /*
> * Mark recent commits available locally and reachable from a local ref as
> * COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
> @@ -670,30 +653,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
> struct ref *ref;
> int old_save_commit_buffer = save_commit_buffer;
> timestamp_t cutoff = 0;
> - struct oidset loose_oid_set = OIDSET_INIT;
> - int use_oidset = 0;
> - struct loose_object_iter iter = {&loose_oid_set, *refs};
> -
> - /* Enumerate all loose objects or know refs are not so many. */
> - use_oidset = !for_each_loose_object(add_loose_objects_to_set,
> - &iter, 0);
>
> save_commit_buffer = 0;
>
> for (ref = *refs; ref; ref = ref->next) {
> struct object *o;
> - unsigned int flags = OBJECT_INFO_QUICK;
>
> - if (use_oidset &&
> - !oidset_contains(&loose_oid_set, &ref->old_oid)) {
> - /*
> - * I know this does not exist in the loose form,
> - * so check if it exists in a non-loose form.
> - */
> - flags |= OBJECT_INFO_IGNORE_LOOSE;
This removes the only user of OBJECT_INFO_IGNORE_LOOSE. #leftoverbits
> - }
> -
> - if (!has_object_file_with_flags(&ref->old_oid, flags))
> + if (!has_object_file_with_flags(&ref->old_oid,
> + OBJECT_INFO_QUICK))
> continue;
> o = parse_object(the_repository, &ref->old_oid);
> if (!o)
> @@ -710,8 +677,6 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
> }
> }
>
> - oidset_clear(&loose_oid_set);
> -
> if (!args->deepen) {
> for_each_ref(mark_complete_oid, NULL);
> for_each_cached_alternate(NULL, mark_alternate_complete);
>
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
2018-11-12 19:25 ` René Scharfe
@ 2018-11-12 19:32 ` Ævar Arnfjörð Bjarmason
2018-11-12 20:07 ` Jeff King
2018-11-12 20:13 ` René Scharfe
0 siblings, 2 replies; 99+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 19:32 UTC (permalink / raw)
To: René Scharfe
Cc: Jeff King, Geert Jansen, Junio C Hamano, git, Takuto Ikuta
On Mon, Nov 12 2018, René Scharfe wrote:
> Am 12.11.2018 um 15:55 schrieb Jeff King:
>> Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
>> object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
>> loose objects we don't have.
>>
>> Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
>> custom solution.
>>
>> Note that this might perform slightly differently, as the original code
>> stopped calling readdir() when we saw more loose objects than there were
>> refs. So:
>>
>> 1. The old code might have spent work on readdir() to fill the cache,
>> but then decided there were too many loose objects, wasting that
>> effort.
>>
>> 2. The new code might spend a lot of time on readdir() if you have a
>> lot of loose objects, even though there are very few objects to
>> ask about.
>
> Plus the old code used an oidset while the new one uses an oid_array.
>
>> In practice it probably won't matter either way; see the previous commit
>> for some discussion of the tradeoff.
>>
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>> fetch-pack.c | 39 ++-------------------------------------
>> 1 file changed, 2 insertions(+), 37 deletions(-)
>>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index b3ed7121bc..25a88f4eb2 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -636,23 +636,6 @@ struct loose_object_iter {
>> struct ref *refs;
>> };
>>
>> -/*
>> - * If the number of refs is not larger than the number of loose objects,
>> - * this function stops inserting.
>> - */
>> -static int add_loose_objects_to_set(const struct object_id *oid,
>> - const char *path,
>> - void *data)
>> -{
>> - struct loose_object_iter *iter = data;
>> - oidset_insert(iter->loose_object_set, oid);
>> - if (iter->refs == NULL)
>> - return 1;
>> -
>> - iter->refs = iter->refs->next;
>> - return 0;
>> -}
>> -
>> /*
>> * Mark recent commits available locally and reachable from a local ref as
>> * COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
>> @@ -670,30 +653,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>> struct ref *ref;
>> int old_save_commit_buffer = save_commit_buffer;
>> timestamp_t cutoff = 0;
>> - struct oidset loose_oid_set = OIDSET_INIT;
>> - int use_oidset = 0;
>> - struct loose_object_iter iter = {&loose_oid_set, *refs};
>> -
>> - /* Enumerate all loose objects or know refs are not so many. */
>> - use_oidset = !for_each_loose_object(add_loose_objects_to_set,
>> - &iter, 0);
>>
>> save_commit_buffer = 0;
>>
>> for (ref = *refs; ref; ref = ref->next) {
>> struct object *o;
>> - unsigned int flags = OBJECT_INFO_QUICK;
>>
>> - if (use_oidset &&
>> - !oidset_contains(&loose_oid_set, &ref->old_oid)) {
>> - /*
>> - * I know this does not exist in the loose form,
>> - * so check if it exists in a non-loose form.
>> - */
>> - flags |= OBJECT_INFO_IGNORE_LOOSE;
>
> This removes the only user of OBJECT_INFO_IGNORE_LOOSE. #leftoverbits
With this series applied there's still a use of it left in
oid_object_info_extended()
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
2018-11-12 19:32 ` Ævar Arnfjörð Bjarmason
@ 2018-11-12 20:07 ` Jeff King
2018-11-12 20:13 ` René Scharfe
1 sibling, 0 replies; 99+ messages in thread
From: Jeff King @ 2018-11-12 20:07 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: René Scharfe, Geert Jansen, Junio C Hamano, git, Takuto Ikuta
On Mon, Nov 12, 2018 at 08:32:43PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> for (ref = *refs; ref; ref = ref->next) {
> >> struct object *o;
> >> - unsigned int flags = OBJECT_INFO_QUICK;
> >>
> >> - if (use_oidset &&
> >> - !oidset_contains(&loose_oid_set, &ref->old_oid)) {
> >> - /*
> >> - * I know this does not exist in the loose form,
> >> - * so check if it exists in a non-loose form.
> >> - */
> >> - flags |= OBJECT_INFO_IGNORE_LOOSE;
> >
> > This removes the only user of OBJECT_INFO_IGNORE_LOOSE. #leftoverbits
>
> With this series applied there's still a use of it left in
> oid_object_info_extended()
That's just the code that does something with the flag. No callers pass
it in anymore, so we could drop the flag _and_ that code.
-Peff
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
2018-11-12 19:32 ` Ævar Arnfjörð Bjarmason
2018-11-12 20:07 ` Jeff King
@ 2018-11-12 20:13 ` René Scharfe
1 sibling, 0 replies; 99+ messages in thread
From: René Scharfe @ 2018-11-12 20:13 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Jeff King, Geert Jansen, Junio C Hamano, git, Takuto Ikuta
Am 12.11.2018 um 20:32 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Nov 12 2018, René Scharfe wrote:
>> This removes the only user of OBJECT_INFO_IGNORE_LOOSE. #leftoverbits
>
> With this series applied there's still a use of it left in
> oid_object_info_extended()
OK, rephrasing: With that patch, OBJECT_INFO_IGNORE_LOOSE is never set
anymore, and its check in oid_object_info_extended() as well as its
definition can be removed.
René
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 0/9] caching loose objects
2018-11-12 14:46 ` [PATCH 0/9] caching loose objects Jeff King
` (8 preceding siblings ...)
2018-11-12 14:55 ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
@ 2018-11-12 16:02 ` Derrick Stolee
2018-11-12 19:10 ` Stefan Beller
9 siblings, 1 reply; 99+ messages in thread
From: Derrick Stolee @ 2018-11-12 16:02 UTC (permalink / raw)
To: Jeff King, Geert Jansen
Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, git,
René Scharfe, Takuto Ikuta
On 11/12/2018 9:46 AM, Jeff King wrote:
> Here's the series I mentioned earlier in the thread to cache loose
> objects when answering has_object_file(..., OBJECT_INFO_QUICK). For
> those just joining us, this makes operations that look up a lot of
> missing objects (like "index-pack" looking for collisions) faster. This
> is mostly targeted at systems where stat() is slow, like over NFS, but
> it seems to give a 2% speedup indexing a full git.git packfile into an
> empty repository (i.e., what you'd see on a clone).
>
> I'm adding René Scharfe and Takuto Ikuta to the cc for their previous
> work in loose-object caching.
>
> The interesting bit is patch 8. The rest of it is cleanup to let us
> treat alternates and the main object directory similarly.
This cleanup is actually really valuable, and affects much more than
this application.
I really think it is a good idea, and hope it doesn't cause too much
trouble as the topic is cooking.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 99+ messages in thread
* Re: [PATCH 0/9] caching loose objects
2018-11-12 16:02 ` [PATCH 0/9] caching loose objects Derrick Stolee
@ 2018-11-12 19:10 ` Stefan Beller
0 siblings, 0 replies; 99+ messages in thread
From: Stefan Beller @ 2018-11-12 19:10 UTC (permalink / raw)
To: Derrick Stolee
Cc: Jeff King, gerardu, Ævar Arnfjörð Bjarmason,
Junio C Hamano, git, René Scharfe, tikuta
On Mon, Nov 12, 2018 at 8:02 AM Derrick Stolee <stolee@gmail.com> wrote:
> This cleanup is actually really valuable, and affects much more than
> this application.
I second this. I'd value this series more for the cleanup than its
application. ;-)
^ permalink raw reply [flat|nested] 99+ messages in thread