All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jansen, Geert" <gerardu@amazon.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: [RFC PATCH] index-pack: improve performance on NFS
Date: Thu, 25 Oct 2018 18:38:09 +0000	[thread overview]
Message-ID: <ED25E182-C296-4D08-8170-340567D8964A@amazon.com> (raw)

The index-pack command determines if a sha1 collision test is needed by
checking the existence of a loose object with the given hash. In my tests, I
can improve performance of “git clone” on Amazon EFS by 8x when used with a
non-default mount option (lookupcache=pos) that's required for a Gitlab HA
setup. My assumption is that this check is unnecessary when cloning into a new
repository because the repository will be empty.

By default, the Linux NFS client will cache directory entries as well as the
non-existence of directory entries. The latter means that when client c1 does
stat() on a file that does not exist, the non-existence will be cached and any
subsequent stat() operation on the file will return -ENOENT until the cache
expires or is invalidated, even if the file was created on client c2 in the
mean time. This leads to errors in a Gitlab HA setup when it distributes jobs
over multiple worker nodes assuming each worker node has the same view of the
shared file system.

The recommended workaround by Gitlab is to use the “lookupcache=pos” NFS mount
option which disables the negative lookup cache. This option has a high
performance impact. Cloning the gitlab-ce repository
(https://gitlab.com/gitlab-org/gitlib-ce.git) into an NFS mounted directory
gives the following results:

  lookupcache=all (default): 624 seconds
  lookupcache=pos: 4957 seconds

The reason for the poor performance is that index-pack will issue a stat()
call for every object in the repo when checking if a collision test is needed.
These stat() calls result in the following NFS operations:

  LOOKUP dirfh=".git/objects", name="01" -> NFS4ERR_ENOENT

With lookupcache=all, the non-existence of the .git/objects/XX directories is
cached, so that there will be at most 256 LOOKUP calls. With lookupcache=pos,
there will be one LOOKUP operation for every object in the repository, which
in case of the gitlab-ce repo is about 1.3 million times.

The attached patch removes the collision check when cloning into a new
repository. The performance of git clone with this patch is:

  lookupcache=pos (with patch): 577 seconds

I'd welcome feedback on the attached patch and whether my assumption that the
sha1 collision check can be safely omitted when cloning into a new repository
is correct.

Signed-off-by: Geert Jansen <gerardu@amazon.com>
---
 builtin/index-pack.c | 5 ++++-
 fetch-pack.c         | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2004e25da..22b3d40fb 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -84,6 +84,7 @@ static int verbose;
 static int show_resolving_progress;
 static int show_stat;
 static int check_self_contained_and_connected;
+static int cloning;
 
 static struct progress *progress;
 
@@ -794,7 +795,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 
 	assert(data || obj_entry);
 
-	if (startup_info->have_repository) {
+	if (startup_info->have_repository && !cloning) {
 		read_lock();
 		collision_test_needed =
 			has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK);
@@ -1705,6 +1706,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				check_self_contained_and_connected = 1;
 			} else if (!strcmp(arg, "--fsck-objects")) {
 				do_fsck_object = 1;
+			} else if (!strcmp(arg, "--cloning")) {
+				cloning = 1;
 			} else if (!strcmp(arg, "--verify")) {
 				verify = 1;
 			} else if (!strcmp(arg, "--verify-stat")) {
diff --git a/fetch-pack.c b/fetch-pack.c
index b3ed7121b..c75bfb8aa 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -843,6 +843,8 @@ static int get_pack(struct fetch_pack_args *args,
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
 		if (args->from_promisor)
 			argv_array_push(&cmd.args, "--promisor");
+		if (args->cloning)
+			argv_array_pushf(&cmd.args, "--cloning");
 	}
 	else {
 		cmd_name = "unpack-objects";
-- 
2.19.1.328.g5a0cc8aca.dirty



             reply	other threads:[~2018-10-25 18:38 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-25 18:38 Jansen, Geert [this message]
2018-10-26  0:21 ` [RFC PATCH] index-pack: improve performance on NFS Junio C Hamano
2018-10-26 20:38   ` Ævar Arnfjörð Bjarmason
2018-10-27  7:26     ` Junio C Hamano
2018-10-27  9:33       ` Jeff King
2018-10-27 11:22         ` Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking Ævar Arnfjörð Bjarmason
2018-10-30  2:49             ` Geert Jansen
2018-10-30  9:04               ` Junio C Hamano
2018-10-30 18:43             ` [PATCH v2 0/3] index-pack: test updates Ævar Arnfjörð Bjarmason
2018-11-13 20:19               ` [PATCH v3] index-pack: add ability to disable SHA-1 collision check Ævar Arnfjörð Bjarmason
2018-11-14  7:09                 ` Junio C Hamano
2018-11-14 12:40                   ` Ævar Arnfjörð Bjarmason
2018-10-30 18:43             ` [PATCH v2 1/3] pack-objects test: modernize style Ævar Arnfjörð Bjarmason
2018-10-30 18:43             ` [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-30 18:43             ` [PATCH v2 3/3] index-pack tests: don't leave test repo dirty " Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 1/4] pack-objects test: modernize style Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 3/4] index-pack tests: don't leave test repo dirty " Ævar Arnfjörð Bjarmason
2018-10-28 22:50           ` [PATCH 4/4] index-pack: add ability to disable SHA-1 collision check Ævar Arnfjörð Bjarmason
2018-10-29 15:04           ` [RFC PATCH] index-pack: improve performance on NFS Jeff King
2018-10-29 15:09             ` Jeff King
2018-10-29 19:36             ` Ævar Arnfjörð Bjarmason
2018-10-29 23:27               ` Jeff King
2018-11-07 22:55                 ` Geert Jansen
2018-11-08 12:02                   ` Jeff King
2018-11-08 20:58                     ` Geert Jansen
2018-11-08 21:18                       ` Jeff King
2018-11-08 21:55                         ` Geert Jansen
2018-11-08 22:20                     ` Ævar Arnfjörð Bjarmason
2018-11-09 10:11                       ` Ævar Arnfjörð Bjarmason
2018-11-12 14:31                       ` Jeff King
2018-11-12 14:46                     ` [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:46                       ` [PATCH 1/9] fsck: do not reuse child_process structs Jeff King
2018-11-12 15:26                         ` Derrick Stolee
2018-11-12 14:47                       ` [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with() Jeff King
2018-11-12 18:23                         ` Stefan Beller
2018-11-12 14:48                       ` [PATCH 3/9] rename "alternate_object_database" to "object_directory" Jeff King
2018-11-12 15:30                         ` Derrick Stolee
2018-11-12 15:36                           ` Jeff King
2018-11-12 19:41                             ` Ramsay Jones
2018-11-12 14:48                       ` [PATCH 4/9] sha1_file_name(): overwrite buffer instead of appending Jeff King
2018-11-12 15:32                         ` Derrick Stolee
2018-11-12 14:49                       ` [PATCH 5/9] handle alternates paths the same as the main object dir Jeff King
2018-11-12 15:38                         ` Derrick Stolee
2018-11-12 15:46                           ` Jeff King
2018-11-12 15:50                             ` Derrick Stolee
2018-11-12 14:50                       ` [PATCH 6/9] sha1-file: use an object_directory for " Jeff King
2018-11-12 15:48                         ` Derrick Stolee
2018-11-12 16:09                           ` Jeff King
2018-11-12 19:04                             ` Stefan Beller
2018-11-22 17:42                               ` Jeff King
2018-11-12 18:48                           ` Stefan Beller
2018-11-12 14:50                       ` [PATCH 7/9] object-store: provide helpers for loose_objects_cache Jeff King
2018-11-12 19:24                         ` René Scharfe
2018-11-12 20:16                           ` Jeff King
2018-11-12 14:54                       ` [PATCH 8/9] sha1-file: use loose object cache for quick existence check Jeff King
2018-11-12 16:00                         ` Derrick Stolee
2018-11-12 16:01                         ` Ævar Arnfjörð Bjarmason
2018-11-12 16:21                           ` Jeff King
2018-11-12 22:18                             ` Ævar Arnfjörð Bjarmason
2018-11-12 22:30                               ` Ævar Arnfjörð Bjarmason
2018-11-13 10:02                                 ` Ævar Arnfjörð Bjarmason
2018-11-14 18:21                                   ` René Scharfe
2018-12-02 10:52                                   ` René Scharfe
2018-12-03 22:04                                     ` Jeff King
2018-12-04 21:45                                       ` René Scharfe
2018-12-05  4:46                                         ` Jeff King
2018-12-05  6:02                                           ` René Scharfe
2018-12-05  6:51                                             ` Jeff King
2018-12-05  8:15                                               ` Jeff King
2018-12-05 18:41                                                 ` René Scharfe
2018-12-05 20:17                                                   ` Jeff King
2018-11-12 22:44                             ` Geert Jansen
2018-11-27 20:48                         ` René Scharfe
2018-12-01 19:49                           ` Jeff King
2018-11-12 14:55                       ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
2018-11-12 19:25                         ` René Scharfe
2018-11-12 19:32                           ` Ævar Arnfjörð Bjarmason
2018-11-12 20:07                             ` Jeff King
2018-11-12 20:13                             ` René Scharfe
2018-11-12 16:02                       ` [PATCH 0/9] caching loose objects Derrick Stolee
2018-11-12 19:10                         ` Stefan Beller
2018-11-09 13:43                   ` [RFC PATCH] index-pack: improve performance on NFS Ævar Arnfjörð Bjarmason
2018-11-09 16:08                     ` Duy Nguyen
2018-11-10 14:04                       ` Ævar Arnfjörð Bjarmason
2018-11-12 14:34                         ` Jeff King
2018-11-12 22:58                     ` Geert Jansen
2018-10-27 14:04         ` Duy Nguyen
2018-10-29 15:18           ` Jeff King
2018-10-29  0:48         ` Junio C Hamano
2018-10-29 15:20           ` Jeff King
2018-10-29 18:43             ` Ævar Arnfjörð Bjarmason
2018-10-29 21:34           ` Geert Jansen
2018-10-29 21:50             ` Jeff King
2018-10-29 22:21               ` Geert Jansen
2018-10-29 22:27             ` Jeff King
2018-10-29 22:35               ` Stefan Beller
2018-10-29 23:29                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ED25E182-C296-4D08-8170-340567D8964A@amazon.com \
    --to=gerardu@amazon.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.