All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] read-cache: call verify_hdr() in a background thread
@ 2017-04-03 18:53 git
  2017-04-03 18:53 ` [PATCH v4 1/4] read-cache: core.checksumindex git
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: git @ 2017-04-03 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Version 4 of this patch series incorporates the cleanup made by Junio
to my 3rd veresion, updates fsck to always verify the checksum, and a
new simplified perf test using p0002 as discussed on the mailing list.

================
Version 3 of this patch series simplifies this effort to just turn
on/off the hash verification using a "core.checksumindex" config variable.

I've preserved the original checksum validation code so that we can
force it on in fsck if desired.

It eliminates the original threading model completely.

Jeff Hostetler (4):
  read-cache: core.checksumindex
  fsck: force core.checksumindex=1
  t1450-fsck: test core.checksumindex
  p0002-read-cache: test core.checksumindex

 Documentation/config.txt   |  8 ++++++++
 builtin/fsck.c             |  1 +
 cache.h                    |  7 +++++++
 read-cache.c               | 18 ++++++++++++++++++
 t/perf/p0002-read-cache.sh |  8 +++++++-
 t/t1450-fsck.sh            | 27 +++++++++++++++++++++++++++
 6 files changed, 68 insertions(+), 1 deletion(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/4] read-cache: core.checksumindex
  2017-04-03 18:53 [PATCH v4 0/4] read-cache: call verify_hdr() in a background thread git
@ 2017-04-03 18:53 ` git
  2017-04-03 18:53 ` [PATCH v4 2/4] fsck: force core.checksumindex=1 git
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: git @ 2017-04-03 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to skip verification of the SHA-1 checksum at the end of
the index file in verify_hdr() called from read_index() when the
core.checksumIndex configuration variable is set to false.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

On the Linux kernel repository, the effect is rather trivial.
The time to reading its index with 58k entries drops from 0.0284 sec
down to 0.0155 sec.

On my Windows source tree (450MB index), I'm seeing a savings of 0.6
seconds -- read_index() went from 1.2 to 0.6 seconds.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt |  8 ++++++++
 read-cache.c             | 12 ++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 47603f5..0f72bdd 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -267,6 +267,14 @@ advice.*::
 		show directions on how to proceed from the current state.
 --
 
+core.checksumIndex::
+	Tell Git to validate the checksum at the end of the index
+	file to detect corruption.  Defaults to `true`.  Those who
+	work on a project with too many files may want to set this
+	variable to `false` to make it faster to load the index (in
+	exchange for reliability, but in general modern disks are
+	reliable enough for most people).
+
 core.fileMode::
 	Tells Git if the executable bit of files in the working tree
 	is to be honored.
diff --git a/read-cache.c b/read-cache.c
index 9054369..dd64cde 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1376,12 +1376,24 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	git_SHA_CTX c;
 	unsigned char sha1[20];
 	int hdr_version;
+	int do_checksum = 1; /* default to true for now */
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
 		return error("bad signature");
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+
+	/*
+	 * Since we run very early in command startup, git_config()
+	 * may not have been called yet and the various "core_*"
+	 * global variables haven't been set.  So look it up
+	 * explicitly.
+	 */
+	git_config_get_bool("core.checksumindex", &do_checksum);
+	if (!do_checksum)
+		return 0;
+
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
 	git_SHA1_Final(sha1, &c);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-03 18:53 [PATCH v4 0/4] read-cache: call verify_hdr() in a background thread git
  2017-04-03 18:53 ` [PATCH v4 1/4] read-cache: core.checksumindex git
@ 2017-04-03 18:53 ` git
  2017-04-03 20:31   ` Ævar Arnfjörð Bjarmason
  2017-04-03 18:53 ` [PATCH v4 3/4] t1450-fsck: test core.checksumindex git
  2017-04-03 18:53 ` [PATCH v4 4/4] p0002-read-cache: " git
  3 siblings, 1 reply; 11+ messages in thread
From: git @ 2017-04-03 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach fsck to override core.checksumindex and always verify
the index checksum when reading the index.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c |  1 +
 cache.h        |  7 +++++++
 read-cache.c   | 20 +++++++++++++-------
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1a5cacc..6913233 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	if (keep_cache_objects) {
+		force_core_checksum_index = 1;
 		read_cache();
 		for (i = 0; i < active_nr; i++) {
 			unsigned int mode;
diff --git a/cache.h b/cache.h
index 80b6372..3ebda0a 100644
--- a/cache.h
+++ b/cache.h
@@ -685,6 +685,13 @@ extern void update_index_if_able(struct index_state *, struct lock_file *);
 extern int hold_locked_index(struct lock_file *, int);
 extern void set_alternate_index_output(const char *);
 
+/*
+ * Override "core.checksumindex" config settings.  Allows commands
+ * like "fsck" to force it without altering on-disk settings in case
+ * routines call die() before it can be reset.
+ */
+extern int force_core_checksum_index;
+
 /* Environment bits from configuration mechanism */
 extern int trust_executable_bit;
 extern int trust_ctime;
diff --git a/read-cache.c b/read-cache.c
index dd64cde..36fdc2a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1371,6 +1371,8 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
+int force_core_checksum_index;
+
 static int verify_hdr(struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
@@ -1384,13 +1386,17 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
 
-	/*
-	 * Since we run very early in command startup, git_config()
-	 * may not have been called yet and the various "core_*"
-	 * global variables haven't been set.  So look it up
-	 * explicitly.
-	 */
-	git_config_get_bool("core.checksumindex", &do_checksum);
+	if (force_core_checksum_index)
+		do_checksum = 1;
+	else {
+		/*
+		 * Since we run very early in command startup, git_config()
+		 * may not have been called yet and the various "core_*"
+		 * global variables haven't been set.  So look it up
+		 * explicitly.
+		 */
+		git_config_get_bool("core.checksumindex", &do_checksum);
+	}
 	if (!do_checksum)
 		return 0;
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/4] t1450-fsck: test core.checksumindex
  2017-04-03 18:53 [PATCH v4 0/4] read-cache: call verify_hdr() in a background thread git
  2017-04-03 18:53 ` [PATCH v4 1/4] read-cache: core.checksumindex git
  2017-04-03 18:53 ` [PATCH v4 2/4] fsck: force core.checksumindex=1 git
@ 2017-04-03 18:53 ` git
  2017-04-03 18:53 ` [PATCH v4 4/4] p0002-read-cache: " git
  3 siblings, 0 replies; 11+ messages in thread
From: git @ 2017-04-03 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Add test for fsck to force verify the index checksum.
Add test to demonstrate that status does ignore the checksum
when reading the index.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/t1450-fsck.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..a33d542 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,31 @@ test_expect_success 'bogus head does not fallback to all heads' '
 	! grep $blob out
 '
 
+test_expect_success 'validate force_core_checksum_index=1' '
+	git fsck --cache
+'
+
+test_expect_success PERL 'detect index file corrupt in fsck' '
+	cp .git/index .git/index.backup &&
+	echo zzzzzzzz >zzzzzzzz &&
+	git add zzzzzzzz &&
+	perl -pi -e "s/zzzzzzzz/yyyyyyyy/" .git/index &&
+	test_must_fail git fsck --cache &&
+	rm .git/index &&
+	mv .git/index.backup .git/index &&
+	rm zzzzzzzz
+'
+
+test_expect_success PERL 'verify status ignores corrupt checksum' '
+	cp .git/index .git/index.backup &&
+	echo zzzzzzzz >zzzzzzzz &&
+	git add zzzzzzzz &&
+	perl -pi -e "s/zzzzzzzz/yyyyyyyy/" .git/index &&
+	git -c core.checksumindex=0 status &&
+	# Status may fix the checksum
+	rm .git/index &&
+	mv .git/index.backup .git/index &&
+	rm zzzzzzzz
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/4] p0002-read-cache: test core.checksumindex
  2017-04-03 18:53 [PATCH v4 0/4] read-cache: call verify_hdr() in a background thread git
                   ` (2 preceding siblings ...)
  2017-04-03 18:53 ` [PATCH v4 3/4] t1450-fsck: test core.checksumindex git
@ 2017-04-03 18:53 ` git
  3 siblings, 0 replies; 11+ messages in thread
From: git @ 2017-04-03 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach t/perf/p0002-read-cache to time read_cache() with
and without the index checksum calculation.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/perf/p0002-read-cache.sh | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/t/perf/p0002-read-cache.sh b/t/perf/p0002-read-cache.sh
index 9180ae9..71feacd 100755
--- a/t/perf/p0002-read-cache.sh
+++ b/t/perf/p0002-read-cache.sh
@@ -7,7 +7,13 @@ test_description="Tests performance of reading the index"
 test_perf_default_repo
 
 count=1000
-test_perf "read_cache/discard_cache $count times" "
+test_perf "read_cache/discard_cache checksum=1 $count times" "
+	git config --local core.checksumindex 1 &&
+	test-read-cache $count
+"
+
+test_perf "read_cache/discard_cache checksum=0 $count times" "
+	git config --local core.checksumindex 0 &&
 	test-read-cache $count
 "
 
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-03 18:53 ` [PATCH v4 2/4] fsck: force core.checksumindex=1 git
@ 2017-04-03 20:31   ` Ævar Arnfjörð Bjarmason
  2017-04-04  2:29     ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-03 20:31 UTC (permalink / raw)
  To: git; +Cc: Git Mailing List, Junio C Hamano, Jeff King, Jeff Hostetler

On Mon, Apr 3, 2017 at 8:53 PM,  <git@jeffhostetler.com> wrote:
> Teach fsck to override core.checksumindex and always verify
> the index checksum when reading the index.

Sorry to only chime in about this at v4.

I think this patch & the documentation you added for
core.checksumindex in 1/4 would be much less confusing if you made
this a on-by-default command-line option like e.g. "full".

With this patch nothing amends the documentation to indicate that the
core.checksumindex is magically overridden when fsck runs, I think
something like this (needs amending to integrate) on top would make
this clearer:

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..19b45b1b6f 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -12,7 +12,7 @@ SYNOPSIS
 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
         [--[no-]full] [--strict] [--verbose] [--lost-found]
         [--[no-]dangling] [--[no-]progress] [--connectivity-only]
-        [--[no-]name-objects] [<object>*]
+        [--[no-]name-objects] [--[no-]checksum-index] [<object>*]

 DESCRIPTION
 -----------
@@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
namespace, and all reflogs
        object pools.  This is now default; you can turn it off
        with --no-full.

+--[no-]checksum-index:
+       Validate the checksum at the end of the index file, on by
+       default, locally overrides any "core.checksumIndex" setting
+       unless negated. See linkgit:git-config[1].
+
 --connectivity-only::
        Check only the connectivity of tags, commits and tree objects. By
        avoiding to unpack blobs, this speeds up the operation, at the
diff --git a/builtin/fsck.c b/builtin/fsck.c
index f76e4163ab..8fe8ec1775 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -24,6 +24,7 @@ static int show_tags;
 static int show_unreachable;
 static int include_reflogs = 1;
 static int check_full = 1;
+static int fsck_opt_checksum_index = 1;
 static int connectivity_only;
 static int check_strict;
 static int keep_cache_objects;
@@ -656,6 +657,8 @@ static struct option fsck_opts[] = {
        OPT_BOOL(0, "cache", &keep_cache_objects, N_("make index
objects head nodes")),
        OPT_BOOL(0, "reflogs", &include_reflogs, N_("make reflogs head
nodes (default)")),
        OPT_BOOL(0, "full", &check_full, N_("also consider packs and
alternate objects")),
+       OPT_BOOL(0, "checksum-index", &fsck_opt_checksum_index,
+                N_("validate the checksum at the end of the index file")),
        OPT_BOOL(0, "connectivity-only", &connectivity_only, N_("check
only connectivity")),
        OPT_BOOL(0, "strict", &check_strict, N_("enable more strict checking")),
        OPT_BOOL(0, "lost-found", &write_lost_and_found,
@@ -681,6 +684,9 @@ int cmd_fsck(int argc, const char **argv, const
char *prefix)
        if (check_strict)
                fsck_obj_options.strict = 1;

+       if (fsck_opt_checksum_index)
+               force_core_checksum_index = 1;
+
        if (show_progress == -1)
                show_progress = isatty(2);
        if (verbose)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-03 20:31   ` Ævar Arnfjörð Bjarmason
@ 2017-04-04  2:29     ` Jeff King
  2017-04-04  8:23       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2017-04-04  2:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Git Mailing List, Junio C Hamano, Jeff Hostetler

On Mon, Apr 03, 2017 at 10:31:03PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Apr 3, 2017 at 8:53 PM,  <git@jeffhostetler.com> wrote:
> > Teach fsck to override core.checksumindex and always verify
> > the index checksum when reading the index.
> 
> Sorry to only chime in about this at v4.
> 
> I think this patch & the documentation you added for
> core.checksumindex in 1/4 would be much less confusing if you made
> this a on-by-default command-line option like e.g. "full".
> 
> With this patch nothing amends the documentation to indicate that the
> core.checksumindex is magically overridden when fsck runs, I think
> something like this (needs amending to integrate) on top would make
> this clearer:

I think that is the wrong direction to reduce confusion. We don't need
more options to twiddle this flag, we need fewer. For instance, in your
proposed documentation:

> @@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
> namespace, and all reflogs
>         object pools.  This is now default; you can turn it off
>         with --no-full.
> 
> +--[no-]checksum-index:
> +       Validate the checksum at the end of the index file, on by
> +       default, locally overrides any "core.checksumIndex" setting
> +       unless negated. See linkgit:git-config[1].

That tells us _what_ it does, but I'm left scratching my head with
"why".

I don't think there is any reason you would want fsck not to compute
that checksum (just like there is no option to ask fsck not to check
pack sha1 trailers).

I would go so far as to say that the config option itself is unnecessary
in this iteration of the series. I only asked for it so that we could
test the verification code paths (both for correctness and performance).
But if fsck can exercise the code path, we can check correctness that
way. And for performance, it's probably enough to test two separate
builds (which Jeff has already done).

Junio also asked for the usual "add a config, and then later we'll flip
the default" procedure. But IMHO that isn't necessary here. Nobody
should ever care about this flag. It was largely useless to check it on
every read in the first place. And if you suspect there's corruption in
your repository, you should run "git fsck".

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-04  2:29     ` Jeff King
@ 2017-04-04  8:23       ` Ævar Arnfjörð Bjarmason
  2017-04-04  8:27         ` Jeff King
  2017-04-04 13:13         ` Jeff Hostetler
  0 siblings, 2 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-04-04  8:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Git Mailing List, Junio C Hamano, Jeff Hostetler

On Tue, Apr 4, 2017 at 4:29 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Apr 03, 2017 at 10:31:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Apr 3, 2017 at 8:53 PM,  <git@jeffhostetler.com> wrote:
>> > Teach fsck to override core.checksumindex and always verify
>> > the index checksum when reading the index.
>>
>> Sorry to only chime in about this at v4.
>>
>> I think this patch & the documentation you added for
>> core.checksumindex in 1/4 would be much less confusing if you made
>> this a on-by-default command-line option like e.g. "full".
>>
>> With this patch nothing amends the documentation to indicate that the
>> core.checksumindex is magically overridden when fsck runs, I think
>> something like this (needs amending to integrate) on top would make
>> this clearer:
>
> I think that is the wrong direction to reduce confusion. We don't need
> more options to twiddle this flag, we need fewer. For instance, in your
> proposed documentation:
>
>> @@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
>> namespace, and all reflogs
>>         object pools.  This is now default; you can turn it off
>>         with --no-full.
>>
>> +--[no-]checksum-index:
>> +       Validate the checksum at the end of the index file, on by
>> +       default, locally overrides any "core.checksumIndex" setting
>> +       unless negated. See linkgit:git-config[1].
>
> That tells us _what_ it does, but I'm left scratching my head with
> "why".
>
> I don't think there is any reason you would want fsck not to compute
> that checksum (just like there is no option to ask fsck not to check
> pack sha1 trailers).
>
> I would go so far as to say that the config option itself is unnecessary
> in this iteration of the series. I only asked for it so that we could
> test the verification code paths (both for correctness and performance).
> But if fsck can exercise the code path, we can check correctness that
> way. And for performance, it's probably enough to test two separate
> builds (which Jeff has already done).
>
> Junio also asked for the usual "add a config, and then later we'll flip
> the default" procedure. But IMHO that isn't necessary here. Nobody
> should ever care about this flag. It was largely useless to check it on
> every read in the first place. And if you suspect there's corruption in
> your repository, you should run "git fsck".

The part that confused my & I found unintuitive is that there's a new
core.WHATEVER config that'll get silently overridden by a specific
command, git-fsck.

Nothing else I can think of in core.* works like this, i.e. it's a
namespace for "applies to all of git", core.editor, core.ignoreCase
etc.

Having git-fsck have a command-line option that's on by default as I
suggested is one way to get out of that confusion. It makes it a
special case of a CLI option overriding some config.

But yeah, another way to resolve this is to get rid of the config
option altogether, or document in git-config.txt that
core.checksumIndex is obeyed by everything except git-fsck.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-04  8:23       ` Ævar Arnfjörð Bjarmason
@ 2017-04-04  8:27         ` Jeff King
  2017-04-04 13:13         ` Jeff Hostetler
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff King @ 2017-04-04  8:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Git Mailing List, Junio C Hamano, Jeff Hostetler

On Tue, Apr 04, 2017 at 10:23:52AM +0200, Ævar Arnfjörð Bjarmason wrote:

> The part that confused my & I found unintuitive is that there's a new
> core.WHATEVER config that'll get silently overridden by a specific
> command, git-fsck.
> 
> Nothing else I can think of in core.* works like this, i.e. it's a
> namespace for "applies to all of git", core.editor, core.ignoreCase
> etc.
> 
> Having git-fsck have a command-line option that's on by default as I
> suggested is one way to get out of that confusion. It makes it a
> special case of a CLI option overriding some config.

Yeah, I do agree your suggestion makes it slightly less confusing.

I mostly just think we can avoid the situation altogether, which IMHO is
preferable.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-04  8:23       ` Ævar Arnfjörð Bjarmason
  2017-04-04  8:27         ` Jeff King
@ 2017-04-04 13:13         ` Jeff Hostetler
  2017-04-04 20:18           ` Jeff King
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Hostetler @ 2017-04-04 13:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: Git Mailing List, Junio C Hamano, Jeff Hostetler



On 4/4/2017 4:23 AM, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Apr 4, 2017 at 4:29 AM, Jeff King <peff@peff.net> wrote:
>> On Mon, Apr 03, 2017 at 10:31:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> On Mon, Apr 3, 2017 at 8:53 PM,  <git@jeffhostetler.com> wrote:
>>>> Teach fsck to override core.checksumindex and always verify
>>>> the index checksum when reading the index.
>>>
>>> Sorry to only chime in about this at v4.
>>>
>>> I think this patch & the documentation you added for
>>> core.checksumindex in 1/4 would be much less confusing if you made
>>> this a on-by-default command-line option like e.g. "full".
>>>
>>> With this patch nothing amends the documentation to indicate that the
>>> core.checksumindex is magically overridden when fsck runs, I think
>>> something like this (needs amending to integrate) on top would make
>>> this clearer:
>>
>> I think that is the wrong direction to reduce confusion. We don't need
>> more options to twiddle this flag, we need fewer. For instance, in your
>> proposed documentation:
>>
>>> @@ -61,6 +61,11 @@ index file, all SHA-1 references in `refs`
>>> namespace, and all reflogs
>>>         object pools.  This is now default; you can turn it off
>>>         with --no-full.
>>>
>>> +--[no-]checksum-index:
>>> +       Validate the checksum at the end of the index file, on by
>>> +       default, locally overrides any "core.checksumIndex" setting
>>> +       unless negated. See linkgit:git-config[1].
>>
>> That tells us _what_ it does, but I'm left scratching my head with
>> "why".
>>
>> I don't think there is any reason you would want fsck not to compute
>> that checksum (just like there is no option to ask fsck not to check
>> pack sha1 trailers).
>>
>> I would go so far as to say that the config option itself is unnecessary
>> in this iteration of the series. I only asked for it so that we could
>> test the verification code paths (both for correctness and performance).
>> But if fsck can exercise the code path, we can check correctness that
>> way. And for performance, it's probably enough to test two separate
>> builds (which Jeff has already done).
>>
>> Junio also asked for the usual "add a config, and then later we'll flip
>> the default" procedure. But IMHO that isn't necessary here. Nobody
>> should ever care about this flag. It was largely useless to check it on
>> every read in the first place. And if you suspect there's corruption in
>> your repository, you should run "git fsck".
>
> The part that confused my & I found unintuitive is that there's a new
> core.WHATEVER config that'll get silently overridden by a specific
> command, git-fsck.
>
> Nothing else I can think of in core.* works like this, i.e. it's a
> namespace for "applies to all of git", core.editor, core.ignoreCase
> etc.

My "force_core_checksum_index" global override was a bit of a hack.
I looked at having fsck explicitly set the "core.checksumindex" config
value (which would write to the disk in the code paths I followed)
before it loaded the index, but if we ever find an invalid checksum,
read_cache() would call die() and fsck would not have a chance to set
it back in the user's config.  So I introduced the global override.

>
> Having git-fsck have a command-line option that's on by default as I
> suggested is one way to get out of that confusion. It makes it a
> special case of a CLI option overriding some config.

I looked at doing that, but thought it would be overkill since
no one is likely to care about turning it off -- or rather, fsck
should always do it whenever it reads the index.

>
> But yeah, another way to resolve this is to get rid of the config
> option altogether, or document in git-config.txt that
> core.checksumIndex is obeyed by everything except git-fsck.
>

If there's no objections then, I'll just remove the config
setting and keep the force_ global for fsck.

Jeff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/4] fsck: force core.checksumindex=1
  2017-04-04 13:13         ` Jeff Hostetler
@ 2017-04-04 20:18           ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2017-04-04 20:18 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Junio C Hamano, Jeff Hostetler

On Tue, Apr 04, 2017 at 09:13:33AM -0400, Jeff Hostetler wrote:

> > But yeah, another way to resolve this is to get rid of the config
> > option altogether, or document in git-config.txt that
> > core.checksumIndex is obeyed by everything except git-fsck.
> 
> If there's no objections then, I'll just remove the config
> setting and keep the force_ global for fsck.

That makes the most sense to me. It doesn't fill the "remain careful by
default, and we can flip the config later" criterion that Junio set out.
But I really just don't think "careful" here is buying us anything.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-04 20:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03 18:53 [PATCH v4 0/4] read-cache: call verify_hdr() in a background thread git
2017-04-03 18:53 ` [PATCH v4 1/4] read-cache: core.checksumindex git
2017-04-03 18:53 ` [PATCH v4 2/4] fsck: force core.checksumindex=1 git
2017-04-03 20:31   ` Ævar Arnfjörð Bjarmason
2017-04-04  2:29     ` Jeff King
2017-04-04  8:23       ` Ævar Arnfjörð Bjarmason
2017-04-04  8:27         ` Jeff King
2017-04-04 13:13         ` Jeff Hostetler
2017-04-04 20:18           ` Jeff King
2017-04-03 18:53 ` [PATCH v4 3/4] t1450-fsck: test core.checksumindex git
2017-04-03 18:53 ` [PATCH v4 4/4] p0002-read-cache: " git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.