git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support
@ 2018-08-27 19:43 Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

On Mon, Aug 27 2018, René Scharfe wrote:

> Am 27.08.2018 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>> 
>> On Sat, Aug 25 2018, René Scharfe wrote:
>> [...]
>> Now, I like yours much better. I'm just saying that currently the
>> patch/commit message combo is confusing about *what* it's
>> doing. I.e. let's not mix up the correction of docs that were always
>> wrong with a non-change in the user facing implementation, and some
>> internal optimization all in one patch.
>
> Now you have me confused.  Unsorted lists were always accepted, but the
> documentation asked for a sorted one anyway, probably to avoid sorting
> it with every use.  Switching the underlying data structure makes that a
> moot point -- sortedness is no longer a concern at all -- not in the
> code, and not for users.
>
> Inviting users to run the array implementation with unsorted lists is
> not incorrect, but it also doesn't help anyone.  We could clarify that
> sorted lists are preferred or recommended instead of required.  I don't
> see value in perfecting the documentation of a quirk just before
> removing it, though.

I think it's easier to clarify step-by-step with code, so here's an my
version of a v3 which implements what I was suggesting, but then of
course while doing it I found other stuff to fix, changes since your
v2:

René Scharfe (2):
  fsck: use strbuf_getline() to read skiplist file

None to the code.

I changed some docs I add in earlier patches to now describe behavior
in a past tense (and the s/sorted // change is earlier), and to change
the docs to say that sorting the list on-disk is now pointless for
optimization purposes, but did something in the past.

  fsck: use oidset for skiplist

There is now a test for the bug you were fixing in your 1/2.

Ævar Arnfjörð Bjarmason (5):
  fsck tests: setup of bogus commit object

Fixing some test redundancies while I'm at it.

  fsck tests: add a test for no skipList input

We didn't test the most basic skipList invocation, fixed while I was
at it...

  fsck: document and test sorted skipList input

Test that sorted & unsorted skipList input, and document that in the
past we said this was required, but it never was, but what (ever so
slight) optimization this gives us.

  fsck: document and test commented & empty line skipList input

We don't support comments or empty lines. Add tests for this not
working, and explicitly document that it doesn't work (I for one tried
it).

  fsck: support comments & empty lines in skipList

Ignoring comments and empty lines is very useful for a file format
that might be human-edited (I maintain one such file). Support that,
and document & test for it.

 Documentation/config.txt        | 22 ++++++++++----
 fsck.c                          | 48 ++++++++++-------------------
 fsck.h                          |  8 +++--
 t/t5504-fetch-receive-strict.sh | 53 ++++++++++++++++++++++++++++++---
 4 files changed, 86 insertions(+), 45 deletions(-)

-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/7] fsck tests: setup of bogus commit object
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Several fsck tests used the exact same git-hash-object output, but had
copy/pasted that part of the setup code. Let's instead do that setup
once and use it in subsequent tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5504-fetch-receive-strict.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 62f3569891..6d268f3327 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -133,6 +133,10 @@ committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
 This commit object intentionally broken
 EOF
 
+test_expect_success 'setup bogus commit' '
+	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
@@ -142,7 +146,6 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 '
 
 test_expect_success 'push with receive.fsck.skipList' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
 	git init dst &&
@@ -169,7 +172,6 @@ test_expect_success 'push with receive.fsck.skipList' '
 '
 
 test_expect_success 'fetch with fetch.fsck.skipList' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	refspec=refs/heads/bogus:refs/heads/bogus &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
@@ -204,7 +206,6 @@ test_expect_success 'fsck.<unknownmsg-id> dies' '
 '
 
 test_expect_success 'push with receive.fsck.missingEmail=warn' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
 	git init dst &&
@@ -232,7 +233,6 @@ test_expect_success 'push with receive.fsck.missingEmail=warn' '
 '
 
 test_expect_success 'fetch with fetch.fsck.missingEmail=warn' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	refspec=refs/heads/bogus:refs/heads/bogus &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 2/7] fsck tests: add a test for no skipList input
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

The recent 65a836fa6b ("fsck: add stress tests for fsck.skipList",
2018-07-27) added various stress tests for odd invocations of
fsck.skipList, but didn't tests for some very simple ones, such as
asserting that providing to skipList with a bad commit causes fsck to
exit with a non-zero exit code. Add such a test.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5504-fetch-receive-strict.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 6d268f3327..cbae31f330 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -137,6 +137,11 @@ test_expect_success 'setup bogus commit' '
 	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
 '
 
+test_expect_success 'fsck with no skipList input' '
+	test_must_fail git fsck 2>err &&
+	test_i18ngrep "missingEmail" err
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 3/7] fsck: document and test sorted skipList input
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Ever since the skipList support was first added in cd94c6f91 ("fsck:
git receive-pack: support excluding objects from fsck'ing",
2015-06-22) the documentation for the format has that the file is a
sorted list of object names.

Thus, anyone using the feature would have thought the list needed to
be sorted. E.g. I recently in conjunction with my fetch.fsck.*
implementation in 1362df0d41 ("fetch: implement fetch.fsck.*",
2018-07-27) wrote some code to ship a skipList, and went out of my way
to sort it.

Doing so seems intuitive, since it contains fixed-width records, and
has no support for comments, so one might expect it to be binary
searched in-place on-disk.

However, as documented here this was never a requirement, so let's
change the documentation. Since this is a file format change let's
also document what was said about this in the past, so e.g. someone
like myself reading the new docs can see this never needed to be
sorted ("why do I have all this code to sort this thing...").

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 10 +++++++++-
 t/t5504-fetch-receive-strict.sh | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..9359fb534e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1708,7 +1708,7 @@ doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
 
 fsck.skipList::
-	The path to a sorted list of object names (i.e. one SHA-1 per
+	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
 	be ignored. This feature is useful when an established project
 	should be accepted despite early commits containing errors that
@@ -1723,6 +1723,14 @@ Unlike variables like `color.ui` and `core.editor` the
 fall back on the `fsck.skipList` configuration if they aren't set. To
 uniformly configure the same fsck settings in different circumstances
 all three of them they must all set to the same values.
++
+Older versions of Git (before 2.20) documented that the object names
+list should be sorted. This was never a requirement, the object names
+can appear in any order, but when reading the list we track whether
+the list is sorted for the purposes of an internal binary search
+implementation, which can save itself some work with an already sorted
+list.  Unless you have a humongous list there's no reason to go out of
+your way to pre-sort the list.
 
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index cbae31f330..fa56052f0f 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -142,6 +142,25 @@ test_expect_success 'fsck with no skipList input' '
 	test_i18ngrep "missingEmail" err
 '
 
+test_expect_success 'setup sorted and unsorted skipLists' '
+	cat >SKIP.unsorted <<-EOF &&
+	0000000000000000000000000000000000000004
+	0000000000000000000000000000000000000002
+	$commit
+	0000000000000000000000000000000000000001
+	0000000000000000000000000000000000000003
+	EOF
+	sort SKIP.unsorted >SKIP.sorted
+'
+
+test_expect_success 'fsck with sorted skipList' '
+	git -c fsck.skipList=SKIP.sorted fsck
+'
+
+test_expect_success 'fsck with unsorted skipList' '
+	git -c fsck.skipList=SKIP.unsorted fsck
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 4/7] fsck: document and test commented & empty line skipList input
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

There is currently no comment syntax for the fsck.skipList, this isn't
really by design, and it would be nice to have support for comments.

Document that this doesn't work, and test for how this errors
out. These tests reveal a current bug, if there's invalid input the
output will emit some of the next line, and then go into uninitialized
memory. This is fixed in a subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 11 +++++++----
 t/t5504-fetch-receive-strict.sh | 21 +++++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9359fb534e..a8dfafa61d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,10 +1710,13 @@ will only cause git to warn.
 fsck.skipList::
 	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
-	be ignored. This feature is useful when an established project
-	should be accepted despite early commits containing errors that
-	can be safely ignored such as invalid committer email addresses.
-	Note: corrupt objects cannot be skipped with this setting.
+	be ignored. Comments ('#') and empty lines are not supported, and
+	will error out.
++
+This feature is useful when an established project should be accepted
+despite early commits containing errors that can be safely ignored
+such as invalid committer email addresses.  Note: corrupt objects
+cannot be skipped with this setting.
 +
 Like `fsck.<msg-id>` this variable has corresponding
 `receive.fsck.skipList` and `fetch.fsck.skipList` variants.
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index fa56052f0f..38aaf3b928 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,6 +169,27 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 	test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
+test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
+	cat >SKIP.with-comment <<-EOF &&
+	# Some bad commit
+	0000000000000000000000000000000000000001
+	EOF
+	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
+	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
+	cat >SKIP.with-empty-line <<-EOF &&
+	0000000000000000000000000000000000000001
+
+	0000000000000000000000000000000000000002
+	EOF
+	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
+	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+'
+
+test_expect_failure 'fsck no garbage output from comments & empty lines errors' '
+	test_line_count = 1 err-with-comment &&
+	test_line_count = 1 err-with-empty-line
+'
+
 test_expect_success 'push with receive.fsck.skipList' '
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

buffer is unlikely to contain a NUL character, so printing its contents
using %s in a die() format is unsafe (detected with ASan).

Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c                          | 25 ++++++++++++-------------
 t/t5504-fetch-receive-strict.sh |  2 +-
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index a0cee0be59..972a26b9ba 100644
--- a/fsck.c
+++ b/fsck.c
@@ -183,8 +183,9 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
 	static struct oid_array skiplist = OID_ARRAY_INIT;
-	int sorted, fd;
-	char buffer[GIT_MAX_HEXSZ + 1];
+	int sorted;
+	FILE *fp;
+	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 
 	if (options->skiplist)
@@ -194,25 +195,23 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		options->skiplist = &skiplist;
 	}
 
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
+	fp = fopen(path, "r");
+	if (!fp)
 		die("Could not open skip list: %s", path);
-	for (;;) {
+	while (!strbuf_getline(&sb, fp)) {
 		const char *p;
-		int result = read_in_full(fd, buffer, sizeof(buffer));
-		if (result < 0)
-			die_errno("Could not read '%s'", path);
-		if (!result)
-			break;
-		if (parse_oid_hex(buffer, &oid, &p) || *p != '\n')
-			die("Invalid SHA-1: %s", buffer);
+		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
+			die("Invalid SHA-1: %s", sb.buf);
 		oid_array_append(&skiplist, &oid);
 		if (sorted && skiplist.nr > 1 &&
 				oidcmp(&skiplist.oid[skiplist.nr - 2],
 				       &oid) > 0)
 			sorted = 0;
 	}
-	close(fd);
+	if (ferror(fp))
+		die_errno("Could not read '%s'", path);
+	fclose(fp);
+	strbuf_release(&sb);
 
 	if (sorted)
 		skiplist.sorted = 1;
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 38aaf3b928..c7224db3bb 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -185,7 +185,7 @@ test_expect_success 'fsck with invalid or bogus skipList input (comments & empty
 	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
 '
 
-test_expect_failure 'fsck no garbage output from comments & empty lines errors' '
+test_expect_success 'fsck no garbage output from comments & empty lines errors' '
 	test_line_count = 1 err-with-comment &&
 	test_line_count = 1 err-with-empty-line
 '
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 6/7] fsck: use oidset for skiplist
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
  2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

Object IDs to skip are stored in a shared static oid_array.  Lookups do
a binary search on the sorted array.  The code checks if the object IDs
are already in the correct order while loading and skips sorting in that
case.  Lookups are done before reporting a (non-fatal) corruption and
before checking .gitmodules files.

Simplify the code by using an oidset instead.  Memory usage is a bit
higher, but we don't need to worry about any sort order anymore.  Embed
the oidset into struct fsck_options to make its ownership clear (no
hidden sharing) and avoid unnecessary pointer indirection.

Performance on repositories with a low number of reported issues and
.gitmodules files (i.e. the usual case) won't be affected much.  The
oidset should be a bit quicker with higher numbers of bad objects in
the skipList.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt | 11 ++++++-----
 fsck.c                   | 23 ++---------------------
 fsck.h                   |  8 +++++---
 3 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a8dfafa61d..3d0556e85d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1729,11 +1729,12 @@ all three of them they must all set to the same values.
 +
 Older versions of Git (before 2.20) documented that the object names
 list should be sorted. This was never a requirement, the object names
-can appear in any order, but when reading the list we track whether
-the list is sorted for the purposes of an internal binary search
-implementation, which can save itself some work with an already sorted
-list.  Unless you have a humongous list there's no reason to go out of
-your way to pre-sort the list.
+could appear in any order, but when reading the list we tracked whether
+the list was sorted for the purposes of an internal binary search
+implementation, which could save itself some work with an already sorted
+list. Unless you had a humongous list there was no reason to go out of
+your way to pre-sort the list. After Git version 2.20 a hash implementation
+is used instead, so there's now no reason to pre-sort the list.
 
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
diff --git a/fsck.c b/fsck.c
index 972a26b9ba..4c643f1d40 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,7 +10,6 @@
 #include "fsck.h"
 #include "refs.h"
 #include "utf8.h"
-#include "sha1-array.h"
 #include "decorate.h"
 #include "oidset.h"
 #include "packfile.h"
@@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
-	static struct oid_array skiplist = OID_ARRAY_INIT;
-	int sorted;
 	FILE *fp;
 	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 
-	if (options->skiplist)
-		sorted = options->skiplist->sorted;
-	else {
-		sorted = 1;
-		options->skiplist = &skiplist;
-	}
-
 	fp = fopen(path, "r");
 	if (!fp)
 		die("Could not open skip list: %s", path);
@@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		const char *p;
 		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
 			die("Invalid SHA-1: %s", sb.buf);
-		oid_array_append(&skiplist, &oid);
-		if (sorted && skiplist.nr > 1 &&
-				oidcmp(&skiplist.oid[skiplist.nr - 2],
-				       &oid) > 0)
-			sorted = 0;
+		oidset_insert(&options->skiplist, &oid);
 	}
 	if (ferror(fp))
 		die_errno("Could not read '%s'", path);
 	fclose(fp);
 	strbuf_release(&sb);
-
-	if (sorted)
-		skiplist.sorted = 1;
 }
 
 static int parse_msg_type(const char *str)
@@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
 
 static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
 {
-	if (opts && opts->skiplist && obj)
-		return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
-	return 0;
+	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
 }
 
 __attribute__((format (printf, 4, 5)))
diff --git a/fsck.h b/fsck.h
index 0c7e8c9428..b95595ae5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -1,6 +1,8 @@
 #ifndef GIT_FSCK_H
 #define GIT_FSCK_H
 
+#include "oidset.h"
+
 #define FSCK_ERROR 1
 #define FSCK_WARN 2
 #define FSCK_IGNORE 3
@@ -35,12 +37,12 @@ struct fsck_options {
 	fsck_error error_func;
 	unsigned strict:1;
 	int *msg_type;
-	struct oid_array *skiplist;
+	struct oidset skiplist;
 	struct decoration *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 7/7] fsck: support comments & empty lines in skipList
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:43 ` Ævar Arnfjörð Bjarmason
  2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).

Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 4 ++--
 fsck.c                          | 2 ++
 t/t5504-fetch-receive-strict.sh | 6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3d0556e85d..e6f95a7fb2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,8 +1710,8 @@ will only cause git to warn.
 fsck.skipList::
 	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
-	be ignored. Comments ('#') and empty lines are not supported, and
-	will error out.
+	be ignored. On versions of Git 2.20 and later comments ('#') and empty
+	lines are ignored, but will error out on older versions.
 +
 This feature is useful when an established project should be accepted
 despite early commits containing errors that can be safely ignored
diff --git a/fsck.c b/fsck.c
index 4c643f1d40..589548308a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -190,6 +190,8 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		die("Could not open skip list: %s", path);
 	while (!strbuf_getline(&sb, fp)) {
 		const char *p;
+		if (!strcmp(sb.buf, "") || starts_with(sb.buf, "#"))
+			continue;
 		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
 			die("Invalid SHA-1: %s", sb.buf);
 		oidset_insert(&options->skiplist, &oid);
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index c7224db3bb..a1bac164d1 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,20 +169,20 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 	test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
-test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
+test_expect_success 'fsck with other accepted skipList input (comments & empty lines)' '
 	cat >SKIP.with-comment <<-EOF &&
 	# Some bad commit
 	0000000000000000000000000000000000000001
 	EOF
 	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
-	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
+	test_i18ngrep "missingEmail" err-with-comment &&
 	cat >SKIP.with-empty-line <<-EOF &&
 	0000000000000000000000000000000000000001
 
 	0000000000000000000000000000000000000002
 	EOF
 	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
-	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+	test_i18ngrep "missingEmail" err-with-empty-line
 '
 
 test_expect_success 'fsck no garbage output from comments & empty lines errors' '
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support
  2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
@ 2018-08-27 19:46 ` Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 19:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King


On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Aug 27 2018, René Scharfe wrote:
>
>> Am 27.08.2018 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Sat, Aug 25 2018, René Scharfe wrote:
>>> [...]
>>> Now, I like yours much better. I'm just saying that currently the
>>> patch/commit message combo is confusing about *what* it's
>>> doing. I.e. let's not mix up the correction of docs that were always
>>> wrong with a non-change in the user facing implementation, and some
>>> internal optimization all in one patch.
>>
>> Now you have me confused.  Unsorted lists were always accepted, but the
>> documentation asked for a sorted one anyway, probably to avoid sorting
>> it with every use.  Switching the underlying data structure makes that a
>> moot point -- sortedness is no longer a concern at all -- not in the
>> code, and not for users.
>>
>> Inviting users to run the array implementation with unsorted lists is
>> not incorrect, but it also doesn't help anyone.  We could clarify that
>> sorted lists are preferred or recommended instead of required.  I don't
>> see value in perfecting the documentation of a quirk just before
>> removing it, though.
>
> I think it's easier to clarify step-by-step with code, so here's an my
> version of a v3 which implements what I was suggesting, but then of
> course while doing it I found other stuff to fix, changes since your
> v2:

Sorry for breaking threading, forgot In-Reply-To, which should be
https://public-inbox.org/git/c7cbc289-d86e-09dc-bdb3-b5496cf0b7ce@web.de/

> René Scharfe (2):
>   fsck: use strbuf_getline() to read skiplist file
>
> None to the code.
>
> I changed some docs I add in earlier patches to now describe behavior
> in a past tense (and the s/sorted // change is earlier), and to change
> the docs to say that sorting the list on-disk is now pointless for
> optimization purposes, but did something in the past.
>
>   fsck: use oidset for skiplist
>
> There is now a test for the bug you were fixing in your 1/2.
>
> Ævar Arnfjörð Bjarmason (5):
>   fsck tests: setup of bogus commit object
>
> Fixing some test redundancies while I'm at it.
>
>   fsck tests: add a test for no skipList input
>
> We didn't test the most basic skipList invocation, fixed while I was
> at it...
>
>   fsck: document and test sorted skipList input
>
> Test that sorted & unsorted skipList input, and document that in the
> past we said this was required, but it never was, but what (ever so
> slight) optimization this gives us.
>
>   fsck: document and test commented & empty line skipList input
>
> We don't support comments or empty lines. Add tests for this not
> working, and explicitly document that it doesn't work (I for one tried
> it).
>
>   fsck: support comments & empty lines in skipList
>
> Ignoring comments and empty lines is very useful for a file format
> that might be human-edited (I maintain one such file). Support that,
> and document & test for it.
>
>  Documentation/config.txt        | 22 ++++++++++----
>  fsck.c                          | 48 ++++++++++-------------------
>  fsck.h                          |  8 +++--
>  t/t5504-fetch-receive-strict.sh | 53 ++++++++++++++++++++++++++++++---
>  4 files changed, 86 insertions(+), 45 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 6/7] fsck: use oidset for skiplist
  2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
@ 2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                       ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-27 20:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King


On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> From: René Scharfe <l.s.r@web.de>
>
> Object IDs to skip are stored in a shared static oid_array.  Lookups do
> a binary search on the sorted array.  The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case.  Lookups are done before reporting a (non-fatal) corruption and
> before checking .gitmodules files.
>
> Simplify the code by using an oidset instead.  Memory usage is a bit
> higher, but we don't need to worry about any sort order anymore.  Embed
> the oidset into struct fsck_options to make its ownership clear (no
> hidden sharing) and avoid unnecessary pointer indirection.
>
> Performance on repositories with a low number of reported issues and
> .gitmodules files (i.e. the usual case) won't be affected much.  The
> oidset should be a bit quicker with higher numbers of bad objects in
> the skipList.

I didn't change this commit message at all, but FWIW this still has me
somewhat confused. What is the interaction with .gitmodules being
described here? You also mentioned it in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
(but I don't get that either)

Does that just mean that when cloning with --recursive with
transfer.fsckObjects=true we'll re-read the file for each "clone"
invocation, both for the main project and everything listed in
.gitmodules?

If so, I think something like this commit message would be clearer:

    fsck: use oidset instead of oid_array for skipList

    Change the implementation of the skipList feature to use oidset
    instead of oid_array to store SHA-1s for later lookup.

    This list is parsed once on startup by fsck, fetch-pack or
    receive-pack depending on the *.skipList config in use, so for fetch
    it's currently re-parsed for each submodule fetch.

    Memory usage is a bit higher, but we don't need to keep track of the
    sort order anymore. Embed the oidset into struct fsck_options to
    make its ownership clear (no hidden sharing) and avoid unnecessary
    pointer indirection.

Then let's just attach the test program you wrote in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
and note the results and let them speak for themselves.

I can do all that if that seems fine to you. I also notice that the test
case only sets up a case where all the items on the skip list are bad
commits in the repo, it would be interesting to test with a few needles
in a large haystack (I can modify it to do that...).

> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  Documentation/config.txt | 11 ++++++-----
>  fsck.c                   | 23 ++---------------------
>  fsck.h                   |  8 +++++---
>  3 files changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a8dfafa61d..3d0556e85d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1729,11 +1729,12 @@ all three of them they must all set to the same values.
>  +
>  Older versions of Git (before 2.20) documented that the object names
>  list should be sorted. This was never a requirement, the object names
> -can appear in any order, but when reading the list we track whether
> -the list is sorted for the purposes of an internal binary search
> -implementation, which can save itself some work with an already sorted
> -list.  Unless you have a humongous list there's no reason to go out of
> -your way to pre-sort the list.
> +could appear in any order, but when reading the list we tracked whether
> +the list was sorted for the purposes of an internal binary search
> +implementation, which could save itself some work with an already sorted
> +list. Unless you had a humongous list there was no reason to go out of
> +your way to pre-sort the list. After Git version 2.20 a hash implementation
> +is used instead, so there's now no reason to pre-sort the list.
>
>  gc.aggressiveDepth::
>  	The depth parameter used in the delta compression
> diff --git a/fsck.c b/fsck.c
> index 972a26b9ba..4c643f1d40 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -10,7 +10,6 @@
>  #include "fsck.h"
>  #include "refs.h"
>  #include "utf8.h"
> -#include "sha1-array.h"
>  #include "decorate.h"
>  #include "oidset.h"
>  #include "packfile.h"
> @@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
>
>  static void init_skiplist(struct fsck_options *options, const char *path)
>  {
> -	static struct oid_array skiplist = OID_ARRAY_INIT;
> -	int sorted;
>  	FILE *fp;
>  	struct strbuf sb = STRBUF_INIT;
>  	struct object_id oid;
>
> -	if (options->skiplist)
> -		sorted = options->skiplist->sorted;
> -	else {
> -		sorted = 1;
> -		options->skiplist = &skiplist;
> -	}
> -
>  	fp = fopen(path, "r");
>  	if (!fp)
>  		die("Could not open skip list: %s", path);
> @@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
>  		const char *p;
>  		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>  			die("Invalid SHA-1: %s", sb.buf);
> -		oid_array_append(&skiplist, &oid);
> -		if (sorted && skiplist.nr > 1 &&
> -				oidcmp(&skiplist.oid[skiplist.nr - 2],
> -				       &oid) > 0)
> -			sorted = 0;
> +		oidset_insert(&options->skiplist, &oid);
>  	}
>  	if (ferror(fp))
>  		die_errno("Could not read '%s'", path);
>  	fclose(fp);
>  	strbuf_release(&sb);
> -
> -	if (sorted)
> -		skiplist.sorted = 1;
>  }
>
>  static int parse_msg_type(const char *str)
> @@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
>
>  static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
>  {
> -	if (opts && opts->skiplist && obj)
> -		return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
> -	return 0;
> +	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
>  }
>
>  __attribute__((format (printf, 4, 5)))
> diff --git a/fsck.h b/fsck.h
> index 0c7e8c9428..b95595ae5f 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -1,6 +1,8 @@
>  #ifndef GIT_FSCK_H
>  #define GIT_FSCK_H
>
> +#include "oidset.h"
> +
>  #define FSCK_ERROR 1
>  #define FSCK_WARN 2
>  #define FSCK_IGNORE 3
> @@ -35,12 +37,12 @@ struct fsck_options {
>  	fsck_error error_func;
>  	unsigned strict:1;
>  	int *msg_type;
> -	struct oid_array *skiplist;
> +	struct oidset skiplist;
>  	struct decoration *object_names;
>  };
>
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
>
>  /* descend in all linked child objects
>   * the return value is:

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:
> Does that just mean that when cloning with --recursive with
> transfer.fsckObjects=true we'll re-read the file for each "clone"
> invocation, both for the main project and everything listed in
> .gitmodules?
>
> If so, I think something like this commit message would be clearer:

Here's a version that does that, I also integrated the perf test René
wrote and used it in the commit messages. The range-diff with v3
follows:

1:  09b52b0eae = 1:  ba9ef73304 fsck tests: setup of bogus commit object
2:  294a4dfd69 = 2:  1d20e37d3f fsck tests: add a test for no skipList input
3:  15a6e16dc1 = 3:  953ad29f6d fsck: document and test sorted skipList input
4:  b93fe502c3 = 4:  c8a0bd1555 fsck: document and test commented & empty line skipList input
-:  ---------- > 5:  cbbb0326d3 fsck: add a performance test for skipList
5:  3cf7d4e9eb ! 6:  ca6aca2994 fsck: use strbuf_getline() to read skiplist file
    @@ -2,14 +2,33 @@
     
         fsck: use strbuf_getline() to read skiplist file
     
    -    buffer is unlikely to contain a NUL character, so printing its contents
    -    using %s in a die() format is unsafe (detected with ASan).
    +    The buffer is unlikely to contain a NUL character, so printing its
    +    contents using %s in a die() format is unsafe (detected with ASan).
     
         Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
         is always NUL-terminated, supports CRLF files as well, accepts files
         without a newline after the last line, supports any hash length
         automatically, and is shorter.
     
    +    This fixes a bug where emitting an error about an invalid line on say
    +    line 1 would continue printing subsequent lines, and usually continue
    +    into uninitialized memory.
    +
    +    The performance impact of this, on a CentOS 7 box with RedHat GCC
    +    4.8.5-28:
    +
    +        $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1450-fsck-skip-list.sh
    +        Test                                             HEAD~             HEAD
    +        ----------------------------------------------------------------------------------------
    +        1450.3: fsck with 0 skipped bad commits          7.75(7.39+0.35)   7.68(7.29+0.39) -0.9%
    +        1450.5: fsck with 1 skipped bad commits          7.70(7.30+0.40)   7.80(7.42+0.37) +1.3%
    +        1450.7: fsck with 10 skipped bad commits         7.77(7.37+0.40)   7.87(7.47+0.40) +1.3%
    +        1450.9: fsck with 100 skipped bad commits        7.82(7.41+0.40)   7.88(7.43+0.44) +0.8%
    +        1450.11: fsck with 1000 skipped bad commits      7.88(7.49+0.39)   7.84(7.43+0.40) -0.5%
    +        1450.13: fsck with 10000 skipped bad commits     8.02(7.63+0.39)   8.07(7.67+0.39) +0.6%
    +        1450.15: fsck with 100000 skipped bad commits    8.01(7.60+0.41)   8.08(7.70+0.38) +0.9%
    +        1450.17: fsck with 1000000 skipped bad commits   7.60(7.10+0.50)   7.37(7.18+0.19) -3.0%
    +
         Helped-by: Jeff King <peff@peff.net>
         Signed-off-by: Rene Scharfe <l.s.r@web.de>
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
6:  bd51de7c14 ! 7:  58bfd04d96 fsck: use oidset for skiplist
    @@ -1,22 +1,33 @@
     Author: René Scharfe <l.s.r@web.de>
     
    -    fsck: use oidset for skiplist
    +    fsck: use oidset instead of oid_array for skipList
     
    -    Object IDs to skip are stored in a shared static oid_array.  Lookups do
    -    a binary search on the sorted array.  The code checks if the object IDs
    -    are already in the correct order while loading and skips sorting in that
    -    case.  Lookups are done before reporting a (non-fatal) corruption and
    -    before checking .gitmodules files.
    +    Change the implementation of the skipList feature to use oidset
    +    instead of oid_array to store SHA-1s for later lookup.
     
    -    Simplify the code by using an oidset instead.  Memory usage is a bit
    -    higher, but we don't need to worry about any sort order anymore.  Embed
    -    the oidset into struct fsck_options to make its ownership clear (no
    -    hidden sharing) and avoid unnecessary pointer indirection.
    +    This list is parsed once on startup by fsck, fetch-pack or
    +    receive-pack depending on the *.skipList config in use. I.e. only once
    +    per invocation, but note that for "clone --recurse-submodules" each
    +    submodule will re-parse the list, in addition to the main project.
     
    -    Performance on repositories with a low number of reported issues and
    -    .gitmodules files (i.e. the usual case) won't be affected much.  The
    -    oidset should be a bit quicker with higher numbers of bad objects in
    -    the skipList.
    +    Memory usage is a bit higher, but we don't need to keep track of the
    +    sort order anymore. Embed the oidset into struct fsck_options to make
    +    its ownership clear (no hidden sharing) and avoid unnecessary pointer
    +    indirection.
    +
    +    The cumulative impact on performance of this & the preceding change,
    +    using the test setup described in the previous commit:
    +
    +        Test                                             HEAD~2            HEAD~                   HEAD
    +        ----------------------------------------------------------------------------------------------------------------
    +        1450.3: fsck with 0 skipped bad commits          7.70(7.31+0.38)   7.72(7.33+0.38) +0.3%   7.70(7.30+0.40) +0.0%
    +        1450.5: fsck with 1 skipped bad commits          7.84(7.47+0.37)   7.69(7.32+0.36) -1.9%   7.71(7.29+0.41) -1.7%
    +        1450.7: fsck with 10 skipped bad commits         7.81(7.40+0.40)   7.94(7.57+0.36) +1.7%   7.92(7.55+0.37) +1.4%
    +        1450.9: fsck with 100 skipped bad commits        7.81(7.42+0.38)   7.95(7.53+0.41) +1.8%   7.83(7.42+0.41) +0.3%
    +        1450.11: fsck with 1000 skipped bad commits      7.99(7.62+0.36)   7.90(7.50+0.40) -1.1%   7.86(7.49+0.37) -1.6%
    +        1450.13: fsck with 10000 skipped bad commits     7.98(7.57+0.40)   7.94(7.53+0.40) -0.5%   7.90(7.45+0.44) -1.0%
    +        1450.15: fsck with 100000 skipped bad commits    7.97(7.57+0.39)   8.03(7.67+0.36) +0.8%   7.84(7.43+0.41) -1.6%
    +        1450.17: fsck with 1000000 skipped bad commits   7.72(7.22+0.50)   7.28(7.07+0.20) -5.7%   7.13(6.87+0.25) -7.6%
     
         Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Signed-off-by: Rene Scharfe <l.s.r@web.de>
7:  17838cf68c ! 8:  0125461f13 fsck: support comments & empty lines in skipList
    @@ -11,6 +11,20 @@
         since this on-disk format can be expected to be used by multiple
         versions of git.
     
    +    There is no notable performance impact from this change, using the
    +    test setup described a couple of commist back:
    +
    +        Test                                             HEAD~             HEAD
    +        ----------------------------------------------------------------------------------------
    +        1450.3: fsck with 0 skipped bad commits          7.81(7.42+0.39)   7.72(7.34+0.38) -1.2%
    +        1450.5: fsck with 1 skipped bad commits          7.75(7.36+0.38)   7.66(7.26+0.39) -1.2%
    +        1450.7: fsck with 10 skipped bad commits         7.81(7.43+0.38)   7.70(7.30+0.39) -1.4%
    +        1450.9: fsck with 100 skipped bad commits        7.85(7.42+0.42)   7.73(7.31+0.41) -1.5%
    +        1450.11: fsck with 1000 skipped bad commits      7.81(7.43+0.38)   7.84(7.46+0.38) +0.4%
    +        1450.13: fsck with 10000 skipped bad commits     7.87(7.47+0.40)   7.86(7.46+0.40) -0.1%
    +        1450.15: fsck with 100000 skipped bad commits    7.77(7.39+0.38)   7.83(7.48+0.34) +0.8%
    +        1450.17: fsck with 1000000 skipped bad commits   7.17(6.92+0.24)   7.11(6.85+0.26) -0.8%
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
     diff --git a/Documentation/config.txt b/Documentation/config.txt

René Scharfe (3):
  fsck: add a performance test for skipList
  fsck: use strbuf_getline() to read skiplist file
  fsck: use oidset instead of oid_array for skipList

Ævar Arnfjörð Bjarmason (5):
  fsck tests: setup of bogus commit object
  fsck tests: add a test for no skipList input
  fsck: document and test sorted skipList input
  fsck: document and test commented & empty line skipList input
  fsck: support comments & empty lines in skipList

 Documentation/config.txt        | 22 ++++++++++----
 fsck.c                          | 48 ++++++++++-------------------
 fsck.h                          |  8 +++--
 t/perf/p1450-fsck-skip-list.sh  | 40 +++++++++++++++++++++++++
 t/t5504-fetch-receive-strict.sh | 53 ++++++++++++++++++++++++++++++---
 5 files changed, 126 insertions(+), 45 deletions(-)
 create mode 100755 t/perf/p1450-fsck-skip-list.sh

-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 1/8] fsck tests: setup of bogus commit object
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Several fsck tests used the exact same git-hash-object output, but had
copy/pasted that part of the setup code. Let's instead do that setup
once and use it in subsequent tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5504-fetch-receive-strict.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 62f3569891..6d268f3327 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -133,6 +133,10 @@ committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
 This commit object intentionally broken
 EOF
 
+test_expect_success 'setup bogus commit' '
+	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
@@ -142,7 +146,6 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 '
 
 test_expect_success 'push with receive.fsck.skipList' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
 	git init dst &&
@@ -169,7 +172,6 @@ test_expect_success 'push with receive.fsck.skipList' '
 '
 
 test_expect_success 'fetch with fetch.fsck.skipList' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	refspec=refs/heads/bogus:refs/heads/bogus &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
@@ -204,7 +206,6 @@ test_expect_success 'fsck.<unknownmsg-id> dies' '
 '
 
 test_expect_success 'push with receive.fsck.missingEmail=warn' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
 	git init dst &&
@@ -232,7 +233,6 @@ test_expect_success 'push with receive.fsck.missingEmail=warn' '
 '
 
 test_expect_success 'fetch with fetch.fsck.missingEmail=warn' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	refspec=refs/heads/bogus:refs/heads/bogus &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/8] fsck tests: add a test for no skipList input
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

The recent 65a836fa6b ("fsck: add stress tests for fsck.skipList",
2018-07-27) added various stress tests for odd invocations of
fsck.skipList, but didn't tests for some very simple ones, such as
asserting that providing to skipList with a bad commit causes fsck to
exit with a non-zero exit code. Add such a test.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5504-fetch-receive-strict.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 6d268f3327..cbae31f330 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -137,6 +137,11 @@ test_expect_success 'setup bogus commit' '
 	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
 '
 
+test_expect_success 'fsck with no skipList input' '
+	test_must_fail git fsck 2>err &&
+	test_i18ngrep "missingEmail" err
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 3/8] fsck: document and test sorted skipList input
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (2 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Ever since the skipList support was first added in cd94c6f91 ("fsck:
git receive-pack: support excluding objects from fsck'ing",
2015-06-22) the documentation for the format has that the file is a
sorted list of object names.

Thus, anyone using the feature would have thought the list needed to
be sorted. E.g. I recently in conjunction with my fetch.fsck.*
implementation in 1362df0d41 ("fetch: implement fetch.fsck.*",
2018-07-27) wrote some code to ship a skipList, and went out of my way
to sort it.

Doing so seems intuitive, since it contains fixed-width records, and
has no support for comments, so one might expect it to be binary
searched in-place on-disk.

However, as documented here this was never a requirement, so let's
change the documentation. Since this is a file format change let's
also document what was said about this in the past, so e.g. someone
like myself reading the new docs can see this never needed to be
sorted ("why do I have all this code to sort this thing...").

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 10 +++++++++-
 t/t5504-fetch-receive-strict.sh | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index eb66a11975..b2fdbc6764 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,7 +1710,7 @@ doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
 
 fsck.skipList::
-	The path to a sorted list of object names (i.e. one SHA-1 per
+	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
 	be ignored. This feature is useful when an established project
 	should be accepted despite early commits containing errors that
@@ -1725,6 +1725,14 @@ Unlike variables like `color.ui` and `core.editor` the
 fall back on the `fsck.skipList` configuration if they aren't set. To
 uniformly configure the same fsck settings in different circumstances
 all three of them they must all set to the same values.
++
+Older versions of Git (before 2.20) documented that the object names
+list should be sorted. This was never a requirement, the object names
+can appear in any order, but when reading the list we track whether
+the list is sorted for the purposes of an internal binary search
+implementation, which can save itself some work with an already sorted
+list.  Unless you have a humongous list there's no reason to go out of
+your way to pre-sort the list.
 
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index cbae31f330..fa56052f0f 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -142,6 +142,25 @@ test_expect_success 'fsck with no skipList input' '
 	test_i18ngrep "missingEmail" err
 '
 
+test_expect_success 'setup sorted and unsorted skipLists' '
+	cat >SKIP.unsorted <<-EOF &&
+	0000000000000000000000000000000000000004
+	0000000000000000000000000000000000000002
+	$commit
+	0000000000000000000000000000000000000001
+	0000000000000000000000000000000000000003
+	EOF
+	sort SKIP.unsorted >SKIP.sorted
+'
+
+test_expect_success 'fsck with sorted skipList' '
+	git -c fsck.skipList=SKIP.sorted fsck
+'
+
+test_expect_success 'fsck with unsorted skipList' '
+	git -c fsck.skipList=SKIP.unsorted fsck
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 4/8] fsck: document and test commented & empty line skipList input
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (3 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

There is currently no comment syntax for the fsck.skipList, this isn't
really by design, and it would be nice to have support for comments.

Document that this doesn't work, and test for how this errors
out. These tests reveal a current bug, if there's invalid input the
output will emit some of the next line, and then go into uninitialized
memory. This is fixed in a subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 11 +++++++----
 t/t5504-fetch-receive-strict.sh | 21 +++++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b2fdbc6764..c7f033f036 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1712,10 +1712,13 @@ will only cause git to warn.
 fsck.skipList::
 	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
-	be ignored. This feature is useful when an established project
-	should be accepted despite early commits containing errors that
-	can be safely ignored such as invalid committer email addresses.
-	Note: corrupt objects cannot be skipped with this setting.
+	be ignored. Comments ('#') and empty lines are not supported, and
+	will error out.
++
+This feature is useful when an established project should be accepted
+despite early commits containing errors that can be safely ignored
+such as invalid committer email addresses.  Note: corrupt objects
+cannot be skipped with this setting.
 +
 Like `fsck.<msg-id>` this variable has corresponding
 `receive.fsck.skipList` and `fetch.fsck.skipList` variants.
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index fa56052f0f..38aaf3b928 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,6 +169,27 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 	test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
+test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
+	cat >SKIP.with-comment <<-EOF &&
+	# Some bad commit
+	0000000000000000000000000000000000000001
+	EOF
+	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
+	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
+	cat >SKIP.with-empty-line <<-EOF &&
+	0000000000000000000000000000000000000001
+
+	0000000000000000000000000000000000000002
+	EOF
+	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
+	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+'
+
+test_expect_failure 'fsck no garbage output from comments & empty lines errors' '
+	test_line_count = 1 err-with-comment &&
+	test_line_count = 1 err-with-empty-line
+'
+
 test_expect_success 'push with receive.fsck.skipList' '
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 5/8] fsck: add a performance test for skipList
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (4 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

Create a performance test to see how the skipList implementation
performs. First we setup N bad commits, then we see how progressively
working our way up to 0..N in increments of 10x does. I.e. the
needle(s) in the haystack get progressively more numerous.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p1450-fsck-skip-list.sh | 40 ++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100755 t/perf/p1450-fsck-skip-list.sh

diff --git a/t/perf/p1450-fsck-skip-list.sh b/t/perf/p1450-fsck-skip-list.sh
new file mode 100755
index 0000000000..c2b97d2487
--- /dev/null
+++ b/t/perf/p1450-fsck-skip-list.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description='Test fsck skipList performance'
+
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+n=1000000
+
+test_expect_success "setup $n bad commits" '
+	for i in $(test_seq 1 $n)
+	do
+		echo "commit refs/heads/master" &&
+		echo "committer C <c@example.com> 1234567890 +0000" &&
+		echo "data <<EOF" &&
+		echo "$i.Q." &&
+		echo "EOF"
+	done | q_to_nul | git fast-import
+'
+
+skip=0
+while test $skip -le $n
+do
+	test_expect_success "create skipList for $skip bad commits" '
+		git log --format=%H --max-count=$skip |
+		sort >skiplist
+	'
+
+	test_perf "fsck with $skip skipped bad commits" '
+		git -c fsck.skipList=skiplist fsck
+	'
+
+	case $skip in
+	0) skip=1 ;;
+	*) skip=${skip}0 ;;
+	esac
+done
+
+test_done
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (5 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

The buffer is unlikely to contain a NUL character, so printing its
contents using %s in a die() format is unsafe (detected with ASan).

Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.

This fixes a bug where emitting an error about an invalid line on say
line 1 would continue printing subsequent lines, and usually continue
into uninitialized memory.

The performance impact of this, on a CentOS 7 box with RedHat GCC
4.8.5-28:

    $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1450-fsck-skip-list.sh
    Test                                             HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.75(7.39+0.35)   7.68(7.29+0.39) -0.9%
    1450.5: fsck with 1 skipped bad commits          7.70(7.30+0.40)   7.80(7.42+0.37) +1.3%
    1450.7: fsck with 10 skipped bad commits         7.77(7.37+0.40)   7.87(7.47+0.40) +1.3%
    1450.9: fsck with 100 skipped bad commits        7.82(7.41+0.40)   7.88(7.43+0.44) +0.8%
    1450.11: fsck with 1000 skipped bad commits      7.88(7.49+0.39)   7.84(7.43+0.40) -0.5%
    1450.13: fsck with 10000 skipped bad commits     8.02(7.63+0.39)   8.07(7.67+0.39) +0.6%
    1450.15: fsck with 100000 skipped bad commits    8.01(7.60+0.41)   8.08(7.70+0.38) +0.9%
    1450.17: fsck with 1000000 skipped bad commits   7.60(7.10+0.50)   7.37(7.18+0.19) -3.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c                          | 25 ++++++++++++-------------
 t/t5504-fetch-receive-strict.sh |  2 +-
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index a0cee0be59..972a26b9ba 100644
--- a/fsck.c
+++ b/fsck.c
@@ -183,8 +183,9 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
 	static struct oid_array skiplist = OID_ARRAY_INIT;
-	int sorted, fd;
-	char buffer[GIT_MAX_HEXSZ + 1];
+	int sorted;
+	FILE *fp;
+	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 
 	if (options->skiplist)
@@ -194,25 +195,23 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		options->skiplist = &skiplist;
 	}
 
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
+	fp = fopen(path, "r");
+	if (!fp)
 		die("Could not open skip list: %s", path);
-	for (;;) {
+	while (!strbuf_getline(&sb, fp)) {
 		const char *p;
-		int result = read_in_full(fd, buffer, sizeof(buffer));
-		if (result < 0)
-			die_errno("Could not read '%s'", path);
-		if (!result)
-			break;
-		if (parse_oid_hex(buffer, &oid, &p) || *p != '\n')
-			die("Invalid SHA-1: %s", buffer);
+		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
+			die("Invalid SHA-1: %s", sb.buf);
 		oid_array_append(&skiplist, &oid);
 		if (sorted && skiplist.nr > 1 &&
 				oidcmp(&skiplist.oid[skiplist.nr - 2],
 				       &oid) > 0)
 			sorted = 0;
 	}
-	close(fd);
+	if (ferror(fp))
+		die_errno("Could not read '%s'", path);
+	fclose(fp);
+	strbuf_release(&sb);
 
 	if (sorted)
 		skiplist.sorted = 1;
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 38aaf3b928..c7224db3bb 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -185,7 +185,7 @@ test_expect_success 'fsck with invalid or bogus skipList input (comments & empty
 	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
 '
 
-test_expect_failure 'fsck no garbage output from comments & empty lines errors' '
+test_expect_success 'fsck no garbage output from comments & empty lines errors' '
 	test_line_count = 1 err-with-comment &&
 	test_line_count = 1 err-with-empty-line
 '
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (6 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
  2018-08-28 14:29     ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
  9 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.

This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use. I.e. only once
per invocation, but note that for "clone --recurse-submodules" each
submodule will re-parse the list, in addition to the main project.

Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to make
its ownership clear (no hidden sharing) and avoid unnecessary pointer
indirection.

The cumulative impact on performance of this & the preceding change,
using the test setup described in the previous commit:

    Test                                             HEAD~2            HEAD~                   HEAD
    ----------------------------------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.70(7.31+0.38)   7.72(7.33+0.38) +0.3%   7.70(7.30+0.40) +0.0%
    1450.5: fsck with 1 skipped bad commits          7.84(7.47+0.37)   7.69(7.32+0.36) -1.9%   7.71(7.29+0.41) -1.7%
    1450.7: fsck with 10 skipped bad commits         7.81(7.40+0.40)   7.94(7.57+0.36) +1.7%   7.92(7.55+0.37) +1.4%
    1450.9: fsck with 100 skipped bad commits        7.81(7.42+0.38)   7.95(7.53+0.41) +1.8%   7.83(7.42+0.41) +0.3%
    1450.11: fsck with 1000 skipped bad commits      7.99(7.62+0.36)   7.90(7.50+0.40) -1.1%   7.86(7.49+0.37) -1.6%
    1450.13: fsck with 10000 skipped bad commits     7.98(7.57+0.40)   7.94(7.53+0.40) -0.5%   7.90(7.45+0.44) -1.0%
    1450.15: fsck with 100000 skipped bad commits    7.97(7.57+0.39)   8.03(7.67+0.36) +0.8%   7.84(7.43+0.41) -1.6%
    1450.17: fsck with 1000000 skipped bad commits   7.72(7.22+0.50)   7.28(7.07+0.20) -5.7%   7.13(6.87+0.25) -7.6%

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt | 11 ++++++-----
 fsck.c                   | 23 ++---------------------
 fsck.h                   |  8 +++++---
 3 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c7f033f036..ebaa044689 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1731,11 +1731,12 @@ all three of them they must all set to the same values.
 +
 Older versions of Git (before 2.20) documented that the object names
 list should be sorted. This was never a requirement, the object names
-can appear in any order, but when reading the list we track whether
-the list is sorted for the purposes of an internal binary search
-implementation, which can save itself some work with an already sorted
-list.  Unless you have a humongous list there's no reason to go out of
-your way to pre-sort the list.
+could appear in any order, but when reading the list we tracked whether
+the list was sorted for the purposes of an internal binary search
+implementation, which could save itself some work with an already sorted
+list. Unless you had a humongous list there was no reason to go out of
+your way to pre-sort the list. After Git version 2.20 a hash implementation
+is used instead, so there's now no reason to pre-sort the list.
 
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
diff --git a/fsck.c b/fsck.c
index 972a26b9ba..4c643f1d40 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,7 +10,6 @@
 #include "fsck.h"
 #include "refs.h"
 #include "utf8.h"
-#include "sha1-array.h"
 #include "decorate.h"
 #include "oidset.h"
 #include "packfile.h"
@@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
-	static struct oid_array skiplist = OID_ARRAY_INIT;
-	int sorted;
 	FILE *fp;
 	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 
-	if (options->skiplist)
-		sorted = options->skiplist->sorted;
-	else {
-		sorted = 1;
-		options->skiplist = &skiplist;
-	}
-
 	fp = fopen(path, "r");
 	if (!fp)
 		die("Could not open skip list: %s", path);
@@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		const char *p;
 		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
 			die("Invalid SHA-1: %s", sb.buf);
-		oid_array_append(&skiplist, &oid);
-		if (sorted && skiplist.nr > 1 &&
-				oidcmp(&skiplist.oid[skiplist.nr - 2],
-				       &oid) > 0)
-			sorted = 0;
+		oidset_insert(&options->skiplist, &oid);
 	}
 	if (ferror(fp))
 		die_errno("Could not read '%s'", path);
 	fclose(fp);
 	strbuf_release(&sb);
-
-	if (sorted)
-		skiplist.sorted = 1;
 }
 
 static int parse_msg_type(const char *str)
@@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
 
 static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
 {
-	if (opts && opts->skiplist && obj)
-		return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
-	return 0;
+	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
 }
 
 __attribute__((format (printf, 4, 5)))
diff --git a/fsck.h b/fsck.h
index 0c7e8c9428..b95595ae5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -1,6 +1,8 @@
 #ifndef GIT_FSCK_H
 #define GIT_FSCK_H
 
+#include "oidset.h"
+
 #define FSCK_ERROR 1
 #define FSCK_WARN 2
 #define FSCK_IGNORE 3
@@ -35,12 +37,12 @@ struct fsck_options {
 	fsck_error error_func;
 	unsigned strict:1;
 	int *msg_type;
-	struct oid_array *skiplist;
+	struct oidset skiplist;
 	struct decoration *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 8/8] fsck: support comments & empty lines in skipList
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (7 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
@ 2018-08-28  9:52     ` Ævar Arnfjörð Bjarmason
  2018-08-28 14:59       ` René Scharfe
  2018-08-28 14:29     ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
  9 siblings, 1 reply; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-28  9:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).

Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.

There is no notable performance impact from this change, using the
test setup described a couple of commist back:

    Test                                             HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.81(7.42+0.39)   7.72(7.34+0.38) -1.2%
    1450.5: fsck with 1 skipped bad commits          7.75(7.36+0.38)   7.66(7.26+0.39) -1.2%
    1450.7: fsck with 10 skipped bad commits         7.81(7.43+0.38)   7.70(7.30+0.39) -1.4%
    1450.9: fsck with 100 skipped bad commits        7.85(7.42+0.42)   7.73(7.31+0.41) -1.5%
    1450.11: fsck with 1000 skipped bad commits      7.81(7.43+0.38)   7.84(7.46+0.38) +0.4%
    1450.13: fsck with 10000 skipped bad commits     7.87(7.47+0.40)   7.86(7.46+0.40) -0.1%
    1450.15: fsck with 100000 skipped bad commits    7.77(7.39+0.38)   7.83(7.48+0.34) +0.8%
    1450.17: fsck with 1000000 skipped bad commits   7.17(6.92+0.24)   7.11(6.85+0.26) -0.8%

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 4 ++--
 fsck.c                          | 2 ++
 t/t5504-fetch-receive-strict.sh | 6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ebaa044689..824634c412 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1712,8 +1712,8 @@ will only cause git to warn.
 fsck.skipList::
 	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
-	be ignored. Comments ('#') and empty lines are not supported, and
-	will error out.
+	be ignored. On versions of Git 2.20 and later comments ('#') and empty
+	lines are ignored, but will error out on older versions.
 +
 This feature is useful when an established project should be accepted
 despite early commits containing errors that can be safely ignored
diff --git a/fsck.c b/fsck.c
index 4c643f1d40..589548308a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -190,6 +190,8 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		die("Could not open skip list: %s", path);
 	while (!strbuf_getline(&sb, fp)) {
 		const char *p;
+		if (!strcmp(sb.buf, "") || starts_with(sb.buf, "#"))
+			continue;
 		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
 			die("Invalid SHA-1: %s", sb.buf);
 		oidset_insert(&options->skiplist, &oid);
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index c7224db3bb..a1bac164d1 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,20 +169,20 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 	test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
-test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
+test_expect_success 'fsck with other accepted skipList input (comments & empty lines)' '
 	cat >SKIP.with-comment <<-EOF &&
 	# Some bad commit
 	0000000000000000000000000000000000000001
 	EOF
 	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
-	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
+	test_i18ngrep "missingEmail" err-with-comment &&
 	cat >SKIP.with-empty-line <<-EOF &&
 	0000000000000000000000000000000000000001
 
 	0000000000000000000000000000000000000002
 	EOF
 	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
-	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+	test_i18ngrep "missingEmail" err-with-empty-line
 '
 
 test_expect_success 'fsck no garbage output from comments & empty lines errors' '
-- 
2.19.0.rc0.228.g281dcd1b4d0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 6/7] fsck: use oidset for skiplist
  2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
                       ` (8 preceding siblings ...)
  2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
@ 2018-08-28 14:29     ` René Scharfe
  9 siblings, 0 replies; 32+ messages in thread
From: René Scharfe @ 2018-08-28 14:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Ramsay Jones, Johannes Schindelin, Jeff King

Am 27.08.2018 um 22:15 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:
> 
>> From: René Scharfe <l.s.r@web.de>
>>
>> Object IDs to skip are stored in a shared static oid_array.  Lookups do
>> a binary search on the sorted array.  The code checks if the object IDs
>> are already in the correct order while loading and skips sorting in that
>> case.  Lookups are done before reporting a (non-fatal) corruption and
>> before checking .gitmodules files.
>>
>> Simplify the code by using an oidset instead.  Memory usage is a bit
>> higher, but we don't need to worry about any sort order anymore.  Embed
>> the oidset into struct fsck_options to make its ownership clear (no
>> hidden sharing) and avoid unnecessary pointer indirection.
>>
>> Performance on repositories with a low number of reported issues and
>> .gitmodules files (i.e. the usual case) won't be affected much.  The
>> oidset should be a bit quicker with higher numbers of bad objects in
>> the skipList.
> 
> I didn't change this commit message at all, but FWIW this still has me
> somewhat confused. What is the interaction with .gitmodules being
> described here? You also mentioned it in
> https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
> (but I don't get that either)

The skipList is consulted before checking .gitmodules blobs, since
fb16287719 (fsck: check skiplist for object in fsck_blob()).

> Does that just mean that when cloning with --recursive with
> transfer.fsckObjects=true we'll re-read the file for each "clone"
> invocation, both for the main project and everything listed in
> .gitmodules?

That is probably true, but I didn't mean that.  I was only talking
about when an object is looked up in the skiplist (oid_array/oidset).

> If so, I think something like this commit message would be clearer:
> 
>     fsck: use oidset instead of oid_array for skipList
> 
>     Change the implementation of the skipList feature to use oidset
>     instead of oid_array to store SHA-1s for later lookup.
> 
>     This list is parsed once on startup by fsck, fetch-pack or
>     receive-pack depending on the *.skipList config in use, so for fetch
>     it's currently re-parsed for each submodule fetch.

OK; the patch doesn't change that, though.  Mentioning it sound like
a good idea if the load takes a significant amount of time -- I
didn't measure this separately, so perhaps this needs to be explored
further if people use big skipLists.

>     Memory usage is a bit higher, but we don't need to keep track of the
>     sort order anymore. Embed the oidset into struct fsck_options to
>     make its ownership clear (no hidden sharing) and avoid unnecessary
>     pointer indirection.
> 
> Then let's just attach the test program you wrote in
> https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b2df@web.de/
> and note the results and let them speak for themselves.

I was surprised that such a big repo can be mocked up relatively
quickly, but the numbers are a bit underwhelming -- there is not a
lot of change.  The script is not too expensive, so I don't mind
adding it.

> I can do all that if that seems fine to you. I also notice that the test
> case only sets up a case where all the items on the skip list are bad
> commits in the repo, it would be interesting to test with a few needles
> in a large haystack (I can modify it to do that...).

I tried something like this first, and its results are even more
boring.  Since skipList lookups are done only for bad objects (and
.gitmodules) you won't see  any difference at all.  Having only bad
objects is the edge case, this test being designed to highlight the
performance of the skipList implementation.

René

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 8/8] fsck: support comments & empty lines in skipList
  2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
@ 2018-08-28 14:59       ` René Scharfe
  2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
                           ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: René Scharfe @ 2018-08-28 14:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Ramsay Jones, Johannes Schindelin, Jeff King

Am 28.08.2018 um 11:52 schrieb Ævar Arnfjörð Bjarmason:
> It's annoying not to be able to put comments and empty lines in the
> skipList, when e.g. keeping a big central list of commits to skip in
> /etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
> implement fetch.fsck.*", 2018-07-27).
> 
> Implement that, and document what version of Git this was changed in,
> since this on-disk format can be expected to be used by multiple
> versions of git.
> 
> There is no notable performance impact from this change, using the
> test setup described a couple of commist back:
> 
>     Test                                             HEAD~             HEAD
>     ----------------------------------------------------------------------------------------
>     1450.3: fsck with 0 skipped bad commits          7.81(7.42+0.39)   7.72(7.34+0.38) -1.2%
>     1450.5: fsck with 1 skipped bad commits          7.75(7.36+0.38)   7.66(7.26+0.39) -1.2%
>     1450.7: fsck with 10 skipped bad commits         7.81(7.43+0.38)   7.70(7.30+0.39) -1.4%
>     1450.9: fsck with 100 skipped bad commits        7.85(7.42+0.42)   7.73(7.31+0.41) -1.5%
>     1450.11: fsck with 1000 skipped bad commits      7.81(7.43+0.38)   7.84(7.46+0.38) +0.4%
>     1450.13: fsck with 10000 skipped bad commits     7.87(7.47+0.40)   7.86(7.46+0.40) -0.1%
>     1450.15: fsck with 100000 skipped bad commits    7.77(7.39+0.38)   7.83(7.48+0.34) +0.8%
>     1450.17: fsck with 1000000 skipped bad commits   7.17(6.92+0.24)   7.11(6.85+0.26) -0.8%
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  Documentation/config.txt        | 4 ++--
>  fsck.c                          | 2 ++
>  t/t5504-fetch-receive-strict.sh | 6 +++---
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ebaa044689..824634c412 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1712,8 +1712,8 @@ will only cause git to warn.
>  fsck.skipList::
>  	The path to a list of object names (i.e. one SHA-1 per
>  	line) that are known to be broken in a non-fatal way and should
> -	be ignored. Comments ('#') and empty lines are not supported, and
> -	will error out.
> +	be ignored. On versions of Git 2.20 and later comments ('#') and empty
> +	lines are ignored, but will error out on older versions.
>  +
>  This feature is useful when an established project should be accepted
>  despite early commits containing errors that can be safely ignored
> diff --git a/fsck.c b/fsck.c
> index 4c643f1d40..589548308a 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -190,6 +190,8 @@ static void init_skiplist(struct fsck_options *options, const char *path)
>  		die("Could not open skip list: %s", path);
>  	while (!strbuf_getline(&sb, fp)) {
>  		const char *p;
> +		if (!strcmp(sb.buf, "") || starts_with(sb.buf, "#"))
> +			continue;

Checking sb.len == 0 is simpler and might be slightly quicker.

But what is an empty line?  Shouldn't whitespace-only lines qualify?
And why not allow trailing comments, while we're at it?  I.e. what
about something like this instead?

		const char *hash = strchr(sb.buf, '#);
		if (hash)
			strbuf_setlen(&sb, hash - sb.buf);
		strbuf_rtrim(&sb);
		if (!sb.len)
			continue;

Too much?

>  		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
>  			die("Invalid SHA-1: %s", sb.buf);
>  		oidset_insert(&options->skiplist, &oid);
> diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
> index c7224db3bb..a1bac164d1 100755
> --- a/t/t5504-fetch-receive-strict.sh
> +++ b/t/t5504-fetch-receive-strict.sh
> @@ -169,20 +169,20 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
>  	test_i18ngrep "Invalid SHA-1: \[core\]" err
>  '
>  
> -test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
> +test_expect_success 'fsck with other accepted skipList input (comments & empty lines)' '
>  	cat >SKIP.with-comment <<-EOF &&
>  	# Some bad commit
>  	0000000000000000000000000000000000000001
>  	EOF
>  	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
> -	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
> +	test_i18ngrep "missingEmail" err-with-comment &&
>  	cat >SKIP.with-empty-line <<-EOF &&
>  	0000000000000000000000000000000000000001
>  
>  	0000000000000000000000000000000000000002
>  	EOF
>  	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
> -	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
> +	test_i18ngrep "missingEmail" err-with-empty-line
>  '
>  
>  test_expect_success 'fsck no garbage output from comments & empty lines errors' '
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support
  2018-08-28 14:59       ` René Scharfe
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
                           ` (9 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Addresses feedback René had on v4. Notes below:

René Scharfe (3):
  fsck: add a performance test for skipList
  fsck: use strbuf_getline() to read skiplist file

No changes.

  fsck: use oidset instead of oid_array for skipList

Now we note in the commit message that the skipList is re-consulted
for .gitmodules blobs, as René pointed out.

Ævar Arnfjörð Bjarmason (7):
  fsck tests: setup of bogus commit object
  fsck tests: add a test for no skipList input

No changes.

  fsck: document and test sorted skipList input

Trivial whitespace change in new docs paragraph.

  fsck: document and test commented & empty line skipList input

No changes.

  fsck: document that skipList input must be unabbreviated

NEW: Now we document & test for the SHA-1s on the skip list being
unabbreviated. It was always like this, but let's say so / test for
it.

  fsck: add a performance test

NEW: Add a generic new fsck perf test while I'm at it, in addition to
the more specific one René wrote.

  fsck: support comments & empty lines in skipList

The major change this time around, we now support trailing comments,
leading whitespace before comments, whitespace-only lines etc. I went
one step further than what René suggested and decided to just call
strbuf_trim() so now we also support SHA-1s with leading spaces. I
don't see a good reason to go out of our way to ban those, and it's
consistent with other formats we use (and perhaps people will use this
to align them in some way).

 Documentation/config.txt        | 23 ++++++++---
 fsck.c                          | 60 +++++++++++++--------------
 fsck.h                          |  8 ++--
 t/perf/p1450-fsck.sh            | 13 ++++++
 t/perf/p1451-fsck-skip-list.sh  | 40 ++++++++++++++++++
 t/t5504-fetch-receive-strict.sh | 72 +++++++++++++++++++++++++++++++--
 6 files changed, 171 insertions(+), 45 deletions(-)
 create mode 100755 t/perf/p1450-fsck.sh
 create mode 100755 t/perf/p1451-fsck-skip-list.sh

-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5 01/10] fsck tests: setup of bogus commit object
  2018-08-28 14:59       ` René Scharfe
  2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
                           ` (8 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Several fsck tests used the exact same git-hash-object output, but had
copy/pasted that part of the setup code. Let's instead do that setup
once and use it in subsequent tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5504-fetch-receive-strict.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 62f3569891..6d268f3327 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -133,6 +133,10 @@ committer Bugs Bunny <bugs@bun.ni> 1234567890 +0000
 This commit object intentionally broken
 EOF
 
+test_expect_success 'setup bogus commit' '
+	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
@@ -142,7 +146,6 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 '
 
 test_expect_success 'push with receive.fsck.skipList' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
 	git init dst &&
@@ -169,7 +172,6 @@ test_expect_success 'push with receive.fsck.skipList' '
 '
 
 test_expect_success 'fetch with fetch.fsck.skipList' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	refspec=refs/heads/bogus:refs/heads/bogus &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
@@ -204,7 +206,6 @@ test_expect_success 'fsck.<unknownmsg-id> dies' '
 '
 
 test_expect_success 'push with receive.fsck.missingEmail=warn' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
 	git init dst &&
@@ -232,7 +233,6 @@ test_expect_success 'push with receive.fsck.missingEmail=warn' '
 '
 
 test_expect_success 'fetch with fetch.fsck.missingEmail=warn' '
-	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
 	refspec=refs/heads/bogus:refs/heads/bogus &&
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 02/10] fsck tests: add a test for no skipList input
  2018-08-28 14:59       ` René Scharfe
  2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
                           ` (7 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

The recent 65a836fa6b ("fsck: add stress tests for fsck.skipList",
2018-07-27) added various stress tests for odd invocations of
fsck.skipList, but didn't tests for some very simple ones, such as
asserting that providing to skipList with a bad commit causes fsck to
exit with a non-zero exit code. Add such a test.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5504-fetch-receive-strict.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 6d268f3327..cbae31f330 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -137,6 +137,11 @@ test_expect_success 'setup bogus commit' '
 	commit="$(git hash-object -t commit -w --stdin <bogus-commit)"
 '
 
+test_expect_success 'fsck with no skipList input' '
+	test_must_fail git fsck 2>err &&
+	test_i18ngrep "missingEmail" err
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 03/10] fsck: document and test sorted skipList input
  2018-08-28 14:59       ` René Scharfe
                           ` (2 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
                           ` (6 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Ever since the skipList support was first added in cd94c6f91 ("fsck:
git receive-pack: support excluding objects from fsck'ing",
2015-06-22) the documentation for the format has that the file is a
sorted list of object names.

Thus, anyone using the feature would have thought the list needed to
be sorted. E.g. I recently in conjunction with my fetch.fsck.*
implementation in 1362df0d41 ("fetch: implement fetch.fsck.*",
2018-07-27) wrote some code to ship a skipList, and went out of my way
to sort it.

Doing so seems intuitive, since it contains fixed-width records, and
has no support for comments, so one might expect it to be binary
searched in-place on-disk.

However, as documented here this was never a requirement, so let's
change the documentation. Since this is a file format change let's
also document what was said about this in the past, so e.g. someone
like myself reading the new docs can see this never needed to be
sorted ("why do I have all this code to sort this thing...").

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 10 +++++++++-
 t/t5504-fetch-receive-strict.sh | 19 +++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index eb66a11975..fd1b5837d0 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,7 +1710,7 @@ doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
 
 fsck.skipList::
-	The path to a sorted list of object names (i.e. one SHA-1 per
+	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
 	be ignored. This feature is useful when an established project
 	should be accepted despite early commits containing errors that
@@ -1725,6 +1725,14 @@ Unlike variables like `color.ui` and `core.editor` the
 fall back on the `fsck.skipList` configuration if they aren't set. To
 uniformly configure the same fsck settings in different circumstances
 all three of them they must all set to the same values.
++
+Older versions of Git (before 2.20) documented that the object names
+list should be sorted. This was never a requirement, the object names
+can appear in any order, but when reading the list we track whether
+the list is sorted for the purposes of an internal binary search
+implementation, which can save itself some work with an already sorted
+list. Unless you have a humongous list there's no reason to go out of
+your way to pre-sort the list.
 
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index cbae31f330..fa56052f0f 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -142,6 +142,25 @@ test_expect_success 'fsck with no skipList input' '
 	test_i18ngrep "missingEmail" err
 '
 
+test_expect_success 'setup sorted and unsorted skipLists' '
+	cat >SKIP.unsorted <<-EOF &&
+	0000000000000000000000000000000000000004
+	0000000000000000000000000000000000000002
+	$commit
+	0000000000000000000000000000000000000001
+	0000000000000000000000000000000000000003
+	EOF
+	sort SKIP.unsorted >SKIP.sorted
+'
+
+test_expect_success 'fsck with sorted skipList' '
+	git -c fsck.skipList=SKIP.sorted fsck
+'
+
+test_expect_success 'fsck with unsorted skipList' '
+	git -c fsck.skipList=SKIP.unsorted fsck
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
 	git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
 	test_must_fail git -c fsck.skipList=does-not-exist -c fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 04/10] fsck: document and test commented & empty line skipList input
  2018-08-28 14:59       ` René Scharfe
                           ` (3 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
                           ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

There is currently no comment syntax for the fsck.skipList, this isn't
really by design, and it would be nice to have support for comments.

Document that this doesn't work, and test for how this errors
out. These tests reveal a current bug, if there's invalid input the
output will emit some of the next line, and then go into uninitialized
memory. This is fixed in a subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 11 +++++++----
 t/t5504-fetch-receive-strict.sh | 21 +++++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd1b5837d0..0e1ce7de8b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1712,10 +1712,13 @@ will only cause git to warn.
 fsck.skipList::
 	The path to a list of object names (i.e. one SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
-	be ignored. This feature is useful when an established project
-	should be accepted despite early commits containing errors that
-	can be safely ignored such as invalid committer email addresses.
-	Note: corrupt objects cannot be skipped with this setting.
+	be ignored. Comments ('#') and empty lines are not supported, and
+	will error out.
++
+This feature is useful when an established project should be accepted
+despite early commits containing errors that can be safely ignored
+such as invalid committer email addresses.  Note: corrupt objects
+cannot be skipped with this setting.
 +
 Like `fsck.<msg-id>` this variable has corresponding
 `receive.fsck.skipList` and `fetch.fsck.skipList` variants.
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index fa56052f0f..38aaf3b928 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,6 +169,27 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 	test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
+test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
+	cat >SKIP.with-comment <<-EOF &&
+	# Some bad commit
+	0000000000000000000000000000000000000001
+	EOF
+	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
+	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
+	cat >SKIP.with-empty-line <<-EOF &&
+	0000000000000000000000000000000000000001
+
+	0000000000000000000000000000000000000002
+	EOF
+	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
+	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+'
+
+test_expect_failure 'fsck no garbage output from comments & empty lines errors' '
+	test_line_count = 1 err-with-comment &&
+	test_line_count = 1 err-with-empty-line
+'
+
 test_expect_success 'push with receive.fsck.skipList' '
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated
  2018-08-28 14:59       ` René Scharfe
                           ` (4 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
                           ` (4 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Abbreviating the SHA-1s in the skipList input has never worked, but
the documentation hasn't unambiguously stated that this is an error,
and there was no test for it.

Let's fix both since it would be easy for some later refactoring
e.g. switch to accidentally switch to a looser OID parsing function,
causing the tests before this change to pass, but for older versions
of git to be incompatible with the new skipList format.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        | 2 +-
 t/t5504-fetch-receive-strict.sh | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e1ce7de8b..3287c7ef8a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,7 +1710,7 @@ doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
 
 fsck.skipList::
-	The path to a list of object names (i.e. one SHA-1 per
+	The path to a list of object names (i.e. one unabbreviated SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
 	be ignored. Comments ('#') and empty lines are not supported, and
 	will error out.
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 38aaf3b928..96bf9facbd 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -190,6 +190,12 @@ test_expect_failure 'fsck no garbage output from comments & empty lines errors'
 	test_line_count = 1 err-with-empty-line
 '
 
+test_expect_success 'fsck with invalid abbreviated skipList input' '
+	echo $commit | test_copy_bytes 20 >SKIP.abbreviated &&
+	test_must_fail git -c fsck.skipList=SKIP.abbreviated fsck 2>err-abbreviated &&
+	test_i18ngrep "^fatal: Invalid SHA-1: " err-abbreviated
+'
+
 test_expect_success 'push with receive.fsck.skipList' '
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 06/10] fsck: add a performance test
  2018-08-28 14:59       ` René Scharfe
                           ` (5 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

Add a plain performance test for "fsck". This test will not be used to
/ referred to in any upcoming commit of mine in this series, but
having a simple test for fsck performance is valuable, so let's add it
while we're at it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p1450-fsck.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)
 create mode 100755 t/perf/p1450-fsck.sh

diff --git a/t/perf/p1450-fsck.sh b/t/perf/p1450-fsck.sh
new file mode 100755
index 0000000000..ae1b84198b
--- /dev/null
+++ b/t/perf/p1450-fsck.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+
+test_description='Test fsck performance'
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_perf 'fsck' '
+	git fsck
+'
+
+test_done
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 07/10] fsck: add a performance test for skipList
  2018-08-28 14:59       ` René Scharfe
                           ` (6 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

Create a performance test to see how the skipList implementation
performs. First we setup N bad commits, then we see how progressively
working our way up to 0..N in increments of 10x does. I.e. the
needle(s) in the haystack get progressively more numerous.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p1451-fsck-skip-list.sh | 40 ++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100755 t/perf/p1451-fsck-skip-list.sh

diff --git a/t/perf/p1451-fsck-skip-list.sh b/t/perf/p1451-fsck-skip-list.sh
new file mode 100755
index 0000000000..c2b97d2487
--- /dev/null
+++ b/t/perf/p1451-fsck-skip-list.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+
+test_description='Test fsck skipList performance'
+
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+n=1000000
+
+test_expect_success "setup $n bad commits" '
+	for i in $(test_seq 1 $n)
+	do
+		echo "commit refs/heads/master" &&
+		echo "committer C <c@example.com> 1234567890 +0000" &&
+		echo "data <<EOF" &&
+		echo "$i.Q." &&
+		echo "EOF"
+	done | q_to_nul | git fast-import
+'
+
+skip=0
+while test $skip -le $n
+do
+	test_expect_success "create skipList for $skip bad commits" '
+		git log --format=%H --max-count=$skip |
+		sort >skiplist
+	'
+
+	test_perf "fsck with $skip skipped bad commits" '
+		git -c fsck.skipList=skiplist fsck
+	'
+
+	case $skip in
+	0) skip=1 ;;
+	*) skip=${skip}0 ;;
+	esac
+done
+
+test_done
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file
  2018-08-28 14:59       ` René Scharfe
                           ` (7 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

The buffer is unlikely to contain a NUL character, so printing its
contents using %s in a die() format is unsafe (detected with ASan).

Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.

This fixes a bug where emitting an error about an invalid line on say
line 1 would continue printing subsequent lines, and usually continue
into uninitialized memory.

The performance impact of this, on a CentOS 7 box with RedHat GCC
4.8.5-28:

    $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh
    Test                                             HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.75(7.39+0.35)   7.68(7.29+0.39) -0.9%
    1450.5: fsck with 1 skipped bad commits          7.70(7.30+0.40)   7.80(7.42+0.37) +1.3%
    1450.7: fsck with 10 skipped bad commits         7.77(7.37+0.40)   7.87(7.47+0.40) +1.3%
    1450.9: fsck with 100 skipped bad commits        7.82(7.41+0.40)   7.88(7.43+0.44) +0.8%
    1450.11: fsck with 1000 skipped bad commits      7.88(7.49+0.39)   7.84(7.43+0.40) -0.5%
    1450.13: fsck with 10000 skipped bad commits     8.02(7.63+0.39)   8.07(7.67+0.39) +0.6%
    1450.15: fsck with 100000 skipped bad commits    8.01(7.60+0.41)   8.08(7.70+0.38) +0.9%
    1450.17: fsck with 1000000 skipped bad commits   7.60(7.10+0.50)   7.37(7.18+0.19) -3.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c                          | 25 ++++++++++++-------------
 t/t5504-fetch-receive-strict.sh |  2 +-
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index a0cee0be59..972a26b9ba 100644
--- a/fsck.c
+++ b/fsck.c
@@ -183,8 +183,9 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
 	static struct oid_array skiplist = OID_ARRAY_INIT;
-	int sorted, fd;
-	char buffer[GIT_MAX_HEXSZ + 1];
+	int sorted;
+	FILE *fp;
+	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 
 	if (options->skiplist)
@@ -194,25 +195,23 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		options->skiplist = &skiplist;
 	}
 
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
+	fp = fopen(path, "r");
+	if (!fp)
 		die("Could not open skip list: %s", path);
-	for (;;) {
+	while (!strbuf_getline(&sb, fp)) {
 		const char *p;
-		int result = read_in_full(fd, buffer, sizeof(buffer));
-		if (result < 0)
-			die_errno("Could not read '%s'", path);
-		if (!result)
-			break;
-		if (parse_oid_hex(buffer, &oid, &p) || *p != '\n')
-			die("Invalid SHA-1: %s", buffer);
+		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
+			die("Invalid SHA-1: %s", sb.buf);
 		oid_array_append(&skiplist, &oid);
 		if (sorted && skiplist.nr > 1 &&
 				oidcmp(&skiplist.oid[skiplist.nr - 2],
 				       &oid) > 0)
 			sorted = 0;
 	}
-	close(fd);
+	if (ferror(fp))
+		die_errno("Could not read '%s'", path);
+	fclose(fp);
+	strbuf_release(&sb);
 
 	if (sorted)
 		skiplist.sorted = 1;
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 96bf9facbd..d67ab37321 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -185,7 +185,7 @@ test_expect_success 'fsck with invalid or bogus skipList input (comments & empty
 	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
 '
 
-test_expect_failure 'fsck no garbage output from comments & empty lines errors' '
+test_expect_success 'fsck no garbage output from comments & empty lines errors' '
 	test_line_count = 1 err-with-comment &&
 	test_line_count = 1 err-with-empty-line
 '
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList
  2018-08-28 14:59       ` René Scharfe
                           ` (8 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  2018-09-03 14:49         ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

From: René Scharfe <l.s.r@web.de>

Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.

This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use. I.e. only once
per invocation, but note that for "clone --recurse-submodules" each
submodule will re-parse the list, in addition to the main project, and
it will be re-parsed when checking .gitmodules blobs, see
fb16287719 ("fsck: check skiplist for object in fsck_blob()",
2018-06-27).

Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to make
its ownership clear (no hidden sharing) and avoid unnecessary pointer
indirection.

The cumulative impact on performance of this & the preceding change,
using the test setup described in the previous commit:

    Test                                             HEAD~2            HEAD~                   HEAD
    ----------------------------------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.70(7.31+0.38)   7.72(7.33+0.38) +0.3%   7.70(7.30+0.40) +0.0%
    1450.5: fsck with 1 skipped bad commits          7.84(7.47+0.37)   7.69(7.32+0.36) -1.9%   7.71(7.29+0.41) -1.7%
    1450.7: fsck with 10 skipped bad commits         7.81(7.40+0.40)   7.94(7.57+0.36) +1.7%   7.92(7.55+0.37) +1.4%
    1450.9: fsck with 100 skipped bad commits        7.81(7.42+0.38)   7.95(7.53+0.41) +1.8%   7.83(7.42+0.41) +0.3%
    1450.11: fsck with 1000 skipped bad commits      7.99(7.62+0.36)   7.90(7.50+0.40) -1.1%   7.86(7.49+0.37) -1.6%
    1450.13: fsck with 10000 skipped bad commits     7.98(7.57+0.40)   7.94(7.53+0.40) -0.5%   7.90(7.45+0.44) -1.0%
    1450.15: fsck with 100000 skipped bad commits    7.97(7.57+0.39)   8.03(7.67+0.36) +0.8%   7.84(7.43+0.41) -1.6%
    1450.17: fsck with 1000000 skipped bad commits   7.72(7.22+0.50)   7.28(7.07+0.20) -5.7%   7.13(6.87+0.25) -7.6%

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt | 11 ++++++-----
 fsck.c                   | 23 ++---------------------
 fsck.h                   |  8 +++++---
 3 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3287c7ef8a..161ffe259e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1731,11 +1731,12 @@ all three of them they must all set to the same values.
 +
 Older versions of Git (before 2.20) documented that the object names
 list should be sorted. This was never a requirement, the object names
-can appear in any order, but when reading the list we track whether
-the list is sorted for the purposes of an internal binary search
-implementation, which can save itself some work with an already sorted
-list. Unless you have a humongous list there's no reason to go out of
-your way to pre-sort the list.
+could appear in any order, but when reading the list we tracked whether
+the list was sorted for the purposes of an internal binary search
+implementation, which could save itself some work with an already sorted
+list. Unless you had a humongous list there was no reason to go out of
+your way to pre-sort the list. After Git version 2.20 a hash implementation
+is used instead, so there's now no reason to pre-sort the list.
 
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
diff --git a/fsck.c b/fsck.c
index 972a26b9ba..4c643f1d40 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,7 +10,6 @@
 #include "fsck.h"
 #include "refs.h"
 #include "utf8.h"
-#include "sha1-array.h"
 #include "decorate.h"
 #include "oidset.h"
 #include "packfile.h"
@@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
-	static struct oid_array skiplist = OID_ARRAY_INIT;
-	int sorted;
 	FILE *fp;
 	struct strbuf sb = STRBUF_INIT;
 	struct object_id oid;
 
-	if (options->skiplist)
-		sorted = options->skiplist->sorted;
-	else {
-		sorted = 1;
-		options->skiplist = &skiplist;
-	}
-
 	fp = fopen(path, "r");
 	if (!fp)
 		die("Could not open skip list: %s", path);
@@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		const char *p;
 		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
 			die("Invalid SHA-1: %s", sb.buf);
-		oid_array_append(&skiplist, &oid);
-		if (sorted && skiplist.nr > 1 &&
-				oidcmp(&skiplist.oid[skiplist.nr - 2],
-				       &oid) > 0)
-			sorted = 0;
+		oidset_insert(&options->skiplist, &oid);
 	}
 	if (ferror(fp))
 		die_errno("Could not read '%s'", path);
 	fclose(fp);
 	strbuf_release(&sb);
-
-	if (sorted)
-		skiplist.sorted = 1;
 }
 
 static int parse_msg_type(const char *str)
@@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
 
 static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
 {
-	if (opts && opts->skiplist && obj)
-		return oid_array_lookup(opts->skiplist, &obj->oid) >= 0;
-	return 0;
+	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
 }
 
 __attribute__((format (printf, 4, 5)))
diff --git a/fsck.h b/fsck.h
index 0c7e8c9428..b95595ae5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -1,6 +1,8 @@
 #ifndef GIT_FSCK_H
 #define GIT_FSCK_H
 
+#include "oidset.h"
+
 #define FSCK_ERROR 1
 #define FSCK_WARN 2
 #define FSCK_IGNORE 3
@@ -35,12 +37,12 @@ struct fsck_options {
 	fsck_error error_func;
 	unsigned strict:1;
 	int *msg_type;
-	struct oid_array *skiplist;
+	struct oidset skiplist;
 	struct decoration *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 10/10] fsck: support comments & empty lines in skipList
  2018-08-28 14:59       ` René Scharfe
                           ` (9 preceding siblings ...)
  2018-09-03 14:49         ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
@ 2018-09-03 14:49         ` Ævar Arnfjörð Bjarmason
  10 siblings, 0 replies; 32+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 14:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Ramsay Jones,
	Johannes Schindelin, Jeff King,
	Ævar Arnfjörð Bjarmason

It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).

Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.

There is no notable performance impact from this change, using the
test setup described a couple of commits back:

    Test                                             HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.69(7.27+0.42)   7.86(7.48+0.37) +2.2%
    1450.5: fsck with 1 skipped bad commits          7.69(7.30+0.38)   7.83(7.47+0.36) +1.8%
    1450.7: fsck with 10 skipped bad commits         7.76(7.38+0.38)   7.79(7.38+0.41) +0.4%
    1450.9: fsck with 100 skipped bad commits        7.76(7.38+0.38)   7.74(7.36+0.38) -0.3%
    1450.11: fsck with 1000 skipped bad commits      7.71(7.30+0.41)   7.72(7.34+0.38) +0.1%
    1450.13: fsck with 10000 skipped bad commits     7.74(7.34+0.40)   7.72(7.34+0.38) -0.3%
    1450.15: fsck with 100000 skipped bad commits    7.75(7.40+0.35)   7.70(7.29+0.40) -0.6%
    1450.17: fsck with 1000000 skipped bad commits   7.12(6.86+0.26)   7.13(6.87+0.26) +0.1%

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt        |  5 +++--
 fsck.c                          | 14 ++++++++++++++
 t/t5504-fetch-receive-strict.sh | 19 ++++++++++++++++---
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 161ffe259e..0906db3a99 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1712,8 +1712,9 @@ will only cause git to warn.
 fsck.skipList::
 	The path to a list of object names (i.e. one unabbreviated SHA-1 per
 	line) that are known to be broken in a non-fatal way and should
-	be ignored. Comments ('#') and empty lines are not supported, and
-	will error out.
+	be ignored. On versions of Git 2.20 and later comments ('#'), empty
+	lines, and any leading and trailing whitespace is ignored. Everything
+	but a SHA-1 per line will error out on older versions.
 +
 This feature is useful when an established project should be accepted
 despite early commits containing errors that can be safely ignored
diff --git a/fsck.c b/fsck.c
index 4c643f1d40..859b050b05 100644
--- a/fsck.c
+++ b/fsck.c
@@ -190,6 +190,20 @@ static void init_skiplist(struct fsck_options *options, const char *path)
 		die("Could not open skip list: %s", path);
 	while (!strbuf_getline(&sb, fp)) {
 		const char *p;
+		const char *hash;
+
+		/*
+		 * Allow trailing comments, leading whitespace
+		 * (including before commits), and empty or whitespace
+		 * only lines.
+		 */
+		hash = strchr(sb.buf, '#');
+		if (hash)
+			strbuf_setlen(&sb, hash - sb.buf);
+		strbuf_trim(&sb);
+		if (!sb.len)
+			continue;
+
 		if (parse_oid_hex(sb.buf, &oid, &p) || *p != '\0')
 			die("Invalid SHA-1: %s", sb.buf);
 		oidset_insert(&options->skiplist, &oid);
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index d67ab37321..7bc706873c 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,20 +169,20 @@ test_expect_success 'fsck with invalid or bogus skipList input' '
 	test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
-test_expect_success 'fsck with invalid or bogus skipList input (comments & empty lines)' '
+test_expect_success 'fsck with other accepted skipList input (comments & empty lines)' '
 	cat >SKIP.with-comment <<-EOF &&
 	# Some bad commit
 	0000000000000000000000000000000000000001
 	EOF
 	test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 2>err-with-comment &&
-	test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" err-with-comment &&
+	test_i18ngrep "missingEmail" err-with-comment &&
 	cat >SKIP.with-empty-line <<-EOF &&
 	0000000000000000000000000000000000000001
 
 	0000000000000000000000000000000000000002
 	EOF
 	test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 2>err-with-empty-line &&
-	test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+	test_i18ngrep "missingEmail" err-with-empty-line
 '
 
 test_expect_success 'fsck no garbage output from comments & empty lines errors' '
@@ -196,6 +196,19 @@ test_expect_success 'fsck with invalid abbreviated skipList input' '
 	test_i18ngrep "^fatal: Invalid SHA-1: " err-abbreviated
 '
 
+test_expect_success 'fsck with exhaustive accepted skipList input (various types of comments etc.)' '
+	>SKIP.exhaustive &&
+	echo "# A commented line" >>SKIP.exhaustive &&
+	echo "" >>SKIP.exhaustive &&
+	echo " " >>SKIP.exhaustive &&
+	echo " # Comment after whitespace" >>SKIP.exhaustive &&
+	echo "$commit # Our bad commit (with leading whitespace and trailing comment)" >>SKIP.exhaustive &&
+	echo "# Some bad commit (leading whitespace)" >>SKIP.exhaustive &&
+	echo "  0000000000000000000000000000000000000001" >>SKIP.exhaustive &&
+	git -c fsck.skipList=SKIP.exhaustive fsck 2>err &&
+	test_must_be_empty err
+'
+
 test_expect_success 'push with receive.fsck.skipList' '
 	git push . $commit:refs/heads/bogus &&
 	rm -rf dst &&
-- 
2.19.0.rc1.350.ge57e33dbd1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-09-03 14:50 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27 19:43 [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 1/7] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 2/7] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 3/7] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 4/7] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-27 19:43 ` [PATCH v3 6/7] fsck: use oidset for skiplist Ævar Arnfjörð Bjarmason
2018-08-27 20:15   ` Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 0/8] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 1/8] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 2/8] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 3/8] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 4/8] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 5/8] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 6/8] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 7/8] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-08-28  9:52     ` [PATCH v4 8/8] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:59       ` René Scharfe
2018-09-03 14:49         ` [PATCH v5 00/10] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 01/10] fsck tests: setup of bogus commit object Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 02/10] fsck tests: add a test for no skipList input Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 03/10] fsck: document and test sorted " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 04/10] fsck: document and test commented & empty line " Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 05/10] fsck: document that skipList input must be unabbreviated Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 06/10] fsck: add a performance test Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 07/10] fsck: add a performance test for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 08/10] fsck: use strbuf_getline() to read skiplist file Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 09/10] fsck: use oidset instead of oid_array for skipList Ævar Arnfjörð Bjarmason
2018-09-03 14:49         ` [PATCH v5 10/10] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-28 14:29     ` [PATCH v3 6/7] fsck: use oidset for skiplist René Scharfe
2018-08-27 19:43 ` [PATCH v3 7/7] fsck: support comments & empty lines in skipList Ævar Arnfjörð Bjarmason
2018-08-27 19:46 ` [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support Ævar Arnfjörð Bjarmason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).