All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes
@ 2021-03-28  2:25 Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
                   ` (4 more replies)
  0 siblings, 5 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

Fix up some of the technical docs around error handling and add a
bug() function, as explained in 4/4 this is for use in some
fsck-related code.

This series does not make use of it, but I'll follow-up with one that
does. I wanted to peel of this small cleanup series from that.

I noticed some semi-related bugs in these error tracing functions to
do with the trace2 integration, noted in
https://lore.kernel.org/git/87mtuoo4ym.fsf@evledraar.gmail.com/ this
doesn't attempt to fix those.

Ævar Arnfjörð Bjarmason (4):
  usage.c: don't copy/paste the same comment three times
  api docs: document BUG() in api-error-handling.txt
  api docs: document that BUG() emits a trace2 error event
  usage.c: add a non-fatal bug() function to go with BUG()

 .../technical/api-error-handling.txt          | 16 ++++++-
 Documentation/technical/api-trace2.txt        |  4 +-
 git-compat-util.h                             |  3 ++
 run-command.c                                 | 11 +++++
 t/helper/test-trace2.c                        | 14 +++++-
 t/t0210-trace2-normal.sh                      | 19 ++++++++
 usage.c                                       | 46 ++++++++++++++-----
 7 files changed, 95 insertions(+), 18 deletions(-)

-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH 1/4] usage.c: don't copy/paste the same comment three times
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:32   ` Eric Sunshine
  2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

In gee4512ed481 (trace2: create new combined trace facility,
2019-02-22) we started with two copies of this comment,
0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
a third. Let's instead add an earlier comment that applies to all
these mostly-the-same functions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 usage.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/usage.c b/usage.c
index 1b206de36d6..c7d233b0de9 100644
--- a/usage.c
+++ b/usage.c
@@ -55,12 +55,13 @@ static NORETURN void usage_builtin(const char *err, va_list params)
 	exit(129);
 }
 
+/*
+ * We call trace2_cmd_error_va() in the below functions first and
+ * expect it to va_copy 'params' before using it (because an 'ap' can
+ * only be walked once).
+ */
 static NORETURN void die_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("fatal: ", err, params);
@@ -70,10 +71,6 @@ static NORETURN void die_builtin(const char *err, va_list params)
 
 static void error_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("error: ", err, params);
@@ -81,10 +78,6 @@ static void error_builtin(const char *err, va_list params)
 
 static void warn_builtin(const char *warn, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(warn, params);
 
 	vreportf("warning: ", warn, params);
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 2/4] api docs: document BUG() in api-error-handling.txt
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-29  5:37   ` Bagas Sanjaya
  2021-03-28  2:26 ` [PATCH 3/4] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

When the BUG() function was added in d8193743e08 (usage.c: add BUG()
function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
error handling functions and conventions, 2014-12-03) were not
updated. Let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index ceeedd485c9..71486abb2f0 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,8 +1,11 @@
 Error reporting in git
 ======================
 
-`die`, `usage`, `error`, and `warning` report errors of various
-kinds.
+`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+various kinds.
+
+- `BUG` is for failed internal assertions that should never happen,
+  i.e. a bug in git itself.
 
 - `die` is for fatal application errors.  It prints a message to
   the user and exits with status 128.
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 3/4] api docs: document that BUG() emits a trace2 error event
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  4 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

Correct documentation added in e544221d97a (trace2:
Documentation/technical/api-trace2.txt, 2019-02-22) to state that
calling BUG() also emits an "error" event. See ee4512ed481 (trace2:
create new combined trace facility, 2019-02-22) for the initial
implementation.

The BUG() function did not emit an event then however, that was only
changed later in 0a9dde4a04c (usage: trace2 BUG() invocations,
2021-02-05), that commit changed the code, but didn't update any of
the docs.

Let's also add a cross-reference from api-error-handling.txt.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 3 +++
 Documentation/technical/api-trace2.txt         | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 71486abb2f0..8be4f4d0d6a 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -23,6 +23,9 @@ various kinds.
   without running into too many problems.  Like `error`, it
   returns -1 after reporting the situation to the caller.
 
+These reports will be logged via the trace2 facility. See the "error"
+event in link:api-trace2.txt[trace2 API].
+
 Customizable error handlers
 ---------------------------
 
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index c65ffafc485..3f52f981a2d 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,7 +465,7 @@ completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `error()`, `die()`,
+	This event is emitted when one of the `BUG()`, `error()`, `die()`,
 	`warning()`, or `usage()` functions are called.
 +
 ------------
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2021-03-28  2:26 ` [PATCH 3/4] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
  2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  4 siblings, 2 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

Add a bug() function that works like error() except the message is
prefixed with "bug:" instead of "error:".

The reason this is needed is for e.g. the fsck code. If we encounter
what we'd consider a BUG() in the middle of fsck traversal we'd still
like to try as hard as possible to go past that object and complete
the fsck, instead of hard dying. A follow-up commit will introduce
such a use in object-file.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 .../technical/api-error-handling.txt          |  8 ++++-
 Documentation/technical/api-trace2.txt        |  4 +--
 git-compat-util.h                             |  3 ++
 run-command.c                                 | 11 +++++++
 t/helper/test-trace2.c                        | 14 +++++++--
 t/t0210-trace2-normal.sh                      | 19 ++++++++++++
 usage.c                                       | 29 +++++++++++++++++++
 7 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 8be4f4d0d6a..9d6ac6f6649 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,7 +1,7 @@
 Error reporting in git
 ======================
 
-`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+`BUG`, `bug`, `die`, `usage`, `error`, and `warning` report errors of
 various kinds.
 
 - `BUG` is for failed internal assertions that should never happen,
@@ -18,6 +18,12 @@ various kinds.
   to the user and returns -1 for convenience in signaling the error
   to the caller.
 
+- `bug` (lower-case, not `BUG`) is supposed to be used like `BUG` but
+  has the same non-fatal semantics as `error`. It's meant to signal an
+  internal bug in a library whose caller might still want to attempt
+  some amount of graceful recovery, or to append other error output of
+  their own.
+
 - `warning` is for reporting situations that probably should not
   occur but which the user (and Git) can continue to work around
   without running into too many problems.  Like `error`, it
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 3f52f981a2d..cafe373f405 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,8 +465,8 @@ completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `BUG()`, `error()`, `die()`,
-	`warning()`, or `usage()` functions are called.
+	This event is emitted when one of the `BUG()`, `bug()`, `error()`,
+	`die()`, `warning()`, or `usage()` functions are called.
 +
 ------------
 {
diff --git a/git-compat-util.h b/git-compat-util.h
index 9ddf9d7044b..13c1dcf9dcc 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -463,6 +463,7 @@ NORETURN void usage(const char *err);
 NORETURN void usagef(const char *err, ...) __attribute__((format (printf, 1, 2)));
 NORETURN void die(const char *err, ...) __attribute__((format (printf, 1, 2)));
 NORETURN void die_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
+int bug(const char *err, ...) __attribute__((format (printf, 1, 2)));
 int error(const char *err, ...) __attribute__((format (printf, 1, 2)));
 int error_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
 void warning(const char *err, ...) __attribute__((format (printf, 1, 2)));
@@ -497,6 +498,8 @@ static inline int const_error(void)
 typedef void (*report_fn)(const char *, va_list params);
 
 void set_die_routine(NORETURN_PTR report_fn routine);
+void set_bug_routine(report_fn routine);
+report_fn get_bug_routine(void);
 void set_error_routine(report_fn routine);
 report_fn get_error_routine(void);
 void set_warn_routine(report_fn routine);
diff --git a/run-command.c b/run-command.c
index be6bc128cd9..8b818b063ff 100644
--- a/run-command.c
+++ b/run-command.c
@@ -348,6 +348,12 @@ static void fake_fatal(const char *err, va_list params)
 	vreportf("fatal: ", err, params);
 }
 
+static void child_bug_fn(const char *err, va_list params)
+{
+	const char msg[] = "bug() should not be called in child\n";
+	xwrite(2, msg, sizeof(msg) - 1);
+}
+
 static void child_error_fn(const char *err, va_list params)
 {
 	const char msg[] = "error() should not be called in child\n";
@@ -371,9 +377,12 @@ static void NORETURN child_die_fn(const char *err, va_list params)
 static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 {
 	static void (*old_errfn)(const char *err, va_list params);
+	static void (*old_bugfn)(const char *err, va_list params);
 
 	old_errfn = get_error_routine();
 	set_error_routine(fake_fatal);
+	old_bugfn = get_bug_routine();
+	set_bug_routine(fake_fatal);
 	errno = cerr->syserr;
 
 	switch (cerr->err) {
@@ -399,6 +408,7 @@ static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 		error_errno("cannot exec '%s'", cmd->argv[0]);
 		break;
 	}
+	set_bug_routine(old_bugfn);
 	set_error_routine(old_errfn);
 }
 
@@ -789,6 +799,7 @@ int start_command(struct child_process *cmd)
 		 * called, they can take stdio locks and malloc.
 		 */
 		set_die_routine(child_die_fn);
+		set_error_routine(child_bug_fn);
 		set_error_routine(child_error_fn);
 		set_warn_routine(child_warn_fn);
 
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f93633f895a..6248427e4bf 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -198,7 +198,7 @@ static int ut_006data(int argc, const char **argv)
 	return 0;
 }
 
-static int ut_007bug(int argc, const char **argv)
+static int ut_007BUG(int argc, const char **argv)
 {
 	/*
 	 * Exercise BUG() to ensure that the message is printed to trace2.
@@ -206,6 +206,15 @@ static int ut_007bug(int argc, const char **argv)
 	BUG("the bug message");
 }
 
+static int ut_008bug(int argc, const char **argv)
+{
+	/*
+	 * Exercise BUG() to ensure that the message is printed to trace2.
+	 */
+	bug("the bug message");
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -222,7 +231,8 @@ static struct unit_test ut_table[] = {
 	{ ut_004child,    "004child",  "[<child_command_line>]" },
 	{ ut_005exec,     "005exec",   "<git_command_args>" },
 	{ ut_006data,     "006data",   "[<category> <key> <value>]+" },
-	{ ut_007bug,      "007bug",    "" },
+	{ ut_007BUG,      "007bug",    "" },
+	{ ut_008bug,      "008bug",    "" },
 };
 /* clang-format on */
 
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 0cf3a63b75b..9c866af971f 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -166,6 +166,25 @@ test_expect_success 'BUG messages are written to trace2' '
 	test_cmp expect actual
 '
 
+# Verb 008bug
+#
+# Check that BUG writes to trace2
+
+test_expect_success 'bug messages are written to trace2' '
+	test_when_finished "rm trace.normal actual expect" &&
+	GIT_TRACE2="$(pwd)/trace.normal" test-tool trace2 008bug &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <trace.normal >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 008bug
+		cmd_name trace2 (trace2)
+		error the bug message
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
 sane_unset GIT_TRACE2_BRIEF
 
 # Now test without environment variables and get all Trace2 settings
diff --git a/usage.c b/usage.c
index c7d233b0de9..34bd3abf048 100644
--- a/usage.c
+++ b/usage.c
@@ -69,6 +69,13 @@ static NORETURN void die_builtin(const char *err, va_list params)
 	exit(128);
 }
 
+static void bug_builtin(const char *err, va_list params)
+{
+	trace2_cmd_error_va(err, params);
+
+	vreportf("bug: ", err, params);
+}
+
 static void error_builtin(const char *err, va_list params)
 {
 	trace2_cmd_error_va(err, params);
@@ -109,6 +116,7 @@ static int die_is_recursing_builtin(void)
  * (ugh), so keep things static. */
 static NORETURN_PTR report_fn usage_routine = usage_builtin;
 static NORETURN_PTR report_fn die_routine = die_builtin;
+static report_fn bug_routine = bug_builtin;
 static report_fn error_routine = error_builtin;
 static report_fn warn_routine = warn_builtin;
 static int (*die_is_recursing)(void) = die_is_recursing_builtin;
@@ -118,11 +126,22 @@ void set_die_routine(NORETURN_PTR report_fn routine)
 	die_routine = routine;
 }
 
+
+void set_bug_routine(report_fn routine)
+{
+	bug_routine = routine;
+}
+
 void set_error_routine(report_fn routine)
 {
 	error_routine = routine;
 }
 
+report_fn get_bug_routine(void)
+{
+	return bug_routine;
+}
+
 report_fn get_error_routine(void)
 {
 	return error_routine;
@@ -223,6 +242,16 @@ int error_errno(const char *fmt, ...)
 	return -1;
 }
 
+int bug(const char *err, ...)
+{
+	va_list params;
+
+	va_start(params, err);
+	bug_routine(err, params);
+	va_end(params);
+	return -1;
+}
+
 #undef error
 int error(const char *err, ...)
 {
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH 1/4] usage.c: don't copy/paste the same comment three times
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:32   ` Eric Sunshine
  0 siblings, 0 replies; 245+ messages in thread
From: Eric Sunshine @ 2021-03-28  2:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King

On Sat, Mar 27, 2021 at 10:28 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> In gee4512ed481 (trace2: create new combined trace facility,

s/gee/ee/

> 2019-02-22) we started with two copies of this comment,
> 0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
> a third. Let's instead add an earlier comment that applies to all
> these mostly-the-same functions.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH 0/5] fsck: improve error reporting
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58   ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                       ` (6 more replies)
  2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
  1 sibling, 7 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

This improves fsck error reporting in a rather obscure edge case, and
fixes up some object APIs along the way as needed.

This is based on my series to add a bug() function:
https://lore.kernel.org/git/cover-0.5-00000000000-20210328T022343Z-avarab@gmail.com/

The use of the new bug() function is in 5/5.

Ævar Arnfjörð Bjarmason (5):
  cache.h: move object functions to object-store.h
  fsck tests: refactor one test to use a sub-repo
  fsck: don't hard die on invalid object types
  fsck: improve the error on invalid object types
  fsck: improve error on loose object hash mismatch

 builtin/cat-file.c    |  7 +++--
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 28 ++++++++++++++---
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 +-
 cache.h               | 10 ------
 object-file.c         | 73 +++++++++++++++++++++++--------------------
 object-store.h        | 19 +++++++++--
 object.c              |  4 +--
 pack-check.c          |  3 +-
 streaming.c           |  5 ++-
 t/t1450-fsck.sh       | 64 +++++++++++++++++++++++++++----------
 12 files changed, 143 insertions(+), 77 deletions(-)

-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH 1/5] cache.h: move object functions to object-store.h
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 2/5] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 6fda8091f11..0e387f84f67 100644
--- a/cache.h
+++ b/cache.h
@@ -1279,16 +1279,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 2/5] fsck tests: refactor one test to use a sub-repo
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 3/5] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..1563b35f88c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 3/5] fsck: don't hard die on invalid object types
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 2/5] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Change builtin/fsck.c to pass down a
OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
error:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>
    $ git fsck
    fatal: invalid object type
    $

Into:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

But why are we complaining about a "hash mismatch" for an object of a
type we don't know about? We shouldn't. This is the bare minimal
change needed to not make fsck hard die on a repository that's been
corrupted in this manner. In subsequent commits we'll teach fsck to
recognize this particular type of corruption and emit a better error
message.

The parse_loose_header() function being changed here is only used in
builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for its introduction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 24 ++++++++++--------------
 object-store.h  |  6 ++++--
 streaming.c     |  5 ++++-
 t/t1450-fsck.sh |  9 +++++++++
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..d92c530863d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -602,7 +602,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 624af408cdc..26560a6281c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1294,8 +1294,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1355,14 +1356,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1417,10 +1410,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2497,13 +2490,16 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2518,7 +2514,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, oi_flags);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..ab86c8bf32c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
@@ -480,7 +481,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 800f07a52cc..e5d4dd2f654 100644
--- a/streaming.c
+++ b/streaming.c
@@ -341,6 +341,9 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
+	struct object_info oi2 = OBJECT_INFO_INIT;
+	oi2.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -349,7 +352,7 @@ static open_method_decl(loose)
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88c..025dd1b491a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,13 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck error and recovery on invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
+'
+
 test_done
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 4/5] fsck: improve the error on invalid object types
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (2 preceding siblings ...)
  2021-03-28  2:58     ` [PATCH 3/5] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  8:56       ` Johannes Sixt
  2021-03-28  2:58     ` [PATCH 5/5] fsck: improve error on loose object hash mismatch Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c |  7 +++++--
 builtin/fsck.c     | 22 ++++++++++++++++++----
 object-file.c      | 31 +++++++++++++++----------------
 object-store.h     |  4 ++--
 t/t1450-fsck.sh    | 23 ++++++++++++++++++++++-
 5 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..1063576a982 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -74,6 +74,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
 	const char *path = force_path;
+	int ret;
 
 	if (unknown_type)
 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
@@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	switch (opt) {
 	case 't':
 		oi.type_name = &sb;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		if (sb.len) {
 			printf("%s\n", sb.buf);
@@ -103,7 +105,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 	case 's':
 		oi.sizep = &size;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		printf("%"PRIuMAX"\n", (uintmax_t)size);
 		return 0;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index d92c530863d..c8ab14d1545 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,12 +601,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 26560a6281c..e744a06637b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1323,9 +1323,7 @@ int parse_loose_header(const char *hdr,
 	 * we're obtaining the type using '--allow-unknown-type'
 	 * option.
 	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
 		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
@@ -1407,14 +1405,17 @@ static int loose_object_info(struct repository *r,
 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
+	if (status < 0) {
 		; /* Do nothing */
-	else if (hdrbuf.len) {
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	} else {
+		status = parse_loose_header(hdr, oi, flags);
+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
+			error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -2488,9 +2489,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2498,8 +2498,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2514,9 +2514,9 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, oi_flags);
-	if (*type < 0) {
-		error(_("unable to parse header of %s"), path);
+	*type = parse_loose_header(hdr, oi, oi_flags);
+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
+		error(_("unable to parse header %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
@@ -2532,8 +2532,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index ab86c8bf32c..786c5c34704 100644
--- a/object-store.h
+++ b/object-store.h
@@ -241,11 +241,11 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  *
  * Returns 0 on success, negative on error (details may be written to stderr).
  */
+struct object_info;
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 025dd1b491a..214278e134a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -868,7 +887,9 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+	grep error: err >err.errors &&
+	test_line_count = 1 err.errors &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 5/5] fsck: improve error on loose object hash mismatch
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (3 preceding siblings ...)
  2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 28 +++++++++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1450-fsck.sh       |  8 +++++---
 9 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85a76e0ef8b..bf0e266d83a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index c8ab14d1545..365b9124bdc 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -604,20 +604,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = null_oid;
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 21899687e2c..93044e9e618 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1420,7 +1420,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..cfecbcd664e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -65,7 +65,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e744a06637b..7aa80701aa7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -993,9 +993,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1003,8 +1005,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1029,9 +1031,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_fn(real_oid.hash, &c);
+	r->hash_algo->final_fn(real_oid->hash, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2489,6 +2491,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2532,9 +2535,16 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, &null_oid))
+				/*
+				 * Not a plain BUG() because if it
+				 * does happen we're in the middle of
+				 * an fsck we'd like to see to the
+				 * end.
+				 */
+				bug("BUG trying to compute hash for object at %s (expected %s)",
+				    path, oid_to_hex(expected_oid));
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 786c5c34704..340b0f51f08 100644
--- a/object-store.h
+++ b/object-store.h
@@ -244,6 +244,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
 struct object_info;
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -484,7 +485,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 78343781ae7..1cb4b30acd7 100644
--- a/object.c
+++ b/object.c
@@ -262,7 +262,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -273,7 +273,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 214278e134a..c7b084364b7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
@ 2021-03-28  6:12   ` Junio C Hamano
  2021-03-28  7:17     ` Jeff King
  1 sibling, 1 reply; 245+ messages in thread
From: Junio C Hamano @ 2021-03-28  6:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff Hostetler, Jonathan Tan, Jeff King

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a bug() function that works like error() except the message is
> prefixed with "bug:" instead of "error:".
>
> The reason this is needed is for e.g. the fsck code. If we encounter
> what we'd consider a BUG() in the middle of fsck traversal we'd still
> like to try as hard as possible to go past that object and complete
> the fsck, instead of hard dying. A follow-up commit will introduce
> such a use in object-file.c.

Reading the description above, i.e. "to go past that object", the
assumed use case seems to be to deal with a data error, not a
program bug (which is where we use BUG()---e.g. one helper function
in the fsck code detected that the caller wasn't careful enough to
vet the data it has and called it with incoherent data).  If we find
a tree entry whose mode bits implies that the object recorded in the
entry ought to be a blob, and later find out that the object turns
out to be a tree, that is a corrupt repository and the code that
detected is not buggy (and we shouldn't use BUG(), of course).

So, ... I am skeptical.  If the code is prepared to handle breakage,
we would not want to die, but then I am not sure why it has to be
different from error().

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
@ 2021-03-28  7:17     ` Jeff King
  2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Jeff King @ 2021-03-28  7:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff Hostetler,
	Jonathan Tan

On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Add a bug() function that works like error() except the message is
> > prefixed with "bug:" instead of "error:".
> >
> > The reason this is needed is for e.g. the fsck code. If we encounter
> > what we'd consider a BUG() in the middle of fsck traversal we'd still
> > like to try as hard as possible to go past that object and complete
> > the fsck, instead of hard dying. A follow-up commit will introduce
> > such a use in object-file.c.
> 
> Reading the description above, i.e. "to go past that object", the
> assumed use case seems to be to deal with a data error, not a
> program bug (which is where we use BUG()---e.g. one helper function
> in the fsck code detected that the caller wasn't careful enough to
> vet the data it has and called it with incoherent data).  If we find
> a tree entry whose mode bits implies that the object recorded in the
> entry ought to be a blob, and later find out that the object turns
> out to be a tree, that is a corrupt repository and the code that
> detected is not buggy (and we shouldn't use BUG(), of course).
> 
> So, ... I am skeptical.  If the code is prepared to handle breakage,
> we would not want to die, but then I am not sure why it has to be
> different from error().

Yeah, this seems like it is missing the point of BUG() completely.  I
took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
like it should really be an error() return or similar. The root cause
would be open_istream() on a loose object failing (which might be
corruption, or might even be a transient OS error!).

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 4/5] fsck: improve the error on invalid object types
  2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
@ 2021-03-28  8:56       ` Johannes Sixt
  0 siblings, 0 replies; 245+ messages in thread
From: Johannes Sixt @ 2021-03-28  8:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Jeff King

A note on the subject line: we are all pretty sure that you want to
improve the code, not disimprove it. Therefore, the word "improve"
carries not meaning and only takes away space that could be used better.
Perhaps

   fsck: report invalid types recorded in objects

or something. Ditto for 5/5.

-- Hannes

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 2/4] api docs: document BUG() in api-error-handling.txt
  2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-03-29  5:37   ` Bagas Sanjaya
  0 siblings, 0 replies; 245+ messages in thread
From: Bagas Sanjaya @ 2021-03-29  5:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King, git

On 28/03/21 09.26, Ævar Arnfjörð Bjarmason wrote:
> -`die`, `usage`, `error`, and `warning` report errors of various
> -kinds.
> +`BUG`, `die`, `usage`, `error`, and `warning` report errors of
> +various kinds.
> +
> +- `BUG` is for failed internal assertions that should never happen,
> +  i.e. a bug in git itself.
>   
>   - `die` is for fatal application errors.  It prints a message to
>     the user and exits with status 128.
> 
Documentation looks OK.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  7:17     ` Jeff King
@ 2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
  2021-03-31 11:06         ` Jeff King
  0 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-29 13:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Jeff Hostetler, Jonathan Tan


On Sun, Mar 28 2021, Jeff King wrote:

> On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>> 
>> > Add a bug() function that works like error() except the message is
>> > prefixed with "bug:" instead of "error:".
>> >
>> > The reason this is needed is for e.g. the fsck code. If we encounter
>> > what we'd consider a BUG() in the middle of fsck traversal we'd still
>> > like to try as hard as possible to go past that object and complete
>> > the fsck, instead of hard dying. A follow-up commit will introduce
>> > such a use in object-file.c.
>> 
>> Reading the description above, i.e. "to go past that object", the
>> assumed use case seems to be to deal with a data error, not a
>> program bug (which is where we use BUG()---e.g. one helper function
>> in the fsck code detected that the caller wasn't careful enough to
>> vet the data it has and called it with incoherent data).  If we find
>> a tree entry whose mode bits implies that the object recorded in the
>> entry ought to be a blob, and later find out that the object turns
>> out to be a tree, that is a corrupt repository and the code that
>> detected is not buggy (and we shouldn't use BUG(), of course).
>> 
>> So, ... I am skeptical.  If the code is prepared to handle breakage,
>> we would not want to die, but then I am not sure why it has to be
>> different from error().
>
> Yeah, this seems like it is missing the point of BUG() completely.  I
> took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> like it should really be an error() return or similar. The root cause
> would be open_istream() on a loose object failing (which might be
> corruption, or might even be a transient OS error!).

I don't feel strongly about this bug() thing, I'll drop it if you two
don't like it.

But that's not why I added it, yes you can now carefully read the code
and reason that this code is unreachable now, as I think it is.

But it may not stay that way, refactoring how we handle I/O errors
etc. further down the stack is the sort of thing that if this bug()
wasn't there would cause us to otherwise silently lose the
error. I.e. does check_object_signature() always promise to return
non-zero *only* if the signature isn't OK?

So maybe we are happy to just make that promise, in which case yes, this
should/could be an error() in this case.

But isn't this also useful for multi-threaded code? E.g. let's say fsck
learns to map-reduce its fsck-ing of objects across threads. One of them
encounters a BUG(). Do we want to hard kill the whole thing or try to
limp ahead and report partial results from the other thread(s)?

We have than now with pack-objects/grep, but I'm struggling to find a
use-case for a partial grep result if e.g. PCRE fails with a BUG(...)
...


^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
@ 2021-03-31 11:06         ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-03-31 11:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Jeff Hostetler, Jonathan Tan

On Mon, Mar 29, 2021 at 03:25:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, this seems like it is missing the point of BUG() completely.  I
> > took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> > like it should really be an error() return or similar. The root cause
> > would be open_istream() on a loose object failing (which might be
> > corruption, or might even be a transient OS error!).
> 
> I don't feel strongly about this bug() thing, I'll drop it if you two
> don't like it.
> 
> But that's not why I added it, yes you can now carefully read the code
> and reason that this code is unreachable now, as I think it is.
> 
> But it may not stay that way, refactoring how we handle I/O errors
> etc. further down the stack is the sort of thing that if this bug()
> wasn't there would cause us to otherwise silently lose the
> error. I.e. does check_object_signature() always promise to return
> non-zero *only* if the signature isn't OK?
> 
> So maybe we are happy to just make that promise, in which case yes, this
> should/could be an error() in this case.

I didn't dig into what check_object_signature() promises, but I don't
think it matters for my argument. If the case you are looking at can be
triggered by bad on-disk data, transient OS errors, etc, then it should
be an error() or a die(), or whatever is appropriate for the code. If it
is meant to be an invariant of the code that it should never trigger,
then it should be a BUG(), so that we loudly inform people that the
code's assumption has been violated.

But I do not see any point in a bug() that does not abort(). The point
of BUG() is that nobody is supposed to see it, and we should be as loud
as possible if we do.

And if there is a call site that is in doubt about what it may be fed,
then it should just be an error() or die().

> But isn't this also useful for multi-threaded code? E.g. let's say fsck
> learns to map-reduce its fsck-ing of objects across threads. One of them
> encounters a BUG(). Do we want to hard kill the whole thing or try to
> limp ahead and report partial results from the other thread(s)?

Yes, we want to hard kill. The point of BUG() is that it is not supposed
to happen, and there is no point in limping further.

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08 ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
                     ` (3 more replies)
  4 siblings, 4 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

A trivial update to the trace2 docs to fix an omission
with "BUG()" not being listed alongside error(), die() etc.

v1 of this[1] added a non-fatal-but-logging bug() function, per the
discussion on v1 that's now gone.

1. http://lore.kernel.org/git/cover-0.6-00000000000-20210328T025618Z-avarab@gmail.com

Ævar Arnfjörð Bjarmason (3):
  usage.c: don't copy/paste the same comment three times
  api docs: document BUG() in api-error-handling.txt
  api docs: document that BUG() emits a trace2 error event

 Documentation/technical/api-error-handling.txt | 10 ++++++++--
 Documentation/technical/api-trace2.txt         |  2 +-
 usage.c                                        | 17 +++++------------
 3 files changed, 14 insertions(+), 15 deletions(-)

Range-diff against v1:
1:  a7b329c21cf ! 1:  2e4665b625b usage.c: don't copy/paste the same comment three times
    @@ Metadata
      ## Commit message ##
         usage.c: don't copy/paste the same comment three times
     
    -    In gee4512ed481 (trace2: create new combined trace facility,
    +    In ee4512ed481 (trace2: create new combined trace facility,
         2019-02-22) we started with two copies of this comment,
         0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
         a third. Let's instead add an earlier comment that applies to all
2:  8c8b1dfd184 = 2:  ce78c79c9ac api docs: document BUG() in api-error-handling.txt
3:  f0e0d0daa6e = 3:  982f72345f1 api docs: document that BUG() emits a trace2 error event
4:  515d146cac8 < -:  ----------- usage.c: add a non-fatal bug() function to go with BUG()
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08   ` Ævar Arnfjörð Bjarmason
  2021-04-15 10:09     ` Jeff King
  2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

In ee4512ed481 (trace2: create new combined trace facility,
2019-02-22) we started with two copies of this comment,
0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
a third. Let's instead add an earlier comment that applies to all
these mostly-the-same functions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 usage.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/usage.c b/usage.c
index 1b206de36d6..c7d233b0de9 100644
--- a/usage.c
+++ b/usage.c
@@ -55,12 +55,13 @@ static NORETURN void usage_builtin(const char *err, va_list params)
 	exit(129);
 }
 
+/*
+ * We call trace2_cmd_error_va() in the below functions first and
+ * expect it to va_copy 'params' before using it (because an 'ap' can
+ * only be walked once).
+ */
 static NORETURN void die_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("fatal: ", err, params);
@@ -70,10 +71,6 @@ static NORETURN void die_builtin(const char *err, va_list params)
 
 static void error_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("error: ", err, params);
@@ -81,10 +78,6 @@ static void error_builtin(const char *err, va_list params)
 
 static void warn_builtin(const char *warn, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(warn, params);
 
 	vreportf("warning: ", warn, params);
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08   ` Ævar Arnfjörð Bjarmason
  2021-04-15 10:00     ` Jeff King
  2021-04-13  9:08   ` [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
  2021-04-15 10:10   ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Jeff King
  3 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

When the BUG() function was added in d8193743e08 (usage.c: add BUG()
function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
error handling functions and conventions, 2014-12-03) were not
updated. Let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index ceeedd485c9..71486abb2f0 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,8 +1,11 @@
 Error reporting in git
 ======================
 
-`die`, `usage`, `error`, and `warning` report errors of various
-kinds.
+`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+various kinds.
+
+- `BUG` is for failed internal assertions that should never happen,
+  i.e. a bug in git itself.
 
 - `die` is for fatal application errors.  It prints a message to
   the user and exits with status 128.
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08   ` Ævar Arnfjörð Bjarmason
  2021-04-15 10:10   ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Jeff King
  3 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Correct documentation added in e544221d97a (trace2:
Documentation/technical/api-trace2.txt, 2019-02-22) to state that
calling BUG() also emits an "error" event. See ee4512ed481 (trace2:
create new combined trace facility, 2019-02-22) for the initial
implementation.

The BUG() function did not emit an event then however, that was only
changed later in 0a9dde4a04c (usage: trace2 BUG() invocations,
2021-02-05), that commit changed the code, but didn't update any of
the docs.

Let's also add a cross-reference from api-error-handling.txt.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 3 +++
 Documentation/technical/api-trace2.txt         | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 71486abb2f0..8be4f4d0d6a 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -23,6 +23,9 @@ various kinds.
   without running into too many problems.  Like `error`, it
   returns -1 after reporting the situation to the caller.
 
+These reports will be logged via the trace2 facility. See the "error"
+event in link:api-trace2.txt[trace2 API].
+
 Customizable error handlers
 ---------------------------
 
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index c65ffafc485..3f52f981a2d 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,7 +465,7 @@ completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `error()`, `die()`,
+	This event is emitted when one of the `BUG()`, `error()`, `die()`,
 	`warning()`, or `usage()` functions are called.
 +
 ------------
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 0/6] fsck: better "invalid object" error reporting
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (4 preceding siblings ...)
  2021-03-28  2:58     ` [PATCH 5/5] fsck: improve error on loose object hash mismatch Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43     ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                         ` (5 more replies)
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  6 siblings, 6 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

A re-roll of improved error reporting for fsck-ing bad loose
objects. See [1] for v1.

This is no longer based on the series to add a bug() function, since
as noted in the re-roll of that[2] that function is gone. This version
uses a plain BUG() for that condition.

Other than that the only change is improved commit messages, and I
added a trivial patch to move read_loose_object() around in
object-store.h so I wouldn't need a forward declaration, and updated
the comment for that function.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210328T025618Z-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-0.3-00000000000-20210413T090603Z-avarab@gmail.com


Ævar Arnfjörð Bjarmason (6):
  cache.h: move object functions to object-store.h
  fsck tests: refactor one test to use a sub-repo
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/cat-file.c    |  7 +++--
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 28 +++++++++++++++---
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 +-
 cache.h               | 10 -------
 object-file.c         | 66 +++++++++++++++++++++----------------------
 object-store.h        | 39 ++++++++++++++++---------
 object.c              |  4 +--
 pack-check.c          |  3 +-
 streaming.c           |  5 +++-
 t/t1450-fsck.sh       | 64 ++++++++++++++++++++++++++++++-----------
 12 files changed, 146 insertions(+), 87 deletions(-)

Range-diff against v1:
1:  f8f00db8d31 = 1:  37c323a2410 cache.h: move object functions to object-store.h
2:  3e547289408 = 2:  5a2cd6cca9c fsck tests: refactor one test to use a sub-repo
3:  74654a01ba3 = 3:  d0d9cb33315 fsck: don't hard die on invalid object types
-:  ----------- > 4:  81fffefcf99 object-store.h: move read_loose_object() below 'struct object_info'
4:  d23fb5cd039 ! 5:  5fb6ac4faee fsck: improve the error on invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck: improve the error on invalid object types
    +    fsck: report invalid types recorded in objects
     
         Continue the work in the preceding commit and improve the error on:
     
    @@ object-file.c: int read_loose_object(const char *path,
      			free(*contents);
     
      ## object-store.h ##
    -@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
    +@@ object-store.h: int oid_object_info_extended(struct repository *r,
    + 
    + /*
    +  * Open the loose object at path, check its hash, and return the contents,
    ++ * use the "oi" argument to assert things about the object, or e.g. populate its
    +  * type, and size. If the object is a blob, then "contents" may return NULL,
    +  * to allow streaming of large blobs.
       *
    -  * Returns 0 on success, negative on error (details may be written to stderr).
    +@@ object-store.h: int oid_object_info_extended(struct repository *r,
       */
    -+struct object_info;
      int read_loose_object(const char *path,
      		      const struct object_id *expected_oid,
     -		      enum object_type *type,
    @@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime
     +		      struct object_info *oi,
      		      unsigned int oi_flags);
      
    - /* Retry packed storage after checking packed and loose storage */
    + /*
     
      ## t/t1450-fsck.sh ##
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
5:  bcec536b0f6 ! 6:  226d2031bcf fsck: improve error on loose object hash mismatch
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    +    fsck: report invalid object type-path combinations
    +
         fsck: improve error on loose object hash mismatch
     
         Improve the error that's emitted in cases where we find a loose object
    @@ object-file.c: int read_loose_object(const char *path,
     -			      oid_to_hex(expected_oid));
     +					   *contents, *size, oi->type_name->buf, real_oid)) {
     +			if (oideq(real_oid, &null_oid))
    -+				/*
    -+				 * Not a plain BUG() because if it
    -+				 * does happen we're in the middle of
    -+				 * an fsck we'd like to see to the
    -+				 * end.
    -+				 */
    -+				bug("BUG trying to compute hash for object at %s (expected %s)",
    -+				    path, oid_to_hex(expected_oid));
    ++				BUG("should only get OID mismatch errors with mapped contents");
      			free(*contents);
      			goto out;
      		}
     
      ## object-store.h ##
    -@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
    - struct object_info;
    +@@ object-store.h: int oid_object_info_extended(struct repository *r,
    +  */
      int read_loose_object(const char *path,
      		      const struct object_id *expected_oid,
     +		      struct object_id *real_oid,
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v2 1/6] cache.h: move object functions to object-store.h
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 148d9ab5f18..f2e7fc615ba 100644
--- a/cache.h
+++ b/cache.h
@@ -1279,16 +1279,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..1563b35f88c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 3/6] fsck: don't hard die on invalid object types
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-23 14:26         ` Jeff King
  2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Change builtin/fsck.c to pass down a
OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
error:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>
    $ git fsck
    fatal: invalid object type
    $

Into:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

But why are we complaining about a "hash mismatch" for an object of a
type we don't know about? We shouldn't. This is the bare minimal
change needed to not make fsck hard die on a repository that's been
corrupted in this manner. In subsequent commits we'll teach fsck to
recognize this particular type of corruption and emit a better error
message.

The parse_loose_header() function being changed here is only used in
builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for its introduction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 24 ++++++++++--------------
 object-store.h  |  6 ++++--
 streaming.c     |  5 ++++-
 t/t1450-fsck.sh |  9 +++++++++
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 70ff95837ae..686f7cdfea0 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,7 +600,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 624af408cdc..26560a6281c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1294,8 +1294,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1355,14 +1356,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1417,10 +1410,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2497,13 +2490,16 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2518,7 +2514,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, oi_flags);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..ab86c8bf32c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
@@ -480,7 +481,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 800f07a52cc..e5d4dd2f654 100644
--- a/streaming.c
+++ b/streaming.c
@@ -341,6 +341,9 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
+	struct object_info oi2 = OBJECT_INFO_INIT;
+	oi2.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -349,7 +352,7 @@ static open_method_decl(loose)
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88c..025dd1b491a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,13 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck error and recovery on invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
+'
+
 test_done
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info'
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-23 14:27         ` Jeff King
  2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 6/6] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the definition of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index ab86c8bf32c..4680dc68ee4 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-23 14:37         ` Jeff King
  2021-04-13  9:43       ` [PATCH v2 6/6] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c |  7 +++++--
 builtin/fsck.c     | 22 ++++++++++++++++++----
 object-file.c      | 31 +++++++++++++++----------------
 object-store.h     |  4 ++--
 t/t1450-fsck.sh    | 23 ++++++++++++++++++++++-
 5 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..1063576a982 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -74,6 +74,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
 	const char *path = force_path;
+	int ret;
 
 	if (unknown_type)
 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
@@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	switch (opt) {
 	case 't':
 		oi.type_name = &sb;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		if (sb.len) {
 			printf("%s\n", sb.buf);
@@ -103,7 +105,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 	case 's':
 		oi.sizep = &size;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		printf("%"PRIuMAX"\n", (uintmax_t)size);
 		return 0;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 686f7cdfea0..878191e53ca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -599,12 +599,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 26560a6281c..e744a06637b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1323,9 +1323,7 @@ int parse_loose_header(const char *hdr,
 	 * we're obtaining the type using '--allow-unknown-type'
 	 * option.
 	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
 		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
@@ -1407,14 +1405,17 @@ static int loose_object_info(struct repository *r,
 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
+	if (status < 0) {
 		; /* Do nothing */
-	else if (hdrbuf.len) {
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	} else {
+		status = parse_loose_header(hdr, oi, flags);
+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
+			error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -2488,9 +2489,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2498,8 +2498,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2514,9 +2514,9 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, oi_flags);
-	if (*type < 0) {
-		error(_("unable to parse header of %s"), path);
+	*type = parse_loose_header(hdr, oi, oi_flags);
+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
+		error(_("unable to parse header %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
@@ -2532,8 +2532,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 4680dc68ee4..3d88b8a7cd3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 025dd1b491a..214278e134a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -868,7 +887,9 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+	grep error: err >err.errors &&
+	test_line_count = 1 err.errors &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v2 6/6] fsck: report invalid object type-path combinations
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  5 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

fsck: improve error on loose object hash mismatch

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1450-fsck.sh       |  8 +++++---
 9 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85a76e0ef8b..bf0e266d83a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 878191e53ca..7713d992960 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -602,20 +602,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = null_oid;
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 15507b5cff0..d5fd81ebf39 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e744a06637b..ca54e76fda2 100644
--- a/object-file.c
+++ b/object-file.c
@@ -993,9 +993,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1003,8 +1005,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1029,9 +1031,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_fn(real_oid.hash, &c);
+	r->hash_algo->final_fn(real_oid->hash, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2489,6 +2491,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2532,9 +2535,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, &null_oid))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 3d88b8a7cd3..c9c4d211de3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -484,7 +485,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 78343781ae7..1cb4b30acd7 100644
--- a/object.c
+++ b/object.c
@@ -262,7 +262,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -273,7 +273,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 214278e134a..c7b084364b7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt
  2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-04-15 10:00     ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-04-15 10:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Eric Sunshine,
	Bagas Sanjaya

On Tue, Apr 13, 2021 at 11:08:20AM +0200, Ævar Arnfjörð Bjarmason wrote:

> When the BUG() function was added in d8193743e08 (usage.c: add BUG()
> function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
> error handling functions and conventions, 2014-12-03) were not
> updated. Let's do that.

Wow, I had no idea this file even existed (most of the time I looked at
technical/api-* the contents were along the lines of "somebody should
write this").

IMHO this is more evidence that this stuff should just go into header
files, where people are more likely to see and update it.

> diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
> index ceeedd485c9..71486abb2f0 100644
> --- a/Documentation/technical/api-error-handling.txt
> +++ b/Documentation/technical/api-error-handling.txt
> @@ -1,8 +1,11 @@
>  Error reporting in git
>  ======================
>  
> -`die`, `usage`, `error`, and `warning` report errors of various
> -kinds.
> +`BUG`, `die`, `usage`, `error`, and `warning` report errors of
> +various kinds.
> +
> +- `BUG` is for failed internal assertions that should never happen,
> +  i.e. a bug in git itself.

Your change looks obviously correct, of course.

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-04-15 10:09     ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-04-15 10:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Eric Sunshine,
	Bagas Sanjaya

On Tue, Apr 13, 2021 at 11:08:19AM +0200, Ævar Arnfjörð Bjarmason wrote:

> In ee4512ed481 (trace2: create new combined trace facility,
> 2019-02-22) we started with two copies of this comment,
> 0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
> a third. Let's instead add an earlier comment that applies to all
> these mostly-the-same functions.

I'm sometimes wary of this aggregating comments like this, because
somebody who is just reading the third function may not think to look
further up to find the comment.

But this comment in particular does not seem dangerous if somebody
misses it (unlike comments that are warning people about bad things
happening).

> + */
>  static NORETURN void die_builtin(const char *err, va_list params)
>  {
> -	/*
> -	 * We call this trace2 function first and expect it to va_copy 'params'
> -	 * before using it (because an 'ap' can only be walked once).
> -	 */
>  	trace2_cmd_error_va(err, params);
>  
>  	vreportf("fatal: ", err, params);

TBH, I am not sure it adds all that much value in the first place. It is
only telling the reader that the code is not broken. But I kind of
wonder if it should simply be doing the defensive thing anyway:

  va_list cp;
  va_copy(cp, params);
  trace2_cmd_error_va(err, params);
  va_end(cp);

We are relying on a subtle contract with the trace2 code, and there is
no compile-time check that it will be upheld (and indeed, on many
platforms we might not even notice if it isn't, depending on how va_list
is implemented). Looking at the trace2 code, we do not even enforce this
centrally. It is up to each target to remember to va_copy()!

Anyway, that is somewhat outside the scope of your series (though I
would not be sad to see this comment go away entirely in favor of the
more defensive code above).

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
                     ` (2 preceding siblings ...)
  2021-04-13  9:08   ` [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
@ 2021-04-15 10:10   ` Jeff King
  3 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-04-15 10:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Eric Sunshine,
	Bagas Sanjaya

On Tue, Apr 13, 2021 at 11:08:18AM +0200, Ævar Arnfjörð Bjarmason wrote:

> A trivial update to the trace2 docs to fix an omission
> with "BUG()" not being listed alongside error(), die() etc.
> 
> v1 of this[1] added a non-fatal-but-logging bug() function, per the
> discussion on v1 that's now gone.

All three look reasonable to me. I left a few comments, but you may want
to just ignore them. ;)

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 3/6] fsck: don't hard die on invalid object types
  2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-04-23 14:26         ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-04-23 14:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Tue, Apr 13, 2021 at 11:43:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Change builtin/fsck.c to pass down a
> OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
> error:
> 
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     <OID>
>     $ git fsck
>     fatal: invalid object type
>     $
> 
> Into:
> 
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ the rest of the fsck output here, i.e. it didn't hard die ]
> 
> We'll still exit with non-zero, but now we'll finish the rest of the
> traversal. The tests that's being added here asserts that we'll still
> complain about other fsck issues (e.g. an unrelated dangling blob).
> 
> But why are we complaining about a "hash mismatch" for an object of a
> type we don't know about? We shouldn't. This is the bare minimal
> change needed to not make fsck hard die on a repository that's been
> corrupted in this manner. In subsequent commits we'll teach fsck to
> recognize this particular type of corruption and emit a better error
> message.

OK. The overall goal makes sense.

> The parse_loose_header() function being changed here is only used in
> builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
> function, 2017-01-13) for its introduction.

This left me scratching my head for a long time. Did you mean
read_loose_object() in the beginning of the sentence?

> -static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
> -				       unsigned int flags)
> +int parse_loose_header(const char *hdr,
> +		       struct object_info *oi,
> +		       unsigned int flags)

So we are getting rid of the "extended" form and just making the
non-extended way take an OI. That seems kind of orthogonal...

> --- a/streaming.c
> +++ b/streaming.c
> @@ -341,6 +341,9 @@ static struct stream_vtbl loose_vtbl = {
>  
>  static open_method_decl(loose)
>  {
> +	struct object_info oi2 = OBJECT_INFO_INIT;
> +	oi2.sizep = &st->size;
> +

...and is what leads us to having to touch this otherwise unrelated
function.

I don't mind _too_ much getting rid of a helper function that would have
only one caller remaining (though "oi2" is a bit mysterious here). But
it seems like the patch would have been a lot easier to understand if
that were separately done (and explained). AFAICT the functional change
here is just passing the flag to read_loose_object(), which could be
calling parse_loose_header_extended(). I guess that would have to become
public, but that seems reasonable.

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info'
  2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-04-23 14:27         ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-04-23 14:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Tue, Apr 13, 2021 at 11:43:07AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Move the definition of read_loose_object() below "struct
> object_info". In the next commit we'll add a "struct object_info *"
> parameter to it, moving it will avoid a forward declaration of the
> struct.

This is a declaration, not a definition, no?

Not a huge deal, I just expected to see the function body moving when I
read the patch, but didn't.

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-04-23 14:37         ` Jeff King
  2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Jeff King @ 2021-04-23 14:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Tue, Apr 13, 2021 at 11:43:08AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Continue the work in the preceding commit and improve the error on:
> 
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ other fsck output ]
> 
> To instead emit:
> 
>     $ git fsck
>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>     [ other fsck output ]

Sounds good.

> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  	switch (opt) {
>  	case 't':
>  		oi.type_name = &sb;
> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
> +		if (!unknown_type && ret < 0)
>  			die("git cat-file: could not get object info");
>  		if (sb.len) {
>  			printf("%s\n", sb.buf);

Surprised to see changes to cat-file here, since the commit message is
all about fsck. Did the semantics of oid_object_info_extended() change?
I.e., this hunk implies to me that it is now returning -1 when we said
unknown types were OK, and we got one. But in that case, how do we
distinguish that from a real error?

Or more concretely, this patch causes this:

  $ git cat-file -t 1234567890123456789012345678901234567890
  fatal: git cat-file: could not get object info

  $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
  fatal: git cat-file 1234567890123456789012345678901234567890: bad file

Or much worse, from the next hunk:

  $ git cat-file -s 1234567890123456789012345678901234567890
  fatal: git cat-file: could not get object info

  $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
  140732113568960

That seems wrong (so I think my "this hunk implies" is not true, but
then I am left with: what is the point of this hunk?).

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-23 14:37         ` Jeff King
@ 2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
  2021-04-26 15:45             ` Jeff King
  0 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-26 14:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Johannes Sixt


On Fri, Apr 23 2021, Jeff King wrote:

> On Tue, Apr 13, 2021 at 11:43:08AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Continue the work in the preceding commit and improve the error on:
>> 
>>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>>     $ git fsck
>>     error: hash mismatch for <OID_PATH> (expected <OID>)
>>     error: <OID>: object corrupt or missing: <OID_PATH>
>>     [ other fsck output ]
>> 
>> To instead emit:
>> 
>>     $ git fsck
>>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>>     [ other fsck output ]
>
> Sounds good.
>
>> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>>  	switch (opt) {
>>  	case 't':
>>  		oi.type_name = &sb;
>> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
>> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
>> +		if (!unknown_type && ret < 0)
>>  			die("git cat-file: could not get object info");
>>  		if (sb.len) {
>>  			printf("%s\n", sb.buf);
>
> Surprised to see changes to cat-file here, since the commit message is
> all about fsck. Did the semantics of oid_object_info_extended() change?
> I.e., this hunk implies to me that it is now returning -1 when we said
> unknown types were OK, and we got one. But in that case, how do we
> distinguish that from a real error?
>
> Or more concretely, this patch causes this:
>
>   $ git cat-file -t 1234567890123456789012345678901234567890
>   fatal: git cat-file: could not get object info
>
>   $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
>   fatal: git cat-file 1234567890123456789012345678901234567890: bad file
>
> Or much worse, from the next hunk:
>
>   $ git cat-file -s 1234567890123456789012345678901234567890
>   fatal: git cat-file: could not get object info
>
>   $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
>   140732113568960
>
> That seems wrong (so I think my "this hunk implies" is not true, but
> then I am left with: what is the point of this hunk?).

That's very well spotted.

I started re-rolling this today but ran out of time. For what it's worth
the combination of this and 6/6 "makes sense" in the sense that all
tests pass at the end of this series.

But the cases you're pointing out are ones we don't have tests for,
i.e. the combination of "allow unknown" and a non-existing object, as
opposed to a garbage one.

Hence the bug with passing up an invalid (uninitialized) size in that
case. It's fallout from other partial lib-ification changes of these
APIs, i.e. making them return bad values upstream instead of dying right
away.

I'll sort that out in some sensible way. Starting with adding meaningful
test coverage for the existing behavior.


^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
@ 2021-04-26 15:45             ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-04-26 15:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Mon, Apr 26, 2021 at 04:28:30PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
> >>  	switch (opt) {
> >>  	case 't':
> >>  		oi.type_name = &sb;
> >> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
> >> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
> >> +		if (!unknown_type && ret < 0)
> >>  			die("git cat-file: could not get object info");
> >>  		if (sb.len) {
> >>  			printf("%s\n", sb.buf);
> >
> > Surprised to see changes to cat-file here, since the commit message is
> > all about fsck. Did the semantics of oid_object_info_extended() change?
> > I.e., this hunk implies to me that it is now returning -1 when we said
> > unknown types were OK, and we got one. But in that case, how do we
> > distinguish that from a real error?
> >
> > Or more concretely, this patch causes this:
> >
> >   $ git cat-file -t 1234567890123456789012345678901234567890
> >   fatal: git cat-file: could not get object info
> >
> >   $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
> >   fatal: git cat-file 1234567890123456789012345678901234567890: bad file
> >
> > Or much worse, from the next hunk:
> >
> >   $ git cat-file -s 1234567890123456789012345678901234567890
> >   fatal: git cat-file: could not get object info
> >
> >   $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
> >   140732113568960
> >
> > That seems wrong (so I think my "this hunk implies" is not true, but
> > then I am left with: what is the point of this hunk?).
> 
> That's very well spotted.
> 
> I started re-rolling this today but ran out of time. For what it's worth
> the combination of this and 6/6 "makes sense" in the sense that all
> tests pass at the end of this series.
> 
> But the cases you're pointing out are ones we don't have tests for,
> i.e. the combination of "allow unknown" and a non-existing object, as
> opposed to a garbage one.
> 
> Hence the bug with passing up an invalid (uninitialized) size in that
> case. It's fallout from other partial lib-ification changes of these
> APIs, i.e. making them return bad values upstream instead of dying right
> away.

I'm not sure I understand. The problem seems solely in the hunk above.
Before, if we got an error from oid_object_info_extended(), we stopped
immediately. But after, we look at the results even though it told us
there was an error. In general, I would think that a "-1" return value
from oid_object_info_extended() is "all bets are off" (remember that
unlike oid_object_info(), this is a strict error return, and not trying
to force the object type into the return value).

And that's independent of what the other patches in the series are
doing, I think.

> I'll sort that out in some sensible way. Starting with adding meaningful
> test coverage for the existing behavior.

Yeah, that sounds fine. I think the current behavior there is perfectly
reasonable (fail with "could not get object info").

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (5 preceding siblings ...)
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22     ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                         ` (18 more replies)
  6 siblings, 19 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

A re-roll of v2's 6 patch series[1], has turned into 17 now. Less
scary than one might think though, it's mostly added test coverage +
splitting existing commits into more incremental chunks, but see the
range-diff. This should address all the feedback on v2 + more.

A brief recap summary of what this is about: We now gracefully recover
instead of dying when fsck encounters types that aren't
blob/commit/tree/tag. Those types don't exist in the wild, but you can
manually create them with "git hash-object -t garbage --literally".

So in some senes this matters to nobody, but I'm doing this as part of
general changes I've been pushing to make fsck/gc error reporting more
graceful, and errors more recoverable. We now have a few more places
in object-file.c where we don't just die(), but properly return
API-like return codes/data to the caller instead.

This does not contain any changes to how --allow-unknown-type
hash-object's --literally etc. work, as I suggested we could do in
[2]. Any such changes will need the API changes here, but these are
just the narrow fsck fixes.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210413T093734Z-avarab@gmail.com
2. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  object-file.c: return -1, not "status" from unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 156 ++++++++++++++++++-------------------
 object-store.h         |  60 +++++++++++----
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 +++++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++-----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 387 insertions(+), 135 deletions(-)

Range-diff against v2:
 2:  5a2cd6cca9 =  1:  aa38b2bf9e fsck tests: refactor one test to use a sub-repo
 -:  ---------- >  2:  82b64abd25 fsck tests: add test for fsck-ing an unknown type
 -:  ---------- >  3:  7c3c2fe25d cat-file tests: test for missing object with -t and -s
 -:  ---------- >  4:  871b820003 cat-file tests: test that --allow-unknown-type isn't on by default
 -:  ---------- >  5:  b98da9cc89 rev-list tests: test for behavior with invalid object types
 -:  ---------- >  6:  04cc1d20f6 cat-file tests: add corrupt loose object test
 -:  ---------- >  7:  9217320888 cat-file tests: test for current --allow-unknown-type behavior
 1:  37c323a241 =  8:  12dd453879 cache.h: move object functions to object-store.h
 3:  d0d9cb3331 !  9:  6a5b78dcad fsck: don't hard die on invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck: don't hard die on invalid object types
    +    object-file.c: make parse_loose_header_extended() public
     
    -    Change builtin/fsck.c to pass down a
    -    OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
    -    error:
    +    Make the parse_loose_header_extended() function public and remove the
    +    parse_loose_header() wrapper. The only direct user of it outside of
    +    object-file.c itself was in streaming.c, that caller can simply pass
    +    the required "struct object-info *" instead.
     
    -        $ git hash-object --stdin -w -t garbage --literally </dev/null
    -        <OID>
    -        $ git fsck
    -        fatal: invalid object type
    -        $
    -
    -    Into:
    -
    -        $ git fsck
    -        error: hash mismatch for <OID_PATH> (expected <OID>)
    -        error: <OID>: object corrupt or missing: <OID_PATH>
    -        [ the rest of the fsck output here, i.e. it didn't hard die ]
    -
    -    We'll still exit with non-zero, but now we'll finish the rest of the
    -    traversal. The tests that's being added here asserts that we'll still
    -    complain about other fsck issues (e.g. an unrelated dangling blob).
    -
    -    But why are we complaining about a "hash mismatch" for an object of a
    -    type we don't know about? We shouldn't. This is the bare minimal
    -    change needed to not make fsck hard die on a repository that's been
    -    corrupted in this manner. In subsequent commits we'll teach fsck to
    -    recognize this particular type of corruption and emit a better error
    -    message.
    -
    -    The parse_loose_header() function being changed here is only used in
    -    builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
    -    function, 2017-01-13) for its introduction.
    +    This change is being done in preparation for teaching
    +    read_loose_object() to accept a flag to pass to
    +    parse_loose_header(). It isn't strictly necessary for that change, we
    +    could simply use parse_loose_header_extended() there, but will leave
    +    the API in a better end state.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## builtin/fsck.c ##
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 	void *contents;
    - 	int eaten;
    - 
    --	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
    -+	if (read_loose_object(path, oid, &type, &size, &contents,
    -+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
    - 		errors_found |= ERROR_OBJECT;
    - 		error(_("%s: object corrupt or missing: %s"),
    - 		      oid_to_hex(oid), path);
    -
      ## object-file.c ##
     @@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
       * too permissive for what we want to check. So do an anal
    @@ object-file.c: static int loose_object_info(struct repository *r,
      
      	if (status >= 0 && oi->contentp) {
     @@ object-file.c: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents)
    -+		      void **contents,
    -+		      unsigned int oi_flags)
    - {
    - 	int ret = -1;
    - 	void *map = NULL;
      	unsigned long mapsize;
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
    @@ object-file.c: int read_loose_object(const char *path,
      	}
      
     -	*type = parse_loose_header(hdr, size);
    -+	*type = parse_loose_header(hdr, &oi, oi_flags);
    ++	*type = parse_loose_header(hdr, &oi, 0);
      	if (*type < 0) {
      		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
     
      ## object-store.h ##
    -@@ object-store.h: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents);
    -+		      void **contents,
    -+		      unsigned int oi_flags);
    - 
    - /* Retry packed storage after checking packed and loose storage */
    - #define HAS_OBJECT_RECHECK_PACKED 1
     @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
      int unpack_loose_header(git_zstream *stream, unsigned char *map,
      			unsigned long mapsize, void *buffer,
    @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
      int finalize_object_file(const char *tmpfile, const char *filename);
     
      ## streaming.c ##
    -@@ streaming.c: static struct stream_vtbl loose_vtbl = {
    - 
    - static open_method_decl(loose)
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    + 			      const struct object_id *oid,
    + 			      enum object_type *type)
      {
    -+	struct object_info oi2 = OBJECT_INFO_INIT;
    -+	oi2.sizep = &st->size;
    ++	struct object_info oi = OBJECT_INFO_INIT;
    ++	oi.sizep = &st->size;
     +
      	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
      	if (!st->u.loose.mapped)
      		return -1;
    -@@ streaming.c: static open_method_decl(loose)
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapsize,
      				 st->u.loose.hdr,
      				 sizeof(st->u.loose.hdr)) < 0) ||
     -	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
    -+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
    ++	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
      		git_inflate_end(&st->z);
      		munmap(st->u.loose.mapped, st->u.loose.mapsize);
      		return -1;
    -
    - ## t/t1450-fsck.sh ##
    -@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
    - 	test_i18ngrep "bad index file" errors
    - '
    - 
    -+test_expect_success 'fsck error and recovery on invalid object type' '
    -+	test_create_repo garbage-type &&
    -+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
    -+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
    -+	test_must_fail git -C garbage-type fsck >out 2>err &&
    -+	grep "$garbage_blob: object corrupt or missing:" err &&
    -+	grep "dangling blob $empty_blob" out
    -+'
    -+
    - test_done
 -:  ---------- > 10:  5d31d7e1a5 object-file.c: add missing braces to loose_object_info()
 -:  ---------- > 11:  ee28089219 object-file.c: stop dying in parse_loose_header()
 -:  ---------- > 12:  77f2cd439c object-file.c: return -2 on "header too long" in unpack_loose_header()
 -:  ---------- > 13:  d22d5b8b85 object-file.c: return -1, not "status" from unpack_loose_header()
 -:  ---------- > 14:  260e9888a3 fsck: don't hard die on invalid object types
 4:  81fffefcf9 ! 15:  e2afb813b2 object-store.h: move read_loose_object() below 'struct object_info'
    @@ Metadata
      ## Commit message ##
         object-store.h: move read_loose_object() below 'struct object_info'
     
    -    Move the definition of read_loose_object() below "struct
    +    Move the declaration of read_loose_object() below "struct
         object_info". In the next commit we'll add a "struct object_info *"
         parameter to it, moving it will avoid a forward declaration of the
         struct.
 5:  5fb6ac4fae ! 16:  328f05c51b fsck: report invalid types recorded in objects
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## builtin/cat-file.c ##
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 	struct strbuf sb = STRBUF_INIT;
    - 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
    - 	const char *path = force_path;
    -+	int ret;
    - 
    - 	if (unknown_type)
    - 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 	switch (opt) {
    - 	case 't':
    - 		oi.type_name = &sb;
    --		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    -+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
    -+		if (!unknown_type && ret < 0)
    - 			die("git cat-file: could not get object info");
    - 		if (sb.len) {
    - 			printf("%s\n", sb.buf);
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 
    - 	case 's':
    - 		oi.sizep = &size;
    --		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    -+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
    -+		if (!unknown_type && ret < 0)
    - 			die("git cat-file: could not get object info");
    - 		printf("%"PRIuMAX"\n", (uintmax_t)size);
    - 		return 0;
    -
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
      	unsigned long size;
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      
     
      ## object-file.c ##
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	 * we're obtaining the type using '--allow-unknown-type'
    - 	 * option.
    - 	 */
    --	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    --		type = 0;
    --	else if (type < 0)
    -+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
    - 		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
    - 		status = error(_("unable to unpack %s header"),
    - 			       oid_to_hex(oid));
    --	if (status < 0)
    -+	if (status < 0) {
    - 		; /* Do nothing */
    --	else if (hdrbuf.len) {
    -+	} else if (hdrbuf.len) {
    - 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    - 			status = error(_("unable to parse %s header with --allow-unknown-type"),
    - 				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
    --		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	} else {
    -+		status = parse_loose_header(hdr, oi, flags);
    -+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
    -+			error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	}
    - 
    - 	if (status >= 0 && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
     @@ object-file.c: static int check_stream_oid(git_zstream *stream,
      
      int read_loose_object(const char *path,
    @@ object-file.c: int read_loose_object(const char *path,
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
     -	struct object_info oi = OBJECT_INFO_INIT;
    + 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    +-	oi.typep = type;
     -	oi.sizep = size;
     +	enum object_type *type = oi->typep;
     +	unsigned long *size = oi->sizep;
    @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
      
    --	*type = parse_loose_header(hdr, &oi, oi_flags);
    --	if (*type < 0) {
    --		error(_("unable to parse header of %s"), path);
    -+	*type = parse_loose_header(hdr, oi, oi_flags);
    -+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
    -+		error(_("unable to parse header %s"), path);
    +-	if (parse_loose_header(hdr, &oi) < 0) {
    ++	if (parse_loose_header(hdr, oi) < 0) {
    + 		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
      		goto out;
    - 	}
     @@ object-file.c: int read_loose_object(const char *path,
      			goto out;
      		}
    @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
      	test_when_finished "git update-ref -d refs/heads/invalid" &&
     @@ t/t1450-fsck.sh: test_expect_success 'fsck error and recovery on invalid object type' '
    - 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
      	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
      	test_must_fail git -C garbage-type fsck >out 2>err &&
    + 	grep -e "^error" -e "^fatal" err >errors &&
    +-	test_line_count = 2 errors &&
    +-	grep "error: hash mismatch for" err &&
     -	grep "$garbage_blob: object corrupt or missing:" err &&
    ++	test_line_count = 1 errors &&
     +	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
    -+	grep error: err >err.errors &&
    -+	test_line_count = 1 err.errors &&
      	grep "dangling blob $empty_blob" out
      '
      
 6:  226d2031bc ! 17:  c5e6686765 fsck: report invalid object type-path combinations
    @@ Metadata
      ## Commit message ##
         fsck: report invalid object type-path combinations
     
    -    fsck: improve error on loose object hash mismatch
    -
         Improve the error that's emitted in cases where we find a loose object
         we parse, but which isn't at the location we expect it to be.
     
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      	struct strbuf sb = STRBUF_INIT;
      	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      	struct object_info oi;
    -+	struct object_id real_oid = null_oid;
    ++	struct object_id real_oid = *null_oid();
      	int found = 0;
      	oi.type_name = &sb;
      	oi.sizep = &size;
    @@ object-file.c: int check_object_signature(struct repository *r, const struct obj
      			break;
      		r->hash_algo->update_fn(&c, buf, readlen);
      	}
    --	r->hash_algo->final_fn(real_oid.hash, &c);
    -+	r->hash_algo->final_fn(real_oid->hash, &c);
    +-	r->hash_algo->final_oid_fn(&real_oid, &c);
    ++	r->hash_algo->final_oid_fn(real_oid, &c);
      	close_istream(st);
     -	return !oideq(oid, &real_oid) ? -1 : 0;
     +	return !oideq(oid, real_oid) ? -1 : 0;
    @@ object-file.c: int read_loose_object(const char *path,
     -			error(_("hash mismatch for %s (expected %s)"), path,
     -			      oid_to_hex(expected_oid));
     +					   *contents, *size, oi->type_name->buf, real_oid)) {
    -+			if (oideq(real_oid, &null_oid))
    ++			if (oideq(real_oid, null_oid()))
     +				BUG("should only get OID mismatch errors with mapped contents");
      			free(*contents);
      			goto out;
    @@ object-store.h: int oid_object_info_extended(struct repository *r,
      		      struct object_info *oi,
      		      unsigned int oi_flags);
     @@ object-store.h: int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - int parse_loose_header(const char *hdr, struct object_info *oi,
    - 		       unsigned int flags);
    + int parse_loose_header(const char *hdr, struct object_info *oi);
    + 
      int check_object_signature(struct repository *r, const struct object_id *oid,
     -			   void *buf, unsigned long size, const char *type);
     +			   void *buf, unsigned long size, const char *type,
    @@ pack-check.c: static int verify_packfile(struct repository *r,
      				    oid_to_hex(&oid), p->pack_name);
      		else if (fn) {
     
    + ## t/t1006-cat-file.sh ##
    +@@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose object' '
    + 		# Swap the two to corrupt the repository
    + 		mv -v "$other_path" "$empty_path" &&
    + 		test_must_fail git fsck 2>err.fsck &&
    +-		grep "hash mismatch" err.fsck &&
    ++		grep "hash-path mismatch" err.fsck &&
    + 
    + 		# confirm that cat-file is reading the new swapped-in
    + 		# blob...
    +
      ## t/t1450-fsck.sh ##
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      	(
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                         ` (17 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5..1563b35f88 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                         ` (16 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88..f36ec1e2f4 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                         ` (15 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74..b71ef94329 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2021-05-20 11:22       ` [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-27 21:17         ` Jonathan Nieder
  2021-05-20 11:22       ` [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                         ` (14 subsequent siblings)
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests added in the tests for the
--allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
cat-file a '--allow-unknown-type' option, 2015-05-03).

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index b71ef94329..dc01d7c4a9 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 06/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                         ` (13 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b71..edb2ed5584 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 06/17] cat-file tests: add corrupt loose object test
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2021-05-20 11:22       ` [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                         ` (12 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dc01d7c4a9..4a76ff024d 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -v "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -v other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (5 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 06/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 08/17] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                         ` (11 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 4a76ff024d..d3d3fd733a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 08/17] cache.h: move object functions to object-store.h
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (6 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                         ` (10 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd3..32ea1ea047 100644
--- a/cache.h
+++ b/cache.h
@@ -1302,16 +1302,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb..9117115a50 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (7 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 08/17] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                         ` (9 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index f233b440b2..527f435381 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1340,8 +1340,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1401,14 +1402,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1463,10 +1456,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2549,6 +2542,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2563,7 +2558,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50..d443964447 100644
--- a/object-store.h
+++ b/object-store.h
@@ -480,7 +480,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c..8beac62cbb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (8 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
                         ` (8 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 527f435381..115054389c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1450,17 +1450,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1469,8 +1472,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (status && oi->typep)
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (9 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 17:50         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                         ` (7 subsequent siblings)
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

The reason this makes sense is that with the refactoring of
parse_loose_header_extended() in an earlier commit the public
interface for parse_loose_header() no longer just accepts a "unsigned
long *sizep". Rather it accepts a "struct object_info *", that
structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

This also allows us to simplify away the
unpack_loose_header_to_strbuf() function added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03). Its code was mostly copy/pasted between it and both
of unpack_loose_header() and unpack_loose_short_header(). We now have
a single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead, but let's
leave this in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 105 ++++++++++++++++++++-----------------------------
 object-store.h |  25 ++++++++++--
 streaming.c    |   7 +++-
 3 files changed, 70 insertions(+), 67 deletions(-)

diff --git a/object-file.c b/object-file.c
index 115054389c..d4bdf86657 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1210,11 +1210,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1225,44 +1226,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. We abort early
+	 * unless under we're running as e.g. "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1340,9 +1322,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1364,15 +1344,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1399,7 +1370,14 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1410,9 +1388,12 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1443,27 +1424,23 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
+	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && *oi->typep < 0)
+		die(_("invalid object type"));
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1481,7 +1458,8 @@ static int loose_object_info(struct repository *r,
 		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
-	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2547,6 +2525,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2557,17 +2536,19 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index d443964447..740edcac30 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,11 +477,30 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown.
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 8beac62cbb..c3dc241d6a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -233,8 +234,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (10 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 17:54         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                         ` (6 subsequent siblings)
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 15 ++++++++++-----
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/object-file.c b/object-file.c
index d4bdf86657..7623ada1aa 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1240,10 +1240,10 @@ int unpack_loose_header(git_zstream *stream,
 	/*
 	 * We have a header longer than MAX_HEADER_LEN. We abort early
 	 * unless under we're running as e.g. "cat-file
-	 * --allow-unknown-type".
+	 * --allow-unknown-type". A -2 is "header too long"
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1264,7 +1264,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1433,9 +1433,14 @@ static int loose_object_info(struct repository *r,
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
 	if (hdr_ret < 0) {
-		status = error(_("unable to unpack %s header"),
-			       oid_to_hex(oid));
+		if (hdr_ret == -2)
+			status = error(_("header for %s too long, exceeds %d bytes"),
+				       oid_to_hex(oid), MAX_HEADER_LEN);
+		else
+			status = error(_("unable to unpack %s header"),
+				       oid_to_hex(oid));
 	}
+
 	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
diff --git a/object-store.h b/object-store.h
index 740edcac30..9accb614fc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d3d3fd733a..f12b06150e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (11 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                         ` (5 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 7623ada1aa..0de699de98 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1229,7 +1229,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 14/17] fsck: don't hard die on invalid object types
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (12 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 18:18         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 87a99b0108..38b515deb6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,7 +600,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 0de699de98..0e8a024eb3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2522,7 +2522,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2530,6 +2531,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2552,8 +2554,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 9accb614fc..790e8b1798 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f36ec1e2f4..e7e8decebb 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	test_create_repo garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info'
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (13 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index 790e8b1798..698a701d70 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 16/17] fsck: report invalid types recorded in objects
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (14 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 18:24         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 38b515deb6..32f11dc1fe 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -599,12 +599,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 0e8a024eb3..f24fa2fe4a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2520,9 +2520,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2530,10 +2529,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2549,7 +2547,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2571,8 +2569,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 698a701d70..ba6e5d76c0 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e7e8decebb..bc541af2cf 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v3 17/17] fsck: report invalid object type-path combinations
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (15 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 18:28         ` Jonathan Tan
  2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0..48a3b6a7f8 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 32f11dc1fe..96df1aadbf 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -602,20 +602,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3fbc5d7077..bf860b6555 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd36..3b2dbbb37e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index f24fa2fe4a..bf9ac38d73 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1039,9 +1039,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1049,8 +1051,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1075,9 +1077,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2520,6 +2522,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2569,9 +2572,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index ba6e5d76c0..60b566a63b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -505,7 +506,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 14188453c5..5467ead328 100644
--- a/object.c
+++ b/object.c
@@ -261,7 +261,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec..e6aa4442c9 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index f12b06150e..0bb5ee97de 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -v "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bc541af2cf..d76293c495 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (16 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-05-27 17:08       ` Jonathan Tan
  2021-05-28  0:18         ` Junio C Hamano
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  18 siblings, 1 reply; 245+ messages in thread
From: Jonathan Tan @ 2021-05-27 17:08 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> So in some senes this matters to nobody, but I'm doing this as part of
> general changes I've been pushing to make fsck/gc error reporting more
> graceful, and errors more recoverable. We now have a few more places
> in object-file.c where we don't just die(), but properly return
> API-like return codes/data to the caller instead.

Well, I guess it's useful if somehow your repository got corrupted, and
you want to pinpoint where it occurred.

> Ævar Arnfjörð Bjarmason (17):
>   fsck tests: refactor one test to use a sub-repo
>   fsck tests: add test for fsck-ing an unknown type
>   cat-file tests: test for missing object with -t and -s
>   cat-file tests: test that --allow-unknown-type isn't on by default
>   rev-list tests: test for behavior with invalid object types
>   cat-file tests: add corrupt loose object test
>   cat-file tests: test for current --allow-unknown-type behavior
>   cache.h: move object functions to object-store.h
>   object-file.c: make parse_loose_header_extended() public
>   object-file.c: add missing braces to loose_object_info()
>   object-file.c: stop dying in parse_loose_header()
>   object-file.c: return -2 on "header too long" in unpack_loose_header()
>   object-file.c: return -1, not "status" from unpack_loose_header()
>   fsck: don't hard die on invalid object types
>   object-store.h: move read_loose_object() below 'struct object_info'
>   fsck: report invalid types recorded in objects
>   fsck: report invalid object type-path combinations

My main comment as a reviewer is I think that there are a lot of
unrelated changes in this patch set - in particular, the first 7 tests
(1 fsck test that refactors something unrelated, 1 fsck test that I
presume will be overridden later, and 5 tests for other commands
unrelated to fsck). I'll also give comments on individual patches.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header()
  2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-27 17:50         ` Jonathan Tan
  0 siblings, 0 replies; 245+ messages in thread
From: Jonathan Tan @ 2021-05-27 17:50 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> Start the libification of parse_loose_header() by making it return
> error codes and data instead of invoking die() by itself. For now
> we'll move the relevant die() call to loose_object_info() and
> read_loose_object() to keep this change smaller, but in subsequent
> commits we'll also libify those.
> 
> The reason this makes sense is that with the refactoring of
> parse_loose_header_extended() in an earlier commit the public
> interface for parse_loose_header() no longer just accepts a "unsigned
> long *sizep". Rather it accepts a "struct object_info *", that
> structure will be populated with information about the object.
> 
> It thus makes sense to further libify the interface so that it stops
> calling die() when it encounters OBJ_BAD, and instead rely on its
> callers to check the populated "oi->typep".
> 
> This also allows us to simplify away the
> unpack_loose_header_to_strbuf() function added in
> 46f034483eb (sha1_file: support reading from a loose object of unknown
> type, 2015-05-03). Its code was mostly copy/pasted between it and both
> of unpack_loose_header() and unpack_loose_short_header(). We now have
> a single unpack_loose_header() function which accepts an optional
> "struct strbuf *" instead.
> 
> I think the remaining unpack_loose_header() function could be further
> simplified, we're carrying some complexity just to be able to emit a
> garbage type longer than MAX_HEADER_LEN, we could alternatively just
> say "we found a garbage type <first 32 bytes>..." instead, but let's
> leave this in place for now.

Looking at the patch itself, this patch:
 1. Combines unpack_loose_header(), unpack_loose_short_header(), and
    unpack_loose_header_to_strbuf() into one function. It does different
    things depending on whether the struct strbuf * is provided.
 2. Updates parse_loose_header() to:
    a. never die upon invalid object type
    b. not accept a flags argument (which was used solely to control
       whether it died or not upon invalid object type)
 3. Updates the callers of these functions.

I think 2b should have been mentioned in the commit message. (And
overall I think that the commit message could be shorter, but what
information to include and exclude is a subjective matter, so I won't
concern myself so much about that.)

Also, I think that 1 should be split into its own commit. As it is, I'm
not even sure if it's a good idea - for me, it is confusing for a
function to consume or not consume more of the stream depending on
whether an argument is NULL or not.

>  	/*
>  	 * Check if entire header is unpacked in the first iteration.
>  	 */
>  	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
>  		return 0;
>  
> +	/*
> +	 * We have a header longer than MAX_HEADER_LEN. We abort early
> +	 * unless under we're running as e.g. "cat-file
> +	 * --allow-unknown-type".
> +	 */
> +	if (!header)
> +		return -1;

What do you mean by "unless under we're running as"? And how would we
know at this point that we're running as "cat-file --allow-unknown-type"
just by checking "header"?

> -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
> -		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
> -			status = error(_("unable to unpack %s header with --allow-unknown-type"),
> -				       oid_to_hex(oid));
> -	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
> +
> +	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
> +				      allow_unknown ? &hdrbuf : NULL);
> +	if (hdr_ret < 0) {
>  		status = error(_("unable to unpack %s header"),
>  			       oid_to_hex(oid));
>  	}

This hunk would go into the split commit (for the unpack_loose_header()
refactoring) I suggested above.

> -
> -	if (status < 0) {
> -		/* Do nothing */
> -	} else if (hdrbuf.len) {
> -		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
> -			status = error(_("unable to parse %s header with --allow-unknown-type"),
> -				       oid_to_hex(oid));
> -	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
> +	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
>  		status = error(_("unable to parse %s header"), oid_to_hex(oid));
>  	}
> +	if (!allow_unknown && *oi->typep < 0)
> +		die(_("invalid object type"));

I think this change doesn't need to be so big? The oi->typep check could
just go in the "else if" branch that happens if --allow-unknown-type is
not set.

> diff --git a/object-store.h b/object-store.h
> index d443964447..740edcac30 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -477,11 +477,30 @@ int for_each_object_in_pack(struct packed_git *p,
>  int for_each_packed_object(each_packed_object_fn, void *,
>  			   enum for_each_object_flags flags);
>  
> +/**
> + * unpack_loose_header() initializes the data stream needed to unpack
> + * a loose object header.
> + *
> + * Returns 0 on success. Returns negative values on error.
> + *
> + * It will only parse up to MAX_HEADER_LEN bytes unless an optional
> + * "hdrbuf" argument is non-NULL. This is intended for use with
> + * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
> + * reporting. The full header will be extracted to "hdrbuf" for use
> + * with parse_loose_header().
> + */
>  int unpack_loose_header(git_zstream *stream, unsigned char *map,
>  			unsigned long mapsize, void *buffer,
> +			unsigned long bufsiz, struct strbuf *hdrbuf);

Parsing up to MAX_HEADER_LEN only occasionally is confusing. Could we
always make it parse up to MAX_HEADER_LEN, and then put a truncated
header in hdrbuf if it is too big (and therefore invalid)?

> +/**
> + * parse_loose_header() parses the starting "<type> <len>\0" of an
> + * object. If it doesn't follow that format -1 is returned. To check
> + * the validity of the <type> populate the "typep" in the "struct
> + * object_info". It will be OBJ_BAD if the object type is unknown.
> + */
> +int parse_loose_header(const char *hdr, struct object_info *oi);

Also mention what happens if the size is invalid.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-27 17:54         ` Jonathan Tan
  0 siblings, 0 replies; 245+ messages in thread
From: Jonathan Tan @ 2021-05-27 17:54 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> diff --git a/object-store.h b/object-store.h
> index 740edcac30..9accb614fc 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
>   * unpack_loose_header() initializes the data stream needed to unpack
>   * a loose object header.
>   *
> - * Returns 0 on success. Returns negative values on error.
> + * Returns 0 on success. Returns negative values on error. If the
> + * header exceeds MAX_HEADER_LEN -2 will be returned.
>   *
>   * It will only parse up to MAX_HEADER_LEN bytes unless an optional
>   * "hdrbuf" argument is non-NULL. This is intended for use with
>   * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
>   * reporting. The full header will be extracted to "hdrbuf" for use
> - * with parse_loose_header().
> + * with parse_loose_header(), -2 will still be returned from this
> + * function to indicate that the header was too long.
>   */
>  int unpack_loose_header(git_zstream *stream, unsigned char *map,
>  			unsigned long mapsize, void *buffer,

Can the return type be an enum in this case?

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index d3d3fd733a..f12b06150e 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
>  
>  test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
>  	cat >err.expect <<-EOF &&
> -	error: unable to unpack $bogus_sha1 header
> +	error: header for $bogus_sha1 too long, exceeds 32 bytes
>  	fatal: git cat-file: could not get object info
>  	EOF

Ah, the error message is much more informative - thanks.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 14/17] fsck: don't hard die on invalid object types
  2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-05-27 18:18         ` Jonathan Tan
  0 siblings, 0 replies; 245+ messages in thread
From: Jonathan Tan @ 2021-05-27 18:18 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> diff --git a/object-file.c b/object-file.c
> index 0de699de98..0e8a024eb3 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2522,7 +2522,8 @@ int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
>  		      enum object_type *type,
>  		      unsigned long *size,
> -		      void **contents)
> +		      void **contents,
> +		      unsigned int oi_flags)
>  {
>  	int ret = -1;
>  	void *map = NULL;
> @@ -2530,6 +2531,7 @@ int read_loose_object(const char *path,
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	oi.typep = type;
>  	oi.sizep = size;
>  
> @@ -2552,8 +2554,11 @@ int read_loose_object(const char *path,
>  		git_inflate_end(&stream);
>  		goto out;
>  	}
> -	if (*type < 0)
> -		die(_("invalid object type"));
> +	if (!allow_unknown && *type < 0) {
> +		error(_("header for %s declares an unknown type"), path);
> +		git_inflate_end(&stream);
> +		goto out;
> +	}
>  
>  	if (*type == OBJ_BLOB && *size > big_file_threshold) {
>  		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)

So instead of dying, we print an error and behave as if the object was
invalid for some other reason. Makes sense.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 16/17] fsck: report invalid types recorded in objects
  2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-05-27 18:24         ` Jonathan Tan
  0 siblings, 0 replies; 245+ messages in thread
From: Jonathan Tan @ 2021-05-27 18:24 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> diff --git a/object-store.h b/object-store.h
> index 698a701d70..ba6e5d76c0 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
>  
>  /*
>   * Open the loose object at path, check its hash, and return the contents,
> + * use the "oi" argument to assert things about the object, or e.g. populate its
>   * type, and size. If the object is a blob, then "contents" may return NULL,
>   * to allow streaming of large blobs.
>   *
> @@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
>   */
>  int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
> -		      enum object_type *type,
> -		      unsigned long *size,
>  		      void **contents,
> +		      struct object_info *oi,
>  		      unsigned int oi_flags);
>  
>  /*

What do you mean by using "oi" to assert things? As far as I know, it's
just a way to specify which information you want (by assigning pointers
to the appropriate fields).

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index e7e8decebb..bc541af2cf 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
>  	)
>  '
>  
> +test_expect_success 'object with hash and type mismatch' '
> +	test_create_repo hash-type-mismatch &&
> +	(
> +		cd hash-type-mismatch &&
> +		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
> +		old=$(test_oid_to_path "$oid") &&
> +		new=$(dirname $old)/$(test_oid ff_2) &&
> +		oid="$(dirname $new)$(basename $new)" &&
> +		mv .git/objects/$old .git/objects/$new &&
> +		git update-index --add --cacheinfo 100644 $oid foo &&
> +		tree=$(git write-tree) &&
> +		cmt=$(echo bogus | git commit-tree $tree) &&
> +		git update-ref refs/heads/bogus $cmt &&
> +		test_must_fail git fsck 2>out &&
> +		grep "^error: hash mismatch for " out &&
> +		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out

I don't think we need to know the precise quote used - it's simpler to
just put "." where we need the single quote. Same for the other case
below.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 17/17] fsck: report invalid object type-path combinations
  2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-05-27 18:28         ` Jonathan Tan
  0 siblings, 0 replies; 245+ messages in thread
From: Jonathan Tan @ 2021-05-27 18:28 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> Improve the error that's emitted in cases where we find a loose object
> we parse, but which isn't at the location we expect it to be.

I'll hold off reviewing this for 2 reasons - firstly, I don't think it's
related to the other patches that are about reporting a wrong object
type, and secondly, in the case of repository corruption, I don't see
how useful computing (and reporting) the hash of a corrupt object is.

If someone else wants to take a look at this, that would be great, but
otherwise I would suggest splitting this into its own patch set.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-05-27 21:17         ` Jonathan Nieder
  2021-05-28  3:10           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Jonathan Nieder @ 2021-05-27 21:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Johannes Sixt

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Fix a blindspot in the tests added in the tests for the
> --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
> cat-file a '--allow-unknown-type' option, 2015-05-03).
>
> Before this change all the tests would succeed if --allow-unknown-type
> was on by default, let's fix that by asserting that -t and -s die on a
> "garbage" type without --allow-unknown-type.

nit: "tests added in the tests" seems oddly repetitive.

More importantly, I'm curious about the desired behavior here.  The
idea behind cat-file --allow-unknown-type is that I can use it to
inspect an invalid object, for example after it has been reported by
git fsck.  The commit that introduced it (39e4ae3880, "cat-file: teach
cat-file a '--allow-unknown-type' option", 2015-05-03) gives the hint
"query broken/corrupt objects" in the documentation, so I figure
that's what it's for, and I'm sympathetic.

But: why is that an option instead of something that we always do?

In other words, is there some situation where I would not want the
more permissive behavior from cat-file against a bad object?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
@ 2021-05-28  0:18         ` Junio C Hamano
  2021-05-28  5:41           ` Felipe Contreras
  0 siblings, 1 reply; 245+ messages in thread
From: Junio C Hamano @ 2021-05-28  0:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: avarab, git, peff, j6t

Jonathan Tan <jonathantanmy@google.com> writes:

> My main comment as a reviewer is I think that there are a lot of
> unrelated changes in this patch set ...

Thanks for reviewing.  I share the same feeling, not specifically
about this series, but I find that "doing too many while-at-it
changes" is shared among the topics by the same author, and I often
wish each topic were more focused.


^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-05-27 21:17         ` Jonathan Nieder
@ 2021-05-28  3:10           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-28  3:10 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Junio C Hamano, Jeff King, Johannes Sixt


On Thu, May 27 2021, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> Fix a blindspot in the tests added in the tests for the
>> --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
>> cat-file a '--allow-unknown-type' option, 2015-05-03).
>>
>> Before this change all the tests would succeed if --allow-unknown-type
>> was on by default, let's fix that by asserting that -t and -s die on a
>> "garbage" type without --allow-unknown-type.
>
> nit: "tests added in the tests" seems oddly repetitive.
>
> More importantly, I'm curious about the desired behavior here.  The
> idea behind cat-file --allow-unknown-type is that I can use it to
> inspect an invalid object, for example after it has been reported by
> git fsck.  The commit that introduced it (39e4ae3880, "cat-file: teach
> cat-file a '--allow-unknown-type' option", 2015-05-03) gives the hint
> "query broken/corrupt objects" in the documentation, so I figure
> that's what it's for, and I'm sympathetic.
>
> But: why is that an option instead of something that we always do?
>
> In other words, is there some situation where I would not want the
> more permissive behavior from cat-file against a bad object?

Yes. I suggested as much in
https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

For this series though I'm sticking to testing for the existing behavior
+ fixing the immediate fsck issues. I've got some local patches queued
up for after this topic lands (after I re-roll it, re-submit etc.) that
do that.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-05-28  0:18         ` Junio C Hamano
@ 2021-05-28  5:41           ` Felipe Contreras
  0 siblings, 0 replies; 245+ messages in thread
From: Felipe Contreras @ 2021-05-28  5:41 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan; +Cc: avarab, git, peff, j6t

Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > My main comment as a reviewer is I think that there are a lot of
> > unrelated changes in this patch set ...
> 
> Thanks for reviewing.  I share the same feeling, not specifically
> about this series, but I find that "doing too many while-at-it
> changes" is shared among the topics by the same author, and I often
> wish each topic were more focused.

The author has a name.

I understand why as a reviewer you want a small patch series, but as a
patch-writer you want your code to land on master.

Perhaps if there was an actual incentive to split a patch series more
people would do so, but in my personal experience that has not been the
case.

If I had to name the reason why some of my patches have landed on master I
would say it's *arbitrary*. Maybe you catch reviewers on a good mood, or
the maintainer in-between release candidates. But regardless of the
actual reason, patch-series' size doesn't seem to be a huge factor.

As exhibit I can five two patch-series of mine:

  1. https://lore.kernel.org/git/20201223144845.143039-1-felipe.contreras@gmail.com/
  2. https://lore.kernel.org/git/20210426161458.49860-1-felipe.contreras@gmail.com/

The first one is 4 patches. The second one is 43.

The second one receved feedback from the maintainer. The first one was
complerely ignored. Neither were acceped.

This is not intended to point fingers at anyone, merely to state the
a mathematical fact.


Splitting a patch series is usually more work. If there's no real
incentive for a submitter to do so, why would she/he?

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v4 00/21] fsck: better "invalid object" error reporting
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (17 preceding siblings ...)
  2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
@ 2021-06-24 19:23       ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                           ` (21 more replies)
  18 siblings, 22 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

A late re-roll of the v3[1] that's in "seen" and causing a couple of
CI errors, both are solved with this version. For a recap of what this
is about see v3's summary[1].

One was a "mv -v" issue on OSX, it asks interactively, and defaults to
no, now moved to "mv -f" (as in other tests that move .git/objects/*
files directly).

The other happened on one of the docker32 Linux boxes, and turned out
to be an uninitialized variable bug, which happened to be initialized
to a negative value there, and zero (or positive) everywhere else.

I also went through all the commentary on the v3 (particularly good
feedback from Jonathan Tan) and hopefully addressed everything in one
way or another. In particular the 12/21 is new here, split off from
what's now 14/21. That change was the most complex one in the previous
series.

1. https://lore.kernel.org/git/cover-00.17-0000000000-20210520T111610Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (21):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  62 +++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 407 insertions(+), 139 deletions(-)

Range-diff against v3:
 1:  aa38b2bf9e7 =  1:  2e37971c016 fsck tests: refactor one test to use a sub-repo
 2:  82b64abd250 =  2:  79630a99433 fsck tests: add test for fsck-ing an unknown type
 3:  7c3c2fe25d9 =  3:  2b5366bfb9d cat-file tests: test for missing object with -t and -s
 4:  871b8200035 !  4:  ea9a5ef0920 cat-file tests: test that --allow-unknown-type isn't on by default
    @@ Metadata
      ## Commit message ##
         cat-file tests: test that --allow-unknown-type isn't on by default
     
    -    Fix a blindspot in the tests added in the tests for the
    -    --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
    -    cat-file a '--allow-unknown-type' option, 2015-05-03).
    +    Fix a blindspot in the tests for the --allow-unknown-type feature
    +    added in 39e4ae38804 (cat-file: teach cat-file a
    +    '--allow-unknown-type' option, 2015-05-03). We should check that
    +    --allow-unknown-type isn't on by default.
     
         Before this change all the tests would succeed if --allow-unknown-type
         was on by default, let's fix that by asserting that -t and -s die on a
 5:  b98da9cc89e =  5:  8eaf0e6ddda rev-list tests: test for behavior with invalid object types
 6:  04cc1d20f62 !  6:  f0e9d92414e cat-file tests: add corrupt loose object test
    @@ t/t1006-cat-file.sh: test_expect_success "Size of large broken object is correct
     +		test_cmp out.expect out.actual &&
     +
     +		# Swap the two to corrupt the repository
    -+		mv -v "$other_path" "$empty_path" &&
    ++		mv -f "$other_path" "$empty_path" &&
     +		test_must_fail git fsck 2>err.fsck &&
     +		grep "hash mismatch" err.fsck &&
     +
    @@ t/t1006-cat-file.sh: test_expect_success "Size of large broken object is correct
     +
     +		# So far "cat-file" has been happy to spew the found
     +		# content out as-is. Try to make it zlib-invalid.
    -+		mv -v other.blob "$empty_path" &&
    ++		mv -f other.blob "$empty_path" &&
     +		test_must_fail git fsck 2>err.fsck &&
     +		grep "^error: inflate: data stream error (" err.fsck
     +	)
 7:  9217320888f =  7:  d797d2e8e9d cat-file tests: test for current --allow-unknown-type behavior
 8:  12dd4538794 =  8:  96310a0bb59 cache.h: move object functions to object-store.h
 -:  ----------- >  9:  54fb9189408 object-file.c: don't set "typep" when returning non-zero
 9:  6a5b78dcad8 = 10:  9d36fcbc44a object-file.c: make parse_loose_header_extended() public
10:  5d31d7e1a54 ! 11:  74c308adc19 object-file.c: add missing braces to loose_object_info()
    @@ object-file.c: static int loose_object_info(struct repository *r,
     +	}
      
      	munmap(map, mapsize);
    - 	if (status && oi->typep)
    + 	if (oi->sizep == &size_scratch)
11:  ee28089219f ! 12:  3f52149bfde object-file.c: stop dying in parse_loose_header()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    object-file.c: stop dying in parse_loose_header()
    +    object-file.c: simplify unpack_loose_short_header()
     
    -    Start the libification of parse_loose_header() by making it return
    -    error codes and data instead of invoking die() by itself. For now
    -    we'll move the relevant die() call to loose_object_info() and
    -    read_loose_object() to keep this change smaller, but in subsequent
    -    commits we'll also libify those.
    +    Combine the unpack_loose_short_header(),
    +    unpack_loose_header_to_strbuf() and unpack_loose_header() functions
    +    into one.
     
    -    The reason this makes sense is that with the refactoring of
    -    parse_loose_header_extended() in an earlier commit the public
    -    interface for parse_loose_header() no longer just accepts a "unsigned
    -    long *sizep". Rather it accepts a "struct object_info *", that
    -    structure will be populated with information about the object.
    -
    -    It thus makes sense to further libify the interface so that it stops
    -    calling die() when it encounters OBJ_BAD, and instead rely on its
    -    callers to check the populated "oi->typep".
    -
    -    This also allows us to simplify away the
    -    unpack_loose_header_to_strbuf() function added in
    +    The unpack_loose_header_to_strbuf() function was added in
         46f034483eb (sha1_file: support reading from a loose object of unknown
    -    type, 2015-05-03). Its code was mostly copy/pasted between it and both
    -    of unpack_loose_header() and unpack_loose_short_header(). We now have
    -    a single unpack_loose_header() function which accepts an optional
    +    type, 2015-05-03).
    +
    +    Its code was mostly copy/pasted between it and both of
    +    unpack_loose_header() and unpack_loose_short_header(). We now have a
    +    single unpack_loose_header() function which accepts an optional
         "struct strbuf *" instead.
     
         I think the remaining unpack_loose_header() function could be further
         simplified, we're carrying some complexity just to be able to emit a
         garbage type longer than MAX_HEADER_LEN, we could alternatively just
    -    say "we found a garbage type <first 32 bytes>..." instead, but let's
    -    leave this in place for now.
    +    say "we found a garbage type <first 32 bytes>..." instead. But let's
    +    leave the current behavior in place for now.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      		return 0;
      
     +	/*
    -+	 * We have a header longer than MAX_HEADER_LEN. We abort early
    -+	 * unless under we're running as e.g. "cat-file
    ++	 * We have a header longer than MAX_HEADER_LEN. The "header"
    ++	 * here is only non-NULL when we run "cat-file
     +	 * --allow-unknown-type".
     +	 */
     +	if (!header)
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      	/*
      	 * buffer[0..bufsiz] was not large enough.  Copy the partial
      	 * result out to header, and then append the result of further
    -@@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
    -  * too permissive for what we want to check. So do an anal
    -  * object header parse by hand.
    -  */
    --int parse_loose_header(const char *hdr,
    --		       struct object_info *oi,
    --		       unsigned int flags)
    -+int parse_loose_header(const char *hdr, struct object_info *oi)
    - {
    - 	const char *type_buf = hdr;
    - 	unsigned long size;
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	type = type_from_string_gently(type_buf, type_len, 1);
    - 	if (oi->type_name)
    - 		strbuf_add(oi->type_name, type_buf, type_len);
    --	/*
    --	 * Set type to 0 if its an unknown object and
    --	 * we're obtaining the type using '--allow-unknown-type'
    --	 * option.
    --	 */
    --	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    --		type = 0;
    --	else if (type < 0)
    --		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    - 
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	/*
    - 	 * The length must be followed by a zero byte
    - 	 */
    --	return *hdr ? -1 : type;
    -+	if (*hdr)
    -+		return -1;
    -+
    -+	/*
    -+	 * The format is valid, but the type may still be bogus. The
    -+	 * Caller needs to check its oi->typep.
    -+	 */
    -+	return 0;
    - }
    - 
    - static int loose_object_info(struct repository *r,
     @@ object-file.c: static int loose_object_info(struct repository *r,
      	unsigned long mapsize;
      	void *map;
    @@ object-file.c: static int loose_object_info(struct repository *r,
      	char hdr[MAX_HEADER_LEN];
      	struct strbuf hdrbuf = STRBUF_INIT;
      	unsigned long size_scratch;
    -+	enum object_type type_scratch;
     +	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      
      	if (oi->delta_base_oid)
      		oidclr(oi->delta_base_oid);
     @@ object-file.c: static int loose_object_info(struct repository *r,
      
    - 	if (!oi->sizep)
    - 		oi->sizep = &size_scratch;
    -+	if (!oi->typep)
    -+		oi->typep = &type_scratch;
    - 
      	if (oi->disk_sizep)
      		*oi->disk_sizep = mapsize;
     -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
    @@ object-file.c: static int loose_object_info(struct repository *r,
      		status = error(_("unable to unpack %s header"),
      			       oid_to_hex(oid));
      	}
    --
    --	if (status < 0) {
    --		/* Do nothing */
    --	} else if (hdrbuf.len) {
    --		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    --			status = error(_("unable to parse %s header with --allow-unknown-type"),
    --				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
    -+	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
    - 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    - 	}
    -+	if (!allow_unknown && *oi->typep < 0)
    -+		die(_("invalid object type"));
    - 
    - 	if (status >= 0 && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 		*oi->typep = status;
    - 	if (oi->sizep == &size_scratch)
    - 		oi->sizep = NULL;
    --	strbuf_release(&hdrbuf);
    -+	if (oi->typep == &type_scratch)
    -+		oi->typep = NULL;
    - 	oi->whence = OI_LOOSE;
    - 	return (status < 0) ? status : 0;
    - }
    -@@ object-file.c: int read_loose_object(const char *path,
    - 	git_zstream stream;
    - 	char hdr[MAX_HEADER_LEN];
    - 	struct object_info oi = OBJECT_INFO_INIT;
    -+	oi.typep = type;
    - 	oi.sizep = size;
    - 
    - 	*contents = NULL;
     @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
    @@ object-file.c: int read_loose_object(const char *path,
      		error(_("unable to unpack header of %s"), path);
      		goto out;
      	}
    - 
    --	*type = parse_loose_header(hdr, &oi, 0);
    --	if (*type < 0) {
    -+	if (parse_loose_header(hdr, &oi) < 0) {
    - 		error(_("unable to parse header of %s"), path);
    - 		git_inflate_end(&stream);
    - 		goto out;
    - 	}
    -+	if (*type < 0)
    -+		die(_("invalid object type"));
    - 
    - 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
    - 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
     
      ## object-store.h ##
     @@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
    @@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
      int unpack_loose_header(git_zstream *stream, unsigned char *map,
      			unsigned long mapsize, void *buffer,
     -			unsigned long bufsiz);
    --int parse_loose_header(const char *hdr, struct object_info *oi,
    --		       unsigned int flags);
     +			unsigned long bufsiz, struct strbuf *hdrbuf);
    -+
    -+/**
    -+ * parse_loose_header() parses the starting "<type> <len>\0" of an
    -+ * object. If it doesn't follow that format -1 is returned. To check
    -+ * the validity of the <type> populate the "typep" in the "struct
    -+ * object_info". It will be OBJ_BAD if the object type is unknown.
    -+ */
    -+int parse_loose_header(const char *hdr, struct object_info *oi);
    -+
    + int parse_loose_header(const char *hdr, struct object_info *oi,
    + 		       unsigned int flags);
      int check_object_signature(struct repository *r, const struct object_id *oid,
    - 			   void *buf, unsigned long size, const char *type);
    - int finalize_object_file(const char *tmpfile, const char *filename);
     
      ## streaming.c ##
    -@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    - {
    - 	struct object_info oi = OBJECT_INFO_INIT;
    - 	oi.sizep = &st->size;
    -+	oi.typep = type;
    - 
    - 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
    - 	if (!st->u.loose.mapped)
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapped,
      				 st->u.loose.mapsize,
      				 st->u.loose.hdr,
     -				 sizeof(st->u.loose.hdr)) < 0) ||
    --	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
     +				 sizeof(st->u.loose.hdr),
     +				 NULL) < 0) ||
    -+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
    -+	    *type < 0) {
    + 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
      		git_inflate_end(&st->z);
      		munmap(st->u.loose.mapped, st->u.loose.mapsize);
    - 		return -1;
 -:  ----------- > 13:  ba632be1520 object-file.c: split up ternary in parse_loose_header()
 -:  ----------- > 14:  ea4f446f5b1 object-file.c: stop dying in parse_loose_header()
 -:  ----------- > 15:  aacef784eab object-file.c: guard against future bugs in loose_object_info()
13:  d22d5b8b85e = 16:  050cfc7808c object-file.c: return -1, not "status" from unpack_loose_header()
12:  77f2cd439c6 ! 17:  78e3152fd94 object-file.c: return -2 on "header too long" in unpack_loose_header()
    @@ Commit message
         MAX_HEADER_LEN limit, or other negative values for "unable to unpack
         <OID> header".
     
    +    I tried setting up an enum just for these three return values, but I
    +    think the result was less readable. Let's consider doing that if we
    +    gain even more return values. For now let's do the next best thing and
    +    enumerate our known return values, and BUG() if we encounter one we
    +    don't know about.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-file.c ##
     @@ object-file.c: int unpack_loose_header(git_zstream *stream,
    - 	/*
    - 	 * We have a header longer than MAX_HEADER_LEN. We abort early
    - 	 * unless under we're running as e.g. "cat-file
    --	 * --allow-unknown-type".
    -+	 * --allow-unknown-type". A -2 is "header too long"
    + 	 * --allow-unknown-type".
      	 */
      	if (!header)
     -		return -1;
    @@ object-file.c: int unpack_loose_header(git_zstream *stream,
      
      static void *unpack_loose_rest(git_zstream *stream,
     @@ object-file.c: static int loose_object_info(struct repository *r,
    + 
      	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
      				      allow_unknown ? &hdrbuf : NULL);
    - 	if (hdr_ret < 0) {
    --		status = error(_("unable to unpack %s header"),
    --			       oid_to_hex(oid));
    -+		if (hdr_ret == -2)
    -+			status = error(_("header for %s too long, exceeds %d bytes"),
    -+				       oid_to_hex(oid), MAX_HEADER_LEN);
    -+		else
    -+			status = error(_("unable to unpack %s header"),
    -+				       oid_to_hex(oid));
    - 	}
    -+
    - 	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
    - 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    +-	if (hdr_ret < 0) {
    ++	switch (hdr_ret) {
    ++	case 0:
    ++		break;
    ++	case -1:
    + 		status = error(_("unable to unpack %s header"),
    + 			       oid_to_hex(oid));
    ++		break;
    ++	case -2:
    ++		status = error(_("header for %s too long, exceeds %d bytes"),
    ++			       oid_to_hex(oid), MAX_HEADER_LEN);
    ++		break;
    ++	default:
    ++		BUG("unknown hdr_ret value %d", hdr_ret);
      	}
    + 	if (!status) {
    + 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
     
      ## object-store.h ##
     @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
14:  260e9888a3e = 18:  f9bb1b799ac fsck: don't hard die on invalid object types
15:  e2afb813b28 = 19:  acbea7e2a2a object-store.h: move read_loose_object() below 'struct object_info'
16:  328f05c51b3 = 20:  edc28de229d fsck: report invalid types recorded in objects
17:  c5e6686765d ! 21:  e588c05f461 fsck: report invalid object type-path combinations
    @@ pack-check.c: static int verify_packfile(struct repository *r,
      ## t/t1006-cat-file.sh ##
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose object' '
      		# Swap the two to corrupt the repository
    - 		mv -v "$other_path" "$empty_path" &&
    + 		mv -f "$other_path" "$empty_path" &&
      		test_must_fail git fsck 2>err.fsck &&
     -		grep "hash mismatch" err.fsck &&
     +		grep "hash-path mismatch" err.fsck &&
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 22:00           ` Andrei Rybak
  2021-06-24 19:23         ` [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                           ` (20 subsequent siblings)
  21 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..1563b35f88c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                           ` (19 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88c..f36ec1e2f4a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                           ` (18 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74a..b71ef94329e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                           ` (17 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index b71ef94329e..dc01d7c4a9a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (3 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                           ` (16 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 06/21] cat-file tests: add corrupt loose object test
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (4 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                           ` (15 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dc01d7c4a9a..7f10a92f0e4 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (5 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                           ` (14 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f10a92f0e4..86fd2a90ca7 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 08/21] cache.h: move object functions to object-store.h
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (6 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                           ` (13 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd36..32ea1ea0474 100644
--- a/cache.h
+++ b/cache.h
@@ -1302,16 +1302,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (7 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                           ` (12 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index f233b440b22..9210e2e6fe4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1480,8 +1480,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (8 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                           ` (11 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index 9210e2e6fe4..e0ba1842272 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1340,8 +1340,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1401,14 +1402,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1463,10 +1456,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2547,6 +2540,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2561,7 +2556,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..d443964447c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -480,7 +480,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (9 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                           ` (10 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index e0ba1842272..646ca7f85d6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1450,17 +1450,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1469,8 +1472,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (10 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
                           ` (9 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 60 ++++++++++++++++++--------------------------------
 object-store.h | 14 +++++++++++-
 streaming.c    |  3 ++-
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/object-file.c b/object-file.c
index 646ca7f85d6..ef3a1517fed 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1210,11 +1210,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1225,44 +1226,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1410,9 +1392,11 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1446,11 +1430,10 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
@@ -2555,7 +2538,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/object-store.h b/object-store.h
index d443964447c..31327a7f6c3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,9 +477,21 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (11 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
                           ` (8 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

This minor formatting change serves to make a subsequent patch easier
to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index ef3a1517fed..e51cf2ca33e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1381,7 +1381,10 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	return type;
 }
 
 static int loose_object_info(struct repository *r,
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 14/21] object-file.c: stop dying in parse_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (12 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
                           ` (7 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).

Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 53 ++++++++++++++++++++++++++------------------------
 object-store.h | 13 +++++++++++--
 streaming.c    |  4 +++-
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index e51cf2ca33e..31263335af9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1322,9 +1322,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1346,15 +1344,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1384,7 +1373,11 @@ int parse_loose_header(const char *hdr,
 	if (*hdr)
 		return -1;
 
-	return type;
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1399,6 +1392,8 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int parsed_header = 0;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1430,6 +1425,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1440,18 +1437,20 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	if (!status) {
+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+			/*
+			 * oi->{sizep,typep} are meaningless unless
+			 * parse_loose_header() returns >= 0.
+			 */
+			parsed_header = 1;
+		else
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && parsed_header && *oi->typep < 0)
+		die(_("invalid object type"));
 
-	if (status >= 0 && oi->contentp) {
+	if (parsed_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
 						  *oi->sizep, oid);
 		if (!*oi->contentp) {
@@ -1466,6 +1465,8 @@ static int loose_object_info(struct repository *r,
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2531,6 +2532,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2547,12 +2549,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 31327a7f6c3..65a8e4dc6a8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -492,8 +492,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr),
 				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (13 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                           ` (6 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().

The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().

This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 31263335af9..d41f444e6cc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1393,7 +1393,7 @@ static int loose_object_info(struct repository *r,
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
 	enum object_type type_scratch;
-	int parsed_header = 0;
+	struct object_info *oi_header = NULL;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1441,18 +1441,20 @@ static int loose_object_info(struct repository *r,
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
 			/*
 			 * oi->{sizep,typep} are meaningless unless
-			 * parse_loose_header() returns >= 0.
+			 * parse_loose_header() returns >= 0. Let's
+			 * access them as "oi_header" (just an alias
+			 * for "oi") below to make that intent clear.
 			 */
-			parsed_header = 1;
+			oi_header = oi;
 		else
 			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
-	if (!allow_unknown && parsed_header && *oi->typep < 0)
+	if (!allow_unknown && oi_header && *oi_header->typep < 0)
 		die(_("invalid object type"));
 
-	if (parsed_header && oi->contentp) {
+	if (oi_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
+						  *oi_header->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (14 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                           ` (5 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index d41f444e6cc..956ca260518 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1229,7 +1229,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (15 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                           ` (4 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

I tried setting up an enum just for these three return values, but I
think the result was less readable. Let's consider doing that if we
gain even more return values. For now let's do the next best thing and
enumerate our known return values, and BUG() if we encounter one we
don't know about.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 16 +++++++++++++---
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 956ca260518..1866115a1c5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1243,7 +1243,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1264,7 +1264,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1433,9 +1433,19 @@ static int loose_object_info(struct repository *r,
 
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
-	if (hdr_ret < 0) {
+	switch (hdr_ret) {
+	case 0:
+		break;
+	case -1:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
+		break;
+	case -2:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
+	default:
+		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 65a8e4dc6a8..1151ce8e820 100644
--- a/object-store.h
+++ b/object-store.h
@@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 86fd2a90ca7..06d38e1fae6 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 18/21] fsck: don't hard die on invalid object types
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (16 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 1866115a1c5..8fb55fc6f58 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2536,7 +2536,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2544,6 +2545,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2566,8 +2568,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 1151ce8e820..94ff03072c1 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f36ec1e2f4a..e7e8decebbd 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	test_create_repo garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info'
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (17 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index 94ff03072c1..72d668b1674 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 20/21] fsck: report invalid types recorded in objects
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (18 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 8fb55fc6f58..e550ea0c7cf 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2534,9 +2534,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2544,10 +2543,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2563,7 +2561,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2585,8 +2583,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 72d668b1674..96a5970f314 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e7e8decebbd..bc541af2cfc 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v4 21/21] fsck: report invalid object type-path combinations
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (19 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3fbc5d70777..bf860b6555e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e550ea0c7cf..923ff759e19 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1039,9 +1039,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1049,8 +1051,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1075,9 +1077,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2534,6 +2536,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2583,9 +2586,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 96a5970f314..9fc69016361 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -507,7 +508,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 14188453c56..5467ead3285 100644
--- a/object.c
+++ b/object.c
@@ -261,7 +261,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 06d38e1fae6..72386cfec0e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bc541af2cfc..d76293c495a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-06-24 22:00           ` Andrei Rybak
  2021-06-24 22:34             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Andrei Rybak @ 2021-06-24 22:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan, Felipe Contreras

On 24/06/2021 21:23, Ævar Arnfjörð Bjarmason wrote:
> Refactor one of the fsck tests to use a throwaway repository. It's a
> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
> teardown of a tests so we're not leaving corrupt content for the next
> test.
> 
> We should instead simply use something like this test_create_repo
> pattern. It's both less verbose, and makes things easier to debug as a
> failing test can have their state left behind under -d without
> damaging the state for other tests.
> 
> But let's punt on that general refactoring and just change this one
> test, I'm going to change it further in subsequent commits.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>   t/t1450-fsck.sh | 34 ++++++++++++++++------------------
>   1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 5071ac63a5b..1563b35f88c 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -48,24 +48,22 @@ remove_object () {
>   	rm "$(sha1_file "$1")"
>   }
>   
> -test_expect_success 'object with bad sha1' '
> -	sha=$(echo blob | git hash-object -w --stdin) &&
> -	old=$(test_oid_to_path "$sha") &&
> -	new=$(dirname $old)/$(test_oid ff_2) &&
> -	sha="$(dirname $new)$(basename $new)" &&
> -	mv .git/objects/$old .git/objects/$new &&
> -	test_when_finished "remove_object $sha" &&
> -	git update-index --add --cacheinfo 100644 $sha foo &&
> -	test_when_finished "git read-tree -u --reset HEAD" &&
> -	tree=$(git write-tree) &&
> -	test_when_finished "remove_object $tree" &&
> -	cmt=$(echo bogus | git commit-tree $tree) &&
> -	test_when_finished "remove_object $cmt" &&
> -	git update-ref refs/heads/bogus $cmt &&
> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
> -
> -	test_must_fail git fsck 2>out &&
> -	test_i18ngrep "$sha.*corrupt" out
> +test_expect_success 'object with hash mismatch' '
> +	test_create_repo hash-mismatch &&

This patch was originally sent to ML on 2021-03-28:
	https://lore.kernel.org/git/patch-2.6-3e547289408-20210328T025618Z-avarab@gmail.com/

since then, however, commit f0d4d398e2 (test-lib: split up and deprecate
test_create_repo(), 2021-05-10) has been merged ;-) so this line should
be:

	git init hash-mismatch &&


> +	(
> +		cd hash-mismatch &&
> +		oid=$(echo blob | git hash-object -w --stdin) &&
> +		old=$(test_oid_to_path "$oid") &&
> +		new=$(dirname $old)/$(test_oid ff_2) &&
> +		oid="$(dirname $new)$(basename $new)" &&
> +		mv .git/objects/$old .git/objects/$new &&
> +		git update-index --add --cacheinfo 100644 $oid foo &&
> +		tree=$(git write-tree) &&
> +		cmt=$(echo bogus | git commit-tree $tree) &&
> +		git update-ref refs/heads/bogus $cmt &&
> +		test_must_fail git fsck 2>out &&
> +		test_i18ngrep "$oid.*corrupt" out
> +	)
>   '
>   
>   test_expect_success 'branch pointing to non-commit' '
> 


^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo
  2021-06-24 22:00           ` Andrei Rybak
@ 2021-06-24 22:34             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 22:34 UTC (permalink / raw)
  To: Andrei Rybak
  Cc: git, Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras


On Fri, Jun 25 2021, Andrei Rybak wrote:

> On 24/06/2021 21:23, Ævar Arnfjörð Bjarmason wrote:
>> Refactor one of the fsck tests to use a throwaway repository. It's a
>> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
>> teardown of a tests so we're not leaving corrupt content for the next
>> test.
>> We should instead simply use something like this test_create_repo
>> pattern. It's both less verbose, and makes things easier to debug as a
>> failing test can have their state left behind under -d without
>> damaging the state for other tests.
>> But let's punt on that general refactoring and just change this one
>> test, I'm going to change it further in subsequent commits.
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>   t/t1450-fsck.sh | 34 ++++++++++++++++------------------
>>   1 file changed, 16 insertions(+), 18 deletions(-)
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 5071ac63a5b..1563b35f88c 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -48,24 +48,22 @@ remove_object () {
>>   	rm "$(sha1_file "$1")"
>>   }
>>   -test_expect_success 'object with bad sha1' '
>> -	sha=$(echo blob | git hash-object -w --stdin) &&
>> -	old=$(test_oid_to_path "$sha") &&
>> -	new=$(dirname $old)/$(test_oid ff_2) &&
>> -	sha="$(dirname $new)$(basename $new)" &&
>> -	mv .git/objects/$old .git/objects/$new &&
>> -	test_when_finished "remove_object $sha" &&
>> -	git update-index --add --cacheinfo 100644 $sha foo &&
>> -	test_when_finished "git read-tree -u --reset HEAD" &&
>> -	tree=$(git write-tree) &&
>> -	test_when_finished "remove_object $tree" &&
>> -	cmt=$(echo bogus | git commit-tree $tree) &&
>> -	test_when_finished "remove_object $cmt" &&
>> -	git update-ref refs/heads/bogus $cmt &&
>> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
>> -
>> -	test_must_fail git fsck 2>out &&
>> -	test_i18ngrep "$sha.*corrupt" out
>> +test_expect_success 'object with hash mismatch' '
>> +	test_create_repo hash-mismatch &&
>
> This patch was originally sent to ML on 2021-03-28:
> 	https://lore.kernel.org/git/patch-2.6-3e547289408-20210328T025618Z-avarab@gmail.com/
>
> since then, however, commit f0d4d398e2 (test-lib: split up and deprecate
> test_create_repo(), 2021-05-10) has been merged ;-) so this line should
> be:
>
> 	git init hash-mismatch &&

Thanks, you'd think that the author of that code would be in a better
position than most to get this right, but apparently not :)

This series originally pre-dates that work, I see I have a couple of
more test_create_repo() in other patches here, will fix for a v5 pending
more discussion on v4, thanks!

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (20 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37         ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                             ` (21 more replies)
  21 siblings, 22 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 18/21, 20/21 and 21/21, to get there I've lib-ified more
thigs in object-file.c and the general object APIs, i.e. now we'll
return error codes instead of calling die() in these cases.

The fsck improvements are rather obscure & trivial in the grand scheme
of things, but the object API improvements make it easier to work with
in general.

A trivial re-roll of v4 to s/test_create_repo/git init/g, pointed out
by Andrei Rybak, I changed them to "git init --bare" while I was at
it. For v4 see:

https://lore.kernel.org/git/cover-00.21-00000000000-20210624T191754Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (21):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  62 +++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 407 insertions(+), 139 deletions(-)

Range-diff against v4:
 1:  2e37971c016 !  1:  a1259cdedcb fsck tests: refactor one test to use a sub-repo
    @@ t/t1450-fsck.sh: remove_object () {
     -	test_must_fail git fsck 2>out &&
     -	test_i18ngrep "$sha.*corrupt" out
     +test_expect_success 'object with hash mismatch' '
    -+	test_create_repo hash-mismatch &&
    ++	git init --bare hash-mismatch &&
     +	(
     +		cd hash-mismatch &&
     +		oid=$(echo blob | git hash-object -w --stdin) &&
     +		old=$(test_oid_to_path "$oid") &&
     +		new=$(dirname $old)/$(test_oid ff_2) &&
     +		oid="$(dirname $new)$(basename $new)" &&
    -+		mv .git/objects/$old .git/objects/$new &&
    ++		mv objects/$old objects/$new &&
     +		git update-index --add --cacheinfo 100644 $oid foo &&
     +		tree=$(git write-tree) &&
     +		cmt=$(echo bogus | git commit-tree $tree) &&
 2:  79630a99433 !  2:  634f991d7c6 fsck tests: add test for fsck-ing an unknown type
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
      '
      
     +test_expect_success 'fsck hard errors on an invalid object type' '
    -+	test_create_repo garbage-type &&
    ++	git init --bare garbage-type &&
     +	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
     +	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
     +	cat >err.expect <<-\EOF &&
 3:  2b5366bfb9d =  3:  ce9dcc423e9 cat-file tests: test for missing object with -t and -s
 4:  ea9a5ef0920 =  4:  50a20741e86 cat-file tests: test that --allow-unknown-type isn't on by default
 5:  8eaf0e6ddda =  5:  f8d0b630d0a rev-list tests: test for behavior with invalid object types
 6:  f0e9d92414e =  6:  43335e653b8 cat-file tests: add corrupt loose object test
 7:  d797d2e8e9d =  7:  a00dfea3fb8 cat-file tests: test for current --allow-unknown-type behavior
 8:  96310a0bb59 =  8:  387d7f08e61 cache.h: move object functions to object-store.h
 9:  54fb9189408 =  9:  e9520953956 object-file.c: don't set "typep" when returning non-zero
10:  9d36fcbc44a = 10:  a8b408eefe6 object-file.c: make parse_loose_header_extended() public
11:  74c308adc19 = 11:  31eee4da0e1 object-file.c: add missing braces to loose_object_info()
12:  3f52149bfde = 12:  dae5cfabd57 object-file.c: simplify unpack_loose_short_header()
13:  ba632be1520 = 13:  0d8385d8d12 object-file.c: split up ternary in parse_loose_header()
14:  ea4f446f5b1 = 14:  d1522291aee object-file.c: stop dying in parse_loose_header()
15:  aacef784eab = 15:  13d4141a21b object-file.c: guard against future bugs in loose_object_info()
16:  050cfc7808c = 16:  912c9edf362 object-file.c: return -1, not "status" from unpack_loose_header()
17:  78e3152fd94 = 17:  7e101f97646 object-file.c: return -2 on "header too long" in unpack_loose_header()
18:  f9bb1b799ac ! 18:  3c04065b0b0 fsck: don't hard die on invalid object types
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
      
     -test_expect_success 'fsck hard errors on an invalid object type' '
     +test_expect_success 'fsck error and recovery on invalid object type' '
    - 	test_create_repo garbage-type &&
    + 	git init --bare garbage-type &&
      	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
      	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
     -	cat >err.expect <<-\EOF &&
19:  acbea7e2a2a = 19:  ad920362594 object-store.h: move read_loose_object() below 'struct object_info'
20:  edc28de229d ! 20:  02a148af5cf fsck: report invalid types recorded in objects
    @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      '
      
     +test_expect_success 'object with hash and type mismatch' '
    -+	test_create_repo hash-type-mismatch &&
    ++	git init --bare hash-type-mismatch &&
     +	(
     +		cd hash-type-mismatch &&
     +		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
     +		old=$(test_oid_to_path "$oid") &&
     +		new=$(dirname $old)/$(test_oid ff_2) &&
     +		oid="$(dirname $new)$(basename $new)" &&
    -+		mv .git/objects/$old .git/objects/$new &&
    ++		mv objects/$old objects/$new &&
     +		git update-index --add --cacheinfo 100644 $oid foo &&
     +		tree=$(git write-tree) &&
     +		cmt=$(echo bogus | git commit-tree $tree) &&
21:  e588c05f461 = 21:  730e0a6f805 fsck: report invalid object type-path combinations
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                             ` (20 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..7becab5ba1e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                             ` (19 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 7becab5ba1e..f10d6f7b7e8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                             ` (18 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74a..b71ef94329e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                             ` (17 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index b71ef94329e..dc01d7c4a9a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                             ` (16 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 06/21] cat-file tests: add corrupt loose object test
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                             ` (15 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dc01d7c4a9a..7f10a92f0e4 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                             ` (14 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f10a92f0e4..86fd2a90ca7 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 08/21] cache.h: move object functions to object-store.h
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                             ` (13 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd36..32ea1ea0474 100644
--- a/cache.h
+++ b/cache.h
@@ -1302,16 +1302,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (7 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                             ` (12 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index f233b440b22..9210e2e6fe4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1480,8 +1480,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (8 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                             ` (11 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index 9210e2e6fe4..e0ba1842272 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1340,8 +1340,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1401,14 +1402,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1463,10 +1456,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2547,6 +2540,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2561,7 +2556,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..d443964447c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -480,7 +480,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (9 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                             ` (10 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index e0ba1842272..646ca7f85d6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1450,17 +1450,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1469,8 +1472,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (10 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
                             ` (9 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 60 ++++++++++++++++++--------------------------------
 object-store.h | 14 +++++++++++-
 streaming.c    |  3 ++-
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/object-file.c b/object-file.c
index 646ca7f85d6..ef3a1517fed 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1210,11 +1210,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1225,44 +1226,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1410,9 +1392,11 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1446,11 +1430,10 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
@@ -2555,7 +2538,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/object-store.h b/object-store.h
index d443964447c..31327a7f6c3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,9 +477,21 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (11 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
                             ` (8 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

This minor formatting change serves to make a subsequent patch easier
to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index ef3a1517fed..e51cf2ca33e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1381,7 +1381,10 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	return type;
 }
 
 static int loose_object_info(struct repository *r,
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 14/21] object-file.c: stop dying in parse_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (12 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
                             ` (7 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).

Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 53 ++++++++++++++++++++++++++------------------------
 object-store.h | 13 +++++++++++--
 streaming.c    |  4 +++-
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index e51cf2ca33e..31263335af9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1322,9 +1322,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1346,15 +1344,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1384,7 +1373,11 @@ int parse_loose_header(const char *hdr,
 	if (*hdr)
 		return -1;
 
-	return type;
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1399,6 +1392,8 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int parsed_header = 0;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1430,6 +1425,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1440,18 +1437,20 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	if (!status) {
+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+			/*
+			 * oi->{sizep,typep} are meaningless unless
+			 * parse_loose_header() returns >= 0.
+			 */
+			parsed_header = 1;
+		else
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && parsed_header && *oi->typep < 0)
+		die(_("invalid object type"));
 
-	if (status >= 0 && oi->contentp) {
+	if (parsed_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
 						  *oi->sizep, oid);
 		if (!*oi->contentp) {
@@ -1466,6 +1465,8 @@ static int loose_object_info(struct repository *r,
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2531,6 +2532,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2547,12 +2549,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 31327a7f6c3..65a8e4dc6a8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -492,8 +492,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr),
 				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (13 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().

The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().

This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 31263335af9..d41f444e6cc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1393,7 +1393,7 @@ static int loose_object_info(struct repository *r,
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
 	enum object_type type_scratch;
-	int parsed_header = 0;
+	struct object_info *oi_header = NULL;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1441,18 +1441,20 @@ static int loose_object_info(struct repository *r,
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
 			/*
 			 * oi->{sizep,typep} are meaningless unless
-			 * parse_loose_header() returns >= 0.
+			 * parse_loose_header() returns >= 0. Let's
+			 * access them as "oi_header" (just an alias
+			 * for "oi") below to make that intent clear.
 			 */
-			parsed_header = 1;
+			oi_header = oi;
 		else
 			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
-	if (!allow_unknown && parsed_header && *oi->typep < 0)
+	if (!allow_unknown && oi_header && *oi_header->typep < 0)
 		die(_("invalid object type"));
 
-	if (parsed_header && oi->contentp) {
+	if (oi_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
+						  *oi_header->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (14 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index d41f444e6cc..956ca260518 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1229,7 +1229,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (15 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

I tried setting up an enum just for these three return values, but I
think the result was less readable. Let's consider doing that if we
gain even more return values. For now let's do the next best thing and
enumerate our known return values, and BUG() if we encounter one we
don't know about.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 16 +++++++++++++---
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 956ca260518..1866115a1c5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1243,7 +1243,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1264,7 +1264,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1433,9 +1433,19 @@ static int loose_object_info(struct repository *r,
 
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
-	if (hdr_ret < 0) {
+	switch (hdr_ret) {
+	case 0:
+		break;
+	case -1:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
+		break;
+	case -2:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
+	default:
+		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 65a8e4dc6a8..1151ce8e820 100644
--- a/object-store.h
+++ b/object-store.h
@@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 86fd2a90ca7..06d38e1fae6 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 18/21] fsck: don't hard die on invalid object types
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (16 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 1866115a1c5..8fb55fc6f58 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2536,7 +2536,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2544,6 +2545,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2566,8 +2568,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 1151ce8e820..94ff03072c1 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f10d6f7b7e8..d8303db9709 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info'
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (17 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index 94ff03072c1..72d668b1674 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 20/21] fsck: report invalid types recorded in objects
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (18 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 8fb55fc6f58..e550ea0c7cf 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2534,9 +2534,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2544,10 +2543,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2563,7 +2561,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2585,8 +2583,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 72d668b1674..96a5970f314 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d8303db9709..da2658155c7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v5 21/21] fsck: report invalid object type-path combinations
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (19 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3fbc5d70777..bf860b6555e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e550ea0c7cf..923ff759e19 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1039,9 +1039,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1049,8 +1051,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1075,9 +1077,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2534,6 +2536,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2583,9 +2586,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 96a5970f314..9fc69016361 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -507,7 +508,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 14188453c56..5467ead3285 100644
--- a/object.c
+++ b/object.c
@@ -261,7 +261,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 06d38e1fae6..72386cfec0e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index da2658155c7..7d0d57564b5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (20 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57           ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                               ` (23 more replies)
  21 siblings, 24 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 19/22, 21/22 and 22/22. To get there I've lib-ified more
thigs in object-file.c and the general object APIs, i.e. now we'll
return error codes instead of calling die() in these cases.

This series has been in "needs review" state for a while. This re-roll
is mainly to bump it for the list's attention, but while I was at it I
addressed point from Jonathan Tan raised in a previous round: use an
enum instead of int for the unpack_loose_header() return value.

I think the v3 of this got a detailed review, and the v3->v4 delta
wasn't that big (although one commit in particular is a bit tricky):

    https://lore.kernel.org/git/cover-00.21-00000000000-20210624T191754Z-avarab@gmail.com/

I.e. it's just this part being substantially different from v3:

    https://lore.kernel.org/git/patch-12.21-3f52149bfde-20210624T191755Z-avarab@gmail.com/
    https://lore.kernel.org/git/patch-13.21-ba632be1520-20210624T191755Z-avarab@gmail.com/
    https://lore.kernel.org/git/patch-14.21-ea4f446f5b1-20210624T191755Z-avarab@gmail.com/

The v5 was then a trivial change of moving away from the
test_create_repo() test helper:
https://lore.kernel.org/git/cover-00.21-00000000000-20210710T133203Z-avarab@gmail.com/

Perhaps this whole thing just needs to be split up, e.g. the first 7
commits could be some "improve fsck tests" series, the middle part
some general small refactoring, and the real meaty work as of 14/22
its own third series...

But I've resisted that because I think while this is rather long the
first parts adding tests for missing behavior should assure reviewers
that the later changes are all properly tested.

I know the end-goal here isn't all that exciting in itself, but I've
really wanted to improve various things around fsck-ing, bad object
reporting etc., and that's been stalled on this first step for a
while.

Ævar Arnfjörð Bjarmason (22):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  cache.h: move object functions to object-store.h
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  75 ++++++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  29 ++++---
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 432 insertions(+), 146 deletions(-)

Range-diff against v5:
 1:  a1259cdedcb =  1:  ebe89f65354 fsck tests: refactor one test to use a sub-repo
 2:  634f991d7c6 =  2:  9072eef3be3 fsck tests: add test for fsck-ing an unknown type
 3:  ce9dcc423e9 =  3:  d442a309178 cat-file tests: test for missing object with -t and -s
 4:  50a20741e86 =  4:  0358273022f cat-file tests: test that --allow-unknown-type isn't on by default
 5:  f8d0b630d0a =  5:  82db40ebf8a rev-list tests: test for behavior with invalid object types
 6:  43335e653b8 =  6:  d1ffd21acc5 cat-file tests: add corrupt loose object test
 7:  a00dfea3fb8 =  7:  22ab12c2282 cat-file tests: test for current --allow-unknown-type behavior
 9:  e9520953956 =  8:  38e4266772d object-file.c: don't set "typep" when returning non-zero
 8:  387d7f08e61 =  9:  5b9278e7bb4 cache.h: move object functions to object-store.h
10:  a8b408eefe6 = 10:  b15ad53414b object-file.c: make parse_loose_header_extended() public
11:  31eee4da0e1 = 11:  326eb74545d object-file.c: add missing braces to loose_object_info()
12:  dae5cfabd57 = 12:  4f829e9b727 object-file.c: simplify unpack_loose_short_header()
13:  0d8385d8d12 = 13:  90489d9e6ec object-file.c: split up ternary in parse_loose_header()
14:  d1522291aee = 14:  7c9819d37c5 object-file.c: stop dying in parse_loose_header()
15:  13d4141a21b = 15:  3fb660ff944 object-file.c: guard against future bugs in loose_object_info()
16:  912c9edf362 = 16:  9e7dbfb4aa3 object-file.c: return -1, not "status" from unpack_loose_header()
17:  7e101f97646 ! 17:  f28c4f0dfb4 object-file.c: return -2 on "header too long" in unpack_loose_header()
    @@ Commit message
         MAX_HEADER_LEN limit, or other negative values for "unable to unpack
         <OID> header".
     
    -    I tried setting up an enum just for these three return values, but I
    -    think the result was less readable. Let's consider doing that if we
    -    gain even more return values. For now let's do the next best thing and
    -    enumerate our known return values, and BUG() if we encounter one we
    -    don't know about.
    -
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-file.c ##
 -:  ----------- > 18:  1b7173a5b5b object-file.c: use "enum" return type for unpack_loose_header()
18:  3c04065b0b0 = 19:  ad1614dbb8d fsck: don't hard die on invalid object types
19:  ad920362594 = 20:  3bf3cf2299d object-store.h: move read_loose_object() below 'struct object_info'
20:  02a148af5cf = 21:  974f650cddf fsck: report invalid types recorded in objects
21:  730e0a6f805 ! 22:  804673a17b0 fsck: report invalid object type-path combinations
    @@ object-store.h: int oid_object_info_extended(struct repository *r,
      		      void **contents,
      		      struct object_info *oi,
      		      unsigned int oi_flags);
    -@@ object-store.h: int unpack_loose_header(git_zstream *stream, unsigned char *map,
    +@@ object-store.h: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
      int parse_loose_header(const char *hdr, struct object_info *oi);
      
      int check_object_signature(struct repository *r, const struct object_id *oid,
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-16 19:40               ` Taylor Blau
  2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                               ` (22 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..7becab5ba1e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-16 19:51               ` Taylor Blau
  2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                               ` (21 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 7becab5ba1e..f10d6f7b7e8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-16 19:57               ` Taylor Blau
  2021-09-07 10:57             ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                               ` (20 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..3a7b138fe4e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                               ` (19 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 3a7b138fe4e..5e05ea0861e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-09-07 10:57             ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 20:40               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                               ` (18 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 06/22] cat-file tests: add corrupt loose object test
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                               ` (17 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5e05ea0861e..8f3516db188 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                               ` (16 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8f3516db188..98729f1edfc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:29               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                               ` (15 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index a8be8994814..bda3497d5ca 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1503,8 +1503,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 09/22] cache.h: move object functions to object-store.h
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (7 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:33               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                               ` (14 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index d23de693680..11a04a93436 100644
--- a/cache.h
+++ b/cache.h
@@ -1313,16 +1313,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index d24915ced1b..eb4876ec983 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,4 +485,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (8 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:39               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                               ` (13 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index bda3497d5ca..7a47af68bd8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1363,8 +1363,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1424,14 +1425,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1486,10 +1479,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2573,6 +2566,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2587,7 +2582,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index eb4876ec983..25e641a606f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -488,7 +488,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (9 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                               ` (12 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7a47af68bd8..878a4298c9b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1473,17 +1473,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1492,8 +1495,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (10 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
                               ` (11 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 60 ++++++++++++++++++--------------------------------
 object-store.h | 14 +++++++++++-
 streaming.c    |  3 ++-
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/object-file.c b/object-file.c
index 878a4298c9b..2dd4cdd1ae0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,11 +1233,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1248,44 +1249,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1433,9 +1415,11 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1469,11 +1453,10 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
@@ -2581,7 +2564,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/object-store.h b/object-store.h
index 25e641a606f..4064710ae29 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,9 +485,21 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (11 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:58               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
                               ` (10 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

This minor formatting change serves to make a subsequent patch easier
to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 2dd4cdd1ae0..7c6a865a6c0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1404,7 +1404,10 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	return type;
 }
 
 static int loose_object_info(struct repository *r,
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 14/22] object-file.c: stop dying in parse_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (12 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  2:32               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
                               ` (9 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).

Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 53 ++++++++++++++++++++++++++------------------------
 object-store.h | 13 +++++++++++--
 streaming.c    |  4 +++-
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7c6a865a6c0..d656960422d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1345,9 +1345,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1369,15 +1367,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1407,7 +1396,11 @@ int parse_loose_header(const char *hdr,
 	if (*hdr)
 		return -1;
 
-	return type;
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1422,6 +1415,8 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int parsed_header = 0;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1453,6 +1448,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1463,18 +1460,20 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	if (!status) {
+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+			/*
+			 * oi->{sizep,typep} are meaningless unless
+			 * parse_loose_header() returns >= 0.
+			 */
+			parsed_header = 1;
+		else
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && parsed_header && *oi->typep < 0)
+		die(_("invalid object type"));
 
-	if (status >= 0 && oi->contentp) {
+	if (parsed_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
 						  *oi->sizep, oid);
 		if (!*oi->contentp) {
@@ -1489,6 +1488,8 @@ static int loose_object_info(struct repository *r,
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2557,6 +2558,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2573,12 +2575,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 4064710ae29..584bf5556af 100644
--- a/object-store.h
+++ b/object-store.h
@@ -500,8 +500,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr),
 				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (13 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  2:35               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                               ` (8 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().

The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().

This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index d656960422d..ae6a37ab5fb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1416,7 +1416,7 @@ static int loose_object_info(struct repository *r,
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
 	enum object_type type_scratch;
-	int parsed_header = 0;
+	struct object_info *oi_header = NULL;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1464,18 +1464,20 @@ static int loose_object_info(struct repository *r,
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
 			/*
 			 * oi->{sizep,typep} are meaningless unless
-			 * parse_loose_header() returns >= 0.
+			 * parse_loose_header() returns >= 0. Let's
+			 * access them as "oi_header" (just an alias
+			 * for "oi") below to make that intent clear.
 			 */
-			parsed_header = 1;
+			oi_header = oi;
 		else
 			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
-	if (!allow_unknown && parsed_header && *oi->typep < 0)
+	if (!allow_unknown && oi_header && *oi_header->typep < 0)
 		die(_("invalid object type"));
 
-	if (parsed_header && oi->contentp) {
+	if (oi_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
+						  *oi_header->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (14 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                               ` (7 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index ae6a37ab5fb..11df4485147 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1252,7 +1252,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (15 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 16 +++++++++++++---
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 11df4485147..0cb5287d3ef 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1266,7 +1266,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1287,7 +1287,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1456,9 +1456,19 @@ static int loose_object_info(struct repository *r,
 
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
-	if (hdr_ret < 0) {
+	switch (hdr_ret) {
+	case 0:
+		break;
+	case -1:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
+		break;
+	case -2:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
+	default:
+		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 584bf5556af..e896b813f24 100644
--- a/object-store.h
+++ b/object-store.h
@@ -489,13 +489,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 98729f1edfc..43a9f4e7f0c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (16 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  2:45               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

In the preceding commits we changed and documented
unpack_loose_header() from return any negative value or zero, to only
-2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
type and use it, and have the compiler assert that we're exhaustively
covering all return values. This gets rid of the need for having a
"default" BUG() case in loose_object_info().

I'm on the fence about whether this is more readable or worth it, but
since it was suggested in [1] to do this let's go for it.

1. https://lore.kernel.org/git/20210527175433.2673306-1-jonathantanmy@google.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 20 ++++++++++----------
 object-store.h | 27 ++++++++++++++++++++-------
 streaming.c    | 27 ++++++++++++++++-----------
 3 files changed, 46 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index 0cb5287d3ef..9484c7ce2be 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,10 +1233,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz,
-			struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *header)
 {
 	int status;
 
@@ -1411,7 +1413,7 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
-	int hdr_ret;
+	enum unpack_loose_header_result hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
@@ -1457,18 +1459,16 @@ static int loose_object_info(struct repository *r,
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
 	switch (hdr_ret) {
-	case 0:
+	case UNPACK_LOOSE_HEADER_RESULT_OK:
 		break;
-	case -1:
+	case UNPACK_LOOSE_HEADER_RESULT_BAD:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 		break;
-	case -2:
+	case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
 		status = error(_("header for %s too long, exceeds %d bytes"),
 			       oid_to_hex(oid), MAX_HEADER_LEN);
 		break;
-	default:
-		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index e896b813f24..ac55b02f15a 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,23 +485,36 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+enum unpack_loose_header_result {
+	UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG = -2,
+	UNPACK_LOOSE_HEADER_RESULT_BAD = -1,
+	UNPACK_LOOSE_HEADER_RESULT_OK,
+
+};
+
 /**
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error. If the
- * header exceeds MAX_HEADER_LEN -2 will be returned.
+ * Returns UNPACK_LOOSE_HEADER_RESULT_OK on success. Returns
+ * UNPACK_LOOSE_HEADER_RESULT_BAD values on error, or if the header
+ * exceeds MAX_HEADER_LEN UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG will
+ * be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header(), -2 will still be returned from this
- * function to indicate that the header was too long.
+ * with parse_loose_header(), UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG
+ * will still be returned from this function to indicate that the
+ * header was too long.
  */
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
-			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *hdrbuf);
 
 /**
  * parse_loose_header() parses the starting "<type> <len>\0" of an
diff --git a/streaming.c b/streaming.c
index c3dc241d6a5..3e5045c004d 100644
--- a/streaming.c
+++ b/streaming.c
@@ -224,24 +224,25 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      enum object_type *type)
 {
 	struct object_info oi = OBJECT_INFO_INIT;
+	enum unpack_loose_header_result hdr_ret;
 	oi.sizep = &st->size;
 	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
-	if ((unpack_loose_header(&st->z,
-				 st->u.loose.mapped,
-				 st->u.loose.mapsize,
-				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr),
-				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
-	    *type < 0) {
-		git_inflate_end(&st->z);
-		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-		return -1;
+	hdr_ret = unpack_loose_header(&st->z, st->u.loose.mapped,
+				      st->u.loose.mapsize, st->u.loose.hdr,
+				      sizeof(st->u.loose.hdr), NULL);
+	switch (hdr_ret) {
+	case UNPACK_LOOSE_HEADER_RESULT_OK:
+		break;
+	case UNPACK_LOOSE_HEADER_RESULT_BAD:
+	case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
+		goto error;
 	}
+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
+		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
@@ -250,6 +251,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->read = read_istream_loose;
 
 	return 0;
+error:
+	git_inflate_end(&st->z);
+	munmap(st->u.loose.mapped, st->u.loose.mapsize);
+	return -1;
 }
 
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 19/22] fsck: don't hard die on invalid object types
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (17 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  3:37               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 9484c7ce2be..0e6937fad73 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2562,7 +2562,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2570,6 +2571,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2592,8 +2594,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index ac55b02f15a..c268662f5ba 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,7 +253,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f10d6f7b7e8..d8303db9709 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info'
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (18 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                               ` (3 subsequent siblings)
  23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index c268662f5ba..dc638335e7d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -242,20 +242,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -396,6 +382,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 21/22] fsck: report invalid types recorded in objects
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (19 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  3:57               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
                               ` (2 subsequent siblings)
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 0e6937fad73..f4850ba62b4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2589,7 +2587,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2611,8 +2609,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index dc638335e7d..f3045148b89 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -391,9 +392,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d8303db9709..da2658155c7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v6 22/22] fsck: report invalid object type-path combinations
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (20 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  4:06               ` Taylor Blau
  2021-09-17  4:08             ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
  23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8336466865c..9f540e0236a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1419,7 +1419,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index f4850ba62b4..07b3e4d9b4b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1062,9 +1062,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1072,8 +1074,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1098,9 +1100,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2560,6 +2562,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2609,9 +2612,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index f3045148b89..3c4ada23f5d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -392,6 +392,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -528,7 +529,8 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 43a9f4e7f0c..39fe11bc92c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index da2658155c7..7d0d57564b5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:40               ` Taylor Blau
  2021-09-17  9:27                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 19:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Refactor one of the fsck tests to use a throwaway repository. It's a
> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
> teardown of a tests so we're not leaving corrupt content for the next
> test.

OK. I seem to recall you advocating against this pattern elsewhere[1], but
this is a good example of why it can sometimes make writing tests much
easier when not having to reason about what leaks out of running a test.

[1]: https://lore.kernel.org/git/87zgsnj0q0.fsf@evledraar.gmail.com/,
although after re-reading it it looks like you were more focused on the
unnecessary "rm -fr repo" there and not the "git init +
test_when_finished rm -fr" pattern.

> -test_expect_success 'object with bad sha1' '
> -	sha=$(echo blob | git hash-object -w --stdin) &&
> -	old=$(test_oid_to_path "$sha") &&
> -	new=$(dirname $old)/$(test_oid ff_2) &&
> -	sha="$(dirname $new)$(basename $new)" &&
> -	mv .git/objects/$old .git/objects/$new &&
> -	test_when_finished "remove_object $sha" &&
> -	git update-index --add --cacheinfo 100644 $sha foo &&
> -	test_when_finished "git read-tree -u --reset HEAD" &&
> -	tree=$(git write-tree) &&
> -	test_when_finished "remove_object $tree" &&
> -	cmt=$(echo bogus | git commit-tree $tree) &&
> -	test_when_finished "remove_object $cmt" &&
> -	git update-ref refs/heads/bogus $cmt &&
> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
> -
> -	test_must_fail git fsck 2>out &&
> -	test_i18ngrep "$sha.*corrupt" out
> +test_expect_success 'object with hash mismatch' '
> +	git init --bare hash-mismatch &&
> +	(
> +		cd hash-mismatch &&
> +		oid=$(echo blob | git hash-object -w --stdin) &&
> +		old=$(test_oid_to_path "$oid") &&
> +		new=$(dirname $old)/$(test_oid ff_2) &&
> +		oid="$(dirname $new)$(basename $new)" &&
> +		mv objects/$old objects/$new &&
> +		git update-index --add --cacheinfo 100644 $oid foo &&
> +		tree=$(git write-tree) &&
> +		cmt=$(echo bogus | git commit-tree $tree) &&
> +		git update-ref refs/heads/bogus $cmt &&
> +		test_must_fail git fsck 2>out &&
> +		test_i18ngrep "$oid.*corrupt" out
> +	)
>  '

This all looks fine to me. The translation is s/sha/oid and removing all
of the now-unnecessary test_when_finished calls.

But the test_i18ngrep (which isn't new) could probably also stand to get
cleaned up and converted to a normal grep.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
  2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:51               ` Taylor Blau
  2021-09-17  9:39                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 19:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the fsck tests by checking what we do when we
> encounter an unknown "garbage" type produced with hash-object's
> --literally option.
>
> This behavior needs to be improved, which'll be done in subsequent
> patches, but for now let's test for the current behavior.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t1450-fsck.sh | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 7becab5ba1e..f10d6f7b7e8 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
>  	test_i18ngrep "bad index file" errors
>  '
>
> +test_expect_success 'fsck hard errors on an invalid object type' '
> +	git init --bare garbage-type &&

I wondered whether it was really possible to not cover this, since I
figured such a test may have just been hiding elsewhere. But we really
do seem to be lacking coverage. So, adding this test is good.

> +	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
> +	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&

I'm nitpicking, but I find the -C garbage-type pattern less than ideal
for two reasons:

  - It makes every line longer (since "-C garbage type" is wider than an
    8-wide tab, even indenting this in a subshell would take up fewer
    characters visually)

  - It pollutes the current directory with things like "err.expect" and
    "err.actual" that have nothing to do with the current directory (and
    much more to do with the garbage-type repository within it).

So I don't care, really, but it may be better to just put all of this in
a subshell.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:57               ` Taylor Blau
  2021-09-16 20:01                 ` Taylor Blau
  2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 19:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Test for what happens when the -t and -s flags are asked to operate on
> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
> -s flags are the only ones that can be combined with
> --allow-unknown-type, so let's test with and without that flag.

I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
and "with `--allow-unknown-type` and without `--allow-unknown-type`".

Testing both the presence and absence of `--allow-unknown-type` seems
useful to me, but I'm not sure what testing `-t` and `-s` separately
buys us.

(If you really feel the need test both, I'd encourage looping like:

    for arg in -t -s
    do
      test_must_fail git cat-file $arg $missing_oid >out 2>err &&
      test_must_be_empty out &&
      test_cmp expect.err err &&

      test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
      test_must_be_empty out &&
      test_cmp expect.err err
    done &&

but I would be equally or perhaps even happier to just have one of the
two tests).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-16 19:57               ` Taylor Blau
@ 2021-09-16 20:01                 ` Taylor Blau
  2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 20:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Thu, Sep 16, 2021 at 03:57:30PM -0400, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Test for what happens when the -t and -s flags are asked to operate on
> > a missing object, this extends tests added in 3e370f9faf0 (t1006: add
> > tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
> > -s flags are the only ones that can be combined with
> > --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".

Ah. Reading the next patch makes me feel even more certain of this
advice. Consider squashing this and the next patch with my suggestion
to use a loop below?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
  2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-16 20:40               ` Taylor Blau
  2021-09-17 11:59                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 20:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the tests for the "rev-list --disk-usage" feature
> added in 16950f8384a (rev-list: add --disk-usage option for
> calculating disk usage, 2021-02-09) to test for what happens when it's
> asked to calculate the disk usage of invalid object types.

I'm not sure that I agree this is a blindspot, or at least one worth
testing. Is the goal to add tests to every Git command that might have
to do something with a corrupt object and make sure that it is handled
correctly?

I'm not sure that doing so would be useful, or at the very least that it
would be worth the effort. That's not to say I'm not interested in
having tests fail when we don't handle corrupt objects correctly, but
more to say that I think there are so many parts of Git that might touch
a corrupt object that trying to test all of them seems like a losing
battle.

Assuming that this is a useful direction, though...

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t6115-rev-list-du.sh | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
> index b4aef32b713..edb2ed55846 100755
> --- a/t/t6115-rev-list-du.sh
> +++ b/t/t6115-rev-list-du.sh
> @@ -48,4 +48,15 @@ check_du HEAD
>  check_du --objects HEAD
>  check_du --objects HEAD^..HEAD
>
> +test_expect_success 'setup garbage repository' '
> +	git clone --bare . garbage.git &&

Since this is cloned within the working directory, should we bother to
clean this up to avoid munging with future tests?

> +	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
> +	git -C garbage.git rev-list --objects --all --disk-usage &&
> +
> +	# Manually create a ref because "update-ref", "tag" etc. have
> +	# no corresponding --literally option.
> +	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
> +	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage

See also my earlier comment about this being much more readable in a
sub-shell.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
  2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:29               ` Taylor Blau
  2021-09-16 21:56                 ` Jeff King
  0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> When the loose_object_info() function returns an error stop faking up
> the "oi->typep" to OBJ_BAD. Let the return value of the function
> itself suffice. This code cleanup simplifies subsequent changes.

The obvious danger (which you mention) is that somebody is relying on
what typep points to, and is reading it even if we returned non-zero
from whatever called this function.

Hopefully nobody is, but this change makes me a little uncomfortable
nonetheless, since there are so many potential callers (even though this
function has only one caller, it doesn't take long before the number of
indirect callers explodes).

So it would be nice if we could do without it, but you claim that it
simplifies changes that happen later on. So let's continue to see if we
really do need it...

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 09/22] cache.h: move object functions to object-store.h
  2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:33               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Move the declaration of some ancient object functions added in
> e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
> 2005-06-01) from cache.h to object-store.h. This continues work
> started in cbd53a2193d (object-store: move object access functions to
> object-store.h, 2018-05-15).

This builds with DEVELOPER=1, likely as a result of all of the includes
on object-store.h added in cbd53a2193d.

> diff --git a/cache.h b/cache.h
> index d23de693680..11a04a93436 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1313,16 +1313,6 @@ char *xdg_cache_home(const char *filename);
>
>  int git_open_cloexec(const char *name, int flags);
>  #define git_open(name) git_open_cloexec(name, O_RDONLY)
> -int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
> -int parse_loose_header(const char *hdr, unsigned long *sizep);
> -
> -int check_object_signature(struct repository *r, const struct object_id *oid,
> -			   void *buf, unsigned long size, const char *type);
> -
> -int finalize_object_file(const char *tmpfile, const char *filename);
> -
> -/* Helper to check and "touch" a file */

I'm fine to drop this comment, by the way, since it does not add any
explanation to what a function called check_and_freshen_file() might do
;).

> -int check_and_freshen_file(const char *fn, int freshen);

Everything else looks fine, although it's unclear how this is related to
the rest of your series.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public
  2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:39               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Make the parse_loose_header_extended() function public and remove the
> parse_loose_header() wrapper. The only direct user of it outside of
> object-file.c itself was in streaming.c, that caller can simply pass
> the required "struct object-info *" instead.
>
> This change is being done in preparation for teaching
> read_loose_object() to accept a flag to pass to
> parse_loose_header(). It isn't strictly necessary for that change, we
> could simply use parse_loose_header_extended() there, but will leave
> the API in a better end state.

All seems reasonable. I agree that this is not a necessary step, but at
least the clean-up is self contained and an easy enough read.

The flag that read_loose_object() is going to start passing to
parse_loose_header() is left a bit vague, but I'll continue reading to
figure out what it is.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
  2021-09-16 21:29               ` Taylor Blau
@ 2021-09-16 21:56                 ` Jeff King
  0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-09-16 21:56 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Jonathan Tan, Andrei Rybak

On Thu, Sep 16, 2021 at 05:29:30PM -0400, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > When the loose_object_info() function returns an error stop faking up
> > the "oi->typep" to OBJ_BAD. Let the return value of the function
> > itself suffice. This code cleanup simplifies subsequent changes.
> 
> The obvious danger (which you mention) is that somebody is relying on
> what typep points to, and is reading it even if we returned non-zero
> from whatever called this function.
> 
> Hopefully nobody is, but this change makes me a little uncomfortable
> nonetheless, since there are so many potential callers (even though this
> function has only one caller, it doesn't take long before the number of
> indirect callers explodes).
> 
> So it would be nice if we could do without it, but you claim that it
> simplifies changes that happen later on. So let's continue to see if we
> really do need it...

I'm actually reasonable comfortable with this patch. If we return an
error from the *_object_info() functions, then I think all bets are off
on what is in the resulting object_info struct. E.g., we'd already leave
sizep uninitialized in such a case.

It feels like oi->typep may be a little bit special because we conflate
"error" and "type" in the return from oid_object_info(). But
oid_object_info_extended() does not do that, and the innards of
oid_object_info() do the right thing.

Of course we _have_ been setting typep in this way for a while, so it's
worth making sure nobody is depending on. Notably packed_object_info()
does not behave in this way; if it hits an error, typep may be left
unset. So any oid_object_info_extended() callers depending on this were
already potentially buggy. I'd be OK with a quick sweep of the hits of
"git grep typep" here.

I just did that, and all the sites look pretty reasonable (they call
oid_object_info_extended() and bail as soon as they see that it fails).

-Peff

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header()
  2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:58               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:08PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This minor formatting change serves to make a subsequent patch easier
> to read.

Hmm. I'm not sure if I agree.

As far as I can tell from reading the subsequent patch, this is designed
to make it easier to add a comment above "return type" that pertains
just to the case when !*hdr.

I think it would have been fine to go from the ternary to this style
with the comment in a single patch. But I also think it would have been
fine to add the comment above the ternary and instead start it off by
saying "when !*hdr, ...".

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-16 19:57               ` Taylor Blau
  2021-09-16 20:01                 ` Taylor Blau
@ 2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-16 22:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Test for what happens when the -t and -s flags are asked to operate on
>> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
>> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
>> -s flags are the only ones that can be combined with
>> --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".
>
> Testing both the presence and absence of `--allow-unknown-type` seems
> useful to me, but I'm not sure what testing `-t` and `-s` separately
> buys us.
>
> (If you really feel the need test both, I'd encourage looping like:

Thanks, I'll try to simplify it.

>     for arg in -t -s
>     do
>       test_must_fail git cat-file $arg $missing_oid >out 2>err &&
>       test_must_be_empty out &&
>       test_cmp expect.err err &&
>
>       test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
>       test_must_be_empty out &&
>       test_cmp expect.err err
>     done &&
>
> but I would be equally or perhaps even happier to just have one of the
> two tests).

A loop like that can be further simplified as just (just inlining
arg=-s):

	test_must_fail git cat-file -s $missing_oid >out 2>err &&
	test_must_be_empty out &&
	test_cmp expect.err err &&

	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
	test_must_be_empty out &&
	test_cmp expect.err err

:)

I.e. unless you end &&-chains in loops in the test framework with an ||
return 1 you're only testing your last iteration. Aside from whatever
I'm doing here I generally prefer to either just spell it out twice (if
small enough), or:

    for arg in -t -s
    do
        test_expect_success '...' "[... use $arg ...]"
    done

Which both nicely get around the issue of that easy-to-make mistake.

We've got some in-tree tests that are broken this way, well, at least
4cf67869b2a (list-objects.c: don't segfault for missing cmdline objects,
2018-12-05). But I think I'll leave that for a #leftoverbits submission
given my outstanding patch queue..., oh there's another one in
t1010-mktree.sh ... :)

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 14/22] object-file.c: stop dying in parse_loose_header()
  2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-09-17  2:32               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  2:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
> It thus makes sense to further libify the interface so that it stops
> calling die() when it encounters OBJ_BAD, and instead rely on its
> callers to check the populated "oi->typep".

Hmm. I thought we got rid of this behavior in a previous commit? Perhaps
I'm thinking of something else, but I would certainly appreciate a
clarification :).

> @@ -1369,15 +1367,6 @@ int parse_loose_header(const char *hdr,
>  	type = type_from_string_gently(type_buf, type_len, 1);
>  	if (oi->type_name)
>  		strbuf_add(oi->type_name, type_buf, type_len);
> -	/*
> -	 * Set type to 0 if its an unknown object and
> -	 * we're obtaining the type using '--allow-unknown-type'
> -	 * option.
> -	 */
> -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
> -		type = 0;
> -	else if (type < 0)
> -		die(_("invalid object type"));

Good, this part moved to loose_object_info() as you said it would.

> @@ -1463,18 +1460,20 @@ static int loose_object_info(struct repository *r,
>  		status = error(_("unable to unpack %s header"),
>  			       oid_to_hex(oid));
>  	}
> -
> -	if (status < 0) {
> -		/* Do nothing */
> -	} else if (hdrbuf.len) {
> -		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
> -			status = error(_("unable to parse %s header with --allow-unknown-type"),
> -				       oid_to_hex(oid));
> -	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
> -		status = error(_("unable to parse %s header"), oid_to_hex(oid));
> +	if (!status) {
> +		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
> +			/*
> +			 * oi->{sizep,typep} are meaningless unless
> +			 * parse_loose_header() returns >= 0.
> +			 */

This double negative is a little confusing. Clearer to say:
"oi->{size,type}p is meaningless if parse_loose_header() returns < 0"?

But I was also a little confused to see that the expression we are
checking here is just that parse_loose_header() returned zero. What
about other positive values?

I think we should either update the comment to say "unless it returns
zero" or the conditional expression to check for >= 0.

> diff --git a/object-store.h b/object-store.h
> index 4064710ae29..584bf5556af 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -500,8 +500,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
>  int unpack_loose_header(git_zstream *stream, unsigned char *map,
>  			unsigned long mapsize, void *buffer,
>  			unsigned long bufsiz, struct strbuf *hdrbuf);
> -int parse_loose_header(const char *hdr, struct object_info *oi,
> -		       unsigned int flags);
> +
> +/**
> + * parse_loose_header() parses the starting "<type> <len>\0" of an
> + * object. If it doesn't follow that format -1 is returned. To check
> + * the validity of the <type> populate the "typep" in the "struct
> + * object_info". It will be OBJ_BAD if the object type is unknown. The
> + * parsed <len> can be retrieved via "oi->sizep", and from there
> + * passed to unpack_loose_rest().
> + */
> +int parse_loose_header(const char *hdr, struct object_info *oi);

OK, I guess this must be what I was confused about earlier (that I
thought we didn't support reading typep if returning OBJ_BAD). But it
seems odd to me that we would get rid of it elsewhere, yet continue
using this pattern here.

Or am I mistaken that the two are different?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info()
  2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-17  2:35               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  2:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> An earlier version of the preceding commit had a subtle bug where our
> "type_scratch" (later assigned to "oi->typep") would be uninitialized
> and used in the "!allow_unknown" case, at which point it would contain
> a nonsensical value if we'd failed to call parse_loose_header().
>
> The preceding commit introduced "parsed_header" variable to check for
> this case, but I think we can do better, let's carry a "oi_header"
> variable initially set to NULL, and only set it to "oi" once we're
> past parse_loose_header().

Everything in this patch seems OK to me.

For what it's worth, I think that this could likely have been folded
into the previous commit. I was just a little surprised to see
parsed_header go away after I had just a minute or two again spent time
thinking about what it was for.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-17  2:45               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  2:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> In the preceding commits we changed and documented
> unpack_loose_header() from return any negative value or zero, to only
> -2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
> type and use it, and have the compiler assert that we're exhaustively
> covering all return values. This gets rid of the need for having a
> "default" BUG() case in loose_object_info().
>
> I'm on the fence about whether this is more readable or worth it, but
> since it was suggested in [1] to do this let's go for it.

:-). The first hunk is quite a long line, but I think that only suggests
the enum has a long name. I also can't think of anything shorter, so I
think what you have is just fine.

I do think that this is an improvement in readability, and for what it's
worth I am a fan of the previous two changes as well.

As a workflow comment, I would have perhaps done these conversions a
little earlier, maybe in these steps:

  - First a patch to introduce unpack_loose_header_result with just OK
    and BAD, and then converted all callers that return negative numbers
    to return BAD (and all others to return OK).

  - Then a second patch to convert some of the BAD returns into
    BAD_TOO_LONG.

That gets things done in two patches, instead of three, at the cost of a
slightly more complicated first patch. But I think you also get some
more insight into why we're making the change in the first place instead
of having to read through a couple of commits to get there.

In any case, what you have is certainly fine, and I don't think that one
approach is any better or worse than the other. Just mentioning it in
case it's something may try in the future.

This patch looks good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 19/22] fsck: don't hard die on invalid object types
  2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-17  3:37               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  3:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Change the error fsck emits on invalid object types, such as:
>
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     <OID>
>
> >From the very ungraceful error of:
>
>     $ git fsck
>     fatal: invalid object type
>     $
>
> To:
>
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ the rest of the fsck output here, i.e. it didn't hard die ]

Great. I don't love the second error (since it doesn't really give the
user any new information when read after the first) but that's fsck's
fault, and not your patch's.

> To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
> flag from read_loose_object() through to parse_loose_header(). Since
> the read_loose_object() function is only used in builtin/fsck.c we can
> simply change it. See f6371f92104 (sha1_file: add read_loose_object()
> function, 2017-01-13) for the introduction of read_loose_object().
>
> Why are we complaining about a "hash mismatch" for an object of a type
> we don't know about? We shouldn't. This is the bare minimal change
> needed to not make fsck hard die on a repository that's been corrupted
> in this manner. In subsequent commits we'll teach fsck to recognize
> this particular type of corruption and emit a better error message.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/fsck.c  |  3 ++-
>  object-file.c   | 11 ++++++++---
>  object-store.h  |  3 ++-
>  t/t1450-fsck.sh | 14 +++++++-------
>  4 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index b42b6fe21f7..082dadd5629 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	void *contents;
>  	int eaten;
>
> -	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
> +	if (read_loose_object(path, oid, &type, &size, &contents,
> +			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
>  		errors_found |= ERROR_OBJECT;
>  		error(_("%s: object corrupt or missing: %s"),
>  		      oid_to_hex(oid), path);
> diff --git a/object-file.c b/object-file.c
> index 9484c7ce2be..0e6937fad73 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2562,7 +2562,8 @@ int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
>  		      enum object_type *type,
>  		      unsigned long *size,
> -		      void **contents)
> +		      void **contents,
> +		      unsigned int oi_flags)
>  {
>  	int ret = -1;
>  	void *map = NULL;
> @@ -2570,6 +2571,7 @@ int read_loose_object(const char *path,
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	oi.typep = type;
>  	oi.sizep = size;
>
> @@ -2592,8 +2594,11 @@ int read_loose_object(const char *path,
>  		git_inflate_end(&stream);
>  		goto out;
>  	}
> -	if (*type < 0)
> -		die(_("invalid object type"));
> +	if (!allow_unknown && *type < 0) {
> +		error(_("header for %s declares an unknown type"), path);
> +		git_inflate_end(&stream);
> +		goto out;
> +	}

Hmm. I'm not sure that I new test for this error (which may be
uninteresting, in which case it is fine to skip).
>
> -test_expect_success 'fsck hard errors on an invalid object type' '
> +test_expect_success 'fsck error and recovery on invalid object type' '
>  	git init --bare garbage-type &&
>  	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
>  	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
> -	cat >err.expect <<-\EOF &&
> -	fatal: invalid object type
> -	EOF
> -	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
> -	test_cmp err.expect err.actual &&
> -	test_must_be_empty out.actual
> +	test_must_fail git -C garbage-type fsck >out 2>err &&
> +	grep -e "^error" -e "^fatal" err >errors &&
> +	test_line_count = 2 errors &&
> +	grep "error: hash mismatch for" err &&
> +	grep "$garbage_blob: object corrupt or missing:" err &&
> +	grep "dangling blob $empty_blob" out
>  '

Great.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 21/22] fsck: report invalid types recorded in objects
  2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-09-17  3:57               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  3:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Continue the work in the preceding commit and improve the error on:
>
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ other fsck output ]
>
> To instead emit:
>
>     $ git fsck
>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>     [ other fsck output ]
>
> The complaint about a "hash mismatch" was simply an emergent property
> of how we'd fall though from read_loose_object() into fsck_loose()
> when we didn't get the data we expected. Now we'll correctly note that
> the object type is invalid.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/fsck.c  | 22 ++++++++++++++++++----
>  object-file.c   | 13 +++++--------
>  object-store.h  |  4 ++--
>  t/t1450-fsck.sh | 24 +++++++++++++++++++++---
>  4 files changed, 46 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 082dadd5629..07af0434db6 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	unsigned long size;
>  	void *contents;
>  	int eaten;
> -
> -	if (read_loose_object(path, oid, &type, &size, &contents,
> -			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
> -		errors_found |= ERROR_OBJECT;
> +	struct strbuf sb = STRBUF_INIT;
> +	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> +	struct object_info oi;
> +	int found = 0;
> +	oi.type_name = &sb;
> +	oi.sizep = &size;
> +	oi.typep = &type;
> +
> +	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {

OK, now we pass a struct object_info instead of pointers to type and
size separately. Makes sense.

> +		found |= ERROR_OBJECT;

And found tracks the error we found when trying to read this loose
object, if any. Having a separate variable makes sense, since we only
want to avoid calling fsck_obj() if we found any errors for this object
while trying to call read_loose_object().

>  		error(_("%s: object corrupt or missing: %s"),
>  		      oid_to_hex(oid), path);
> +	}
> +	if (type < 0) {
> +		found |= ERROR_OBJECT;
> +		error(_("%s: object is of unknown type '%s': %s"),
> +		      oid_to_hex(oid), sb.buf, path);
> +	}
> +	if (found) {
> +		errors_found |= ERROR_OBJECT;

Perhaps errors_found |= found ?

>  		return 0; /* keep checking other objects */
>  	}
>
> diff --git a/object-file.c b/object-file.c
> index 0e6937fad73..f4850ba62b4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
>
>  int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
> -		      enum object_type *type,
> -		      unsigned long *size,
>  		      void **contents,
> +		      struct object_info *oi,
>  		      unsigned int oi_flags)

All of the changes in this function make perfect sense, except...
>  {
>  	int ret = -1;
> @@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
>  	unsigned long mapsize;
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> -	oi.typep = type;
> -	oi.sizep = size;
> +	enum object_type *type = oi->typep;
> +	unsigned long *size = oi->sizep;

...I see that size is used in check_object_signature(), but I don't see
any uses for type. Am I missing it?

The tests look good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 22/22] fsck: report invalid object type-path combinations
  2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-17  4:06               ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  4:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Improve the error that's emitted in cases where we find a loose object
> we parse, but which isn't at the location we expect it to be.
>
> Before this change we'd prefix the error with a not-a-OID derived from
> the path at which the object was found, due to an emergent behavior in
> how we'd end up with an "OID" in these codepaths.
>
> Now we'll instead say what object we hashed, and what path it was
> found at. Before this patch series e.g.:
>
>     $ git hash-object --stdin -w -t blob </dev/null
>     e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
>     $ mv objects/e6/ objects/e7
>
> Would emit ("[...]" used to abbreviate the OIDs):
>
>     git fsck
>     error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
>     error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
>
> Now we'll instead emit:
>
>     error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Lovely!

> @@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	struct strbuf sb = STRBUF_INIT;
>  	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	struct object_info oi;
> +	struct object_id real_oid = *null_oid();
>  	int found = 0;
>  	oi.type_name = &sb;
>  	oi.sizep = &size;
>  	oi.typep = &type;
>
> -	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
> +	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
>  		found |= ERROR_OBJECT;
> -		error(_("%s: object corrupt or missing: %s"),
> -		      oid_to_hex(oid), path);
> +		if (!oideq(&real_oid, oid))
> +			error(_("%s: hash-path mismatch, found at: %s"),
> +			      oid_to_hex(&real_oid), path);
> +		else
> +			error(_("%s: object corrupt or missing: %s"),
> +			      oid_to_hex(oid), path);

Nice; this is the important part that this patch is changing, and the
logic is very nice. Before it read "anytime read_loose_object fails,
it's an error" to "it's still an error, but we can handle the case where
the real OID and the one we expected were different separately from
generic corruption".

>  	}
>  	if (type < 0) {
>  		found |= ERROR_OBJECT;
>  		error(_("%s: object is of unknown type '%s': %s"),
> -		      oid_to_hex(oid), sb.buf, path);
> +		      oid_to_hex(&real_oid), sb.buf, path);

Could go either way on this hunk, but I think that I err slightly on
your side now that we have access to the "real_oid".

The rest of the code and test changes in this patch look good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (21 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-17  4:08             ` Taylor Blau
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17  4:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This improves fsck error reporting, see the examples in the commit
> messages of 19/22, 21/22 and 22/22. To get there I've lib-ified more
> thigs in object-file.c and the general object APIs, i.e. now we'll
> return error codes instead of calling die() in these cases.
>
> This series has been in "needs review" state for a while. This re-roll
> is mainly to bump it for the list's attention, but while I was at it I
> addressed point from Jonathan Tan raised in a previous round: use an
> enum instead of int for the unpack_loose_header() return value.

I took a thorough look through this series, and left a handful of minor
comments. I didn't spot any glaring issues, and think that this series
is in pretty good shape.

I do admit there were quite a large number of patches to get to the
couple of changes at the end. I left some thoughts throughout for places
that I would have combined things / presented them in a different order
or similar.

I don't think you should spend much time changing the structure now that
it's been looked at with close eyes, but just some idle thoughts for
other large series you might send in the future.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
  2021-09-16 19:40               ` Taylor Blau
@ 2021-09-17  9:27                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17  9:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Refactor one of the fsck tests to use a throwaway repository. It's a
>> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
>> teardown of a tests so we're not leaving corrupt content for the next
>> test.
>
> OK. I seem to recall you advocating against this pattern elsewhere[1], but
> this is a good example of why it can sometimes make writing tests much
> easier when not having to reason about what leaks out of running a test.
>
> [1]: https://lore.kernel.org/git/87zgsnj0q0.fsf@evledraar.gmail.com/,
> although after re-reading it it looks like you were more focused on the
> unnecessary "rm -fr repo" there and not the "git init +
> test_when_finished rm -fr" pattern.

I was referring to a different pattern there, replied in some detail at
https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

>> -test_expect_success 'object with bad sha1' '
>> -	sha=$(echo blob | git hash-object -w --stdin) &&
>> -	old=$(test_oid_to_path "$sha") &&
>> -	new=$(dirname $old)/$(test_oid ff_2) &&
>> -	sha="$(dirname $new)$(basename $new)" &&
>> -	mv .git/objects/$old .git/objects/$new &&
>> -	test_when_finished "remove_object $sha" &&
>> -	git update-index --add --cacheinfo 100644 $sha foo &&
>> -	test_when_finished "git read-tree -u --reset HEAD" &&
>> -	tree=$(git write-tree) &&
>> -	test_when_finished "remove_object $tree" &&
>> -	cmt=$(echo bogus | git commit-tree $tree) &&
>> -	test_when_finished "remove_object $cmt" &&
>> -	git update-ref refs/heads/bogus $cmt &&
>> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
>> -
>> -	test_must_fail git fsck 2>out &&
>> -	test_i18ngrep "$sha.*corrupt" out
>> +test_expect_success 'object with hash mismatch' '
>> +	git init --bare hash-mismatch &&
>> +	(
>> +		cd hash-mismatch &&
>> +		oid=$(echo blob | git hash-object -w --stdin) &&
>> +		old=$(test_oid_to_path "$oid") &&
>> +		new=$(dirname $old)/$(test_oid ff_2) &&
>> +		oid="$(dirname $new)$(basename $new)" &&
>> +		mv objects/$old objects/$new &&
>> +		git update-index --add --cacheinfo 100644 $oid foo &&
>> +		tree=$(git write-tree) &&
>> +		cmt=$(echo bogus | git commit-tree $tree) &&
>> +		git update-ref refs/heads/bogus $cmt &&
>> +		test_must_fail git fsck 2>out &&
>> +		test_i18ngrep "$oid.*corrupt" out
>> +	)
>>  '
>
> This all looks fine to me. The translation is s/sha/oid and removing all
> of the now-unnecessary test_when_finished calls.
>
> But the test_i18ngrep (which isn't new) could probably also stand to get
> cleaned up and converted to a normal grep.

Thanks, I missed that one!

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
  2021-09-16 19:51               ` Taylor Blau
@ 2021-09-17  9:39                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17  9:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:57:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Fix a blindspot in the fsck tests by checking what we do when we
>> encounter an unknown "garbage" type produced with hash-object's
>> --literally option.
>>
>> This behavior needs to be improved, which'll be done in subsequent
>> patches, but for now let's test for the current behavior.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  t/t1450-fsck.sh | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 7becab5ba1e..f10d6f7b7e8 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
>>  	test_i18ngrep "bad index file" errors
>>  '
>>
>> +test_expect_success 'fsck hard errors on an invalid object type' '
>> +	git init --bare garbage-type &&
>
> I wondered whether it was really possible to not cover this, since I
> figured such a test may have just been hiding elsewhere. But we really
> do seem to be lacking coverage. So, adding this test is good.
>
>> +	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
>> +	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
>
> I'm nitpicking, but I find the -C garbage-type pattern less than ideal
> for two reasons:
>
>   - It makes every line longer (since "-C garbage type" is wider than an
>     8-wide tab, even indenting this in a subshell would take up fewer
>     characters visually)
>
>   - It pollutes the current directory with things like "err.expect" and
>     "err.actual" that have nothing to do with the current directory (and
>     much more to do with the garbage-type repository within it).
>
> So I don't care, really, but it may be better to just put all of this in
> a subshell.

Yes, it does look much nicer like that. Thanks!

Some aspects of style I use I have some informed/strong opinion about,
like the teardown/setup pattern noted in [1], but for some other stuff
like this ... I think I was just following the pattern of some recent
test I'd read or something.

Well, one advantage of using "git -C" is that if it fails you can cd to
the trash directory and run the command you saw fail as-is without
cd-ing further, and in that case the "polluting" is a feature, you can
cat the top-level expect/actual consistently.

But I think on balance having the test itself be easier to read is more
important, so I'm going with the subshell.

1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
  2021-09-16 20:40               ` Taylor Blau
@ 2021-09-17 11:59                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17 11:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:58:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Fix a blindspot in the tests for the "rev-list --disk-usage" feature
>> added in 16950f8384a (rev-list: add --disk-usage option for
>> calculating disk usage, 2021-02-09) to test for what happens when it's
>> asked to calculate the disk usage of invalid object types.
>
> I'm not sure that I agree this is a blindspot, or at least one worth
> testing. Is the goal to add tests to every Git command that might have
> to do something with a corrupt object and make sure that it is handled
> correctly?
>
> I'm not sure that doing so would be useful, or at the very least that
> it would be worth the effort. [...] I think there are so many parts of
> Git that might touch a corrupt object that trying to test all of them
> seems like a losing battle.

I'll drop it since it doesn't have anything directly to do with this
series. This slipped in from the work I meant to follow-up after this
with.

This isn't just any random command that might come across an invalid
object though, it's specifically reporting object sizes. Once we change
that to not die we'll we'll want to see how invalid objects are handled
by it. Will the disk size be reported as -1? 0? ~0?

> [...]
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  t/t6115-rev-list-du.sh | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
>> index b4aef32b713..edb2ed55846 100755
>> --- a/t/t6115-rev-list-du.sh
>> +++ b/t/t6115-rev-list-du.sh
>> @@ -48,4 +48,15 @@ check_du HEAD
>>  check_du --objects HEAD
>>  check_du --objects HEAD^..HEAD
>>
>> +test_expect_success 'setup garbage repository' '
>> +	git clone --bare . garbage.git &&
>
> Since this is cloned within the working directory, should we bother to
> clean this up to avoid munging with future tests?

In general (and I had some other replies with this) I think no, if a an
individual test is picking a unique name for its data it doesn't need to
bother with test_when_finished, it can just leave the cleanup to the
eventual trash directory cleanup.

>> +	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
>> +	git -C garbage.git rev-list --objects --all --disk-usage &&
>> +
>> +	# Manually create a ref because "update-ref", "tag" etc. have
>> +	# no corresponding --literally option.
>> +	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
>> +	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
>
> See also my earlier comment about this being much more readable in a
> sub-shell.

*nod*

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v7 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (22 preceding siblings ...)
  2021-09-17  4:08             ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
@ 2021-09-20 19:04             ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                                 ` (17 more replies)
  23 siblings, 18 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.

v6 of this got a very detailed review from Taylor Blau (thanks a
lot!), for the v6 see:
https://lore.kernel.org/git/cover-v6-00.22-00000000000-20210907T104558Z-avarab@gmail.com/

This should address all of the things brought up, and more. After
leaving this series for a while I came up with ways to simplify it
even more, so now it's 17 instead of 22 patches!

So things like:

 * The move of functions from cache.h to object-store.h is gone, that
   still makes sense to do, but can be left for later.

 * A large part of the mid-series is squashed together and
   re-arranged, e.g. I moved the migration of unpack_loose_header() to
   to an enum earlier, which simplified later steps. The 15/17 rewrite
   of much of parse_loose_header() is now much simpler.

  * I attempted to address the comments about the tests with some
    for-loops and boilerplate-generated testing, maybe some of it's a
    bit too ugly, but it's both less copy/pasting now, and more cases
    (e.g. the "cat-file -p" case) are tested.

 * We now test for what happens/error reporting when we append garbage
   to a loose object.

 * Many more small changes / improvements / simplifications, see the
   range-diff below, but given its size perhaps a re-read is easier...

Ævar Arnfjörð Bjarmason (17):
  fsck tests: add test for fsck-ing an unknown type
  fsck tests: refactor one test to use a sub-repo
  fsck tests: test current hash/type mismatch behavior
  fsck tests: test for garbage appended to a loose object
  cat-file tests: move bogus_* variable declarations earlier
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: stop dying in parse_loose_header()
  fsck: don't hard die on invalid object types
  fsck: report invalid object type-path combinations

 builtin/fast-export.c |   2 +-
 builtin/fsck.c        |  28 +++++-
 builtin/index-pack.c  |   2 +-
 builtin/mktag.c       |   3 +-
 cache.h               |  45 ++++++++-
 object-file.c         | 176 +++++++++++++++------------------
 object-store.h        |   7 +-
 object.c              |   4 +-
 pack-check.c          |   3 +-
 streaming.c           |  27 +++--
 t/oid-info/oid        |   2 +
 t/t1006-cat-file.sh   | 223 +++++++++++++++++++++++++++++++++++++++---
 t/t1450-fsck.sh       |  99 +++++++++++++++----
 13 files changed, 463 insertions(+), 158 deletions(-)

Range-diff against v6:
 2:  9072eef3be3 !  1:  752cef556c2 fsck tests: add test for fsck-ing an unknown type
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
      
     +test_expect_success 'fsck hard errors on an invalid object type' '
     +	git init --bare garbage-type &&
    -+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
    -+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
    -+	cat >err.expect <<-\EOF &&
    -+	fatal: invalid object type
    -+	EOF
    -+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
    -+	test_cmp err.expect err.actual &&
    -+	test_must_be_empty out.actual
    ++	(
    ++		cd garbage-type &&
    ++
    ++		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
    ++		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
    ++
    ++		cat >err.expect <<-\EOF &&
    ++		fatal: invalid object type
    ++		EOF
    ++		test_must_fail git fsck >out 2>err &&
    ++		test_cmp err.expect err &&
    ++		test_must_be_empty out
    ++	)
     +'
     +
      test_done
 1:  ebe89f65354 !  2:  612003bdd2c fsck tests: refactor one test to use a sub-repo
    @@ Commit message
         teardown of a tests so we're not leaving corrupt content for the next
         test.
     
    -    We should instead simply use something like this test_create_repo
    -    pattern. It's both less verbose, and makes things easier to debug as a
    -    failing test can have their state left behind under -d without
    -    damaging the state for other tests.
    +    We can instead use the pattern of creating a named sub-repository,
    +    then we don't have to worry about cleaning up after ourselves, nobody
    +    will care what state the broken "hash-mismatch" repository is after
    +    this test runs.
     
    -    But let's punt on that general refactoring and just change this one
    -    test, I'm going to change it further in subsequent commits.
    +    See [1] for related discussion on various "modern" test patterns that
    +    can be used to avoid verbosity and increase reliability.
    +
    +    1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ t/t1450-fsck.sh: remove_object () {
     -	test_when_finished "remove_object $cmt" &&
     -	git update-ref refs/heads/bogus $cmt &&
     -	test_when_finished "git update-ref -d refs/heads/bogus" &&
    --
    --	test_must_fail git fsck 2>out &&
    --	test_i18ngrep "$sha.*corrupt" out
     +test_expect_success 'object with hash mismatch' '
     +	git init --bare hash-mismatch &&
     +	(
     +		cd hash-mismatch &&
    + 
    +-	test_must_fail git fsck 2>out &&
    +-	test_i18ngrep "$sha.*corrupt" out
     +		oid=$(echo blob | git hash-object -w --stdin) &&
     +		old=$(test_oid_to_path "$oid") &&
     +		new=$(dirname $old)/$(test_oid ff_2) &&
     +		oid="$(dirname $new)$(basename $new)" &&
    ++
     +		mv objects/$old objects/$new &&
     +		git update-index --add --cacheinfo 100644 $oid foo &&
     +		tree=$(git write-tree) &&
     +		cmt=$(echo bogus | git commit-tree $tree) &&
     +		git update-ref refs/heads/bogus $cmt &&
    ++
     +		test_must_fail git fsck 2>out &&
    -+		test_i18ngrep "$oid.*corrupt" out
    ++		grep "$oid.*corrupt" out
     +	)
      '
      
 3:  d442a309178 !  3:  1e40a4235e9 cat-file tests: test for missing object with -t and -s
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    cat-file tests: test for missing object with -t and -s
    +    fsck tests: test current hash/type mismatch behavior
     
    -    Test for what happens when the -t and -s flags are asked to operate on
    -    a missing object, this extends tests added in 3e370f9faf0 (t1006: add
    -    tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
    -    -s flags are the only ones that can be combined with
    -    --allow-unknown-type, so let's test with and without that flag.
    +    If fsck we move an object around between .git/objects/?? directories
    +    to simulate a hash mismatch "git fsck" will currently hard die() in
    +    object-file.c. This behavior will be fixed in subsequent commits, but
    +    let's test for it as-is for now.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## t/t1006-cat-file.sh ##
    -@@ t/t1006-cat-file.sh: test_expect_success '%(deltabase) reports packed delta bases' '
    - 	}
    + ## t/t1450-fsck.sh ##
    +@@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
    + 	)
      '
      
    -+missing_oid=$(test_oid deadbeef)
    -+test_expect_success 'error on type of missing object' '
    -+	cat >expect.err <<-\EOF &&
    -+	fatal: git cat-file: could not get object info
    -+	EOF
    -+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
    -+	test_must_be_empty out &&
    -+	test_cmp expect.err err &&
    ++test_expect_success 'object with hash and type mismatch' '
    ++	git init --bare hash-type-mismatch &&
    ++	(
    ++		cd hash-type-mismatch &&
     +
    -+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
    -+	test_must_be_empty out &&
    -+	test_cmp expect.err err
    -+'
    ++		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
    ++		old=$(test_oid_to_path "$oid") &&
    ++		new=$(dirname $old)/$(test_oid ff_2) &&
    ++		oid="$(dirname $new)$(basename $new)" &&
     +
    -+test_expect_success 'error on size of missing object' '
    -+	cat >expect.err <<-\EOF &&
    -+	fatal: git cat-file: could not get object info
    -+	EOF
    -+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
    -+	test_must_be_empty out &&
    -+	test_cmp expect.err err &&
    ++		mv objects/$old objects/$new &&
    ++		git update-index --add --cacheinfo 100644 $oid foo &&
    ++		tree=$(git write-tree) &&
    ++		cmt=$(echo bogus | git commit-tree $tree) &&
    ++		git update-ref refs/heads/bogus $cmt &&
     +
    -+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
    -+	test_must_be_empty out &&
    -+	test_cmp expect.err err
    ++		cat >expect <<-\EOF &&
    ++		fatal: invalid object type
    ++		EOF
    ++		test_must_fail git fsck 2>actual &&
    ++		test_cmp expect actual
    ++	)
     +'
     +
    - bogus_type="bogus"
    - bogus_content="bogus"
    - bogus_size=$(strlen "$bogus_content")
    + test_expect_success 'branch pointing to non-commit' '
    + 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
    + 	test_when_finished "git update-ref -d refs/heads/invalid" &&
 5:  82db40ebf8a !  4:  854991c1543 rev-list tests: test for behavior with invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    rev-list tests: test for behavior with invalid object types
    +    fsck tests: test for garbage appended to a loose object
     
    -    Fix a blindspot in the tests for the "rev-list --disk-usage" feature
    -    added in 16950f8384a (rev-list: add --disk-usage option for
    -    calculating disk usage, 2021-02-09) to test for what happens when it's
    -    asked to calculate the disk usage of invalid object types.
    +    There wasn't any output tests for this scenario, let's ensure that we
    +    don't regress on it in the changes that come after this.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## t/t6115-rev-list-du.sh ##
    -@@ t/t6115-rev-list-du.sh: check_du HEAD
    - check_du --objects HEAD
    - check_du --objects HEAD^..HEAD
    + ## t/t1450-fsck.sh ##
    +@@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
    + 	)
    + '
      
    -+test_expect_success 'setup garbage repository' '
    -+	git clone --bare . garbage.git &&
    -+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
    -+	git -C garbage.git rev-list --objects --all --disk-usage &&
    ++test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
    ++	git init --bare corrupt-loose-output &&
    ++	(
    ++		cd corrupt-loose-output &&
    ++		oid=$(git hash-object -w --stdin --literally </dev/null) &&
    ++		oidf=objects/$(test_oid_to_path "$oid") &&
    ++		chmod 755 $oidf &&
    ++		echo extra garbage >>$oidf &&
     +
    -+	# Manually create a ref because "update-ref", "tag" etc. have
    -+	# no corresponding --literally option.
    -+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
    -+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
    ++		cat >expect.error <<-EOF &&
    ++		error: garbage at end of loose object '\''$oid'\''
    ++		error: unable to unpack contents of ./$oidf
    ++		error: $oid: object corrupt or missing: ./$oidf
    ++		EOF
    ++		test_must_fail git fsck 2>actual &&
    ++		grep ^error: actual >error &&
    ++		test_cmp expect.error error
    ++	)
     +'
     +
    - test_done
    + test_expect_success 'branch pointing to non-commit' '
    + 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
    + 	test_when_finished "git update-ref -d refs/heads/invalid" &&
19:  ad1614dbb8d !  5:  fc93c2c2530 fsck: don't hard die on invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck: don't hard die on invalid object types
    +    cat-file tests: move bogus_* variable declarations earlier
     
    -    Change the error fsck emits on invalid object types, such as:
    -
    -        $ git hash-object --stdin -w -t garbage --literally </dev/null
    -        <OID>
    -
    -    From the very ungraceful error of:
    -
    -        $ git fsck
    -        fatal: invalid object type
    -        $
    -
    -    To:
    -
    -        $ git fsck
    -        error: hash mismatch for <OID_PATH> (expected <OID>)
    -        error: <OID>: object corrupt or missing: <OID_PATH>
    -        [ the rest of the fsck output here, i.e. it didn't hard die ]
    -
    -    We'll still exit with non-zero, but now we'll finish the rest of the
    -    traversal. The tests that's being added here asserts that we'll still
    -    complain about other fsck issues (e.g. an unrelated dangling blob).
    -
    -    To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
    -    flag from read_loose_object() through to parse_loose_header(). Since
    -    the read_loose_object() function is only used in builtin/fsck.c we can
    -    simply change it. See f6371f92104 (sha1_file: add read_loose_object()
    -    function, 2017-01-13) for the introduction of read_loose_object().
    -
    -    Why are we complaining about a "hash mismatch" for an object of a type
    -    we don't know about? We shouldn't. This is the bare minimal change
    -    needed to not make fsck hard die on a repository that's been corrupted
    -    in this manner. In subsequent commits we'll teach fsck to recognize
    -    this particular type of corruption and emit a better error message.
    +    Change the short/long bogus bogus object type variables into a form
    +    where the two sets can be used concurrently. This'll be used by
    +    subsequently added tests.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## builtin/fsck.c ##
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 	void *contents;
    - 	int eaten;
    + ## t/t1006-cat-file.sh ##
    +@@ t/t1006-cat-file.sh: test_expect_success '%(deltabase) reports packed delta bases' '
    + 	}
    + '
      
    --	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
    -+	if (read_loose_object(path, oid, &type, &size, &contents,
    -+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
    - 		errors_found |= ERROR_OBJECT;
    - 		error(_("%s: object corrupt or missing: %s"),
    - 		      oid_to_hex(oid), path);
    -
    - ## object-file.c ##
    -@@ object-file.c: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents)
    -+		      void **contents,
    -+		      unsigned int oi_flags)
    - {
    - 	int ret = -1;
    - 	void *map = NULL;
    -@@ object-file.c: int read_loose_object(const char *path,
    - 	git_zstream stream;
    - 	char hdr[MAX_HEADER_LEN];
    - 	struct object_info oi = OBJECT_INFO_INIT;
    -+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    - 	oi.typep = type;
    - 	oi.sizep = size;
    +-bogus_type="bogus"
    +-bogus_content="bogus"
    +-bogus_size=$(strlen "$bogus_content")
    +-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
    ++test_expect_success 'setup bogus data' '
    ++	bogus_short_type="bogus" &&
    ++	bogus_short_content="bogus" &&
    ++	bogus_short_size=$(strlen "$bogus_short_content") &&
    ++	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
    ++
    ++	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
    ++	bogus_long_content="bogus" &&
    ++	bogus_long_size=$(strlen "$bogus_long_content") &&
    ++	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
    ++'
      
    -@@ object-file.c: int read_loose_object(const char *path,
    - 		git_inflate_end(&stream);
    - 		goto out;
    - 	}
    --	if (*type < 0)
    --		die(_("invalid object type"));
    -+	if (!allow_unknown && *type < 0) {
    -+		error(_("header for %s declares an unknown type"), path);
    -+		git_inflate_end(&stream);
    -+		goto out;
    -+	}
    + test_expect_success "Type of broken object is correct" '
    +-	echo $bogus_type >expect &&
    +-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    ++	echo $bogus_short_type >expect &&
    ++	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
    + 	test_cmp expect actual
    + '
      
    - 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
    - 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
    -
    - ## object-store.h ##
    -@@ object-store.h: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents);
    -+		      void **contents,
    -+		      unsigned int oi_flags);
    + test_expect_success "Size of broken object is correct" '
    +-	echo $bogus_size >expect &&
    +-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
    ++	echo $bogus_short_size >expect &&
    ++	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
    + 	test_cmp expect actual
    + '
    +-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
    +-bogus_content="bogus"
    +-bogus_size=$(strlen "$bogus_content")
    +-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
      
    - /* Retry packed storage after checking packed and loose storage */
    - #define HAS_OBJECT_RECHECK_PACKED 1
    -
    - ## t/t1450-fsck.sh ##
    -@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
    - 	test_i18ngrep "bad index file" errors
    + test_expect_success "Type of broken object is correct when type is large" '
    +-	echo $bogus_type >expect &&
    +-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    ++	echo $bogus_long_type >expect &&
    ++	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
    + 	test_cmp expect actual
      '
      
    --test_expect_success 'fsck hard errors on an invalid object type' '
    -+test_expect_success 'fsck error and recovery on invalid object type' '
    - 	git init --bare garbage-type &&
    - 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
    - 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
    --	cat >err.expect <<-\EOF &&
    --	fatal: invalid object type
    --	EOF
    --	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
    --	test_cmp err.expect err.actual &&
    --	test_must_be_empty out.actual
    -+	test_must_fail git -C garbage-type fsck >out 2>err &&
    -+	grep -e "^error" -e "^fatal" err >errors &&
    -+	test_line_count = 2 errors &&
    -+	grep "error: hash mismatch for" err &&
    -+	grep "$garbage_blob: object corrupt or missing:" err &&
    -+	grep "dangling blob $empty_blob" out
    + test_expect_success "Size of large broken object is correct when type is large" '
    +-	echo $bogus_size >expect &&
    +-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
    ++	echo $bogus_long_size >expect &&
    ++	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
    + 	test_cmp expect actual
      '
      
    - test_done
 4:  0358273022f !  6:  051088aa114 cat-file tests: test that --allow-unknown-type isn't on by default
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    cat-file tests: test that --allow-unknown-type isn't on by default
    +    cat-file tests: test for missing/bogus object with -t, -s and -p
     
    -    Fix a blindspot in the tests for the --allow-unknown-type feature
    -    added in 39e4ae38804 (cat-file: teach cat-file a
    -    '--allow-unknown-type' option, 2015-05-03). We should check that
    -    --allow-unknown-type isn't on by default.
    +    When we look up a missing object with cat_one_file() what error we
    +    print out currently depends on whether we'll error out early in
    +    get_oid_with_context(), or if we'll get an error later from
    +    oid_object_info_extended().
     
    -    Before this change all the tests would succeed if --allow-unknown-type
    -    was on by default, let's fix that by asserting that -t and -s die on a
    -    "garbage" type without --allow-unknown-type.
    +    The --allow-unknown-type flag then changes whether we pass the
    +    "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
    +    not.
    +
    +    The "-p" flag is yet another special-case in printing the same output
    +    on the deadbeef OID as we'd emit on the deadbeef_short OID for the
    +    "-s" and "-t" options, it also doesn't support the
    +    "--allow-unknown-type" flag at all.
    +
    +    Let's test the combination of the two sets of [-t, -s, -p] and
    +    [--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
    +    in not supplying it), as well as a [missing,bogus] object pair.
    +
    +    This extends tests added in 3e370f9faf0 (t1006: add tests for git
    +    cat-file --allow-unknown-type, 2015-05-03).
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## t/oid-info/oid ##
    +@@ t/oid-info/oid: numeric		sha1:0123456789012345678901234567890123456789
    + numeric		sha256:0123456789012345678901234567890123456789012345678901234567890123
    + deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
    + deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
    ++deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
    ++deadbee_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
    +
      ## t/t1006-cat-file.sh ##
    -@@ t/t1006-cat-file.sh: bogus_content="bogus"
    - bogus_size=$(strlen "$bogus_content")
    - bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
    +@@ t/t1006-cat-file.sh: test_expect_success 'setup bogus data' '
    + 	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
    + '
      
    -+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
    -+	cat >err.expect <<-\EOF &&
    -+	fatal: invalid object type
    -+	EOF
    ++for arg1 in '' --allow-unknown-type
    ++do
    ++	for arg2 in -s -t -p
    ++	do
    ++		if test $arg1 = "--allow-unknown-type" && test "$arg2" = "-p"
    ++		then
    ++			continue
    ++		fi
    ++
     +
    -+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
    -+	test_cmp err.expect err.actual &&
    -+	test_must_be_empty out.actual &&
    ++		test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
    ++			cat >expect <<-\EOF &&
    ++			fatal: invalid object type
    ++			EOF
     +
    -+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
    -+	test_cmp err.expect err.actual &&
    -+	test_must_be_empty out.actual
    -+'
    ++			if test "$arg1" = "--allow-unknown-type"
    ++			then
    ++				git cat-file $arg1 $arg2 $bogus_short_sha1
    ++			else
    ++				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
    ++				test_must_be_empty out &&
    ++				test_cmp expect actual
    ++			fi
    ++		'
    ++
    ++		test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
    ++			if test "$arg2" = "-p"
    ++			then
    ++				cat >expect <<-EOF
    ++				error: unable to unpack $bogus_long_sha1 header
    ++				fatal: Not a valid object name $bogus_long_sha1
    ++				EOF
    ++			else
    ++				cat >expect <<-EOF
    ++				error: unable to unpack $bogus_long_sha1 header
    ++				fatal: git cat-file: could not get object info
    ++				EOF
    ++			fi &&
    ++
    ++			if test "$arg1" = "--allow-unknown-type"
    ++			then
    ++				git cat-file $arg1 $arg2 $bogus_short_sha1
    ++			else
    ++				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
    ++				test_must_be_empty out &&
    ++				test_cmp expect actual
    ++			fi
    ++		'
    ++
    ++		test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
    ++			cat >expect.err <<-EOF &&
    ++			fatal: Not a valid object name $(test_oid deadbeef_short)
    ++			EOF
    ++			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
    ++			test_must_be_empty out
    ++		'
    ++
    ++		test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
    ++			if test "$arg2" = "-p"
    ++			then
    ++				cat >expect.err <<-EOF
    ++				fatal: Not a valid object name $(test_oid deadbeef)
    ++				EOF
    ++			else
    ++				cat >expect.err <<-\EOF
    ++				fatal: git cat-file: could not get object info
    ++				EOF
    ++			fi &&
    ++			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
    ++			test_must_be_empty out &&
    ++			test_cmp expect.err err.actual
    ++		'
    ++	done
    ++done
     +
      test_expect_success "Type of broken object is correct" '
    - 	echo $bogus_type >expect &&
    - 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    -@@ t/t1006-cat-file.sh: bogus_content="bogus"
    - bogus_size=$(strlen "$bogus_content")
    - bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
    - 
    -+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
    -+	cat >err.expect <<-EOF &&
    -+	error: unable to unpack $bogus_sha1 header
    -+	fatal: git cat-file: could not get object info
    -+	EOF
    -+
    -+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
    -+	test_cmp err.expect err.actual &&
    -+	test_must_be_empty out.actual &&
    -+
    -+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
    -+	test_cmp err.expect err.actual &&
    -+	test_must_be_empty out.actual
    -+'
    -+
    - test_expect_success "Type of broken object is correct when type is large" '
    - 	echo $bogus_type >expect &&
    - 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    + 	echo $bogus_short_type >expect &&
    + 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
 6:  d1ffd21acc5 =  7:  20bd81c1af0 cat-file tests: add corrupt loose object test
 7:  22ab12c2282 !  8:  cd1d52b8a07 cat-file tests: test for current --allow-unknown-type behavior
    @@ Commit message
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## t/t1006-cat-file.sh ##
    -@@ t/t1006-cat-file.sh: test_expect_success 'die on broken object under -t and -s without --allow-unknow
    - 	test_must_be_empty out.actual
    - '
    +@@ t/t1006-cat-file.sh: do
    + 	done
    + done
      
     +test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
    -+	git cat-file -e $bogus_sha1
    ++	git cat-file -e $bogus_short_sha1
     +'
     +
     +test_expect_success '-e can not be combined with --allow-unknown-type' '
    -+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
    ++	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
     +'
     +
     +test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
    -+	test_must_fail git cat-file -p $bogus_sha1 &&
    -+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
    ++	test_must_fail git cat-file -p $bogus_short_sha1 &&
    ++	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
     +'
     +
     +test_expect_success '<type> <hash> does not work with objects of broken types' '
     +	cat >err.expect <<-\EOF &&
     +	fatal: invalid object type "bogus"
     +	EOF
    -+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
    ++	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
     +	test_cmp err.expect err.actual
     +'
     +
     +test_expect_success 'broken types combined with --batch and --batch-check' '
    -+	echo $bogus_sha1 >bogus-oid &&
    ++	echo $bogus_short_sha1 >bogus-oid &&
     +
     +	cat >err.expect <<-\EOF &&
     +	fatal: invalid object type
    @@ t/t1006-cat-file.sh: test_expect_success 'die on broken object under -t and -s w
     +	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
     +	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
     +'
    -+
    - test_expect_success "Type of broken object is correct" '
    - 	echo $bogus_type >expect &&
    - 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    -@@ t/t1006-cat-file.sh: test_expect_success "Size of broken object is correct" '
    - 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
    - 	test_cmp expect actual
    - '
     +
     +test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
     +	cat >expect <<-EOF &&
    -+	$bogus_type
    ++	$bogus_short_type
     +	EOF
    -+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    ++	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
     +	test_cmp expect actual &&
     +
     +	# Create it manually, as "git replace" will die on bogus
     +	# types.
     +	head=$(git rev-parse --verify HEAD) &&
    ++	test_when_finished "rm -rf .git/refs/replace" &&
     +	mkdir -p .git/refs/replace &&
    -+	echo $head >.git/refs/replace/$bogus_sha1 &&
    ++	echo $head >.git/refs/replace/$bogus_short_sha1 &&
     +
     +	cat >expect <<-EOF &&
     +	commit
     +	EOF
    -+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
    ++	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
     +	test_cmp expect actual
     +'
     +
    - bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
    - bogus_content="bogus"
    - bogus_size=$(strlen "$bogus_content")
    + test_expect_success "Type of broken object is correct" '
    + 	echo $bogus_short_type >expect &&
    + 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
 8:  38e4266772d =  9:  d9f5adfc74b object-file.c: don't set "typep" when returning non-zero
 9:  5b9278e7bb4 <  -:  ----------- cache.h: move object functions to object-store.h
16:  9e7dbfb4aa3 ! 10:  51d14bc9274 object-file.c: return -1, not "status" from unpack_loose_header()
    @@ Commit message
     
      ## object-file.c ##
     @@ object-file.c: int unpack_loose_header(git_zstream *stream,
    - 	status = git_inflate(stream, 0);
    - 	obj_read_lock();
    + 					       buffer, bufsiz);
    + 
      	if (status < Z_OK)
     -		return status;
     +		return -1;
      
    - 	/*
    - 	 * Check if entire header is unpacked in the first iteration.
    + 	/* Make sure we have the terminating NUL */
    + 	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
10:  b15ad53414b ! 11:  f43cfd8a5ed object-file.c: make parse_loose_header_extended() public
    @@ Commit message
         could simply use parse_loose_header_extended() there, but will leave
         the API in a better end state.
     
    +    It would be a better end-state to have already moved the declaration
    +    of these functions to object-store.h to avoid the forward declaration
    +    of "struct object_info" in cache.h, but let's leave that cleanup for
    +    some other time.
    +
    +    1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## cache.h ##
    +@@ cache.h: char *xdg_cache_home(const char *filename);
    + int git_open_cloexec(const char *name, int flags);
    + #define git_open(name) git_open_cloexec(name, O_RDONLY)
    + int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
    +-int parse_loose_header(const char *hdr, unsigned long *sizep);
    ++struct object_info;
    ++int parse_loose_header(const char *hdr, struct object_info *oi,
    ++		       unsigned int flags);
    + 
    + int check_object_signature(struct repository *r, const struct object_id *oid,
    + 			   void *buf, unsigned long size, const char *type);
    +
      ## object-file.c ##
     @@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
       * too permissive for what we want to check. So do an anal
    @@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
       */
     -static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
     -				       unsigned int flags)
    -+int parse_loose_header(const char *hdr,
    -+		       struct object_info *oi,
    ++int parse_loose_header(const char *hdr, struct object_info *oi,
     +		       unsigned int flags)
      {
      	const char *type_buf = hdr;
    @@ object-file.c: int read_loose_object(const char *path,
      		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
     
    - ## object-store.h ##
    -@@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
    - int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - 			unsigned long mapsize, void *buffer,
    - 			unsigned long bufsiz);
    --int parse_loose_header(const char *hdr, unsigned long *sizep);
    -+int parse_loose_header(const char *hdr, struct object_info *oi,
    -+		       unsigned int flags);
    - int check_object_signature(struct repository *r, const struct object_id *oid,
    - 			   void *buf, unsigned long size, const char *type);
    - int finalize_object_file(const char *tmpfile, const char *filename);
    -
      ## streaming.c ##
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      			      const struct object_id *oid,
11:  326eb74545d <  -:  ----------- object-file.c: add missing braces to loose_object_info()
12:  4f829e9b727 ! 12:  50d938f7f3c object-file.c: simplify unpack_loose_short_header()
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## cache.h ##
    +@@ cache.h: char *xdg_cache_home(const char *filename);
    + 
    + int git_open_cloexec(const char *name, int flags);
    + #define git_open(name) git_open_cloexec(name, O_RDONLY)
    +-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
    ++
    ++/**
    ++ * unpack_loose_header() initializes the data stream needed to unpack
    ++ * a loose object header.
    ++ *
    ++ * Returns 0 on success. Returns negative values on error.
    ++ *
    ++ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
    ++ * "hdrbuf" argument is non-NULL. This is intended for use with
    ++ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
    ++ * reporting. The full header will be extracted to "hdrbuf" for use
    ++ * with parse_loose_header().
    ++ */
    ++int unpack_loose_header(git_zstream *stream, unsigned char *map,
    ++			unsigned long mapsize, void *buffer,
    ++			unsigned long bufsiz, struct strbuf *hdrbuf);
    + struct object_info;
    + int parse_loose_header(const char *hdr, struct object_info *oi,
    + 		       unsigned int flags);
    +
      ## object-file.c ##
     @@ object-file.c: void *map_loose_object(struct repository *r,
      	return map_loose_object_1(r, NULL, oid, size);
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
     -	int status = unpack_loose_short_header(stream, map, mapsize,
     -					       buffer, bufsiz);
     -
    - 	if (status < Z_OK)
    - 		return status;
    - 
    +-	if (status < Z_OK)
    +-		return -1;
    +-
     -	/* Make sure we have the terminating NUL */
     -	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
     -		return -1;
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
     -	int status;
     -
     -	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
    --	if (status < Z_OK)
    --		return -1;
    --
    - 	/*
    - 	 * Check if entire header is unpacked in the first iteration.
    - 	 */
    + 	if (status < Z_OK)
    + 		return -1;
    + 
    +@@ object-file.c: static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
      	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
      		return 0;
      
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      	 * buffer[0..bufsiz] was not large enough.  Copy the partial
      	 * result out to header, and then append the result of further
     @@ object-file.c: static int loose_object_info(struct repository *r,
    - 	unsigned long mapsize;
    - 	void *map;
    - 	git_zstream stream;
    -+	int hdr_ret;
      	char hdr[MAX_HEADER_LEN];
      	struct strbuf hdrbuf = STRBUF_INIT;
      	unsigned long size_scratch;
    @@ object-file.c: static int loose_object_info(struct repository *r,
     -		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
     -			status = error(_("unable to unpack %s header with --allow-unknown-type"),
     -				       oid_to_hex(oid));
    --	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
    +-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
     +
    -+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    -+				      allow_unknown ? &hdrbuf : NULL);
    -+	if (hdr_ret < 0) {
    ++	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    ++				allow_unknown ? &hdrbuf : NULL) < 0)
      		status = error(_("unable to unpack %s header"),
      			       oid_to_hex(oid));
    - 	}
    + 	if (status < 0)
     @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
    @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
     
    - ## object-store.h ##
    -@@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
    - int for_each_packed_object(each_packed_object_fn, void *,
    - 			   enum for_each_object_flags flags);
    - 
    -+/**
    -+ * unpack_loose_header() initializes the data stream needed to unpack
    -+ * a loose object header.
    -+ *
    -+ * Returns 0 on success. Returns negative values on error.
    -+ *
    -+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
    -+ * "hdrbuf" argument is non-NULL. This is intended for use with
    -+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
    -+ * reporting. The full header will be extracted to "hdrbuf" for use
    -+ * with parse_loose_header().
    -+ */
    - int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - 			unsigned long mapsize, void *buffer,
    --			unsigned long bufsiz);
    -+			unsigned long bufsiz, struct strbuf *hdrbuf);
    - int parse_loose_header(const char *hdr, struct object_info *oi,
    - 		       unsigned int flags);
    - int check_object_signature(struct repository *r, const struct object_id *oid,
    -
      ## streaming.c ##
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapped,
13:  90489d9e6ec <  -:  ----------- object-file.c: split up ternary in parse_loose_header()
18:  1b7173a5b5b ! 13:  755fde00b46 object-file.c: use "enum" return type for unpack_loose_header()
    @@ Metadata
      ## Commit message ##
         object-file.c: use "enum" return type for unpack_loose_header()
     
    -    In the preceding commits we changed and documented
    -    unpack_loose_header() from return any negative value or zero, to only
    -    -2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
    -    type and use it, and have the compiler assert that we're exhaustively
    -    covering all return values. This gets rid of the need for having a
    -    "default" BUG() case in loose_object_info().
    +    In a preceding commit we changed and documented unpack_loose_header()
    +    from its previous behavior of returning any negative value or zero, to
    +    only -1 or 0.
     
    -    I'm on the fence about whether this is more readable or worth it, but
    -    since it was suggested in [1] to do this let's go for it.
    -
    -    1. https://lore.kernel.org/git/20210527175433.2673306-1-jonathantanmy@google.com/
    +    Let's add an "enum unpack_loose_header_result" type and use it for
    +    these return values, and have the compiler assert that we're
    +    exhaustively covering all of them.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## cache.h ##
    +@@ cache.h: int git_open_cloexec(const char *name, int flags);
    +  * unpack_loose_header() initializes the data stream needed to unpack
    +  * a loose object header.
    +  *
    +- * Returns 0 on success. Returns negative values on error.
    ++ * Returns:
    ++ *
    ++ * - ULHR_OK on success
    ++ * - ULHR_BAD on error
    +  *
    +  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
    +  * "hdrbuf" argument is non-NULL. This is intended for use with
    +@@ cache.h: int git_open_cloexec(const char *name, int flags);
    +  * reporting. The full header will be extracted to "hdrbuf" for use
    +  * with parse_loose_header().
    +  */
    +-int unpack_loose_header(git_zstream *stream, unsigned char *map,
    +-			unsigned long mapsize, void *buffer,
    +-			unsigned long bufsiz, struct strbuf *hdrbuf);
    ++enum unpack_loose_header_result {
    ++	ULHR_OK,
    ++	ULHR_BAD,
    ++};
    ++enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
    ++						    unsigned char *map,
    ++						    unsigned long mapsize,
    ++						    void *buffer,
    ++						    unsigned long bufsiz,
    ++						    struct strbuf *hdrbuf);
    ++
    + struct object_info;
    + int parse_loose_header(const char *hdr, struct object_info *oi,
    + 		       unsigned int flags);
    +
      ## object-file.c ##
     @@ object-file.c: void *map_loose_object(struct repository *r,
      	return map_loose_object_1(r, NULL, oid, size);
    @@ object-file.c: void *map_loose_object(struct repository *r,
      {
      	int status;
      
    +@@ object-file.c: int unpack_loose_header(git_zstream *stream,
    + 	status = git_inflate(stream, 0);
    + 	obj_read_lock();
    + 	if (status < Z_OK)
    +-		return -1;
    ++		return ULHR_BAD;
    + 
    + 	/*
    + 	 * Check if entire header is unpacked in the first iteration.
    + 	 */
    + 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
    +-		return 0;
    ++		return ULHR_OK;
    + 
    + 	/*
    + 	 * We have a header longer than MAX_HEADER_LEN. The "header"
    +@@ object-file.c: int unpack_loose_header(git_zstream *stream,
    + 	 * --allow-unknown-type".
    + 	 */
    + 	if (!header)
    +-		return -1;
    ++		return ULHR_BAD;
    + 
    + 	/*
    + 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
    +@@ object-file.c: int unpack_loose_header(git_zstream *stream,
    + 		stream->next_out = buffer;
    + 		stream->avail_out = bufsiz;
    + 	} while (status != Z_STREAM_END);
    +-	return -1;
    ++	return ULHR_BAD;
    + }
    + 
    + static void *unpack_loose_rest(git_zstream *stream,
     @@ object-file.c: static int loose_object_info(struct repository *r,
    - 	unsigned long mapsize;
    - 	void *map;
    - 	git_zstream stream;
    --	int hdr_ret;
    -+	enum unpack_loose_header_result hdr_ret;
    - 	char hdr[MAX_HEADER_LEN];
    - 	struct strbuf hdrbuf = STRBUF_INIT;
    - 	unsigned long size_scratch;
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    - 				      allow_unknown ? &hdrbuf : NULL);
    - 	switch (hdr_ret) {
    --	case 0:
    -+	case UNPACK_LOOSE_HEADER_RESULT_OK:
    - 		break;
    --	case -1:
    -+	case UNPACK_LOOSE_HEADER_RESULT_BAD:
    + 	if (oi->disk_sizep)
    + 		*oi->disk_sizep = mapsize;
    + 
    +-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    +-				allow_unknown ? &hdrbuf : NULL) < 0)
    ++	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    ++				    allow_unknown ? &hdrbuf : NULL)) {
    ++	case ULHR_OK:
    ++		break;
    ++	case ULHR_BAD:
      		status = error(_("unable to unpack %s header"),
      			       oid_to_hex(oid));
    - 		break;
    --	case -2:
    -+	case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
    - 		status = error(_("header for %s too long, exceeds %d bytes"),
    - 			       oid_to_hex(oid), MAX_HEADER_LEN);
    - 		break;
    --	default:
    --		BUG("unknown hdr_ret value %d", hdr_ret);
    - 	}
    - 	if (!status) {
    - 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
    -
    - ## object-store.h ##
    -@@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
    - int for_each_packed_object(each_packed_object_fn, void *,
    - 			   enum for_each_object_flags flags);
    - 
    -+enum unpack_loose_header_result {
    -+	UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG = -2,
    -+	UNPACK_LOOSE_HEADER_RESULT_BAD = -1,
    -+	UNPACK_LOOSE_HEADER_RESULT_OK,
    -+
    -+};
    +-	if (status < 0)
    +-		; /* Do nothing */
    +-	else if (hdrbuf.len) {
    ++		break;
    ++	}
     +
    - /**
    -  * unpack_loose_header() initializes the data stream needed to unpack
    -  * a loose object header.
    -  *
    -- * Returns 0 on success. Returns negative values on error. If the
    -- * header exceeds MAX_HEADER_LEN -2 will be returned.
    -+ * Returns UNPACK_LOOSE_HEADER_RESULT_OK on success. Returns
    -+ * UNPACK_LOOSE_HEADER_RESULT_BAD values on error, or if the header
    -+ * exceeds MAX_HEADER_LEN UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG will
    -+ * be returned.
    -  *
    -  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
    -  * "hdrbuf" argument is non-NULL. This is intended for use with
    -  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
    -  * reporting. The full header will be extracted to "hdrbuf" for use
    -- * with parse_loose_header(), -2 will still be returned from this
    -- * function to indicate that the header was too long.
    -+ * with parse_loose_header(), UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG
    -+ * will still be returned from this function to indicate that the
    -+ * header was too long.
    -  */
    --int unpack_loose_header(git_zstream *stream, unsigned char *map,
    --			unsigned long mapsize, void *buffer,
    --			unsigned long bufsiz, struct strbuf *hdrbuf);
    -+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
    -+						    unsigned char *map,
    -+						    unsigned long mapsize,
    -+						    void *buffer,
    -+						    unsigned long bufsiz,
    -+						    struct strbuf *hdrbuf);
    - 
    - /**
    -  * parse_loose_header() parses the starting "<type> <len>\0" of an
    ++	if (status < 0) {
    ++		/* Do nothing */
    ++	} else if (hdrbuf.len) {
    + 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    + 			status = error(_("unable to parse %s header with --allow-unknown-type"),
    + 				       oid_to_hex(oid));
     
      ## streaming.c ##
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    - 			      enum object_type *type)
    - {
    - 	struct object_info oi = OBJECT_INFO_INIT;
    -+	enum unpack_loose_header_result hdr_ret;
    - 	oi.sizep = &st->size;
    - 	oi.typep = type;
    - 
      	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
      	if (!st->u.loose.mapped)
      		return -1;
    @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct reposi
     -				 st->u.loose.hdr,
     -				 sizeof(st->u.loose.hdr),
     -				 NULL) < 0) ||
    --	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
    --	    *type < 0) {
    +-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
     -		git_inflate_end(&st->z);
     -		munmap(st->u.loose.mapped, st->u.loose.mapsize);
     -		return -1;
    -+	hdr_ret = unpack_loose_header(&st->z, st->u.loose.mapped,
    -+				      st->u.loose.mapsize, st->u.loose.hdr,
    -+				      sizeof(st->u.loose.hdr), NULL);
    -+	switch (hdr_ret) {
    -+	case UNPACK_LOOSE_HEADER_RESULT_OK:
    ++	switch (unpack_loose_header(&st->z, st->u.loose.mapped,
    ++				    st->u.loose.mapsize, st->u.loose.hdr,
    ++				    sizeof(st->u.loose.hdr), NULL)) {
    ++	case ULHR_OK:
     +		break;
    -+	case UNPACK_LOOSE_HEADER_RESULT_BAD:
    -+	case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
    ++	case ULHR_BAD:
     +		goto error;
      	}
    -+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
    ++	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
     +		goto error;
      
      	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
17:  f28c4f0dfb4 ! 14:  522d71eb19d object-file.c: return -2 on "header too long" in unpack_loose_header()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    object-file.c: return -2 on "header too long" in unpack_loose_header()
    +    object-file.c: return ULHR_TOO_LONG on "header too long"
     
         Split up the return code for "header too long" from the generic
         negative return value unpack_loose_header() returns, and report via
    @@ Commit message
         As a test added earlier in this series in t1006-cat-file.sh shows
         we'll correctly emit zlib errors from zlib.c already in this case, so
         we have no need to carry those return codes further down the
    -    stack. Let's instead just return -2 saying we ran into the
    +    stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
         MAX_HEADER_LEN limit, or other negative values for "unable to unpack
         <OID> header".
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## cache.h ##
    +@@ cache.h: int git_open_cloexec(const char *name, int flags);
    +  *
    +  * - ULHR_OK on success
    +  * - ULHR_BAD on error
    ++ * - ULHR_TOO_LONG if the header was too long
    +  *
    +  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
    +  * "hdrbuf" argument is non-NULL. This is intended for use with
    +  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
    +  * reporting. The full header will be extracted to "hdrbuf" for use
    +- * with parse_loose_header().
    ++ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
    ++ * from this function to indicate that the header was too long.
    +  */
    + enum unpack_loose_header_result {
    + 	ULHR_OK,
    + 	ULHR_BAD,
    ++	ULHR_TOO_LONG,
    + };
    + enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
    + 						    unsigned char *map,
    +
      ## object-file.c ##
    -@@ object-file.c: int unpack_loose_header(git_zstream *stream,
    +@@ object-file.c: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
      	 * --allow-unknown-type".
      	 */
      	if (!header)
    --		return -1;
    -+		return -2;
    +-		return ULHR_BAD;
    ++		return ULHR_TOO_LONG;
      
      	/*
      	 * buffer[0..bufsiz] was not large enough.  Copy the partial
    -@@ object-file.c: int unpack_loose_header(git_zstream *stream,
    +@@ object-file.c: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
      		stream->next_out = buffer;
      		stream->avail_out = bufsiz;
      	} while (status != Z_STREAM_END);
    --	return -1;
    -+	return -2;
    +-	return ULHR_BAD;
    ++	return ULHR_TOO_LONG;
      }
      
      static void *unpack_loose_rest(git_zstream *stream,
     @@ object-file.c: static int loose_object_info(struct repository *r,
    - 
    - 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    - 				      allow_unknown ? &hdrbuf : NULL);
    --	if (hdr_ret < 0) {
    -+	switch (hdr_ret) {
    -+	case 0:
    -+		break;
    -+	case -1:
      		status = error(_("unable to unpack %s header"),
      			       oid_to_hex(oid));
    -+		break;
    -+	case -2:
    + 		break;
    ++	case ULHR_TOO_LONG:
     +		status = error(_("header for %s too long, exceeds %d bytes"),
     +			       oid_to_hex(oid), MAX_HEADER_LEN);
     +		break;
    -+	default:
    -+		BUG("unknown hdr_ret value %d", hdr_ret);
      	}
    - 	if (!status) {
    - 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
    + 
    + 	if (status < 0) {
     
    - ## object-store.h ##
    -@@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
    -  * unpack_loose_header() initializes the data stream needed to unpack
    -  * a loose object header.
    -  *
    -- * Returns 0 on success. Returns negative values on error.
    -+ * Returns 0 on success. Returns negative values on error. If the
    -+ * header exceeds MAX_HEADER_LEN -2 will be returned.
    -  *
    -  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
    -  * "hdrbuf" argument is non-NULL. This is intended for use with
    -  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
    -  * reporting. The full header will be extracted to "hdrbuf" for use
    -- * with parse_loose_header().
    -+ * with parse_loose_header(), -2 will still be returned from this
    -+ * function to indicate that the header was too long.
    -  */
    - int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - 			unsigned long mapsize, void *buffer,
    + ## streaming.c ##
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    + 	case ULHR_OK:
    + 		break;
    + 	case ULHR_BAD:
    ++	case ULHR_TOO_LONG:
    + 		goto error;
    + 	}
    + 	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
     
      ## t/t1006-cat-file.sh ##
    -@@ t/t1006-cat-file.sh: bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
    - 
    - test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
    - 	cat >err.expect <<-EOF &&
    --	error: unable to unpack $bogus_sha1 header
    -+	error: header for $bogus_sha1 too long, exceeds 32 bytes
    - 	fatal: git cat-file: could not get object info
    - 	EOF
    - 
    +@@ t/t1006-cat-file.sh: do
    + 			if test "$arg2" = "-p"
    + 			then
    + 				cat >expect <<-EOF
    +-				error: unable to unpack $bogus_long_sha1 header
    ++				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
    + 				fatal: Not a valid object name $bogus_long_sha1
    + 				EOF
    + 			else
    + 				cat >expect <<-EOF
    +-				error: unable to unpack $bogus_long_sha1 header
    ++				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
    + 				fatal: git cat-file: could not get object info
    + 				EOF
    + 			fi &&
14:  7c9819d37c5 ! 15:  1ca875395c1 object-file.c: stop dying in parse_loose_header()
    @@ Metadata
      ## Commit message ##
         object-file.c: stop dying in parse_loose_header()
     
    -    Start the libification of parse_loose_header() by making it return
    -    error codes and data instead of invoking die() by itself. For now
    -    we'll move the relevant die() call to loose_object_info() and
    -    read_loose_object() to keep this change smaller, but in subsequent
    -    commits we'll also libify those.
    +    Make parse_loose_header() return error codes and data instead of
    +    invoking die() by itself.
     
    -    Since the refactoring of parse_loose_header_extended() into
    -    parse_loose_header() in an earlier commit, its interface accepts a
    -    "unsigned long *sizep". Rather it accepts a "struct object_info *",
    -    that structure will be populated with information about the object.
    +    For now we'll move the relevant die() call to loose_object_info() and
    +    read_loose_object() to keep this change smaller. In a subsequent
    +    commit we'll make read_loose_object() return an error code instead of
    +    dying. We should also address the "allow_unknown" case (should be
    +    moved to builtin/cat-file.c), but for now I'll be leaving it.
     
    -    It thus makes sense to further libify the interface so that it stops
    -    calling die() when it encounters OBJ_BAD, and instead rely on its
    -    callers to check the populated "oi->typep".
    +    For making parse_loose_header() not die() change its prototype to
    +    accept a "struct object_info *" instead of the "unsigned long *sizep"
    +    it accepted before. Its callers can now check the populated populated
    +    "oi->typep".
     
         Because of this we don't need to pass in the "unsigned int flags"
         which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
    @@ Commit message
         variable. In some cases we set it to the return value of "error()",
         i.e. -1, and later checked if "status < 0" was true.
     
    -    In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
    -    2017-06-21) (but the behavior pre-dated that) we did checks of "status
    -    >= 0", because at that point "status" had become the return value of
    -    parse_loose_header(). I.e. a non-negative "enum object_type" (unless
    -    we -1, aka. OBJ_BAD).
    +    Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
    +    objects, 2017-04-01) the return value of loose_object_info() (then
    +    named sha1_loose_object_info()) had been a "status" variable that be
    +    any negative value, as we were expecting to return the "enum
    +    object_type".
     
    -    Now that parse_loose_header() will return 0 on success instead of the
    +    The only negative type happens to be OBJ_BAD, but the code still
    +    assumed that more might be added. This was then used later in
    +    e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
    +    that parse_loose_header() will return 0 on success instead of the
         type (which it'll stick into the "struct object_info") we don't need
         to conflate these two cases in its callers.
     
    +    Since parse_loose_header() doesn't need to return an arbitrary
    +    "status" we only need to treat its "ret < 0" specially, but can
    +    idiomatically overwrite it with our own error() return. This along
    +    with having made unpack_loose_header() return an "enum
    +    unpack_loose_header_result" in an earlier commit means that we can
    +    move the previously nested if/else cases mostly into the "ULHR_OK"
    +    branch of the "switch" statement.
    +
    +    We should be less silent if we reach that "status = -1" branch, which
    +    happens if we've got trailing garbage in loose objects, see
    +    f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
    +    for a better way to handle it. For now let's punt on it, a subsequent
    +    commit will address that edge case.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## cache.h ##
    +@@ cache.h: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
    + 						    unsigned long bufsiz,
    + 						    struct strbuf *hdrbuf);
    + 
    ++/**
    ++ * parse_loose_header() parses the starting "<type> <len>\0" of an
    ++ * object. If it doesn't follow that format -1 is returned. To check
    ++ * the validity of the <type> populate the "typep" in the "struct
    ++ * object_info". It will be OBJ_BAD if the object type is unknown. The
    ++ * parsed <len> can be retrieved via "oi->sizep", and from there
    ++ * passed to unpack_loose_rest().
    ++ */
    + struct object_info;
    +-int parse_loose_header(const char *hdr, struct object_info *oi,
    +-		       unsigned int flags);
    ++int parse_loose_header(const char *hdr, struct object_info *oi);
    + 
    + int check_object_signature(struct repository *r, const struct object_id *oid,
    + 			   void *buf, unsigned long size, const char *type);
    +
      ## object-file.c ##
     @@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
       * too permissive for what we want to check. So do an anal
       * object header parse by hand.
       */
    --int parse_loose_header(const char *hdr,
    --		       struct object_info *oi,
    +-int parse_loose_header(const char *hdr, struct object_info *oi,
     -		       unsigned int flags)
     +int parse_loose_header(const char *hdr, struct object_info *oi)
      {
      	const char *type_buf = hdr;
      	unsigned long size;
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi,
      	type = type_from_string_gently(type_buf, type_len, 1);
      	if (oi->type_name)
      		strbuf_add(oi->type_name, type_buf, type_len);
    @@ object-file.c: int parse_loose_header(const char *hdr,
      	if (oi->typep)
      		*oi->typep = type;
      
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	if (*hdr)
    - 		return -1;
    - 
    --	return type;
    +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi,
    + 	/*
    + 	 * The length must be followed by a zero byte
    + 	 */
    +-	return *hdr ? -1 : type;
    ++	if (*hdr)
    ++		return -1;
    ++
     +	/*
     +	 * The format is valid, but the type may still be bogus. The
     +	 * Caller needs to check its oi->typep.
    @@ object-file.c: static int loose_object_info(struct repository *r,
      	struct strbuf hdrbuf = STRBUF_INIT;
      	unsigned long size_scratch;
     +	enum object_type type_scratch;
    -+	int parsed_header = 0;
      	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      
      	if (oi->delta_base_oid)
    @@ object-file.c: static int loose_object_info(struct repository *r,
      	if (oi->disk_sizep)
      		*oi->disk_sizep = mapsize;
     @@ object-file.c: static int loose_object_info(struct repository *r,
    + 	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
    + 				    allow_unknown ? &hdrbuf : NULL)) {
    + 	case ULHR_OK:
    ++		if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
    ++			status = error(_("unable to parse %s header"), oid_to_hex(oid));
    ++		else if (!allow_unknown && *oi->typep < 0)
    ++			die(_("invalid object type"));
    ++
    ++		if (!oi->contentp)
    ++			break;
    ++		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
    ++		if (*oi->contentp)
    ++			goto cleanup;
    ++
    ++		status = -1;
    + 		break;
    + 	case ULHR_BAD:
      		status = error(_("unable to unpack %s header"),
    - 			       oid_to_hex(oid));
    +@@ object-file.c: static int loose_object_info(struct repository *r,
    + 		break;
      	}
    --
    + 
     -	if (status < 0) {
     -		/* Do nothing */
     -	} else if (hdrbuf.len) {
     -		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
     -			status = error(_("unable to parse %s header with --allow-unknown-type"),
     -				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
    +-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
     -		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	if (!status) {
    -+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
    -+			/*
    -+			 * oi->{sizep,typep} are meaningless unless
    -+			 * parse_loose_header() returns >= 0.
    -+			 */
    -+			parsed_header = 1;
    -+		else
    -+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
    - 	}
    -+	if (!allow_unknown && parsed_header && *oi->typep < 0)
    -+		die(_("invalid object type"));
    - 
    +-
     -	if (status >= 0 && oi->contentp) {
    -+	if (parsed_header && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
    - 						  *oi->sizep, oid);
    - 		if (!*oi->contentp) {
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    +-		*oi->contentp = unpack_loose_rest(&stream, hdr,
    +-						  *oi->sizep, oid);
    +-		if (!*oi->contentp) {
    +-			git_inflate_end(&stream);
    +-			status = -1;
    +-		}
    +-	} else
    +-		git_inflate_end(&stream);
    +-
    ++	git_inflate_end(&stream);
    ++cleanup:
    + 	munmap(map, mapsize);
      	if (oi->sizep == &size_scratch)
      		oi->sizep = NULL;
      	strbuf_release(&hdrbuf);
     +	if (oi->typep == &type_scratch)
     +		oi->typep = NULL;
      	oi->whence = OI_LOOSE;
    - 	return (status < 0) ? status : 0;
    +-	return (status < 0) ? status : 0;
    ++	return status;
      }
    + 
    + int obj_read_use_lock = 0;
     @@ object-file.c: int read_loose_object(const char *path,
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
    @@ object-file.c: int read_loose_object(const char *path,
      	if (*type == OBJ_BLOB && *size > big_file_threshold) {
      		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
     
    - ## object-store.h ##
    -@@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
    - int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - 			unsigned long mapsize, void *buffer,
    - 			unsigned long bufsiz, struct strbuf *hdrbuf);
    --int parse_loose_header(const char *hdr, struct object_info *oi,
    --		       unsigned int flags);
    -+
    -+/**
    -+ * parse_loose_header() parses the starting "<type> <len>\0" of an
    -+ * object. If it doesn't follow that format -1 is returned. To check
    -+ * the validity of the <type> populate the "typep" in the "struct
    -+ * object_info". It will be OBJ_BAD if the object type is unknown. The
    -+ * parsed <len> can be retrieved via "oi->sizep", and from there
    -+ * passed to unpack_loose_rest().
    -+ */
    -+int parse_loose_header(const char *hdr, struct object_info *oi);
    -+
    - int check_object_signature(struct repository *r, const struct object_id *oid,
    - 			   void *buf, unsigned long size, const char *type);
    - int finalize_object_file(const char *tmpfile, const char *filename);
    -
      ## streaming.c ##
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      {
    @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct reposi
      	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
      	if (!st->u.loose.mapped)
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    - 				 st->u.loose.hdr,
    - 				 sizeof(st->u.loose.hdr),
    - 				 NULL) < 0) ||
    --	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
    -+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
    -+	    *type < 0) {
    - 		git_inflate_end(&st->z);
    - 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
    - 		return -1;
    + 	case ULHR_TOO_LONG:
    + 		goto error;
    + 	}
    +-	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
    ++	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
    + 		goto error;
    + 
    + 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
15:  3fb660ff944 <  -:  ----------- object-file.c: guard against future bugs in loose_object_info()
20:  3bf3cf2299d <  -:  ----------- object-store.h: move read_loose_object() below 'struct object_info'
21:  974f650cddf ! 16:  d38067feab3 fsck: report invalid types recorded in objects
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck: report invalid types recorded in objects
    +    fsck: don't hard die on invalid object types
     
    -    Continue the work in the preceding commit and improve the error on:
    +    Change the error fsck emits on invalid object types, such as:
     
             $ git hash-object --stdin -w -t garbage --literally </dev/null
    +        <OID>
    +
    +    From the very ungraceful error of:
    +
             $ git fsck
    -        error: hash mismatch for <OID_PATH> (expected <OID>)
    -        error: <OID>: object corrupt or missing: <OID_PATH>
    -        [ other fsck output ]
    +        fatal: invalid object type
    +        $
     
    -    To instead emit:
    +    To:
     
             $ git fsck
             error: <OID>: object is of unknown type 'garbage': <OID_PATH>
             [ other fsck output ]
     
    -    The complaint about a "hash mismatch" was simply an emergent property
    -    of how we'd fall though from read_loose_object() into fsck_loose()
    -    when we didn't get the data we expected. Now we'll correctly note that
    -    the object type is invalid.
    +    We'll still exit with non-zero, but now we'll finish the rest of the
    +    traversal. The tests that's being added here asserts that we'll still
    +    complain about other fsck issues (e.g. an unrelated dangling blob).
    +
    +    To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
    +    flag from read_loose_object() through to parse_loose_header(). Since
    +    the read_loose_object() function is only used in builtin/fsck.c we can
    +    simply change it to accept a "struct object_info" (which contains the
    +    OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
    +    f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
    +    for the introduction of read_loose_object().
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      	unsigned long size;
      	void *contents;
      	int eaten;
    --
    --	if (read_loose_object(path, oid, &type, &size, &contents,
    --			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
    --		errors_found |= ERROR_OBJECT;
     +	struct strbuf sb = STRBUF_INIT;
    -+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    -+	struct object_info oi;
    -+	int found = 0;
    ++	struct object_info oi = OBJECT_INFO_INIT;
    ++	int err = 0;
    + 
    +-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
     +	oi.type_name = &sb;
     +	oi.sizep = &size;
     +	oi.typep = &type;
     +
    -+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
    -+		found |= ERROR_OBJECT;
    - 		error(_("%s: object corrupt or missing: %s"),
    - 		      oid_to_hex(oid), path);
    -+	}
    -+	if (type < 0) {
    -+		found |= ERROR_OBJECT;
    -+		error(_("%s: object is of unknown type '%s': %s"),
    -+		      oid_to_hex(oid), sb.buf, path);
    -+	}
    -+	if (found) {
    -+		errors_found |= ERROR_OBJECT;
    ++	if (read_loose_object(path, oid, &contents, &oi) < 0)
    ++		err = error(_("%s: object corrupt or missing: %s"),
    ++			    oid_to_hex(oid), path);
    ++	if (type < 0)
    ++		err = error(_("%s: object is of unknown type '%s': %s"),
    ++			    oid_to_hex(oid), sb.buf, path);
    ++	if (err) {
    + 		errors_found |= ERROR_OBJECT;
    +-		error(_("%s: object corrupt or missing: %s"),
    +-		      oid_to_hex(oid), path);
      		return 0; /* keep checking other objects */
      	}
      
    @@ object-file.c: static int check_stream_oid(git_zstream *stream,
      		      const struct object_id *expected_oid,
     -		      enum object_type *type,
     -		      unsigned long *size,
    - 		      void **contents,
    -+		      struct object_info *oi,
    - 		      unsigned int oi_flags)
    +-		      void **contents)
    ++		      void **contents,
    ++		      struct object_info *oi)
      {
      	int ret = -1;
    -@@ object-file.c: int read_loose_object(const char *path,
    + 	void *map = NULL;
      	unsigned long mapsize;
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
     -	struct object_info oi = OBJECT_INFO_INIT;
    - 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
     -	oi.typep = type;
     -	oi.sizep = size;
    -+	enum object_type *type = oi->typep;
     +	unsigned long *size = oi->sizep;
      
      	*contents = NULL;
    @@ object-file.c: int read_loose_object(const char *path,
      		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
      		goto out;
    + 	}
    +-	if (*type < 0)
    +-		die(_("invalid object type"));
    + 
    +-	if (*type == OBJ_BLOB && *size > big_file_threshold) {
    ++	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
    + 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
    + 			goto out;
    + 	} else {
     @@ object-file.c: int read_loose_object(const char *path,
      			goto out;
      		}
    @@ object-file.c: int read_loose_object(const char *path,
      			free(*contents);
     
      ## object-store.h ##
    -@@ object-store.h: int oid_object_info_extended(struct repository *r,
    +@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
      
      /*
       * Open the loose object at path, check its hash, and return the contents,
    @@ object-store.h: int oid_object_info_extended(struct repository *r,
       * type, and size. If the object is a blob, then "contents" may return NULL,
       * to allow streaming of large blobs.
       *
    -@@ object-store.h: int oid_object_info_extended(struct repository *r,
    +@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
       */
      int read_loose_object(const char *path,
      		      const struct object_id *expected_oid,
     -		      enum object_type *type,
     -		      unsigned long *size,
    - 		      void **contents,
    -+		      struct object_info *oi,
    - 		      unsigned int oi_flags);
    +-		      void **contents);
    ++		      void **contents,
    ++		      struct object_info *oi);
      
    - /*
    + /* Retry packed storage after checking packed and loose storage */
    + #define HAS_OBJECT_RECHECK_PACKED 1
     
      ## t/t1450-fsck.sh ##
    -@@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
    - 	)
    - '
    +@@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
    + 		cmt=$(echo bogus | git commit-tree $tree) &&
    + 		git update-ref refs/heads/bogus $cmt &&
      
    -+test_expect_success 'object with hash and type mismatch' '
    -+	git init --bare hash-type-mismatch &&
    -+	(
    -+		cd hash-type-mismatch &&
    -+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
    -+		old=$(test_oid_to_path "$oid") &&
    -+		new=$(dirname $old)/$(test_oid ff_2) &&
    -+		oid="$(dirname $new)$(basename $new)" &&
    -+		mv objects/$old objects/$new &&
    -+		git update-index --add --cacheinfo 100644 $oid foo &&
    -+		tree=$(git write-tree) &&
    -+		cmt=$(echo bogus | git commit-tree $tree) &&
    -+		git update-ref refs/heads/bogus $cmt &&
    +-		cat >expect <<-\EOF &&
    +-		fatal: invalid object type
    +-		EOF
    +-		test_must_fail git fsck 2>actual &&
    +-		test_cmp expect actual
    ++
     +		test_must_fail git fsck 2>out &&
     +		grep "^error: hash mismatch for " out &&
     +		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
    -+	)
    -+'
    -+
    - test_expect_success 'branch pointing to non-commit' '
    - 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
    - 	test_when_finished "git update-ref -d refs/heads/invalid" &&
    -@@ t/t1450-fsck.sh: test_expect_success 'fsck error and recovery on invalid object type' '
    - 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
    - 	test_must_fail git -C garbage-type fsck >out 2>err &&
    - 	grep -e "^error" -e "^fatal" err >errors &&
    --	test_line_count = 2 errors &&
    --	grep "error: hash mismatch for" err &&
    --	grep "$garbage_blob: object corrupt or missing:" err &&
    -+	test_line_count = 1 errors &&
    -+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
    - 	grep "dangling blob $empty_blob" out
    + 	)
    + '
    + 
    +@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
    + 	test_i18ngrep "bad index file" errors
    + '
    + 
    +-test_expect_success 'fsck hard errors on an invalid object type' '
    ++test_expect_success 'fsck error and recovery on invalid object type' '
    + 	git init --bare garbage-type &&
    + 	(
    + 		cd garbage-type &&
    +@@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type' '
    + 		fatal: invalid object type
    + 		EOF
    + 		test_must_fail git fsck >out 2>err &&
    +-		test_cmp err.expect err &&
    +-		test_must_be_empty out
    ++		grep -e "^error" -e "^fatal" err >errors &&
    ++		test_line_count = 1 errors &&
    ++		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
    ++		grep "dangling blob $empty_blob" out
    + 	)
      '
      
22:  804673a17b0 ! 17:  b07e892fc19 fsck: report invalid object type-path combinations
    @@ Commit message
         value for checking whether we got far enough to be certain that the
         issue was indeed this OID mismatch.
     
    -    In the case of check_object_signature() I don't really trust all the
    -    moving parts there to behave consistently, in the face of future
    -    refactorings. Getting it wrong would mean that we'd potentially emit
    -    no error at all on a failing check_object_signature(), or worse
    -    misreport whatever issue we encountered. So let's use the new bug()
    -    function to ferry and return code up to fsck_loose() in that case.
    +    We need to add the "object corrupt or missing" special-case to deal
    +    with cases where read_loose_object() will return an error before
    +    completing check_object_signature(), e.g. if we have an error in
    +    unpack_loose_rest() because we find garbage after the valid gzip
    +    content:
    +
    +        $ git hash-object --stdin -w -t blob </dev/null
    +        e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    +        $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    +        $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    +        $ git fsck
    +        error: garbage at end of loose object 'e69d[...]'
    +        error: unable to unpack contents of ./objects/e6/9d[...]
    +        error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
    +
    +    There is currently some weird messaging in the edge case when the two
    +    are combined, i.e. because we're not explicitly passing along an error
    +    state about this specific scenario from check_stream_oid() via
    +    read_loose_object() we'll end up printing the null OID if an object is
    +    of an unknown type *and* it can't be unpacked by zlib, e.g.:
    +
    +        $ git hash-object --stdin -w -t garbage --literally </dev/null
    +        8315a83d2acc4c174aed59430f9a9c4ed926440f
    +        $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    +        $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    +        $ /usr/bin/git fsck
    +        fatal: invalid object type
    +        $ ~/g/git/git fsck
    +        error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    +        error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    +        error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    +        error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    +        [...]
    +
    +    I think it's OK to leave that for future improvements, which would
    +    involve enum-ifying more error state as we've done with "enum
    +    unpack_loose_header_result" in preceding commits. In these
    +    increasingly more obscure cases the worst that can happen is that
    +    we'll get slightly nonsensical or inapplicable error messages.
    +
    +    There's other such potential edge cases, all of which might produce
    +    some confusing messaging, but still be handled correctly as far as
    +    passing along errors goes. E.g. if check_object_signature() returns
    +    and oideq(real_oid, null_oid()) is true, which could happen if it
    +    returns -1 due to the read_istream() call having failed.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/fast-export.c: static void export_blob(const struct object_id *oid)
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    + 	struct object *obj;
    + 	enum object_type type;
    + 	unsigned long size;
    +-	void *contents;
    ++	unsigned char *contents = NULL;
    + 	int eaten;
      	struct strbuf sb = STRBUF_INIT;
    - 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    - 	struct object_info oi;
    + 	struct object_info oi = OBJECT_INFO_INIT;
    +-	int err = 0;
     +	struct object_id real_oid = *null_oid();
    - 	int found = 0;
    ++	int ret;
    + 
      	oi.type_name = &sb;
      	oi.sizep = &size;
      	oi.typep = &type;
      
    --	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
    -+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
    - 		found |= ERROR_OBJECT;
    --		error(_("%s: object corrupt or missing: %s"),
    --		      oid_to_hex(oid), path);
    -+		if (!oideq(&real_oid, oid))
    +-	if (read_loose_object(path, oid, &contents, &oi) < 0)
    +-		err = error(_("%s: object corrupt or missing: %s"),
    +-			    oid_to_hex(oid), path);
    ++	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
    ++	if (ret < 0) {
    ++		if (contents && !oideq(&real_oid, oid))
     +			error(_("%s: hash-path mismatch, found at: %s"),
     +			      oid_to_hex(&real_oid), path);
     +		else
     +			error(_("%s: object corrupt or missing: %s"),
     +			      oid_to_hex(oid), path);
    - 	}
    - 	if (type < 0) {
    - 		found |= ERROR_OBJECT;
    - 		error(_("%s: object is of unknown type '%s': %s"),
    --		      oid_to_hex(oid), sb.buf, path);
    -+		      oid_to_hex(&real_oid), sb.buf, path);
    - 	}
    - 	if (found) {
    ++	}
    + 	if (type < 0)
    +-		err = error(_("%s: object is of unknown type '%s': %s"),
    +-			    oid_to_hex(oid), sb.buf, path);
    +-	if (err) {
    ++		ret = error(_("%s: object is of unknown type '%s': %s"),
    ++			    oid_to_hex(&real_oid), sb.buf, path);
    ++	if (ret < 0) {
      		errors_found |= ERROR_OBJECT;
    + 		return 0; /* keep checking other objects */
    + 	}
     
      ## builtin/index-pack.c ##
     @@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
    @@ builtin/mktag.c: static int verify_object_in_tag(struct object_id *tagged_oid, i
      
      	return ret;
     
    + ## cache.h ##
    +@@ cache.h: struct object_info;
    + int parse_loose_header(const char *hdr, struct object_info *oi);
    + 
    + int check_object_signature(struct repository *r, const struct object_id *oid,
    +-			   void *buf, unsigned long size, const char *type);
    ++			   void *buf, unsigned long size, const char *type,
    ++			   struct object_id *real_oidp);
    + 
    + int finalize_object_file(const char *tmpfile, const char *filename);
    + 
    +
      ## object-file.c ##
     @@ object-file.c: void *xmmap(void *start, size_t length,
       * the streaming interface and rehash it to do the same.
    @@ object-file.c: static int check_stream_oid(git_zstream *stream,
      		      const struct object_id *expected_oid,
     +		      struct object_id *real_oid,
      		      void **contents,
    - 		      struct object_info *oi,
    - 		      unsigned int oi_flags)
    + 		      struct object_info *oi)
    + {
    +@@ object-file.c: int read_loose_object(const char *path,
    + 	char hdr[MAX_HEADER_LEN];
    + 	unsigned long *size = oi->sizep;
    + 
    +-	*contents = NULL;
    +-
    + 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
    + 	if (!map) {
    + 		error_errno(_("unable to mmap %s"), path);
     @@ object-file.c: int read_loose_object(const char *path,
      			goto out;
      		}
    @@ object-file.c: int read_loose_object(const char *path,
     -			error(_("hash mismatch for %s (expected %s)"), path,
     -			      oid_to_hex(expected_oid));
     +					   *contents, *size, oi->type_name->buf, real_oid)) {
    -+			if (oideq(real_oid, null_oid()))
    -+				BUG("should only get OID mismatch errors with mapped contents");
      			free(*contents);
      			goto out;
      		}
     
      ## object-store.h ##
    -@@ object-store.h: int oid_object_info_extended(struct repository *r,
    +@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
       */
      int read_loose_object(const char *path,
      		      const struct object_id *expected_oid,
     +		      struct object_id *real_oid,
      		      void **contents,
    - 		      struct object_info *oi,
    - 		      unsigned int oi_flags);
    -@@ object-store.h: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
    - int parse_loose_header(const char *hdr, struct object_info *oi);
    - 
    - int check_object_signature(struct repository *r, const struct object_id *oid,
    --			   void *buf, unsigned long size, const char *type);
    -+			   void *buf, unsigned long size, const char *type,
    -+			   struct object_id *real_oidp);
    - int finalize_object_file(const char *tmpfile, const char *filename);
    - int check_and_freshen_file(const char *fn, int freshen);
    + 		      struct object_info *oi);
      
     
      ## object.c ##
    @@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose ob
     
      ## t/t1450-fsck.sh ##
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
    - 	(
      		cd hash-mismatch &&
    + 
      		oid=$(echo blob | git hash-object -w --stdin) &&
     +		oldoid=$oid &&
      		old=$(test_oid_to_path "$oid") &&
      		new=$(dirname $old)/$(test_oid ff_2) &&
      		oid="$(dirname $new)$(basename $new)" &&
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
    - 		cmt=$(echo bogus | git commit-tree $tree) &&
      		git update-ref refs/heads/bogus $cmt &&
    + 
      		test_must_fail git fsck 2>out &&
    --		test_i18ngrep "$oid.*corrupt" out
    +-		grep "$oid.*corrupt" out
     +		grep "$oldoid: hash-path mismatch, found at: .*$new" out
      	)
      '
      
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
    - 	(
      		cd hash-type-mismatch &&
    + 
      		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
     +		oldoid=$oid &&
      		old=$(test_oid_to_path "$oid") &&
      		new=$(dirname $old)/$(test_oid ff_2) &&
      		oid="$(dirname $new)$(basename $new)" &&
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
    - 		cmt=$(echo bogus | git commit-tree $tree) &&
    - 		git update-ref refs/heads/bogus $cmt &&
    + 
    + 
      		test_must_fail git fsck 2>out &&
     -		grep "^error: hash mismatch for " out &&
     -		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                                 ` (16 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..969bfbbdd8f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	(
+		cd garbage-type &&
+
+		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
+		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+		cat >err.expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck >out 2>err &&
+		test_cmp err.expect err &&
+		test_must_be_empty out
+	)
+'
+
 test_done
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
                                 ` (15 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.

See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.

1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 969bfbbdd8f..f8edd15abf8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
 
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		test_must_fail git fsck 2>out &&
+		grep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
                                 ` (14 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f8edd15abf8..175ed304637 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		cat >expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck 2>actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (2 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
                                 ` (13 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 175ed304637..bd696d21dba 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
 	)
 '
 
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+	git init --bare corrupt-loose-output &&
+	(
+		cd corrupt-loose-output &&
+		oid=$(git hash-object -w --stdin --literally </dev/null) &&
+		oidf=objects/$(test_oid_to_path "$oid") &&
+		chmod 755 $oidf &&
+		echo extra garbage >>$oidf &&
+
+		cat >expect.error <<-EOF &&
+		error: garbage at end of loose object '\''$oid'\''
+		error: unable to unpack contents of ./$oidf
+		error: $oid: object corrupt or missing: ./$oidf
+		EOF
+		test_must_fail git fsck 2>actual &&
+		grep ^error: actual >error &&
+		test_cmp expect.error error
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (3 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
                                 ` (12 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+	bogus_short_type="bogus" &&
+	bogus_short_content="bogus" &&
+	bogus_short_size=$(strlen "$bogus_short_content") &&
+	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+	bogus_long_content="bogus" &&
+	bogus_long_size=$(strlen "$bogus_long_content") &&
+	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
 
 test_expect_success "Type of broken object is correct" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of broken object is correct" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
 test_expect_success "Type of broken object is correct when type is large" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of large broken object is correct when type is large" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (4 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-21  3:30                 ` Taylor Blau
  2021-09-20 19:04               ` [PATCH v7 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                                 ` (11 subsequent siblings)
  17 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().

The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.

The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.

Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.

This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/oid-info/oid      |  2 ++
 t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..ecffa9045f9 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric		sha1:0123456789012345678901234567890123456789
 numeric		sha256:0123456789012345678901234567890123456789012345678901234567890123
 deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
 deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbee_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..af59613250b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
 	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
 '
 
+for arg1 in '' --allow-unknown-type
+do
+	for arg2 in -s -t -p
+	do
+		if test $arg1 = "--allow-unknown-type" && test "$arg2" = "-p"
+		then
+			continue
+		fi
+
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+			cat >expect <<-\EOF &&
+			fatal: invalid object type
+			EOF
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: Not a valid object name $bogus_long_sha1
+				EOF
+			else
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+			cat >expect.err <<-EOF &&
+			fatal: Not a valid object name $(test_oid deadbeef_short)
+			EOF
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+			test_must_be_empty out
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect.err <<-EOF
+				fatal: Not a valid object name $(test_oid deadbeef)
+				EOF
+			else
+				cat >expect.err <<-\EOF
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+			test_must_be_empty out &&
+			test_cmp expect.err err.actual
+		'
+	done
+done
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 07/17] cat-file tests: add corrupt loose object test
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (5 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                                 ` (10 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index af59613250b..8bbc34efb0c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (6 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                                 ` (9 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8bbc34efb0c..269ab7e4729 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
 	done
 done
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_short_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_short_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_short_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	test_when_finished "rm -rf .git/refs/replace" &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (7 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                                 ` (8 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index a8be8994814..bda3497d5ca 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1503,8 +1503,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (8 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                                 ` (7 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index bda3497d5ca..774ec8c866f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1262,7 +1262,7 @@ int unpack_loose_header(git_zstream *stream,
 					       buffer, bufsiz);
 
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/* Make sure we have the terminating NUL */
 	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (9 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                                 ` (6 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.

1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       |  4 +++-
 object-file.c | 20 +++++++-------------
 streaming.c   |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index d23de693680..33cacbd22ac 100644
--- a/cache.h
+++ b/cache.h
@@ -1314,7 +1314,9 @@ char *xdg_cache_home(const char *filename);
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
 int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 774ec8c866f..33a01ac203f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1363,8 +1363,8 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1424,14 +1424,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1486,10 +1478,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2573,6 +2565,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2587,7 +2581,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header()
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (10 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
                                 ` (5 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 17 ++++++++++++++-
 object-file.c | 58 ++++++++++++++++++---------------------------------
 streaming.c   |  3 ++-
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 33cacbd22ac..9ad81d452ad 100644
--- a/cache.h
+++ b/cache.h
@@ -1313,7 +1313,22 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 33a01ac203f..8dd35f768bb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,11 +1233,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1248,35 +1249,8 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
-	if (status < Z_OK)
-		return -1;
-
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
 	if (status < Z_OK)
 		return -1;
 
@@ -1286,6 +1260,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1435,6 +1417,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1468,11 +1451,9 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				allow_unknown ? &hdrbuf : NULL) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	if (status < 0)
@@ -2576,7 +2557,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (11 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
                                 ` (4 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.

Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 19 +++++++++++++++----
 object-file.c | 34 +++++++++++++++++++++-------------
 streaming.c   | 23 +++++++++++++----------
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index 9ad81d452ad..90dde86828e 100644
--- a/cache.h
+++ b/cache.h
@@ -1318,7 +1318,10 @@ int git_open_cloexec(const char *name, int flags);
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1326,9 +1329,17 @@ int git_open_cloexec(const char *name, int flags);
  * reporting. The full header will be extracted to "hdrbuf" for use
  * with parse_loose_header().
  */
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
-			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+	ULHR_OK,
+	ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *hdrbuf);
+
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 8dd35f768bb..b214a152ca8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,10 +1233,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz,
-			struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *header)
 {
 	int status;
 
@@ -1252,13 +1254,13 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return 0;
+		return ULHR_OK;
 
 	/*
 	 * We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1266,7 +1268,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1287,7 +1289,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return ULHR_BAD;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1452,13 +1454,19 @@ static int loose_object_info(struct repository *r,
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				allow_unknown ? &hdrbuf : NULL) < 0)
+	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				    allow_unknown ? &hdrbuf : NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+		break;
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
-	if ((unpack_loose_header(&st->z,
-				 st->u.loose.mapped,
-				 st->u.loose.mapsize,
-				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr),
-				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
-		git_inflate_end(&st->z);
-		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-		return -1;
+	switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+				    st->u.loose.mapsize, st->u.loose.hdr,
+				    sizeof(st->u.loose.hdr), NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
+		goto error;
 	}
+	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->read = read_istream_loose;
 
 	return 0;
+error:
+	git_inflate_end(&st->z);
+	munmap(st->u.loose.mapped, st->u.loose.mapsize);
+	return -1;
 }
 
 
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (12 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
                                 ` (3 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h             | 5 ++++-
 object-file.c       | 8 ++++++--
 streaming.c         | 1 +
 t/t1006-cat-file.sh | 4 ++--
 4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 90dde86828e..49b18f2755c 100644
--- a/cache.h
+++ b/cache.h
@@ -1322,16 +1322,19 @@ int git_open_cloexec(const char *name, int flags);
  *
  * - ULHR_OK on success
  * - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
  */
 enum unpack_loose_header_result {
 	ULHR_OK,
 	ULHR_BAD,
+	ULHR_TOO_LONG,
 };
 enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned char *map,
diff --git a/object-file.c b/object-file.c
index b214a152ca8..ca4abe172ce 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1268,7 +1268,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return ULHR_BAD;
+		return ULHR_TOO_LONG;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1289,7 +1289,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return ULHR_BAD;
+	return ULHR_TOO_LONG;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1462,6 +1462,10 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 		break;
+	case ULHR_TOO_LONG:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
 	}
 
 	if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_OK:
 		break;
 	case ULHR_BAD:
+	case ULHR_TOO_LONG:
 		goto error;
 	}
 	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 269ab7e4729..711dcc6d795 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
 			if test "$arg2" = "-p"
 			then
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: Not a valid object name $bogus_long_sha1
 				EOF
 			else
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: git cat-file: could not get object info
 				EOF
 			fi &&
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header()
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (13 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                                 ` (2 subsequent siblings)
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make parse_loose_header() return error codes and data instead of
invoking die() by itself.

For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.

For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".

The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.

We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 11 +++++++--
 object-file.c | 67 +++++++++++++++++++++++++--------------------------
 streaming.c   |  3 ++-
 3 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/cache.h b/cache.h
index 49b18f2755c..23f0534b70e 100644
--- a/cache.h
+++ b/cache.h
@@ -1343,9 +1343,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned long bufsiz,
 						    struct strbuf *hdrbuf);
 
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
 struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index ca4abe172ce..1af914c19c6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1347,8 +1347,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1370,15 +1369,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1405,7 +1395,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1419,6 +1416,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1450,6 +1448,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1457,6 +1457,18 @@ static int loose_object_info(struct repository *r,
 	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				    allow_unknown ? &hdrbuf : NULL)) {
 	case ULHR_OK:
+		if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
+		else if (!allow_unknown && *oi->typep < 0)
+			die(_("invalid object type"));
+
+		if (!oi->contentp)
+			break;
+		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+		if (*oi->contentp)
+			goto cleanup;
+
+		status = -1;
 		break;
 	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
@@ -1468,31 +1480,16 @@ static int loose_object_info(struct repository *r,
 		break;
 	}
 
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
-	if (status >= 0 && oi->contentp) {
-		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
-		if (!*oi->contentp) {
-			git_inflate_end(&stream);
-			status = -1;
-		}
-	} else
-		git_inflate_end(&stream);
-
+	git_inflate_end(&stream);
+cleanup:
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
-	return (status < 0) ? status : 0;
+	return status;
 }
 
 int obj_read_use_lock = 0;
@@ -2559,6 +2556,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2575,12 +2573,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_TOO_LONG:
 		goto error;
 	}
-	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
 		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 16/17] fsck: don't hard die on invalid object types
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (14 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-20 19:04               ` [PATCH v7 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 17 ++++++++++++++---
 object-file.c   | 18 ++++++------------
 object-store.h  |  6 +++---
 t/t1450-fsck.sh | 17 +++++++++--------
 4 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..3b046820750 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,11 +600,22 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
+	struct strbuf sb = STRBUF_INIT;
+	struct object_info oi = OBJECT_INFO_INIT;
+	int err = 0;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi) < 0)
+		err = error(_("%s: object corrupt or missing: %s"),
+			    oid_to_hex(oid), path);
+	if (type < 0)
+		err = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(oid), sb.buf, path);
+	if (err) {
 		errors_found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 1af914c19c6..be568ade95b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2546,18 +2546,15 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      struct object_info *oi)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.typep = type;
-	oi.sizep = size;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2573,15 +2570,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
 
-	if (*type == OBJ_BLOB && *size > big_file_threshold) {
+	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
 			goto out;
 	} else {
@@ -2592,8 +2587,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index b4dc6668aa2..e8b4d87b898 100644
--- a/object-store.h
+++ b/object-store.h
@@ -244,6 +244,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -251,9 +252,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      struct object_info *oi);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bd696d21dba..167c319823a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 
-		cat >expect <<-\EOF &&
-		fatal: invalid object type
-		EOF
-		test_must_fail git fsck 2>actual &&
-		test_cmp expect actual
+
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
@@ -910,7 +909,7 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	(
 		cd garbage-type &&
@@ -922,8 +921,10 @@ test_expect_success 'fsck hard errors on an invalid object type' '
 		fatal: invalid object type
 		EOF
 		test_must_fail git fsck >out 2>err &&
-		test_cmp err.expect err &&
-		test_must_be_empty out
+		grep -e "^error" -e "^fatal" err >errors &&
+		test_line_count = 1 errors &&
+		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+		grep "dangling blob $empty_blob" out
 	)
 '
 
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v7 17/17] fsck: report invalid object type-path combinations
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (15 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04               ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 23 +++++++++++++++--------
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 cache.h               |  3 ++-
 object-file.c         | 21 ++++++++++-----------
 object-store.h        |  1 +
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 11 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3b046820750..d925cdbae5c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct object *obj;
 	enum object_type type;
 	unsigned long size;
-	void *contents;
+	unsigned char *contents = NULL;
 	int eaten;
 	struct strbuf sb = STRBUF_INIT;
 	struct object_info oi = OBJECT_INFO_INIT;
-	int err = 0;
+	struct object_id real_oid = *null_oid();
+	int ret;
 
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi) < 0)
-		err = error(_("%s: object corrupt or missing: %s"),
-			    oid_to_hex(oid), path);
+	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
+	if (ret < 0) {
+		if (contents && !oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
+	}
 	if (type < 0)
-		err = error(_("%s: object is of unknown type '%s': %s"),
-			    oid_to_hex(oid), sb.buf, path);
-	if (err) {
+		ret = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(&real_oid), sb.buf, path);
+	if (ret < 0) {
 		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 6cc48902170..17c4b1d3ead 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/cache.h b/cache.h
index 23f0534b70e..44b11f52362 100644
--- a/cache.h
+++ b/cache.h
@@ -1355,7 +1355,8 @@ struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 
 int finalize_object_file(const char *tmpfile, const char *filename);
 
diff --git a/object-file.c b/object-file.c
index be568ade95b..ff0e465d556 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1062,9 +1062,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1072,8 +1074,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1098,9 +1100,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2546,6 +2548,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi)
 {
@@ -2556,8 +2559,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2587,9 +2588,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index e8b4d87b898..77aa3d857cc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -252,6 +252,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi);
 
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 711dcc6d795..1f7cc0717b7 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 167c319823a..eb0e772f098 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
 		cd hash-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
 		git update-ref refs/heads/bogus $cmt &&
 
 		test_must_fail git fsck 2>out &&
-		grep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
 		cd hash-type-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
 
 
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.33.0.1098.g29a6526ae47


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
  2021-09-20 19:04               ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-21  3:30                 ` Taylor Blau
  0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-21  3:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Mon, Sep 20, 2021 at 09:04:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/t/oid-info/oid b/t/oid-info/oid
> index a754970523c..ecffa9045f9 100644
> --- a/t/oid-info/oid
> +++ b/t/oid-info/oid
> @@ -27,3 +27,5 @@ numeric		sha1:0123456789012345678901234567890123456789
>  numeric		sha256:0123456789012345678901234567890123456789012345678901234567890123
>  deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>  deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> +deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
> +deadbee_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee

This jumped out at me while I was reading it. In the second line,
s/deadbee_short/deadbeef_short/ ?

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index ea6a53d425b..af59613250b 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
>  	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
>  '
>
> +for arg1 in '' --allow-unknown-type
> +do
> +	for arg2 in -s -t -p
> +	do

This is quite the loop! I appreciate the extra thoroughness, although it
may come at some extra cost of intertwining all of these combinations of
tests together.

But that may be warranted, since they are all related. But it's not a
full matrix of all possible combinations; e.g., "--allow-unknown-type"
does not go with "-p".

So this may be the best that we can do. It's definitely a mouthful, but
I think it's overall an easier read than what we had in the previous
version. And it's definitely more thorough, which is good. Thanks for
spending the time improving this test.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
                                 ` (16 preceding siblings ...)
  2021-09-20 19:04               ` [PATCH v7 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18               ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                                   ` (18 more replies)
  17 siblings, 19 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.

v6 of this got a very detailed review from Taylor Blau (thanks a
lot!), for the v6 see:
https://lore.kernel.org/git/cover-v6-00.22-00000000000-20210907T104558Z-avarab@gmail.com/

The v7 had a couple of trivial shellscripting issues, a typo'd
test_oid variable, and a warning on a "test" comparison. For v7 see
https://lore.kernel.org/git/cover-v7-00.17-00000000000-20210920T190304Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: add test for fsck-ing an unknown type
  fsck tests: refactor one test to use a sub-repo
  fsck tests: test current hash/type mismatch behavior
  fsck tests: test for garbage appended to a loose object
  cat-file tests: move bogus_* variable declarations earlier
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: stop dying in parse_loose_header()
  fsck: don't hard die on invalid object types
  fsck: report invalid object type-path combinations

 builtin/fast-export.c |   2 +-
 builtin/fsck.c        |  28 +++++-
 builtin/index-pack.c  |   2 +-
 builtin/mktag.c       |   3 +-
 cache.h               |  45 ++++++++-
 object-file.c         | 176 +++++++++++++++------------------
 object-store.h        |   7 +-
 object.c              |   4 +-
 pack-check.c          |   3 +-
 streaming.c           |  27 +++--
 t/oid-info/oid        |   2 +
 t/t1006-cat-file.sh   | 223 +++++++++++++++++++++++++++++++++++++++---
 t/t1450-fsck.sh       |  99 +++++++++++++++----
 13 files changed, 463 insertions(+), 158 deletions(-)

Range-diff against v7:
 1:  752cef556c2 =  1:  b999ab695d9 fsck tests: add test for fsck-ing an unknown type
 2:  612003bdd2c =  2:  e01c21378a4 fsck tests: refactor one test to use a sub-repo
 3:  1e40a4235e9 =  3:  93197a7bcee fsck tests: test current hash/type mismatch behavior
 4:  854991c1543 =  4:  277188dd58d fsck tests: test for garbage appended to a loose object
 5:  fc93c2c2530 =  5:  ab2ea1beaaf cat-file tests: move bogus_* variable declarations earlier
 6:  051088aa114 !  6:  91229b94fac cat-file tests: test for missing/bogus object with -t, -s and -p
    @@ t/oid-info/oid: numeric		sha1:0123456789012345678901234567890123456789
      deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
      deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
     +deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
    -+deadbee_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
    ++deadbeef_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
     
      ## t/t1006-cat-file.sh ##
     @@ t/t1006-cat-file.sh: test_expect_success 'setup bogus data' '
    @@ t/t1006-cat-file.sh: test_expect_success 'setup bogus data' '
     +do
     +	for arg2 in -s -t -p
     +	do
    -+		if test $arg1 = "--allow-unknown-type" && test "$arg2" = "-p"
    ++		if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
     +		then
     +			continue
     +		fi
 7:  20bd81c1af0 =  7:  9e95e134d30 cat-file tests: add corrupt loose object test
 8:  cd1d52b8a07 =  8:  215f98ad369 cat-file tests: test for current --allow-unknown-type behavior
 9:  d9f5adfc74b =  9:  3e1df3594df object-file.c: don't set "typep" when returning non-zero
10:  51d14bc9274 = 10:  b96828f3d5b object-file.c: return -1, not "status" from unpack_loose_header()
11:  f43cfd8a5ed = 11:  273acb45517 object-file.c: make parse_loose_header_extended() public
12:  50d938f7f3c = 12:  314d34357dd object-file.c: simplify unpack_loose_short_header()
13:  755fde00b46 = 13:  07481bcb55c object-file.c: use "enum" return type for unpack_loose_header()
14:  522d71eb19d = 14:  42b8d135c8c object-file.c: return ULHR_TOO_LONG on "header too long"
15:  1ca875395c1 = 15:  106b7461ce9 object-file.c: stop dying in parse_loose_header()
16:  d38067feab3 = 16:  d01223ae322 fsck: don't hard die on invalid object types
17:  b07e892fc19 = 17:  7f394a991a6 fsck: report invalid object type-path combinations
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..969bfbbdd8f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	(
+		cd garbage-type &&
+
+		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
+		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+		cat >err.expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck >out 2>err &&
+		test_cmp err.expect err &&
+		test_must_be_empty out
+	)
+'
+
 test_done
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
                                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.

See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.

1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 969bfbbdd8f..f8edd15abf8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
 
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		test_must_fail git fsck 2>out &&
+		grep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
                                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f8edd15abf8..175ed304637 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		cat >expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck 2>actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (2 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
                                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 175ed304637..bd696d21dba 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
 	)
 '
 
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+	git init --bare corrupt-loose-output &&
+	(
+		cd corrupt-loose-output &&
+		oid=$(git hash-object -w --stdin --literally </dev/null) &&
+		oidf=objects/$(test_oid_to_path "$oid") &&
+		chmod 755 $oidf &&
+		echo extra garbage >>$oidf &&
+
+		cat >expect.error <<-EOF &&
+		error: garbage at end of loose object '\''$oid'\''
+		error: unable to unpack contents of ./$oidf
+		error: $oid: object corrupt or missing: ./$oidf
+		EOF
+		test_must_fail git fsck 2>actual &&
+		grep ^error: actual >error &&
+		test_cmp expect.error error
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (3 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
                                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+	bogus_short_type="bogus" &&
+	bogus_short_content="bogus" &&
+	bogus_short_size=$(strlen "$bogus_short_content") &&
+	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+	bogus_long_content="bogus" &&
+	bogus_long_size=$(strlen "$bogus_long_content") &&
+	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
 
 test_expect_success "Type of broken object is correct" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of broken object is correct" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
 test_expect_success "Type of broken object is correct when type is large" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of large broken object is correct when type is large" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (4 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().

The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.

The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.

Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.

This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/oid-info/oid      |  2 ++
 t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..7547d2c7903 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric		sha1:0123456789012345678901234567890123456789
 numeric		sha256:0123456789012345678901234567890123456789012345678901234567890123
 deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
 deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbeef_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..abf57339a29 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
 	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
 '
 
+for arg1 in '' --allow-unknown-type
+do
+	for arg2 in -s -t -p
+	do
+		if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+		then
+			continue
+		fi
+
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+			cat >expect <<-\EOF &&
+			fatal: invalid object type
+			EOF
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: Not a valid object name $bogus_long_sha1
+				EOF
+			else
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+			cat >expect.err <<-EOF &&
+			fatal: Not a valid object name $(test_oid deadbeef_short)
+			EOF
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+			test_must_be_empty out
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect.err <<-EOF
+				fatal: Not a valid object name $(test_oid deadbeef)
+				EOF
+			else
+				cat >expect.err <<-\EOF
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+			test_must_be_empty out &&
+			test_cmp expect.err err.actual
+		'
+	done
+done
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 07/17] cat-file tests: add corrupt loose object test
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (5 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index abf57339a29..15774979ad3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (6 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 15774979ad3..5b16c69c286 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
 	done
 done
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_short_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_short_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_short_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	test_when_finished "rm -rf .git/refs/replace" &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (7 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..766ba88b851 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1525,8 +1525,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (8 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 766ba88b851..8475b128944 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1284,7 +1284,7 @@ int unpack_loose_header(git_zstream *stream,
 					       buffer, bufsiz);
 
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/* Make sure we have the terminating NUL */
 	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (9 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.

1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       |  4 +++-
 object-file.c | 20 +++++++-------------
 streaming.c   |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index f6295f3b048..35f254dae4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,7 +1320,9 @@ char *xdg_cache_home(const char *filename);
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
 int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 8475b128944..6b91c4edcf6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1385,8 +1385,8 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1446,14 +1446,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1508,10 +1500,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2599,6 +2591,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2613,7 +2607,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header()
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (10 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
                                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 17 ++++++++++++++-
 object-file.c | 58 ++++++++++++++++++---------------------------------
 streaming.c   |  3 ++-
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 35f254dae4a..d7189aed8fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1319,7 +1319,22 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 6b91c4edcf6..1327872cbf4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,11 +1255,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1270,35 +1271,8 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
-	if (status < Z_OK)
-		return -1;
-
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
 	if (status < Z_OK)
 		return -1;
 
@@ -1308,6 +1282,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1457,6 +1439,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1490,11 +1473,9 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				allow_unknown ? &hdrbuf : NULL) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	if (status < 0)
@@ -2602,7 +2583,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (11 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
                                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.

Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 19 +++++++++++++++----
 object-file.c | 34 +++++++++++++++++++++-------------
 streaming.c   | 23 +++++++++++++----------
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index d7189aed8fc..7239e20a625 100644
--- a/cache.h
+++ b/cache.h
@@ -1324,7 +1324,10 @@ int git_open_cloexec(const char *name, int flags);
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1332,9 +1335,17 @@ int git_open_cloexec(const char *name, int flags);
  * reporting. The full header will be extracted to "hdrbuf" for use
  * with parse_loose_header().
  */
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
-			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+	ULHR_OK,
+	ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *hdrbuf);
+
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 1327872cbf4..e0f508415dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,10 +1255,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz,
-			struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *header)
 {
 	int status;
 
@@ -1274,13 +1276,13 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return 0;
+		return ULHR_OK;
 
 	/*
 	 * We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1288,7 +1290,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1309,7 +1311,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return ULHR_BAD;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1474,13 +1476,19 @@ static int loose_object_info(struct repository *r,
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				allow_unknown ? &hdrbuf : NULL) < 0)
+	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				    allow_unknown ? &hdrbuf : NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+		break;
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
-	if ((unpack_loose_header(&st->z,
-				 st->u.loose.mapped,
-				 st->u.loose.mapsize,
-				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr),
-				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
-		git_inflate_end(&st->z);
-		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-		return -1;
+	switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+				    st->u.loose.mapsize, st->u.loose.hdr,
+				    sizeof(st->u.loose.hdr), NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
+		goto error;
 	}
+	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->read = read_istream_loose;
 
 	return 0;
+error:
+	git_inflate_end(&st->z);
+	munmap(st->u.loose.mapped, st->u.loose.mapsize);
+	return -1;
 }
 
 
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (12 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
                                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h             | 5 ++++-
 object-file.c       | 8 ++++++--
 streaming.c         | 1 +
 t/t1006-cat-file.sh | 4 ++--
 4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 7239e20a625..8e05392fda8 100644
--- a/cache.h
+++ b/cache.h
@@ -1328,16 +1328,19 @@ int git_open_cloexec(const char *name, int flags);
  *
  * - ULHR_OK on success
  * - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
  */
 enum unpack_loose_header_result {
 	ULHR_OK,
 	ULHR_BAD,
+	ULHR_TOO_LONG,
 };
 enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned char *map,
diff --git a/object-file.c b/object-file.c
index e0f508415dd..3589c5a2e33 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1290,7 +1290,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return ULHR_BAD;
+		return ULHR_TOO_LONG;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1311,7 +1311,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return ULHR_BAD;
+	return ULHR_TOO_LONG;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1484,6 +1484,10 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 		break;
+	case ULHR_TOO_LONG:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
 	}
 
 	if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_OK:
 		break;
 	case ULHR_BAD:
+	case ULHR_TOO_LONG:
 		goto error;
 	}
 	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5b16c69c286..a5e7401af8b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
 			if test "$arg2" = "-p"
 			then
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: Not a valid object name $bogus_long_sha1
 				EOF
 			else
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: git cat-file: could not get object info
 				EOF
 			fi &&
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header()
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (13 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make parse_loose_header() return error codes and data instead of
invoking die() by itself.

For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.

For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".

The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.

We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 11 +++++++--
 object-file.c | 67 +++++++++++++++++++++++++--------------------------
 streaming.c   |  3 ++-
 3 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/cache.h b/cache.h
index 8e05392fda8..6c5f00c82d5 100644
--- a/cache.h
+++ b/cache.h
@@ -1349,9 +1349,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned long bufsiz,
 						    struct strbuf *hdrbuf);
 
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
 struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 3589c5a2e33..a70669700d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1369,8 +1369,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1392,15 +1391,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1427,7 +1417,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1441,6 +1438,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1472,6 +1470,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1479,6 +1479,18 @@ static int loose_object_info(struct repository *r,
 	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				    allow_unknown ? &hdrbuf : NULL)) {
 	case ULHR_OK:
+		if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
+		else if (!allow_unknown && *oi->typep < 0)
+			die(_("invalid object type"));
+
+		if (!oi->contentp)
+			break;
+		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+		if (*oi->contentp)
+			goto cleanup;
+
+		status = -1;
 		break;
 	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
@@ -1490,31 +1502,16 @@ static int loose_object_info(struct repository *r,
 		break;
 	}
 
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
-	if (status >= 0 && oi->contentp) {
-		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
-		if (!*oi->contentp) {
-			git_inflate_end(&stream);
-			status = -1;
-		}
-	} else
-		git_inflate_end(&stream);
-
+	git_inflate_end(&stream);
+cleanup:
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
-	return (status < 0) ? status : 0;
+	return status;
 }
 
 int obj_read_use_lock = 0;
@@ -2585,6 +2582,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2601,12 +2599,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_TOO_LONG:
 		goto error;
 	}
-	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
 		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 16/17] fsck: don't hard die on invalid object types
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (14 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-28  2:18                 ` [PATCH v8 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
                                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 17 ++++++++++++++---
 object-file.c   | 18 ++++++------------
 object-store.h  |  6 +++---
 t/t1450-fsck.sh | 17 +++++++++--------
 4 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..3b046820750 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,11 +600,22 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
+	struct strbuf sb = STRBUF_INIT;
+	struct object_info oi = OBJECT_INFO_INIT;
+	int err = 0;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi) < 0)
+		err = error(_("%s: object corrupt or missing: %s"),
+			    oid_to_hex(oid), path);
+	if (type < 0)
+		err = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(oid), sb.buf, path);
+	if (err) {
 		errors_found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index a70669700d0..fe95285f405 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2572,18 +2572,15 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      struct object_info *oi)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.typep = type;
-	oi.sizep = size;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2599,15 +2596,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
 
-	if (*type == OBJ_BLOB && *size > big_file_threshold) {
+	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
 			goto out;
 	} else {
@@ -2618,8 +2613,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index c5130d8baea..c90c41a07f7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,6 +245,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -252,9 +253,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      struct object_info *oi);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bd696d21dba..167c319823a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 
-		cat >expect <<-\EOF &&
-		fatal: invalid object type
-		EOF
-		test_must_fail git fsck 2>actual &&
-		test_cmp expect actual
+
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
@@ -910,7 +909,7 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	(
 		cd garbage-type &&
@@ -922,8 +921,10 @@ test_expect_success 'fsck hard errors on an invalid object type' '
 		fatal: invalid object type
 		EOF
 		test_must_fail git fsck >out 2>err &&
-		test_cmp err.expect err &&
-		test_must_be_empty out
+		grep -e "^error" -e "^fatal" err >errors &&
+		test_line_count = 1 errors &&
+		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+		grep "dangling blob $empty_blob" out
 	)
 '
 
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v8 17/17] fsck: report invalid object type-path combinations
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (15 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-28  2:18                 ` Ævar Arnfjörð Bjarmason
  2021-09-29 19:50                 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28  2:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 23 +++++++++++++++--------
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 cache.h               |  3 ++-
 object-file.c         | 21 ++++++++++-----------
 object-store.h        |  1 +
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 11 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3b046820750..d925cdbae5c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct object *obj;
 	enum object_type type;
 	unsigned long size;
-	void *contents;
+	unsigned char *contents = NULL;
 	int eaten;
 	struct strbuf sb = STRBUF_INIT;
 	struct object_info oi = OBJECT_INFO_INIT;
-	int err = 0;
+	struct object_id real_oid = *null_oid();
+	int ret;
 
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi) < 0)
-		err = error(_("%s: object corrupt or missing: %s"),
-			    oid_to_hex(oid), path);
+	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
+	if (ret < 0) {
+		if (contents && !oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
+	}
 	if (type < 0)
-		err = error(_("%s: object is of unknown type '%s': %s"),
-			    oid_to_hex(oid), sb.buf, path);
-	if (err) {
+		ret = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(&real_oid), sb.buf, path);
+	if (ret < 0) {
 		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ce69c087ec..15ae406e6b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/cache.h b/cache.h
index 6c5f00c82d5..e2a203073ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,7 +1361,8 @@ struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 
 int finalize_object_file(const char *tmpfile, const char *filename);
 
diff --git a/object-file.c b/object-file.c
index fe95285f405..49561e31551 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi)
 {
@@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index c90c41a07f7..17b072e5a19 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,6 +253,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi);
 
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a5e7401af8b..0f52ca9cc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 167c319823a..eb0e772f098 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
 		cd hash-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
 		git update-ref refs/heads/bogus $cmt &&
 
 		test_must_fail git fsck 2>out &&
-		grep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
 		cd hash-type-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
 
 
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.33.0.1327.g9926af6cb02


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (16 preceding siblings ...)
  2021-09-28  2:18                 ` [PATCH v8 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-29 19:50                 ` Taylor Blau
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
  18 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-29 19:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 28, 2021 at 04:18:41AM +0200, Ævar Arnfjörð Bjarmason wrote:
> This improves fsck error reporting, see the examples in the commit
> messages of 16/17 and 17/17. To get there I've lib-ified more things
> in object-file.c and the general object APIs, i.e. now we'll return
> error codes instead of calling die() in these cases.
>
> v6 of this got a very detailed review from Taylor Blau (thanks a
> lot!), for the v6 see:
> https://lore.kernel.org/git/cover-v6-00.22-00000000000-20210907T104558Z-avarab@gmail.com/
>
> The v7 had a couple of trivial shellscripting issues, a typo'd
> test_oid variable, and a warning on a "test" comparison. For v7 see
> https://lore.kernel.org/git/cover-v7-00.17-00000000000-20210920T190304Z-avarab@gmail.com/

Thanks; I looked at the range-diff and it addresses both of my comments.

This series looks good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                                   ` (17 preceding siblings ...)
  2021-09-29 19:50                 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
@ 2021-09-30 13:37                 ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                                     ` (18 more replies)
  18 siblings, 19 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.

Status of this: Since v6 this series has been getting a thorough
review from Taylor Blau, thanks again Taylor! See [1] for the v8, [2]
for Taylor's ack on the [2], and [3] for my own status update on the
last What's Cooking regarding the v8.

The only change since v8 is the plugging of a memory leak introduced
in the previous 16/17. I've been doing integration of my local pending
patches using some follow-up work for the in-flight
ab/sanitize-leak-ci topic, which is already proving quite useful.

1. https://lore.kernel.org/git/cover-v8-00.17-00000000000-20210928T021616Z-avarab@gmail.com/
2. https://lore.kernel.org/git/YVTDgJ7wFl9DCjS+@nand.local/
3. https://lore.kernel.org/git/87czotzaru.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: add test for fsck-ing an unknown type
  fsck tests: refactor one test to use a sub-repo
  fsck tests: test current hash/type mismatch behavior
  fsck tests: test for garbage appended to a loose object
  cat-file tests: move bogus_* variable declarations earlier
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: stop dying in parse_loose_header()
  fsck: don't hard die on invalid object types
  fsck: report invalid object type-path combinations

 builtin/fast-export.c |   2 +-
 builtin/fsck.c        |  37 +++++--
 builtin/index-pack.c  |   2 +-
 builtin/mktag.c       |   3 +-
 cache.h               |  45 ++++++++-
 object-file.c         | 176 +++++++++++++++------------------
 object-store.h        |   7 +-
 object.c              |   4 +-
 pack-check.c          |   3 +-
 streaming.c           |  27 +++--
 t/oid-info/oid        |   2 +
 t/t1006-cat-file.sh   | 223 +++++++++++++++++++++++++++++++++++++++---
 t/t1450-fsck.sh       |  99 +++++++++++++++----
 13 files changed, 468 insertions(+), 162 deletions(-)

Range-diff against v8:
 1:  b999ab695d9 =  1:  520732612f7 fsck tests: add test for fsck-ing an unknown type
 2:  e01c21378a4 =  2:  af7086623fe fsck tests: refactor one test to use a sub-repo
 3:  93197a7bcee =  3:  102bc4f0176 fsck tests: test current hash/type mismatch behavior
 4:  277188dd58d =  4:  ff7fc09d5a1 fsck tests: test for garbage appended to a loose object
 5:  ab2ea1beaaf =  5:  278df093239 cat-file tests: move bogus_* variable declarations earlier
 6:  91229b94fac =  6:  290bf983590 cat-file tests: test for missing/bogus object with -t, -s and -p
 7:  9e95e134d30 =  7:  a41b2c571e5 cat-file tests: add corrupt loose object test
 8:  215f98ad369 =  8:  cedeb117330 cat-file tests: test for current --allow-unknown-type behavior
 9:  3e1df3594df =  9:  6f0673d38c8 object-file.c: don't set "typep" when returning non-zero
10:  b96828f3d5b = 10:  6637e8fd2ca object-file.c: return -1, not "status" from unpack_loose_header()
11:  273acb45517 = 11:  51db08ebbae object-file.c: make parse_loose_header_extended() public
12:  314d34357dd = 12:  dffe5581f6f object-file.c: simplify unpack_loose_short_header()
13:  07481bcb55c = 13:  eb7c949c8b7 object-file.c: use "enum" return type for unpack_loose_header()
14:  42b8d135c8c = 14:  f4cc7271df7 object-file.c: return ULHR_TOO_LONG on "header too long"
15:  106b7461ce9 = 15:  25d6ec668d4 object-file.c: stop dying in parse_loose_header()
16:  d01223ae322 ! 16:  6ce0414b2b7 fsck: don't hard die on invalid object types
    @@ Commit message
         f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
         for the introduction of read_loose_object().
     
    +    Since we're now passing in a "oi.type_name" we'll have to clean up the
    +    allocated "strbuf sb". That we're doing it right is asserted by
    +    e.g. the "fsck notices broken commit" test added in 03818a4a94c
    +    (split_ident: parse timestamp from end of line, 2013-10-14). To do
    +    that switch to a "goto cleanup" pattern, and while we're at it factor
    +    out the already duplicated free(content) to use that pattern.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      		errors_found |= ERROR_OBJECT;
     -		error(_("%s: object corrupt or missing: %s"),
     -		      oid_to_hex(oid), path);
    - 		return 0; /* keep checking other objects */
    +-		return 0; /* keep checking other objects */
    ++		goto cleanup;
    + 	}
    + 
    + 	if (!contents && type != OBJ_BLOB)
    +@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    + 		errors_found |= ERROR_OBJECT;
    + 		error(_("%s: object could not be parsed: %s"),
    + 		      oid_to_hex(oid), path);
    +-		if (!eaten)
    +-			free(contents);
    +-		return 0; /* keep checking other objects */
    ++		goto cleanup_eaten;
      	}
      
    + 	obj->flags &= ~(REACHABLE | SEEN);
    +@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    + 	if (fsck_obj(obj, contents, size))
    + 		errors_found |= ERROR_OBJECT;
    + 
    ++cleanup_eaten:
    + 	if (!eaten)
    + 		free(contents);
    ++cleanup:
    ++	strbuf_release(&sb);
    + 	return 0; /* keep checking other objects, even if we saw an error */
    + }
    + 
     
      ## object-file.c ##
     @@ object-file.c: static int check_stream_oid(git_zstream *stream,
17:  7f394a991a6 ! 17:  8d926e41fc3 fsck: report invalid object type-path combinations
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
     +			    oid_to_hex(&real_oid), sb.buf, path);
     +	if (ret < 0) {
      		errors_found |= ERROR_OBJECT;
    - 		return 0; /* keep checking other objects */
    + 		goto cleanup;
      	}
     
      ## builtin/index-pack.c ##
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 19:22                     ` Andrei Rybak
  2021-09-30 13:37                   ` [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                                     ` (17 subsequent siblings)
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..969bfbbdd8f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	(
+		cd garbage-type &&
+
+		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
+		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+		cat >err.expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck >out 2>err &&
+		test_cmp err.expect err &&
+		test_must_be_empty out
+	)
+'
+
 test_done
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
                                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.

See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.

1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 969bfbbdd8f..f8edd15abf8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
 
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		test_must_fail git fsck 2>out &&
+		grep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
                                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f8edd15abf8..175ed304637 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		cat >expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck 2>actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (2 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
                                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 175ed304637..bd696d21dba 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
 	)
 '
 
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+	git init --bare corrupt-loose-output &&
+	(
+		cd corrupt-loose-output &&
+		oid=$(git hash-object -w --stdin --literally </dev/null) &&
+		oidf=objects/$(test_oid_to_path "$oid") &&
+		chmod 755 $oidf &&
+		echo extra garbage >>$oidf &&
+
+		cat >expect.error <<-EOF &&
+		error: garbage at end of loose object '\''$oid'\''
+		error: unable to unpack contents of ./$oidf
+		error: $oid: object corrupt or missing: ./$oidf
+		EOF
+		test_must_fail git fsck 2>actual &&
+		grep ^error: actual >error &&
+		test_cmp expect.error error
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (3 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
                                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+	bogus_short_type="bogus" &&
+	bogus_short_content="bogus" &&
+	bogus_short_size=$(strlen "$bogus_short_content") &&
+	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+	bogus_long_content="bogus" &&
+	bogus_long_size=$(strlen "$bogus_long_content") &&
+	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
 
 test_expect_success "Type of broken object is correct" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of broken object is correct" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
 test_expect_success "Type of broken object is correct when type is large" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of large broken object is correct when type is large" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (4 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().

The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.

The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.

Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.

This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/oid-info/oid      |  2 ++
 t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..7547d2c7903 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric		sha1:0123456789012345678901234567890123456789
 numeric		sha256:0123456789012345678901234567890123456789012345678901234567890123
 deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
 deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbeef_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..abf57339a29 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
 	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
 '
 
+for arg1 in '' --allow-unknown-type
+do
+	for arg2 in -s -t -p
+	do
+		if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+		then
+			continue
+		fi
+
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+			cat >expect <<-\EOF &&
+			fatal: invalid object type
+			EOF
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: Not a valid object name $bogus_long_sha1
+				EOF
+			else
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+			cat >expect.err <<-EOF &&
+			fatal: Not a valid object name $(test_oid deadbeef_short)
+			EOF
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+			test_must_be_empty out
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect.err <<-EOF
+				fatal: Not a valid object name $(test_oid deadbeef)
+				EOF
+			else
+				cat >expect.err <<-\EOF
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+			test_must_be_empty out &&
+			test_cmp expect.err err.actual
+		'
+	done
+done
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 07/17] cat-file tests: add corrupt loose object test
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (5 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index abf57339a29..15774979ad3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (6 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                                     ` (10 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 15774979ad3..5b16c69c286 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
 	done
 done
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_short_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_short_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_short_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	test_when_finished "rm -rf .git/refs/replace" &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (7 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..766ba88b851 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1525,8 +1525,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (8 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 766ba88b851..8475b128944 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1284,7 +1284,7 @@ int unpack_loose_header(git_zstream *stream,
 					       buffer, bufsiz);
 
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/* Make sure we have the terminating NUL */
 	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (9 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.

1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       |  4 +++-
 object-file.c | 20 +++++++-------------
 streaming.c   |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index f6295f3b048..35f254dae4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,7 +1320,9 @@ char *xdg_cache_home(const char *filename);
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
 int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 8475b128944..6b91c4edcf6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1385,8 +1385,8 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1446,14 +1446,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1508,10 +1500,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2599,6 +2591,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2613,7 +2607,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header()
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (10 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
                                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 17 ++++++++++++++-
 object-file.c | 58 ++++++++++++++++++---------------------------------
 streaming.c   |  3 ++-
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 35f254dae4a..d7189aed8fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1319,7 +1319,22 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 6b91c4edcf6..1327872cbf4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,11 +1255,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1270,35 +1271,8 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
-	if (status < Z_OK)
-		return -1;
-
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
 	if (status < Z_OK)
 		return -1;
 
@@ -1308,6 +1282,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1457,6 +1439,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1490,11 +1473,9 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				allow_unknown ? &hdrbuf : NULL) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	if (status < 0)
@@ -2602,7 +2583,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (11 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
                                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.

Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 19 +++++++++++++++----
 object-file.c | 34 +++++++++++++++++++++-------------
 streaming.c   | 23 +++++++++++++----------
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index d7189aed8fc..7239e20a625 100644
--- a/cache.h
+++ b/cache.h
@@ -1324,7 +1324,10 @@ int git_open_cloexec(const char *name, int flags);
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1332,9 +1335,17 @@ int git_open_cloexec(const char *name, int flags);
  * reporting. The full header will be extracted to "hdrbuf" for use
  * with parse_loose_header().
  */
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
-			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+	ULHR_OK,
+	ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *hdrbuf);
+
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 1327872cbf4..e0f508415dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,10 +1255,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz,
-			struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *header)
 {
 	int status;
 
@@ -1274,13 +1276,13 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return 0;
+		return ULHR_OK;
 
 	/*
 	 * We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1288,7 +1290,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1309,7 +1311,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return ULHR_BAD;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1474,13 +1476,19 @@ static int loose_object_info(struct repository *r,
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				allow_unknown ? &hdrbuf : NULL) < 0)
+	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				    allow_unknown ? &hdrbuf : NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+		break;
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
-	if ((unpack_loose_header(&st->z,
-				 st->u.loose.mapped,
-				 st->u.loose.mapsize,
-				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr),
-				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
-		git_inflate_end(&st->z);
-		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-		return -1;
+	switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+				    st->u.loose.mapsize, st->u.loose.hdr,
+				    sizeof(st->u.loose.hdr), NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
+		goto error;
 	}
+	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->read = read_istream_loose;
 
 	return 0;
+error:
+	git_inflate_end(&st->z);
+	munmap(st->u.loose.mapped, st->u.loose.mapsize);
+	return -1;
 }
 
 
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (12 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
                                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h             | 5 ++++-
 object-file.c       | 8 ++++++--
 streaming.c         | 1 +
 t/t1006-cat-file.sh | 4 ++--
 4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 7239e20a625..8e05392fda8 100644
--- a/cache.h
+++ b/cache.h
@@ -1328,16 +1328,19 @@ int git_open_cloexec(const char *name, int flags);
  *
  * - ULHR_OK on success
  * - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
  */
 enum unpack_loose_header_result {
 	ULHR_OK,
 	ULHR_BAD,
+	ULHR_TOO_LONG,
 };
 enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned char *map,
diff --git a/object-file.c b/object-file.c
index e0f508415dd..3589c5a2e33 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1290,7 +1290,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return ULHR_BAD;
+		return ULHR_TOO_LONG;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1311,7 +1311,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return ULHR_BAD;
+	return ULHR_TOO_LONG;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1484,6 +1484,10 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 		break;
+	case ULHR_TOO_LONG:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
 	}
 
 	if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_OK:
 		break;
 	case ULHR_BAD:
+	case ULHR_TOO_LONG:
 		goto error;
 	}
 	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5b16c69c286..a5e7401af8b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
 			if test "$arg2" = "-p"
 			then
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: Not a valid object name $bogus_long_sha1
 				EOF
 			else
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: git cat-file: could not get object info
 				EOF
 			fi &&
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header()
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (13 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make parse_loose_header() return error codes and data instead of
invoking die() by itself.

For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.

For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".

The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.

We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 11 +++++++--
 object-file.c | 67 +++++++++++++++++++++++++--------------------------
 streaming.c   |  3 ++-
 3 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/cache.h b/cache.h
index 8e05392fda8..6c5f00c82d5 100644
--- a/cache.h
+++ b/cache.h
@@ -1349,9 +1349,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned long bufsiz,
 						    struct strbuf *hdrbuf);
 
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
 struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 3589c5a2e33..a70669700d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1369,8 +1369,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1392,15 +1391,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1427,7 +1417,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1441,6 +1438,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1472,6 +1470,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1479,6 +1479,18 @@ static int loose_object_info(struct repository *r,
 	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				    allow_unknown ? &hdrbuf : NULL)) {
 	case ULHR_OK:
+		if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
+		else if (!allow_unknown && *oi->typep < 0)
+			die(_("invalid object type"));
+
+		if (!oi->contentp)
+			break;
+		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+		if (*oi->contentp)
+			goto cleanup;
+
+		status = -1;
 		break;
 	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
@@ -1490,31 +1502,16 @@ static int loose_object_info(struct repository *r,
 		break;
 	}
 
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
-	if (status >= 0 && oi->contentp) {
-		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
-		if (!*oi->contentp) {
-			git_inflate_end(&stream);
-			status = -1;
-		}
-	} else
-		git_inflate_end(&stream);
-
+	git_inflate_end(&stream);
+cleanup:
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
-	return (status < 0) ? status : 0;
+	return status;
 }
 
 int obj_read_use_lock = 0;
@@ -2585,6 +2582,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2601,12 +2599,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_TOO_LONG:
 		goto error;
 	}
-	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
 		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 16/17] fsck: don't hard die on invalid object types
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (14 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 13:37                   ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
                                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().

Since we're now passing in a "oi.type_name" we'll have to clean up the
allocated "strbuf sb". That we're doing it right is asserted by
e.g. the "fsck notices broken commit" test added in 03818a4a94c
(split_ident: parse timestamp from end of line, 2013-10-14). To do
that switch to a "goto cleanup" pattern, and while we're at it factor
out the already duplicated free(content) to use that pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 26 +++++++++++++++++++-------
 object-file.c   | 18 ++++++------------
 object-store.h  |  6 +++---
 t/t1450-fsck.sh | 17 +++++++++--------
 4 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..623f8fc3194 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,23 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
+	struct strbuf sb = STRBUF_INIT;
+	struct object_info oi = OBJECT_INFO_INIT;
+	int err = 0;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi) < 0)
+		err = error(_("%s: object corrupt or missing: %s"),
+			    oid_to_hex(oid), path);
+	if (type < 0)
+		err = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(oid), sb.buf, path);
+	if (err) {
 		errors_found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
-		return 0; /* keep checking other objects */
+		goto cleanup;
 	}
 
 	if (!contents && type != OBJ_BLOB)
@@ -618,9 +629,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object could not be parsed: %s"),
 		      oid_to_hex(oid), path);
-		if (!eaten)
-			free(contents);
-		return 0; /* keep checking other objects */
+		goto cleanup_eaten;
 	}
 
 	obj->flags &= ~(REACHABLE | SEEN);
@@ -628,8 +637,11 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	if (fsck_obj(obj, contents, size))
 		errors_found |= ERROR_OBJECT;
 
+cleanup_eaten:
 	if (!eaten)
 		free(contents);
+cleanup:
+	strbuf_release(&sb);
 	return 0; /* keep checking other objects, even if we saw an error */
 }
 
diff --git a/object-file.c b/object-file.c
index a70669700d0..fe95285f405 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2572,18 +2572,15 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      struct object_info *oi)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.typep = type;
-	oi.sizep = size;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2599,15 +2596,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
 
-	if (*type == OBJ_BLOB && *size > big_file_threshold) {
+	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
 			goto out;
 	} else {
@@ -2618,8 +2613,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index c5130d8baea..c90c41a07f7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,6 +245,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -252,9 +253,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      struct object_info *oi);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bd696d21dba..167c319823a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 
-		cat >expect <<-\EOF &&
-		fatal: invalid object type
-		EOF
-		test_must_fail git fsck 2>actual &&
-		test_cmp expect actual
+
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
@@ -910,7 +909,7 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	(
 		cd garbage-type &&
@@ -922,8 +921,10 @@ test_expect_success 'fsck hard errors on an invalid object type' '
 		fatal: invalid object type
 		EOF
 		test_must_fail git fsck >out 2>err &&
-		test_cmp err.expect err &&
-		test_must_be_empty out
+		grep -e "^error" -e "^fatal" err >errors &&
+		test_line_count = 1 errors &&
+		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+		grep "dangling blob $empty_blob" out
 	)
 '
 
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v9 17/17] fsck: report invalid object type-path combinations
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (15 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37                   ` Ævar Arnfjörð Bjarmason
  2021-09-30 21:01                     ` Junio C Hamano
  2021-09-30 19:06                   ` [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
  18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 23 +++++++++++++++--------
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 cache.h               |  3 ++-
 object-file.c         | 21 ++++++++++-----------
 object-store.h        |  1 +
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 11 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 623f8fc3194..980c26e3b25 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct object *obj;
 	enum object_type type;
 	unsigned long size;
-	void *contents;
+	unsigned char *contents = NULL;
 	int eaten;
 	struct strbuf sb = STRBUF_INIT;
 	struct object_info oi = OBJECT_INFO_INIT;
-	int err = 0;
+	struct object_id real_oid = *null_oid();
+	int ret;
 
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi) < 0)
-		err = error(_("%s: object corrupt or missing: %s"),
-			    oid_to_hex(oid), path);
+	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
+	if (ret < 0) {
+		if (contents && !oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
+	}
 	if (type < 0)
-		err = error(_("%s: object is of unknown type '%s': %s"),
-			    oid_to_hex(oid), sb.buf, path);
-	if (err) {
+		ret = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(&real_oid), sb.buf, path);
+	if (ret < 0) {
 		errors_found |= ERROR_OBJECT;
 		goto cleanup;
 	}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ce69c087ec..15ae406e6b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/cache.h b/cache.h
index 6c5f00c82d5..e2a203073ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,7 +1361,8 @@ struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 
 int finalize_object_file(const char *tmpfile, const char *filename);
 
diff --git a/object-file.c b/object-file.c
index fe95285f405..49561e31551 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi)
 {
@@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index c90c41a07f7..17b072e5a19 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,6 +253,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi);
 
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a5e7401af8b..0f52ca9cc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 167c319823a..eb0e772f098 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
 		cd hash-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
 		git update-ref refs/heads/bogus $cmt &&
 
 		test_must_fail git fsck 2>out &&
-		grep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
 		cd hash-type-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
 
 
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.33.0.1374.g05459a61530


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (16 preceding siblings ...)
  2021-09-30 13:37                   ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-30 19:06                   ` Taylor Blau
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
  18 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-30 19:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau

On Thu, Sep 30, 2021 at 03:37:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> The only change since v8 is the plugging of a memory leak introduced
> in the previous 16/17. I've been doing integration of my local pending
> patches using some follow-up work for the in-flight
> ab/sanitize-leak-ci topic, which is already proving quite useful.

Good catch, sorry that I missed it myself when reading the previous
version. The way you plugged the leak is sensible to me, so I'm happy
with this version, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type
  2021-09-30 13:37                   ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-30 19:22                     ` Andrei Rybak
  2021-10-01  9:05                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 245+ messages in thread
From: Andrei Rybak @ 2021-09-30 19:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Taylor Blau

On 30/09/2021 15:37, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the fsck tests by checking what we do when we
> encounter an unknown "garbage" type produced with hash-object's
> --literally option.
> 
> This behavior needs to be improved, which'll be done in subsequent
> patches, but for now let's test for the current behavior.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>   t/t1450-fsck.sh | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 5071ac63a5b..969bfbbdd8f 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
>   	test_i18ngrep "bad index file" errors
>   '
>   
> +test_expect_success 'fsck hard errors on an invalid object type' '
> +	git init --bare garbage-type &&
> +	(
> +		cd garbage-type &&
> +
> +		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
> +		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&

Patch 01/17 introduces two unused variables: "garbage" and "empty".
However, patch 16/17 introduces grep checks for "garbage_blob" and
"empty_blob". Aside from that, 't/test-lib.sh' already defines
$EMPTY_BLOB.

> +
> +		cat >err.expect <<-\EOF &&
> +		fatal: invalid object type
> +		EOF
> +		test_must_fail git fsck >out 2>err &&
> +		test_cmp err.expect err &&
> +		test_must_be_empty out
> +	)
> +'
> +
>   test_done
> 


^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v9 17/17] fsck: report invalid object type-path combinations
  2021-09-30 13:37                   ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-30 21:01                     ` Junio C Hamano
  0 siblings, 0 replies; 245+ messages in thread
From: Junio C Hamano @ 2021-09-30 21:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 623f8fc3194..980c26e3b25 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	struct object *obj;
>  	enum object_type type;
>  	unsigned long size;
> -	void *contents;
> +	unsigned char *contents = NULL;
>  	int eaten;
>  	struct strbuf sb = STRBUF_INIT;
>  	struct object_info oi = OBJECT_INFO_INIT;
> -	int err = 0;
> +	struct object_id real_oid = *null_oid();
> +	int ret;
>  
>  	oi.type_name = &sb;
>  	oi.sizep = &size;
>  	oi.typep = &type;
>  
> -	if (read_loose_object(path, oid, &contents, &oi) < 0)
> -		err = error(_("%s: object corrupt or missing: %s"),
> -			    oid_to_hex(oid), path);
> +	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
> +	if (ret < 0) {
> +		if (contents && !oideq(&real_oid, oid))
> +			error(_("%s: hash-path mismatch, found at: %s"),
> +			      oid_to_hex(&real_oid), path);
> +		else
> +			error(_("%s: object corrupt or missing: %s"),
> +			      oid_to_hex(oid), path);
> +	}
>  	if (type < 0)
> -		err = error(_("%s: object is of unknown type '%s': %s"),
> -			    oid_to_hex(oid), sb.buf, path);
> -	if (err) {
> +		ret = error(_("%s: object is of unknown type '%s': %s"),
> +			    oid_to_hex(&real_oid), sb.buf, path);
> +	if (ret < 0) {
>  		errors_found |= ERROR_OBJECT;
>  		goto cleanup;
>  	}

This is immediately touching up what 16/17 has introduced, which is
making it a bit harder to follow than necessary, so let's take the
whole postimage of 16+17.

> static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> {
> 	struct object *obj;
> 	enum object_type type;
> 	unsigned long size;
> 	unsigned char *contents = NULL;
> 	int eaten;
> 	struct strbuf sb = STRBUF_INIT;
> 	struct object_info oi = OBJECT_INFO_INIT;
> 	struct object_id real_oid = *null_oid();
> 	int ret;
> 
> 	oi.type_name = &sb;
> 	oi.sizep = &size;
> 	oi.typep = &type;
> 
> 	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
> 	if (ret < 0) {
> 		if (contents && !oideq(&real_oid, oid))
> 			error(_("%s: hash-path mismatch, found at: %s"),
> 			      oid_to_hex(&real_oid), path);
> 		else
> 			error(_("%s: object corrupt or missing: %s"),
> 			      oid_to_hex(oid), path);

We can emit an error() message from either one of these.  contents
may or may not be NULL, ret is negative, and we continue.  Do we
know anything about the value of type at this point?  IOW, will we
get into the body of the next "if (type < 0)" statement to overwrite
ret with -1?

> 	}
> 	if (type < 0)
> 		ret = error(_("%s: object is of unknown type '%s': %s"),
> 			    oid_to_hex(&real_oid), sb.buf, path);
> 	if (ret < 0) {
> 		errors_found |= ERROR_OBJECT;
> 		goto cleanup;

In any case, we'd jump to clean-up if any of the above hold, so we'd
avoid hittign the next BUG().

> 	}
> 
> 	if (!contents && type != OBJ_BLOB)
> 		BUG("read_loose_object streamed a non-blob");
> 
> 	obj = parse_object_buffer(the_repository, oid, type, size,
> 				  contents, &eaten);
> 
> 	if (!obj) {
> 		errors_found |= ERROR_OBJECT;
> 		error(_("%s: object could not be parsed: %s"),
> 		      oid_to_hex(oid), path);
> 		goto cleanup_eaten;
> 	}
> 
> 	obj->flags &= ~(REACHABLE | SEEN);
> 	obj->flags |= HAS_OBJ;
> 	if (fsck_obj(obj, contents, size))
> 		errors_found |= ERROR_OBJECT;
> 
> cleanup_eaten:
> 	if (!eaten)
> 		free(contents);
> cleanup:
> 	strbuf_release(&sb);

In the "goto cleanup" error case above, we haven't done anything
that would have caused the object contents eaten, and contents may
either point at an allocated memory or NULL (in the "hash-path
mismatch" case, we may have contents allocated but nobody has freed
it yet, leaking it).

I am wondering if we initialized "eaten" to false, we can get rid of
one of the two labels we added in this series, which would fix this
leak as well, no?

> 	return 0; /* keep checking other objects, even if we saw an error */
> }

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type
  2021-09-30 19:22                     ` Andrei Rybak
@ 2021-10-01  9:05                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:05 UTC (permalink / raw)
  To: Andrei Rybak; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Taylor Blau


On Thu, Sep 30 2021, Andrei Rybak wrote:

> On 30/09/2021 15:37, Ævar Arnfjörð Bjarmason wrote:
>> Fix a blindspot in the fsck tests by checking what we do when we
>> encounter an unknown "garbage" type produced with hash-object's
>> --literally option.
>> This behavior needs to be improved, which'll be done in subsequent
>> patches, but for now let's test for the current behavior.
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>   t/t1450-fsck.sh | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 5071ac63a5b..969bfbbdd8f 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
>>   	test_i18ngrep "bad index file" errors
>>   '
>>   +test_expect_success 'fsck hard errors on an invalid object type'
>> '
>> +	git init --bare garbage-type &&
>> +	(
>> +		cd garbage-type &&
>> +
>> +		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
>> +		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
>
> Patch 01/17 introduces two unused variables: "garbage" and "empty".
> However, patch 16/17 introduces grep checks for "garbage_blob" and
> "empty_blob". Aside from that, 't/test-lib.sh' already defines
> $EMPTY_BLOB.

Will fix in the v10 re-roll.

I think this is from an earlier version where I used the $empty, FWIW
you do need it (or to write it) even with $EMPTY_BLOB since that's just
the OID, but doesn't give you the object. You can write the /dev/null
input and then use $EMPTY_BLOB, but I thought using the output of
hash-object was less confusing.

But in any case it isn't needed herea as you point out, we just need to
write the garbage object, we don't need either variable. Thanks!

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v10 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
                                     ` (17 preceding siblings ...)
  2021-09-30 19:06                   ` [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
@ 2021-10-01  9:16                   ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                                       ` (16 more replies)
  18 siblings, 17 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.

This should fix the issues noted about v9[1]. I.e.:

 A. Junio's right in [2] that the "type" can't be trusted after a
    failed read_loose_object(). I.e. if we'll fail before we can parse
    it out it'll be uninitialized. It's now initialized to OBJ_NONE,
    we can trust it if it's set to something else.

 B. I re-arranged much of 16/17 and 17/17 to make the diff (but not
    the range-diff) smaller.

 C. We now share a single "strbuf" across the whole fsck_loose() walk
    for saving away the type name, instead of allocating a new one
    each time. This is both a better memory usage pattern, and makes
    fsck_loose() itself simpler.

    It also allows for using much of the pre-image as-is, i.e. the
    whole "goto cleanup" is gone. Likewise instead of "ret" and "err"
    we just have the "err" variable now.

 D. I fixed the redundant/left-over test setup noted by Andrei
    Rybak[3].

1. http://lore.kernel.org/git/cover-v9-00.17-00000000000-20210930T133300Z-avarab@gmail.com
2. https://lore.kernel.org/git/xmqqsfxlaicg.fsf@gitster.g/
3. https://lore.kernel.org/git/78bab348-ba3a-7a27-e32e-6b75f91178db@gmail.com/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: add test for fsck-ing an unknown type
  fsck tests: refactor one test to use a sub-repo
  fsck tests: test current hash/type mismatch behavior
  fsck tests: test for garbage appended to a loose object
  cat-file tests: move bogus_* variable declarations earlier
  cat-file tests: test for missing/bogus object with -t, -s and -p
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: make parse_loose_header_extended() public
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  object-file.c: return ULHR_TOO_LONG on "header too long"
  object-file.c: stop dying in parse_loose_header()
  fsck: don't hard die on invalid object types
  fsck: report invalid object type-path combinations

 builtin/fast-export.c |   2 +-
 builtin/fsck.c        |  44 +++++++--
 builtin/index-pack.c  |   2 +-
 builtin/mktag.c       |   3 +-
 cache.h               |  45 ++++++++-
 object-file.c         | 176 +++++++++++++++------------------
 object-store.h        |   7 +-
 object.c              |   4 +-
 pack-check.c          |   3 +-
 streaming.c           |  27 +++--
 t/oid-info/oid        |   2 +
 t/t1006-cat-file.sh   | 223 +++++++++++++++++++++++++++++++++++++++---
 t/t1450-fsck.sh       |  97 ++++++++++++++----
 13 files changed, 476 insertions(+), 159 deletions(-)

Range-diff against v9:
 1:  520732612f7 !  1:  00936435423 fsck tests: add test for fsck-ing an unknown type
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
     +	(
     +		cd garbage-type &&
     +
    -+		empty=$(git hash-object --stdin -w -t blob </dev/null) &&
    -+		garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
    ++		git hash-object --stdin -w -t garbage --literally </dev/null &&
     +
     +		cat >err.expect <<-\EOF &&
     +		fatal: invalid object type
 2:  af7086623fe =  2:  32a2f9cc0c9 fsck tests: refactor one test to use a sub-repo
 3:  102bc4f0176 =  3:  00d661a6032 fsck tests: test current hash/type mismatch behavior
 4:  ff7fc09d5a1 =  4:  a527e3b262c fsck tests: test for garbage appended to a loose object
 5:  278df093239 =  5:  7a63d30aef3 cat-file tests: move bogus_* variable declarations earlier
 6:  290bf983590 =  6:  a563c7efe1c cat-file tests: test for missing/bogus object with -t, -s and -p
 7:  a41b2c571e5 =  7:  c5affb65b7e cat-file tests: add corrupt loose object test
 8:  cedeb117330 =  8:  76f9888a6f7 cat-file tests: test for current --allow-unknown-type behavior
 9:  6f0673d38c8 =  9:  85a91f43634 object-file.c: don't set "typep" when returning non-zero
10:  6637e8fd2ca = 10:  51eaa2e8479 object-file.c: return -1, not "status" from unpack_loose_header()
11:  51db08ebbae = 11:  5cd2ba830e9 object-file.c: make parse_loose_header_extended() public
12:  dffe5581f6f = 12:  6899c6ec17a object-file.c: simplify unpack_loose_short_header()
13:  eb7c949c8b7 = 13:  a3bdd53d296 object-file.c: use "enum" return type for unpack_loose_header()
14:  f4cc7271df7 = 14:  5a7c2855b50 object-file.c: return ULHR_TOO_LONG on "header too long"
15:  25d6ec668d4 = 15:  3ec9fee7ee9 object-file.c: stop dying in parse_loose_header()
16:  6ce0414b2b7 ! 16:  9b75ac7c8ed fsck: don't hard die on invalid object types
    @@ Commit message
         f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
         for the introduction of read_loose_object().
     
    -    Since we're now passing in a "oi.type_name" we'll have to clean up the
    -    allocated "strbuf sb". That we're doing it right is asserted by
    -    e.g. the "fsck notices broken commit" test added in 03818a4a94c
    -    (split_ident: parse timestamp from end of line, 2013-10-14). To do
    -    that switch to a "goto cleanup" pattern, and while we're at it factor
    -    out the already duplicated free(content) to use that pattern.
    +    Since we'll need a "struct strbuf" to hold the "type_name" let's pass
    +    it to the for_each_loose_file_in_objdir() callback to avoid allocating
    +    a new one for each loose object in the iteration. It also makes the
    +    memory management simpler than sticking it in fsck_loose() itself, as
    +    we'll only need to strbuf_reset() it, with no need to do a
    +    strbuf_release() before each "return".
    +
    +    Before this commit we'd never check the "type" if read_loose_object()
    +    failed, but now we do. We therefore need to initialize it to OBJ_NONE
    +    to be able to tell the difference between e.g. its
    +    unpack_loose_header() having failed, and us getting past that and into
    +    parse_loose_header().
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    +@@ builtin/fsck.c: static void get_default_heads(void)
    + 	}
    + }
    + 
    ++struct for_each_loose_cb
    ++{
    ++	struct progress *progress;
    ++	struct strbuf obj_type;
    ++};
    ++
    + static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    + {
    ++	struct for_each_loose_cb *cb_data = data;
    + 	struct object *obj;
    +-	enum object_type type;
    ++	enum object_type type = OBJ_NONE;
      	unsigned long size;
      	void *contents;
      	int eaten;
    -+	struct strbuf sb = STRBUF_INIT;
     +	struct object_info oi = OBJECT_INFO_INIT;
     +	int err = 0;
      
     -	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
    -+	oi.type_name = &sb;
    ++	strbuf_reset(&cb_data->obj_type);
    ++	oi.type_name = &cb_data->obj_type;
     +	oi.sizep = &size;
     +	oi.typep = &type;
     +
     +	if (read_loose_object(path, oid, &contents, &oi) < 0)
     +		err = error(_("%s: object corrupt or missing: %s"),
     +			    oid_to_hex(oid), path);
    -+	if (type < 0)
    ++	if (type != OBJ_NONE && type < 0)
     +		err = error(_("%s: object is of unknown type '%s': %s"),
    -+			    oid_to_hex(oid), sb.buf, path);
    -+	if (err) {
    ++			    oid_to_hex(oid), cb_data->obj_type.buf, path);
    ++	if (err < 0) {
      		errors_found |= ERROR_OBJECT;
     -		error(_("%s: object corrupt or missing: %s"),
     -		      oid_to_hex(oid), path);
    --		return 0; /* keep checking other objects */
    -+		goto cleanup;
    - 	}
    - 
    - 	if (!contents && type != OBJ_BLOB)
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 		errors_found |= ERROR_OBJECT;
    - 		error(_("%s: object could not be parsed: %s"),
    - 		      oid_to_hex(oid), path);
    --		if (!eaten)
    --			free(contents);
    --		return 0; /* keep checking other objects */
    -+		goto cleanup_eaten;
    + 		return 0; /* keep checking other objects */
      	}
      
    - 	obj->flags &= ~(REACHABLE | SEEN);
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 	if (fsck_obj(obj, contents, size))
    - 		errors_found |= ERROR_OBJECT;
    +@@ builtin/fsck.c: static int fsck_cruft(const char *basename, const char *path, void *data)
    + 	return 0;
    + }
      
    -+cleanup_eaten:
    - 	if (!eaten)
    - 		free(contents);
    -+cleanup:
    -+	strbuf_release(&sb);
    - 	return 0; /* keep checking other objects, even if we saw an error */
    +-static int fsck_subdir(unsigned int nr, const char *path, void *progress)
    ++static int fsck_subdir(unsigned int nr, const char *path, void *data)
    + {
    ++	struct for_each_loose_cb *cb_data = data;
    ++	struct progress *progress = cb_data->progress;
    + 	display_progress(progress, nr + 1);
    + 	return 0;
    + }
    +@@ builtin/fsck.c: static int fsck_subdir(unsigned int nr, const char *path, void *progress)
    + static void fsck_object_dir(const char *path)
    + {
    + 	struct progress *progress = NULL;
    ++	struct for_each_loose_cb cb_data = {
    ++		.obj_type = STRBUF_INIT,
    ++		.progress = progress,
    ++	};
    + 
    + 	if (verbose)
    + 		fprintf_ln(stderr, _("Checking object directory"));
    +@@ builtin/fsck.c: static void fsck_object_dir(const char *path)
    + 		progress = start_progress(_("Checking object directories"), 256);
    + 
    + 	for_each_loose_file_in_objdir(path, fsck_loose, fsck_cruft, fsck_subdir,
    +-				      progress);
    ++				      &cb_data);
    + 	display_progress(progress, 256);
    + 	stop_progress(&progress);
    ++	strbuf_release(&cb_data.obj_type);
      }
      
    + static int fsck_head_link(const char *head_ref_name,
     
      ## object-file.c ##
     @@ object-file.c: static int check_stream_oid(git_zstream *stream,
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
      	git init --bare garbage-type &&
      	(
      		cd garbage-type &&
    -@@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type' '
    + 
    +-		git hash-object --stdin -w -t garbage --literally </dev/null &&
    ++		garbage_blob=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
    + 
    + 		cat >err.expect <<-\EOF &&
      		fatal: invalid object type
      		EOF
      		test_must_fail git fsck >out 2>err &&
    @@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type
     -		test_must_be_empty out
     +		grep -e "^error" -e "^fatal" err >errors &&
     +		test_line_count = 1 errors &&
    -+		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
    -+		grep "dangling blob $empty_blob" out
    ++		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err
      	)
      '
      
17:  8d926e41fc3 ! 17:  838df0a979b fsck: report invalid object type-path combinations
    @@ builtin/fast-export.c: static void export_blob(const struct object_id *oid)
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 	struct object *obj;
    - 	enum object_type type;
    - 	unsigned long size;
    --	void *contents;
    -+	unsigned char *contents = NULL;
    + 	void *contents;
      	int eaten;
    - 	struct strbuf sb = STRBUF_INIT;
      	struct object_info oi = OBJECT_INFO_INIT;
    --	int err = 0;
     +	struct object_id real_oid = *null_oid();
    -+	int ret;
    + 	int err = 0;
      
    - 	oi.type_name = &sb;
    + 	strbuf_reset(&cb_data->obj_type);
    +@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
      	oi.sizep = &size;
      	oi.typep = &type;
      
     -	if (read_loose_object(path, oid, &contents, &oi) < 0)
     -		err = error(_("%s: object corrupt or missing: %s"),
     -			    oid_to_hex(oid), path);
    -+	ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
    -+	if (ret < 0) {
    ++	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
     +		if (contents && !oideq(&real_oid, oid))
    -+			error(_("%s: hash-path mismatch, found at: %s"),
    -+			      oid_to_hex(&real_oid), path);
    ++			err = error(_("%s: hash-path mismatch, found at: %s"),
    ++				    oid_to_hex(&real_oid), path);
     +		else
    -+			error(_("%s: object corrupt or missing: %s"),
    -+			      oid_to_hex(oid), path);
    ++			err = error(_("%s: object corrupt or missing: %s"),
    ++				    oid_to_hex(oid), path);
     +	}
    - 	if (type < 0)
    --		err = error(_("%s: object is of unknown type '%s': %s"),
    --			    oid_to_hex(oid), sb.buf, path);
    --	if (err) {
    -+		ret = error(_("%s: object is of unknown type '%s': %s"),
    -+			    oid_to_hex(&real_oid), sb.buf, path);
    -+	if (ret < 0) {
    + 	if (type != OBJ_NONE && type < 0)
    + 		err = error(_("%s: object is of unknown type '%s': %s"),
    +-			    oid_to_hex(oid), cb_data->obj_type.buf, path);
    ++			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
    ++			    path);
    + 	if (err < 0) {
      		errors_found |= ERROR_OBJECT;
    - 		goto cleanup;
    - 	}
    + 		return 0; /* keep checking other objects */
     
      ## builtin/index-pack.c ##
     @@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                                       ` (15 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..beb233e91b1 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,20 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	(
+		cd garbage-type &&
+
+		git hash-object --stdin -w -t garbage --literally </dev/null &&
+
+		cat >err.expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck >out 2>err &&
+		test_cmp err.expect err &&
+		test_must_be_empty out
+	)
+'
+
 test_done
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
                                       ` (14 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.

See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.

1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index beb233e91b1..b73bc2a2ec3 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
 
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		test_must_fail git fsck 2>out &&
+		grep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
                                       ` (13 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index b73bc2a2ec3..f9cabcecd14 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+
+		cat >expect <<-\EOF &&
+		fatal: invalid object type
+		EOF
+		test_must_fail git fsck 2>actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (2 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
                                       ` (12 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f9cabcecd14..281ff8bdd8e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
 	)
 '
 
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+	git init --bare corrupt-loose-output &&
+	(
+		cd corrupt-loose-output &&
+		oid=$(git hash-object -w --stdin --literally </dev/null) &&
+		oidf=objects/$(test_oid_to_path "$oid") &&
+		chmod 755 $oidf &&
+		echo extra garbage >>$oidf &&
+
+		cat >expect.error <<-EOF &&
+		error: garbage at end of loose object '\''$oid'\''
+		error: unable to unpack contents of ./$oidf
+		error: $oid: object corrupt or missing: ./$oidf
+		EOF
+		test_must_fail git fsck 2>actual &&
+		grep ^error: actual >error &&
+		test_cmp expect.error error
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (3 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
                                       ` (11 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+	bogus_short_type="bogus" &&
+	bogus_short_content="bogus" &&
+	bogus_short_size=$(strlen "$bogus_short_content") &&
+	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+	bogus_long_content="bogus" &&
+	bogus_long_size=$(strlen "$bogus_long_content") &&
+	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
 
 test_expect_success "Type of broken object is correct" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of broken object is correct" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_short_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
 	test_cmp expect actual
 '
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
 test_expect_success "Type of broken object is correct when type is large" '
-	echo $bogus_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_type >expect &&
+	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of large broken object is correct when type is large" '
-	echo $bogus_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+	echo $bogus_long_size >expect &&
+	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
 	test_cmp expect actual
 '
 
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (4 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                                       ` (10 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().

The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.

The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.

Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.

This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/oid-info/oid      |  2 ++
 t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..7547d2c7903 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric		sha1:0123456789012345678901234567890123456789
 numeric		sha256:0123456789012345678901234567890123456789012345678901234567890123
 deadbeef	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
 deadbeef	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short	sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbeef_short	sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..abf57339a29 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
 	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
 '
 
+for arg1 in '' --allow-unknown-type
+do
+	for arg2 in -s -t -p
+	do
+		if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+		then
+			continue
+		fi
+
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+			cat >expect <<-\EOF &&
+			fatal: invalid object type
+			EOF
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: Not a valid object name $bogus_long_sha1
+				EOF
+			else
+				cat >expect <<-EOF
+				error: unable to unpack $bogus_long_sha1 header
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+
+			if test "$arg1" = "--allow-unknown-type"
+			then
+				git cat-file $arg1 $arg2 $bogus_short_sha1
+			else
+				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+				test_must_be_empty out &&
+				test_cmp expect actual
+			fi
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+			cat >expect.err <<-EOF &&
+			fatal: Not a valid object name $(test_oid deadbeef_short)
+			EOF
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+			test_must_be_empty out
+		'
+
+		test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+			if test "$arg2" = "-p"
+			then
+				cat >expect.err <<-EOF
+				fatal: Not a valid object name $(test_oid deadbeef)
+				EOF
+			else
+				cat >expect.err <<-\EOF
+				fatal: git cat-file: could not get object info
+				EOF
+			fi &&
+			test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+			test_must_be_empty out &&
+			test_cmp expect.err err.actual
+		'
+	done
+done
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 07/17] cat-file tests: add corrupt loose object test
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (5 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                                       ` (9 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index abf57339a29..15774979ad3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (6 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                                       ` (8 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 15774979ad3..5b16c69c286 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
 	done
 done
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_short_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_short_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_short_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	test_when_finished "rm -rf .git/refs/replace" &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (7 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                                       ` (7 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..766ba88b851 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1525,8 +1525,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (8 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                                       ` (6 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 766ba88b851..8475b128944 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1284,7 +1284,7 @@ int unpack_loose_header(git_zstream *stream,
 					       buffer, bufsiz);
 
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/* Make sure we have the terminating NUL */
 	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (9 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                                       ` (5 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.

1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       |  4 +++-
 object-file.c | 20 +++++++-------------
 streaming.c   |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index f6295f3b048..35f254dae4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,7 +1320,9 @@ char *xdg_cache_home(const char *filename);
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
 int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 8475b128944..6b91c4edcf6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1385,8 +1385,8 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1446,14 +1446,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1508,10 +1500,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2599,6 +2591,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2613,7 +2607,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header()
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (10 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
                                       ` (4 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 17 ++++++++++++++-
 object-file.c | 58 ++++++++++++++++++---------------------------------
 streaming.c   |  3 ++-
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 35f254dae4a..d7189aed8fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1319,7 +1319,22 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 6b91c4edcf6..1327872cbf4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,11 +1255,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1270,35 +1271,8 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
-	if (status < Z_OK)
-		return -1;
-
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
 	if (status < Z_OK)
 		return -1;
 
@@ -1308,6 +1282,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1457,6 +1439,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1490,11 +1473,9 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				allow_unknown ? &hdrbuf : NULL) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	if (status < 0)
@@ -2602,7 +2583,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header()
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (11 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
                                       ` (3 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.

Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 19 +++++++++++++++----
 object-file.c | 34 +++++++++++++++++++++-------------
 streaming.c   | 23 +++++++++++++----------
 3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index d7189aed8fc..7239e20a625 100644
--- a/cache.h
+++ b/cache.h
@@ -1324,7 +1324,10 @@ int git_open_cloexec(const char *name, int flags);
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1332,9 +1335,17 @@ int git_open_cloexec(const char *name, int flags);
  * reporting. The full header will be extracted to "hdrbuf" for use
  * with parse_loose_header().
  */
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
-			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+	ULHR_OK,
+	ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *hdrbuf);
+
 struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 1327872cbf4..e0f508415dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,10 +1255,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz,
-			struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *header)
 {
 	int status;
 
@@ -1274,13 +1276,13 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return 0;
+		return ULHR_OK;
 
 	/*
 	 * We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1288,7 +1290,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return ULHR_BAD;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1309,7 +1311,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return ULHR_BAD;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1474,13 +1476,19 @@ static int loose_object_info(struct repository *r,
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				allow_unknown ? &hdrbuf : NULL) < 0)
+	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				    allow_unknown ? &hdrbuf : NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+		break;
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
-	if ((unpack_loose_header(&st->z,
-				 st->u.loose.mapped,
-				 st->u.loose.mapsize,
-				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr),
-				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
-		git_inflate_end(&st->z);
-		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-		return -1;
+	switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+				    st->u.loose.mapsize, st->u.loose.hdr,
+				    sizeof(st->u.loose.hdr), NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
+		goto error;
 	}
+	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->read = read_istream_loose;
 
 	return 0;
+error:
+	git_inflate_end(&st->z);
+	munmap(st->u.loose.mapped, st->u.loose.mapsize);
+	return -1;
 }
 
 
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (12 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
                                       ` (2 subsequent siblings)
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h             | 5 ++++-
 object-file.c       | 8 ++++++--
 streaming.c         | 1 +
 t/t1006-cat-file.sh | 4 ++--
 4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 7239e20a625..8e05392fda8 100644
--- a/cache.h
+++ b/cache.h
@@ -1328,16 +1328,19 @@ int git_open_cloexec(const char *name, int flags);
  *
  * - ULHR_OK on success
  * - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
  */
 enum unpack_loose_header_result {
 	ULHR_OK,
 	ULHR_BAD,
+	ULHR_TOO_LONG,
 };
 enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned char *map,
diff --git a/object-file.c b/object-file.c
index e0f508415dd..3589c5a2e33 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1290,7 +1290,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return ULHR_BAD;
+		return ULHR_TOO_LONG;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1311,7 +1311,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return ULHR_BAD;
+	return ULHR_TOO_LONG;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1484,6 +1484,10 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 		break;
+	case ULHR_TOO_LONG:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
 	}
 
 	if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_OK:
 		break;
 	case ULHR_BAD:
+	case ULHR_TOO_LONG:
 		goto error;
 	}
 	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5b16c69c286..a5e7401af8b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
 			if test "$arg2" = "-p"
 			then
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: Not a valid object name $bogus_long_sha1
 				EOF
 			else
 				cat >expect <<-EOF
-				error: unable to unpack $bogus_long_sha1 header
+				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
 				fatal: git cat-file: could not get object info
 				EOF
 			fi &&
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header()
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (13 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Make parse_loose_header() return error codes and data instead of
invoking die() by itself.

For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.

For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".

The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.

We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h       | 11 +++++++--
 object-file.c | 67 +++++++++++++++++++++++++--------------------------
 streaming.c   |  3 ++-
 3 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/cache.h b/cache.h
index 8e05392fda8..6c5f00c82d5 100644
--- a/cache.h
+++ b/cache.h
@@ -1349,9 +1349,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 						    unsigned long bufsiz,
 						    struct strbuf *hdrbuf);
 
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
 struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 3589c5a2e33..a70669700d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1369,8 +1369,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1392,15 +1391,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1427,7 +1417,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1441,6 +1438,7 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1472,6 +1470,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1479,6 +1479,18 @@ static int loose_object_info(struct repository *r,
 	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				    allow_unknown ? &hdrbuf : NULL)) {
 	case ULHR_OK:
+		if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
+		else if (!allow_unknown && *oi->typep < 0)
+			die(_("invalid object type"));
+
+		if (!oi->contentp)
+			break;
+		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+		if (*oi->contentp)
+			goto cleanup;
+
+		status = -1;
 		break;
 	case ULHR_BAD:
 		status = error(_("unable to unpack %s header"),
@@ -1490,31 +1502,16 @@ static int loose_object_info(struct repository *r,
 		break;
 	}
 
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
-	if (status >= 0 && oi->contentp) {
-		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
-		if (!*oi->contentp) {
-			git_inflate_end(&stream);
-			status = -1;
-		}
-	} else
-		git_inflate_end(&stream);
-
+	git_inflate_end(&stream);
+cleanup:
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
-	return (status < 0) ? status : 0;
+	return status;
 }
 
 int obj_read_use_lock = 0;
@@ -2585,6 +2582,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2601,12 +2599,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	case ULHR_TOO_LONG:
 		goto error;
 	}
-	if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
 		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 16/17] fsck: don't hard die on invalid object types
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (14 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01  9:16                     ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().

Since we'll need a "struct strbuf" to hold the "type_name" let's pass
it to the for_each_loose_file_in_objdir() callback to avoid allocating
a new one for each loose object in the iteration. It also makes the
memory management simpler than sticking it in fsck_loose() itself, as
we'll only need to strbuf_reset() it, with no need to do a
strbuf_release() before each "return".

Before this commit we'd never check the "type" if read_loose_object()
failed, but now we do. We therefore need to initialize it to OBJ_NONE
to be able to tell the difference between e.g. its
unpack_loose_header() having failed, and us getting past that and into
parse_loose_header().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 37 +++++++++++++++++++++++++++++++------
 object-file.c   | 18 ++++++------------
 object-store.h  |  6 +++---
 t/t1450-fsck.sh | 18 +++++++++---------
 4 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..260210bf8a1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -593,18 +593,36 @@ static void get_default_heads(void)
 	}
 }
 
+struct for_each_loose_cb
+{
+	struct progress *progress;
+	struct strbuf obj_type;
+};
+
 static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 {
+	struct for_each_loose_cb *cb_data = data;
 	struct object *obj;
-	enum object_type type;
+	enum object_type type = OBJ_NONE;
 	unsigned long size;
 	void *contents;
 	int eaten;
+	struct object_info oi = OBJECT_INFO_INIT;
+	int err = 0;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	strbuf_reset(&cb_data->obj_type);
+	oi.type_name = &cb_data->obj_type;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi) < 0)
+		err = error(_("%s: object corrupt or missing: %s"),
+			    oid_to_hex(oid), path);
+	if (type != OBJ_NONE && type < 0)
+		err = error(_("%s: object is of unknown type '%s': %s"),
+			    oid_to_hex(oid), cb_data->obj_type.buf, path);
+	if (err < 0) {
 		errors_found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
 		return 0; /* keep checking other objects */
 	}
 
@@ -640,8 +658,10 @@ static int fsck_cruft(const char *basename, const char *path, void *data)
 	return 0;
 }
 
-static int fsck_subdir(unsigned int nr, const char *path, void *progress)
+static int fsck_subdir(unsigned int nr, const char *path, void *data)
 {
+	struct for_each_loose_cb *cb_data = data;
+	struct progress *progress = cb_data->progress;
 	display_progress(progress, nr + 1);
 	return 0;
 }
@@ -649,6 +669,10 @@ static int fsck_subdir(unsigned int nr, const char *path, void *progress)
 static void fsck_object_dir(const char *path)
 {
 	struct progress *progress = NULL;
+	struct for_each_loose_cb cb_data = {
+		.obj_type = STRBUF_INIT,
+		.progress = progress,
+	};
 
 	if (verbose)
 		fprintf_ln(stderr, _("Checking object directory"));
@@ -657,9 +681,10 @@ static void fsck_object_dir(const char *path)
 		progress = start_progress(_("Checking object directories"), 256);
 
 	for_each_loose_file_in_objdir(path, fsck_loose, fsck_cruft, fsck_subdir,
-				      progress);
+				      &cb_data);
 	display_progress(progress, 256);
 	stop_progress(&progress);
+	strbuf_release(&cb_data.obj_type);
 }
 
 static int fsck_head_link(const char *head_ref_name,
diff --git a/object-file.c b/object-file.c
index a70669700d0..fe95285f405 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2572,18 +2572,15 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      struct object_info *oi)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.typep = type;
-	oi.sizep = size;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2599,15 +2596,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
 
-	if (*type == OBJ_BLOB && *size > big_file_threshold) {
+	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
 			goto out;
 	} else {
@@ -2618,8 +2613,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index c5130d8baea..c90c41a07f7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,6 +245,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -252,9 +253,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      struct object_info *oi);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 281ff8bdd8e..faf0e98847b 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 
-		cat >expect <<-\EOF &&
-		fatal: invalid object type
-		EOF
-		test_must_fail git fsck 2>actual &&
-		test_cmp expect actual
+
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
@@ -910,19 +909,20 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	(
 		cd garbage-type &&
 
-		git hash-object --stdin -w -t garbage --literally </dev/null &&
+		garbage_blob=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
 
 		cat >err.expect <<-\EOF &&
 		fatal: invalid object type
 		EOF
 		test_must_fail git fsck >out 2>err &&
-		test_cmp err.expect err &&
-		test_must_be_empty out
+		grep -e "^error" -e "^fatal" err >errors &&
+		test_line_count = 1 errors &&
+		grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err
 	)
 '
 
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH v10 17/17] fsck: report invalid object type-path combinations
  2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
                                       ` (15 preceding siblings ...)
  2021-10-01  9:16                     ` [PATCH v10 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-10-01  9:16                     ` Ævar Arnfjörð Bjarmason
  2021-10-01 22:14                       ` Junio C Hamano
                                         ` (2 more replies)
  16 siblings, 3 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01  9:16 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 15 +++++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 cache.h               |  3 ++-
 object-file.c         | 21 ++++++++++-----------
 object-store.h        |  1 +
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 11 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 260210bf8a1..30a516da29e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -608,6 +608,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 	struct object_info oi = OBJECT_INFO_INIT;
+	struct object_id real_oid = *null_oid();
 	int err = 0;
 
 	strbuf_reset(&cb_data->obj_type);
@@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi) < 0)
-		err = error(_("%s: object corrupt or missing: %s"),
-			    oid_to_hex(oid), path);
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
+		if (contents && !oideq(&real_oid, oid))
+			err = error(_("%s: hash-path mismatch, found at: %s"),
+				    oid_to_hex(&real_oid), path);
+		else
+			err = error(_("%s: object corrupt or missing: %s"),
+				    oid_to_hex(oid), path);
+	}
 	if (type != OBJ_NONE && type < 0)
 		err = error(_("%s: object is of unknown type '%s': %s"),
-			    oid_to_hex(oid), cb_data->obj_type.buf, path);
+			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
+			    path);
 	if (err < 0) {
 		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ce69c087ec..15ae406e6b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/cache.h b/cache.h
index 6c5f00c82d5..e2a203073ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,7 +1361,8 @@ struct object_info;
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 
 int finalize_object_file(const char *tmpfile, const char *filename);
 
diff --git a/object-file.c b/object-file.c
index fe95285f405..49561e31551 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi)
 {
@@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index c90c41a07f7..17b072e5a19 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,6 +253,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi);
 
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a5e7401af8b..0f52ca9cc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index faf0e98847b..6337236fd82 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
 		cd hash-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
 		git update-ref refs/heads/bogus $cmt &&
 
 		test_must_fail git fsck 2>out &&
-		grep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
 		cd hash-type-mismatch &&
 
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
 
 
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.33.0.1375.g5eed55aa1b5


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations
  2021-10-01  9:16                     ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-10-01 22:14                       ` Junio C Hamano
  2021-10-01 22:33                         ` Ævar Arnfjörð Bjarmason
  2021-11-11  3:03                       ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
  2021-11-11  3:05                       ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
  2 siblings, 1 reply; 245+ messages in thread
From: Junio C Hamano @ 2021-10-01 22:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 260210bf8a1..30a516da29e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	oi.sizep = &size;
>  	oi.typep = &type;
>  
> -	if (read_loose_object(path, oid, &contents, &oi) < 0)
> -		err = error(_("%s: object corrupt or missing: %s"),
> -			    oid_to_hex(oid), path);
> +	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
> +		if (contents && !oideq(&real_oid, oid))
> +			err = error(_("%s: hash-path mismatch, found at: %s"),
> +				    oid_to_hex(&real_oid), path);
> +		else
> +			err = error(_("%s: object corrupt or missing: %s"),
> +				    oid_to_hex(oid), path);
> +	}
>  	if (type != OBJ_NONE && type < 0)
>  		err = error(_("%s: object is of unknown type '%s': %s"),
> -			    oid_to_hex(oid), cb_data->obj_type.buf, path);
> +			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
> +			    path);
>  	if (err < 0) {
>  		errors_found |= ERROR_OBJECT;
>  		return 0; /* keep checking other objects */

When we say "hash-path mismatch", we would have non-null contents,
presumably obtained from read_loose_object().  err is made negative
when we give that messge, and we come here to return.  Did we forget
to free "contents" in that case?


^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations
  2021-10-01 22:14                       ` Junio C Hamano
@ 2021-10-01 22:33                         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 22:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau


On Fri, Oct 01 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> diff --git a/builtin/fsck.c b/builtin/fsck.c
>> index 260210bf8a1..30a516da29e 100644
>> --- a/builtin/fsck.c
>> +++ b/builtin/fsck.c
>> @@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>>  	oi.sizep = &size;
>>  	oi.typep = &type;
>>  
>> -	if (read_loose_object(path, oid, &contents, &oi) < 0)
>> -		err = error(_("%s: object corrupt or missing: %s"),
>> -			    oid_to_hex(oid), path);
>> +	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
>> +		if (contents && !oideq(&real_oid, oid))
>> +			err = error(_("%s: hash-path mismatch, found at: %s"),
>> +				    oid_to_hex(&real_oid), path);
>> +		else
>> +			err = error(_("%s: object corrupt or missing: %s"),
>> +				    oid_to_hex(oid), path);
>> +	}
>>  	if (type != OBJ_NONE && type < 0)
>>  		err = error(_("%s: object is of unknown type '%s': %s"),
>> -			    oid_to_hex(oid), cb_data->obj_type.buf, path);
>> +			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
>> +			    path);
>>  	if (err < 0) {
>>  		errors_found |= ERROR_OBJECT;
>>  		return 0; /* keep checking other objects */
>
> When we say "hash-path mismatch", we would have non-null contents,
> presumably obtained from read_loose_object().  err is made negative
> when we give that messge, and we come here to return.  Did we forget
> to free "contents" in that case?

No, e.g. the "cat-file -t and -s on corrupt loose object" test added in
this series doesn't error with SANITIZE=leak.

This is because as we go through read_loose_object() we'll make our way
to unpack_loose_rest(), which will return that malloc'd buffer. So we
would leak it if we returned after that.

Except that in read_loose_object() we'll go on to call
check_object_signature() right afterwards. The expecte OID is whatever
we inferred from the FS path, and the OID we saw is what we get from
hashing. That call will return non-zero, and we'll free() the
contents. The buffer isn't NULL'd, but we can't use it.

This is all behavior that pre-dates this series. I think it's a bit
stupid, and we should arguably do better about data recovery here, as
alluded to at the end of the commit message.

I.e. ideally we can use the information that we know we wanted OID A,
who cares if we found it at path B? It hashes to A and completes the
graph! Let's just re-write it to A. Or maybe it's not worth it. Or we'd
want to optionally log the content we *do* find on such failures,
e.g. maybe the content is partial or whatever. I had some WIP work on
top of this that did that, e.g. to recover in cases where you append
garbage data at the end of an object (in which case we *do* have the
content and can recover, we just need to stop reading at that byte once
our OID matches, and re-write it out again).

But anyway, it doesn't work that way now, and this doesn't leak memory,
or as far as I can tell do the wrong thing in these various edge cases,
because "content is bad" is always synonymous with read_loose_object()
itself calling free().

Thanks a lot for the careful checking!

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v2] receive-pack: not receive pack file with large object
  2021-10-01  9:16                     ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-10-01 22:14                       ` Junio C Hamano
@ 2021-11-11  3:03                       ` Han Xin
  2021-11-11 18:35                         ` Junio C Hamano
  2021-11-11  3:05                       ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
  2 siblings, 1 reply; 245+ messages in thread
From: Han Xin @ 2021-11-11  3:03 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, jonathantanmy, me, peff, rybak.a.v

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
...
> diff --git a/object-file.c b/object-file.c
> index fe95285f405..49561e31551 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
> * the streaming interface and rehash it to do the same.
> */
> int check_object_signature(struct repository *r, const struct object_id *oid,
> - void *map, unsigned long size, const char *type)
> + void *map, unsigned long size, const char *type,
> + struct object_id *real_oidp)
> {
> - struct object_id real_oid;
> + struct object_id tmp;
> + struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
> enum object_type obj_type;
> struct git_istream *st;
> git_hash_ctx c;
> @@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> int hdrlen;
>
> if (map) {
> - hash_object_file(r->hash_algo, map, size, type, &real_oid);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + hash_object_file(r->hash_algo, map, size, type, real_oid);
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> st = open_istream(r, oid, &obj_type, &size, NULL);
> @@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> break;
> r->hash_algo->update_fn(&c, buf, readlen);
> }
> - r->hash_algo->final_oid_fn(&real_oid, &c);
> + r->hash_algo->final_oid_fn(real_oid, &c);
> close_istream(st);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> int git_open_cloexec(const char *name, int flags)
> @@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
>
> int read_loose_object(const char *path,
> const struct object_id *expected_oid,
> + struct object_id *real_oid,
> void **contents,
> struct object_info *oi)
> {
> @@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> - *contents = NULL;
> -

Deleting "*contents = NULL;" here will cause a memory free error.
When reading a large loose blob ( large than big_file_threshold), it will enter the following block and *content will not be set:

	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
			goto out;
	} else {
		...
	}


This test case can illustrate this problem:

test_expect_success 'fsck large loose blob' '
	blob=$(echo large | git hash-object -w --stdin) &&
	git -c core.bigfilethreshold=4 fsck
'

git(73697,0x1198f1e00) malloc: *** error for object 0x36: pointer being freed was not allocated
git(73697,0x1198f1e00) malloc: *** set a breakpoint in malloc_error_break to debug
./test-lib.sh: line 947: 73697 Abort trap: 6           git -c core.bigfilethreshold=4 fsck

> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
> @@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
> goto out;
> }
> if (check_object_signature(the_repository, expected_oid,
> - *contents, *size, oi->type_name->buf)) {
> - error(_("hash mismatch for %s (expected %s)"), path,
> - oid_to_hex(expected_oid));
> + *contents, *size, oi->type_name->buf, real_oid)) {
> free(*contents);
> goto out;
> }
...

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations
  2021-10-01  9:16                     ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-10-01 22:14                       ` Junio C Hamano
  2021-11-11  3:03                       ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
@ 2021-11-11  3:05                       ` Han Xin
  2021-11-11  5:18                         ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 245+ messages in thread
From: Han Xin @ 2021-11-11  3:05 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, jonathantanmy, me, peff, rybak.a.v

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
...
> diff --git a/object-file.c b/object-file.c
> index fe95285f405..49561e31551 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
> * the streaming interface and rehash it to do the same.
> */
> int check_object_signature(struct repository *r, const struct object_id *oid,
> - void *map, unsigned long size, const char *type)
> + void *map, unsigned long size, const char *type,
> + struct object_id *real_oidp)
> {
> - struct object_id real_oid;
> + struct object_id tmp;
> + struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
> enum object_type obj_type;
> struct git_istream *st;
> git_hash_ctx c;
> @@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> int hdrlen;
>
> if (map) {
> - hash_object_file(r->hash_algo, map, size, type, &real_oid);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + hash_object_file(r->hash_algo, map, size, type, real_oid);
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> st = open_istream(r, oid, &obj_type, &size, NULL);
> @@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> break;
> r->hash_algo->update_fn(&c, buf, readlen);
> }
> - r->hash_algo->final_oid_fn(&real_oid, &c);
> + r->hash_algo->final_oid_fn(real_oid, &c);
> close_istream(st);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> int git_open_cloexec(const char *name, int flags)
> @@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
>
> int read_loose_object(const char *path,
> const struct object_id *expected_oid,
> + struct object_id *real_oid,
> void **contents,
> struct object_info *oi)
> {
> @@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> - *contents = NULL;
> -

Deleting "*contents = NULL;" here will cause a memory free error.
When reading a large loose blob ( large than big_file_threshold), it will enter the following block and *content will not be set:

	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
			goto out;
	} else {
		...
	}


This test case can illustrate this problem:

test_expect_success 'fsck large loose blob' '
	blob=$(echo large | git hash-object -w --stdin) &&
	git -c core.bigfilethreshold=4 fsck
'

git(73697,0x1198f1e00) malloc: *** error for object 0x36: pointer being freed was not allocated
git(73697,0x1198f1e00) malloc: *** set a breakpoint in malloc_error_break to debug
./test-lib.sh: line 947: 73697 Abort trap: 6           git -c core.bigfilethreshold=4 fsck

> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
> @@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
> goto out;
> }
> if (check_object_signature(the_repository, expected_oid,
> - *contents, *size, oi->type_name->buf)) {
> - error(_("hash mismatch for %s (expected %s)"), path,
> - oid_to_hex(expected_oid));
> + *contents, *size, oi->type_name->buf, real_oid)) {
> free(*contents);
> goto out;
> }
...

^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type
  2021-11-11  3:05                       ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
@ 2021-11-11  5:18                         ` Ævar Arnfjörð Bjarmason
  2021-11-11  5:18                           ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
  2021-11-11  5:18                           ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11  5:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

On Thu, Nov 11 2021, Han Xin wrote:
> [...]
> Deleting "*contents = NULL;" here will cause a memory free error.
> When reading a large loose blob ( large than big_file_threshold), it will enter the following block and *content will not be set:
>
>       if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
>               if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
>                       goto out;
>       } else {
>               ...
>       }
>
>
> This test case can illustrate this problem:
>
> test_expect_success 'fsck large loose blob' '
>       blob=$(echo large | git hash-object -w --stdin) &&
>       git -c core.bigfilethreshold=4 fsck
> '
>
> git(73697,0x1198f1e00) malloc: *** error for object 0x36: pointer being freed was not allocated
> git(73697,0x1198f1e00) malloc: *** set a breakpoint in malloc_error_break to debug
> ./test-lib.sh: line 947: 73697 Abort trap: 6           git -c core.bigfilethreshold=4 fsck

Thanks a lot for the detailed report and test case. It looks like I've
got the dubious honor of most scary caught-by-rc bug so far.

This series:

Ævar Arnfjörð Bjarmason (2):
  object-file: fix SEGV on free() regression in v2.34.0-rc2

This is the most minimal fix for this issue. So Junio, if you'd like
to just pick this up for v2.34.0 you can peel just 1/2 off...

  object-file: free(*contents) only in read_loose_object() caller

... a fix for a related issue. In ab/fsck-unexpected-type we stopped
die()-ing in the object-name.c, so per SANITIZE=leak's accounting we
introduced a memory leak with the same variable we dealt with in 1/2.

But IMO more importantly by changing this code so that only one
function owns the free()-ing it's much easier to reason about this
code.

 builtin/fsck.c   | 3 ++-
 object-file.c    | 5 ++---
 t/t1050-large.sh | 8 ++++++++
 3 files changed, 12 insertions(+), 4 deletions(-)

-- 
2.34.0.rc2.795.g926201d1cc8


^ permalink raw reply	[flat|nested] 245+ messages in thread

* [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
  2021-11-11  5:18                         ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
@ 2021-11-11  5:18                           ` Ævar Arnfjörð Bjarmason
  2021-11-11 15:18                             ` Jeff King
  2021-11-11 18:41                             ` Junio C Hamano
  2021-11-11  5:18                           ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11  5:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
object type-path combinations, 2021-10-01). When fsck-ing blobs larger
than core.bigFileThreshold we'd free() a pointer to uninitialized
memory.

This issue would have been caught by SANITIZE=address, but since it
involves core.bigFileThreshold none of the existing tests in our test
suite covered it.

Running them with the "big_file_threshold" in "environment.c" changed
to say "6" would have shown this failure, but let's add a dedicated
test for this scenario based on Han Xin's report[1].

It would be a good follow-up change to add a GIT_TEST_* mode to run
all the tests with a low core.bigFileThreshold threshold.

Currently a lot of them fail (but none due to SANITIZE=address)
because they make implicit assumptions about the current hardcoded
setting of core.bigFileThreshold.

Around half the failures are due to us assuming that files larger than
that are binary, see 6bf3b813486 (diff --stat: mark any file larger
than core.bigfilethreshold binary, 2014-08-16) and the comment added
in 12426e114b2 (diff: do not short-cut CHECK_SIZE_ONLY check in
diff_populate_filespec(), 2017-03-01). The rest seem to all be
pack/loose-related, i.e. they're assuming that something ends up as a
loose object or in a pack.

The bug was introduced between v9 and v10[2] of the fsck series merged
in 061a21d36d8 (Merge branch 'ab/fsck-unexpected-type', 2021-10-25).

1. https://lore.kernel.org/git/20211111030302.75694-1-hanxin.hx@alibaba-inc.com/
2. https://lore.kernel.org/git/cover-v10-00.17-00000000000-20211001T091051Z-avarab@gmail.com/

Reported-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c    | 2 ++
 t/t1050-large.sh | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/object-file.c b/object-file.c
index 02b79702748..ac476653a06 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
+	*contents = NULL;
+
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 4bab6a513c5..6bc1d76fb10 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -17,6 +17,14 @@ test_expect_success setup '
 	export GIT_ALLOC_LIMIT
 '
 
+test_expect_success 'enter "large" codepath, with small core.bigFileThreshold' '
+	test_when_finished "rm -rf repo" &&
+
+	git init --bare repo &&
+	echo large | git -C repo hash-object -w --stdin &&
+	git -C repo -c core.bigfilethreshold=4 fsck
+'
+
 # add a large file with different settings
 while read expect config
 do
-- 
2.34.0.rc2.795.g926201d1cc8


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller
  2021-11-11  5:18                         ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
  2021-11-11  5:18                           ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
@ 2021-11-11  5:18                           ` Ævar Arnfjörð Bjarmason
  2021-11-11 18:54                             ` Junio C Hamano
  1 sibling, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11  5:18 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak,
	Taylor Blau, Ævar Arnfjörð Bjarmason

In the preceding commit a free() of uninitialized memory regression in
96e41f58fe1 (fsck: report invalid object type-path combinations,
2021-10-01) was fixed, but we'd still have an issue with leaking
memory from fsck_loose(). Let's fix that issue too.

That issue was introduced in my 31deb28f5e0 (fsck: don't hard die on
invalid object types, 2021-10-01). It can be reproduced under
SANITIZE=leak with the test I added in 093fffdfbec (fsck tests: add
test for fsck-ing an unknown type, 2021-10-01):

    ./t1450-fsck.sh --run=84 -vixd

In some sense it's not a problem, we lost the same amount of memory in
terms of things malloc'd and not free'd. It just moved from the "still
reachable" to "definitely lost" column in valgrind(1) nomenclature[1],
since we'd have die()'d before.

But now that we don't hard die() anymore in the library let's properly
free() it. Doing so makes this code much easier to follow, since we'll
now have one function owning the freeing of the "contents" variable,
not two.

For context on that memory management pattern the read_loose_object()
function was added in f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) and subsequently used in c68b489e564 (fsck:
parse loose object paths directly, 2017-01-13). The pattern of it
being the task of both sides to free() the memory has been there in
this form since its inception.

1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c | 3 ++-
 object-file.c  | 7 ++-----
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d87c28a1cc4..27b9e78094d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -605,7 +605,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct object *obj;
 	enum object_type type = OBJ_NONE;
 	unsigned long size;
-	void *contents;
+	void *contents = NULL;
 	int eaten;
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct object_id real_oid = *null_oid();
@@ -630,6 +630,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 			    path);
 	if (err < 0) {
 		errors_found |= ERROR_OBJECT;
+		free(contents);
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index ac476653a06..c3d866a287e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2559,10 +2557,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf, real_oid)) {
-			free(*contents);
+					   *contents, *size,
+					   oi->type_name->buf, real_oid))
 			goto out;
-		}
 	}
 
 	ret = 0; /* everything checks out */
-- 
2.34.0.rc2.795.g926201d1cc8


^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
  2021-11-11  5:18                           ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
@ 2021-11-11 15:18                             ` Jeff King
  2021-11-11 18:41                             ` Junio C Hamano
  1 sibling, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-11-11 15:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Han Xin, Jonathan Tan, Andrei Rybak, Taylor Blau

On Thu, Nov 11, 2021 at 06:18:55AM +0100, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/object-file.c b/object-file.c
> index 02b79702748..ac476653a06 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
>  	char hdr[MAX_HEADER_LEN];
>  	unsigned long *size = oi->sizep;
>  
> +	*contents = NULL;
> +
>  	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
>  	if (!map) {
>  		error_errno(_("unable to mmap %s"), path);

OK, I agree this fixes the segfault, and is the minimal fix.

I do find the fact that fsck_loose() looks at "contents" after
read_loose_object() returns an error to be a bit questionable. It's a
recipe for confusion about what has happened, and who is supposed to
free what.  Your v2 addresses the leak, but by just shifting more burden
to the caller. There's only one caller, so it's not too bad, but for a
public function, read_loose_object() has a lot of sharp edges.

Plus I think it fails to work as intended for streaming blobs (we do not
fill in "contents" at all in that case, so we can never say "hash-path
mismatch").

I understand you're trying to catch the case of "we actually opened the
file and computed the sha1 of its contents" from cases where we didn't
get that far. But since you initialize real_oid, it seems like it would
be better to see if anything was written to that.

I.e., something like:

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d87c28a1cc..8f156ed9cd 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -617,18 +617,20 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	oi.typep = &type;
 
 	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
-		if (contents && !oideq(&real_oid, oid))
+		if (!is_null_oid(&real_oid) && !oideq(&real_oid, oid))
 			err = error(_("%s: hash-path mismatch, found at: %s"),
 				    oid_to_hex(&real_oid), path);
 		else
 			err = error(_("%s: object corrupt or missing: %s"),
 				    oid_to_hex(oid), path);
+		errors_found |= ERROR_OBJECT;
+		return 0; /* keep checking other objects */
 	}
-	if (type != OBJ_NONE && type < 0)
+	if (type != OBJ_NONE && type < 0) {
 		err = error(_("%s: object is of unknown type '%s': %s"),
 			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
 			    path);
-	if (err < 0) {
+		free(contents);
 		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}

(the "err" variable is now superfluous, but I left it in to keep the
diff smaller). And then it would be safe to just set "contents" in
read_loose_object() when we need it:

diff --git a/object-file.c b/object-file.c
index ac476653a0..5e8ff94fd4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2549,6 +2547,7 @@ int read_loose_object(const char *path,
 	}
 
 	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
+		*contents = NULL;
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
 			goto out;
 	} else {

That doesn't fix the hash-path mismatch problem for streaming, but it
sets us up to do so, if check_stream_oid() returned the real_oid it
computed.

All of this is much too large for an -rc fix, so we should take your
patch as-is. These are just thoughts I had while trying to figure out
if there were other problems caused by that same commit.

-Peff

^ permalink raw reply related	[flat|nested] 245+ messages in thread

* Re: [PATCH v2] receive-pack: not receive pack file with large object
  2021-11-11  3:03                       ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
@ 2021-11-11 18:35                         ` Junio C Hamano
  0 siblings, 0 replies; 245+ messages in thread
From: Junio C Hamano @ 2021-11-11 18:35 UTC (permalink / raw)
  To: Han Xin; +Cc: avarab, git, jonathantanmy, me, peff, rybak.a.v

Han Xin <chiyutianyi@gmail.com> writes:

>> @@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
>> char hdr[MAX_HEADER_LEN];
>> unsigned long *size = oi->sizep;
>>
>> - *contents = NULL;
>> -
>
> Deleting "*contents = NULL;" here will cause a memory free error.

Good find.  I see in the discussion downthread we have a band-aid
for this issue already ;-)

Thanks.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
  2021-11-11  5:18                           ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
  2021-11-11 15:18                             ` Jeff King
@ 2021-11-11 18:41                             ` Junio C Hamano
  2021-11-13  9:00                               ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 245+ messages in thread
From: Junio C Hamano @ 2021-11-11 18:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
> object type-path combinations, 2021-10-01). When fsck-ing blobs larger
> than core.bigFileThreshold we'd free() a pointer to uninitialized
> memory.

s/d we'd/d, we'd/; no need to resend.

> This issue would have been caught by SANITIZE=address, but since it
> involves core.bigFileThreshold none of the existing tests in our test
> suite covered it.

s/d none/d, none/; likewise.

> Running them with the "big_file_threshold" in "environment.c" changed
> to say "6" would have shown this failure, but let's add a dedicated
> test for this scenario based on Han Xin's report[1].

Yeah, it is a good and focused test.  

By the way, I do not think changing big_file_threshold _blindly_ to
smaller values, instead of in a focused test like this, is a good
idea in general.  Some tests check if a file with a normal size that
is smaller than the threshold correctly is treated as a binary file,
and lowering threshold for them without understanding what they are
meant to test would trigger a "bug" that is not a bug at all, for
example.

> It would be a good follow-up change to add a GIT_TEST_* mode to run
> all the tests with a low core.bigFileThreshold threshold.

So, no, please don't do that.

>  object-file.c    | 2 ++
>  t/t1050-large.sh | 8 ++++++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/object-file.c b/object-file.c
> index 02b79702748..ac476653a06 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
>  	char hdr[MAX_HEADER_LEN];
>  	unsigned long *size = oi->sizep;
>  
> +	*contents = NULL;
> +
>  	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
>  	if (!map) {
>  		error_errno(_("unable to mmap %s"), path);
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 4bab6a513c5..6bc1d76fb10 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -17,6 +17,14 @@ test_expect_success setup '
>  	export GIT_ALLOC_LIMIT
>  '
>  
> +test_expect_success 'enter "large" codepath, with small core.bigFileThreshold' '
> +	test_when_finished "rm -rf repo" &&
> +
> +	git init --bare repo &&
> +	echo large | git -C repo hash-object -w --stdin &&
> +	git -C repo -c core.bigfilethreshold=4 fsck
> +'
> +
>  # add a large file with different settings
>  while read expect config
>  do

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller
  2021-11-11  5:18                           ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
@ 2021-11-11 18:54                             ` Junio C Hamano
  0 siblings, 0 replies; 245+ messages in thread
From: Junio C Hamano @ 2021-11-11 18:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index d87c28a1cc4..27b9e78094d 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -605,7 +605,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	struct object *obj;
>  	enum object_type type = OBJ_NONE;
>  	unsigned long size;
> -	void *contents;
> +	void *contents = NULL;
>  	int eaten;
>  	struct object_info oi = OBJECT_INFO_INIT;
>  	struct object_id real_oid = *null_oid();
> @@ -630,6 +630,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  			    path);
>  	if (err < 0) {
>  		errors_found |= ERROR_OBJECT;
> +		free(contents);
>  		return 0; /* keep checking other objects */
>  	}
>  
> diff --git a/object-file.c b/object-file.c
> index ac476653a06..c3d866a287e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
>  	char hdr[MAX_HEADER_LEN];
>  	unsigned long *size = oi->sizep;
>  
> -	*contents = NULL;
> -
>  	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
>  	if (!map) {
>  		error_errno(_("unable to mmap %s"), path);
> @@ -2559,10 +2557,9 @@ int read_loose_object(const char *path,
>  			goto out;
>  		}
>  		if (check_object_signature(the_repository, expected_oid,
> -					   *contents, *size, oi->type_name->buf, real_oid)) {
> -			free(*contents);
> +					   *contents, *size,
> +					   oi->type_name->buf, real_oid))
>  			goto out;
> -		}
>  	}

Yeah, I have to say that read_loose_object() that frees *contents
without clearing *contents to NULL only because it wants to signal
if the failure comes from check_object_signature() step is quite
ugly.  Making the caller responsible for freeing (in other words,
when caller's *contents is non-NULL after function returns, it
always has a valid piece of memory to be freed) makes the contract
easier to explain.

^ permalink raw reply	[flat|nested] 245+ messages in thread

* Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
  2021-11-11 18:41                             ` Junio C Hamano
@ 2021-11-13  9:00                               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-13  9:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau


On Thu, Nov 11 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
>> object type-path combinations, 2021-10-01). When fsck-ing blobs larger
>> than core.bigFileThreshold we'd free() a pointer to uninitialized
>> memory.
>
> s/d we'd/d, we'd/; no need to resend.
>
>> This issue would have been caught by SANITIZE=address, but since it
>> involves core.bigFileThreshold none of the existing tests in our test
>> suite covered it.
>
> s/d none/d, none/; likewise.
>
>> Running them with the "big_file_threshold" in "environment.c" changed
>> to say "6" would have shown this failure, but let's add a dedicated
>> test for this scenario based on Han Xin's report[1].
>
> Yeah, it is a good and focused test.  
>
> By the way, I do not think changing big_file_threshold _blindly_ to
> smaller values, instead of in a focused test like this, is a good
> idea in general.  Some tests check if a file with a normal size that
> is smaller than the threshold correctly is treated as a binary file,
> and lowering threshold for them without understanding what they are
> meant to test would trigger a "bug" that is not a bug at all, for
> example.
>
>> It would be a good follow-up change to add a GIT_TEST_* mode to run
>> all the tests with a low core.bigFileThreshold threshold.
>
> So, no, please don't do that.

Yes it's probably not worth it, and I've got enough dragons to slay as
it is.

I took the commentary you added in 12426e114b2 (diff: do not short-cut
CHECK_SIZE_ONLY check in diff_populate_filespec(), 2017-03-01) as a
suggestion that we might be conflating too many things in
core.bigFileThreshold, but maybe that's just projecting.

I think that setting is probably too much of a kitchen sink grab bag of
stuff for its own good. Any such GIT_TEST_* mode would I think need to
introduce another setting to not have it imply "these files are binary".

Which may be a good idea in general, and it might not. I.e. are there
users who mainly don't want to consider these for packing, but do want
to have "git diff" work on them?

Anyway, even if that were split up we'd still have the remaining tests
that are assuming that something ends up loose or in a pack.

Fixing those is probably a good idea either way, so poking at this might
be a useful canary for someone. I haven't looked in any detail, but a
part of them are probably checking things manually on .git/objects and
could move to "git rev-parse" or whatever.

The other half likely really do care about whether something ends up
loose or not, and would probably benefit from testing "both sides".

None of that's anything I'll pursue now, just idle thoughts from having
looked at these failures a bit, in case anyone's interested.

^ permalink raw reply	[flat|nested] 245+ messages in thread

end of thread, other threads:[~2021-11-13  9:09 UTC | newest]

Thread overview: 245+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
2021-03-28  2:32   ` Eric Sunshine
2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
2021-03-29  5:37   ` Bagas Sanjaya
2021-03-28  2:26 ` [PATCH 3/4] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
2021-03-28  2:58     ` [PATCH 2/5] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-03-28  2:58     ` [PATCH 3/5] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
2021-03-28  8:56       ` Johannes Sixt
2021-03-28  2:58     ` [PATCH 5/5] fsck: improve error on loose object hash mismatch Ævar Arnfjörð Bjarmason
2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
2021-04-13  9:43       ` [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-04-23 14:26         ` Jeff King
2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
2021-04-23 14:27         ` Jeff King
2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
2021-04-23 14:37         ` Jeff King
2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
2021-04-26 15:45             ` Jeff King
2021-04-13  9:43       ` [PATCH v2 6/6] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-05-20 11:22       ` [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-05-20 11:22       ` [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
2021-05-27 21:17         ` Jonathan Nieder
2021-05-28  3:10           ` Ævar Arnfjörð Bjarmason
2021-05-20 11:22       ` [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 06/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 08/17] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-05-27 17:50         ` Jonathan Tan
2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-05-27 17:54         ` Jonathan Tan
2021-05-20 11:23       ` [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-05-27 18:18         ` Jonathan Tan
2021-05-20 11:23       ` [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
2021-05-27 18:24         ` Jonathan Tan
2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-05-27 18:28         ` Jonathan Tan
2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
2021-05-28  0:18         ` Junio C Hamano
2021-05-28  5:41           ` Felipe Contreras
2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-06-24 22:00           ` Andrei Rybak
2021-06-24 22:34             ` Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
2021-06-24 19:23         ` [PATCH v4 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
2021-07-10 13:37           ` [PATCH v5 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-09-16 19:40               ` Taylor Blau
2021-09-17  9:27                 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-16 19:51               ` Taylor Blau
2021-09-17  9:39                 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
2021-09-16 19:57               ` Taylor Blau
2021-09-16 20:01                 ` Taylor Blau
2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:57             ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
2021-09-16 20:40               ` Taylor Blau
2021-09-17 11:59                 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-09-16 21:29               ` Taylor Blau
2021-09-16 21:56                 ` Jeff King
2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
2021-09-16 21:33               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-09-16 21:39               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-09-16 21:58               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
2021-09-17  2:32               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
2021-09-17  2:35               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-17  2:45               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-09-17  3:37               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
2021-09-17  3:57               ` Taylor Blau
2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-09-17  4:06               ` Taylor Blau
2021-09-17  4:08             ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
2021-09-20 19:04             ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
2021-09-21  3:30                 ` Taylor Blau
2021-09-20 19:04               ` [PATCH v7 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-09-20 19:04               ` [PATCH v7 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-09-28  2:18               ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-09-28  2:18                 ` [PATCH v8 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-09-29 19:50                 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
2021-09-30 13:37                 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-30 19:22                     ` Andrei Rybak
2021-10-01  9:05                       ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-09-30 13:37                   ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-09-30 21:01                     ` Junio C Hamano
2021-09-30 19:06                   ` [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
2021-10-01  9:16                   ` [PATCH v10 " Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-10-01  9:16                     ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-10-01 22:14                       ` Junio C Hamano
2021-10-01 22:33                         ` Ævar Arnfjörð Bjarmason
2021-11-11  3:03                       ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
2021-11-11 18:35                         ` Junio C Hamano
2021-11-11  3:05                       ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
2021-11-11  5:18                         ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
2021-11-11  5:18                           ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
2021-11-11 15:18                             ` Jeff King
2021-11-11 18:41                             ` Junio C Hamano
2021-11-13  9:00                               ` Ævar Arnfjörð Bjarmason
2021-11-11  5:18                           ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
2021-11-11 18:54                             ` Junio C Hamano
2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
2021-03-28  7:17     ` Jeff King
2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
2021-03-31 11:06         ` Jeff King
2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
2021-04-15 10:09     ` Jeff King
2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
2021-04-15 10:00     ` Jeff King
2021-04-13  9:08   ` [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
2021-04-15 10:10   ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.