git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] tree-walk improvements
@ 2016-09-26 19:32 David Turner
  2016-09-26 19:32 ` [PATCH 1/2] tree-walk: be more specific about corrupt tree errors David Turner
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: David Turner @ 2016-09-26 19:32 UTC (permalink / raw)
  To: git, peff, mhagger; +Cc: David Turner

The first patch is a re-roll of Peff's patch from 2014 -- here's
the archive message:

http://git.661346.n2.nabble.com/PATCH-tree-walk-be-more-specific-about-corrupt-tree-errors-td7603558.html

Not sure why this wasn't applied then, but I thought it looked pretty
good, so I added a few tests.

Hopefully the encoding works correctly on these patches. If not, you
can fetch from
https://github.com/novalis/git/
on branch dturner/bad-trees

Email address note 1: my employer wants me to use my company address,
but not my company computer, for patches I write on work time.  This
means that I'm going to continue corresponding from
novalis@novalis.org, but will send patches with the @twosigma.com
address in the author line.

Email address note 2: I'm not subscribed to the mailing list these
days, so please CC me (at novalis@novalis.org) on replies.

David Turner (1):
  fsck: handle bad trees like other errors

Jeff King (1):
  tree-walk: be more specific about corrupt tree errors

 fsck.c                                             |  18 +++--
 t/t1007-hash-object.sh                             |  15 +++-
 t/t1007/tree-with-empty-filename                   | Bin 0 -> 28 bytes
 t/t1007/tree-with-malformed-mode                   | Bin 0 -> 39 bytes
 t/t1450-fsck.sh                                    |  17 ++++-
 .../307e300745b82417cc1a903f875c7d22e45ef907       |   4 +
 .../f506a346749bb96f52d8605ffba9fb93d46b5ffd       | Bin 0 -> 45 bytes
 tree-walk.c                                        |  83 ++++++++++++++++++---
 tree-walk.h                                        |   8 ++
 9 files changed, 125 insertions(+), 20 deletions(-)
 create mode 100644 t/t1007/tree-with-empty-filename
 create mode 100644 t/t1007/tree-with-malformed-mode
 create mode 100644 t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907
 create mode 100644 t/t1450/bad-objects/f506a346749bb96f52d8605ffba9fb93d46b5ffd

-- 
2.8.0.rc4.22.g8ae061a


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] tree-walk: be more specific about corrupt tree errors
  2016-09-26 19:32 [PATCH 0/2] tree-walk improvements David Turner
@ 2016-09-26 19:32 ` David Turner
  2016-09-27  5:14   ` Jeff King
  2016-09-26 19:32 ` [PATCH 2/2] fsck: handle bad trees like other errors David Turner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: David Turner @ 2016-09-26 19:32 UTC (permalink / raw)
  To: git, peff, mhagger; +Cc: David Turner

From: Jeff King <peff@peff.net>

When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.

Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1007-hash-object.sh           |  15 +++++++++++++--
 t/t1007/tree-with-empty-filename | Bin 0 -> 28 bytes
 t/t1007/tree-with-malformed-mode | Bin 0 -> 39 bytes
 tree-walk.c                      |  12 +++++++-----
 4 files changed, 20 insertions(+), 7 deletions(-)
 create mode 100644 t/t1007/tree-with-empty-filename
 create mode 100644 t/t1007/tree-with-malformed-mode

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index acca9ac..cd10c73 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -183,9 +183,20 @@ for args in "-w --stdin-paths" "--stdin-paths -w"; do
 	pop_repo
 done
 
-test_expect_success 'corrupt tree' '
+test_expect_success 'too-short tree' '
 	echo abc >malformed-tree &&
-	test_must_fail git hash-object -t tree malformed-tree
+	test_must_fail git hash-object -t tree malformed-tree 2>err &&
+	grep "too-short tree object" err
+'
+
+test_expect_success 'malformed mode in tree' '
+	test_must_fail git hash-object -t tree ../t1007/tree-with-malformed-mode 2>err &&
+	grep "malformed mode in tree entry for tree" err
+'
+
+test_expect_success 'empty filename in tree' '
+	test_must_fail git hash-object -t tree ../t1007/tree-with-empty-filename 2>err &&
+	grep "empty filename in tree entry for tree" err
 '
 
 test_expect_success 'corrupt commit' '
diff --git a/t/t1007/tree-with-empty-filename b/t/t1007/tree-with-empty-filename
new file mode 100644
index 0000000000000000000000000000000000000000..aeb1ceb20e485eebd0acbb81c974d1c6fedcc1fe
GIT binary patch
literal 28
kcmXpsFfcPQQDAsB_tET47q2;ccWbUIkGgT_Nl)-Z0Hx{;SO5S3

literal 0
HcmV?d00001

diff --git a/t/t1007/tree-with-malformed-mode b/t/t1007/tree-with-malformed-mode
new file mode 100644
index 0000000000000000000000000000000000000000..24aa84d60ef8e269fb0b29c67b5208639b9da3ae
GIT binary patch
literal 39
vcmYewPcJRb%}+^HNXyJg%}dNpWq3CC(d<nZuQ_{nYpyGgx^d`9Pw+$lU*Quk

literal 0
HcmV?d00001

diff --git a/tree-walk.c b/tree-walk.c
index ce27842..ba544cf 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -27,12 +27,14 @@ static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned
 	const char *path;
 	unsigned int mode, len;
 
-	if (size < 24 || buf[size - 21])
-		die("corrupt tree file");
+	if (size < 23 || buf[size - 21])
+		die("too-short tree object");
 
 	path = get_mode(buf, &mode);
-	if (!path || !*path)
-		die("corrupt tree file");
+	if (!path)
+		die("malformed mode in tree entry for tree");
+	if (!*path)
+		die("empty filename in tree entry for tree");
 	len = strlen(path) + 1;
 
 	/* Initialize the descriptor entry */
@@ -81,7 +83,7 @@ void update_tree_entry(struct tree_desc *desc)
 	unsigned long len = end - (const unsigned char *)buf;
 
 	if (size < len)
-		die("corrupt tree file");
+		die("too-short tree file");
 	buf = end;
 	size -= len;
 	desc->buffer = buf;
-- 
2.8.0.rc4.22.g8ae061a


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-26 19:32 [PATCH 0/2] tree-walk improvements David Turner
  2016-09-26 19:32 ` [PATCH 1/2] tree-walk: be more specific about corrupt tree errors David Turner
@ 2016-09-26 19:32 ` David Turner
  2016-09-26 19:51   ` Junio C Hamano
                     ` (2 more replies)
  2016-09-26 19:39 ` [PATCH 0/2] tree-walk improvements Stefan Beller
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: David Turner @ 2016-09-26 19:32 UTC (permalink / raw)
  To: git, peff, mhagger; +Cc: David Turner

From: David Turner <dturner@twosigma.com>

Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue.  Now fsck can tell the user which tree is
bad, too.

Signed-off-by: David Turner <dturner@twosigma.com>
---
 fsck.c                                             |  18 +++--
 t/t1450-fsck.sh                                    |  17 ++++-
 .../307e300745b82417cc1a903f875c7d22e45ef907       |   4 +
 .../f506a346749bb96f52d8605ffba9fb93d46b5ffd       | Bin 0 -> 45 bytes
 tree-walk.c                                        |  83 ++++++++++++++++++---
 tree-walk.h                                        |   8 ++
 6 files changed, 111 insertions(+), 19 deletions(-)
 create mode 100644 t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907
 create mode 100644 t/t1450/bad-objects/f506a346749bb96f52d8605ffba9fb93d46b5ffd

diff --git a/fsck.c b/fsck.c
index c9cf3de..4a3069e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -347,8 +347,9 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		return -1;
 
 	name = get_object_name(options, &tree->object);
-	init_tree_desc(&desc, tree->buffer, tree->size);
-	while (tree_entry(&desc, &entry)) {
+	if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
+		return -1;
+	while (tree_entry_gently(&desc, &entry)) {
 		struct object *obj;
 		int result;
 
@@ -520,7 +521,7 @@ static int verify_ordered(unsigned mode1, const char *name1, unsigned mode2, con
 
 static int fsck_tree(struct tree *item, struct fsck_options *options)
 {
-	int retval;
+	int retval = 0;
 	int has_null_sha1 = 0;
 	int has_full_path = 0;
 	int has_empty_name = 0;
@@ -535,7 +536,10 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
 	unsigned o_mode;
 	const char *o_name;
 
-	init_tree_desc(&desc, item->buffer, item->size);
+	if (init_tree_desc_gently(&desc, item->buffer, item->size)) {
+		retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		return retval;
+	}
 
 	o_mode = 0;
 	o_name = NULL;
@@ -556,7 +560,10 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
 			       is_hfs_dotgit(name) ||
 			       is_ntfs_dotgit(name));
 		has_zero_pad |= *(char *)desc.buffer == '0';
-		update_tree_entry(&desc);
+		if (update_tree_entry_gently(&desc)) {
+			retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			break;
+		}
 
 		switch (mode) {
 		/*
@@ -597,7 +604,6 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
 		o_name = name;
 	}
 
-	retval = 0;
 	if (has_null_sha1)
 		retval += report(options, &item->object, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8f52da2..f456963 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -188,8 +188,7 @@ test_expect_success 'commit with NUL in header' '
 	grep "error in commit $new.*unterminated header: NUL at offset" out
 '
 
-test_expect_success 'malformatted tree object' '
-	test_when_finished "git update-ref -d refs/tags/wrong" &&
+test_expect_success 'tree object with duplicate entries' '
 	test_when_finished "remove_object \$T" &&
 	T=$(
 		GIT_INDEX_FILE=test-index &&
@@ -208,6 +207,20 @@ test_expect_success 'malformatted tree object' '
 	grep "error in tree .*contains duplicate file entries" out
 '
 
+test_expect_success 'unparseable tree object' '
+	test_when_finished "git update-ref -d refs/heads/wrong" &&
+	test_when_finished "remove_object 307e300745b82417cc1a903f875c7d22e45ef907" &&
+	test_when_finished "remove_object f506a346749bb96f52d8605ffba9fb93d46b5ffd" &&
+	mkdir -p .git/objects/30 mkdir -p .git/objects/f5 &&
+	cp ../t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907 .git/objects/30/7e300745b82417cc1a903f875c7d22e45ef907 &&
+	cp ../t1450/bad-objects/f506a346749bb96f52d8605ffba9fb93d46b5ffd .git/objects/f5/06a346749bb96f52d8605ffba9fb93d46b5ffd &&
+	git update-ref refs/heads/wrong 307e300745b82417cc1a903f875c7d22e45ef907 &&
+	test_must_fail git fsck 2>out &&
+	grep "warning: empty filename in tree entry" out &&
+	grep "f506a346749bb96f52d8605ffba9fb93d46b5ffd" out &&
+	! grep "fatal: empty filename in tree entry" out
+'
+
 test_expect_success 'tag pointing to nonexistent' '
 	cat >invalid-tag <<-\EOF &&
 	object ffffffffffffffffffffffffffffffffffffffff
diff --git a/t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907 b/t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907
new file mode 100644
index 0000000..6e23d62
--- /dev/null
+++ b/t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907
@@ -0,0 +1,4 @@
+x\x01ŽA\x0e \x10E]sŠ¹€f°@Ä\x18\x17\x1eÁ\v0\x05Z\x12[\x12
+õú¢é	\ýüÅ{ÿ\x0fižc\x01IòP²÷\x104\x1aÛ)Ó+b&\x13ôÙ]\fê\x10ØR`êœ2Üš\x13¶–)exØ-:xÖ¼ø\f×%mö\x15×ûž§”Ç^[HÕd\x12{-áˆ
+Q\f¿ÍÒ€\x7fè\x1d‡w,\x13p\x1aë
+ßçâ\x03&ë?Þ
\ No newline at end of file
diff --git a/t/t1450/bad-objects/f506a346749bb96f52d8605ffba9fb93d46b5ffd b/t/t1450/bad-objects/f506a346749bb96f52d8605ffba9fb93d46b5ffd
new file mode 100644
index 0000000000000000000000000000000000000000..9111a7fc3c8578906e13c930a0fbd3cae047762e
GIT binary patch
literal 45
zcmb=Jqpj)X8)~pA!NA18z}PS_p~CF@#W%j<>n*Fxv)5_&?<#!Z>Hoon;loq@NdS%f
B6F2|>

literal 0
HcmV?d00001

diff --git a/tree-walk.c b/tree-walk.c
index ba544cf..0fb830b 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -22,33 +22,60 @@ static const char *get_mode(const char *str, unsigned int *modep)
 	return str;
 }
 
-static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size)
+static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err)
 {
 	const char *path;
 	unsigned int mode, len;
 
-	if (size < 23 || buf[size - 21])
-		die("too-short tree object");
+	if (size < 23 || buf[size - 21]) {
+		strbuf_addstr(err, "too-short tree object");
+		return -1;
+	}
 
 	path = get_mode(buf, &mode);
-	if (!path)
-		die("malformed mode in tree entry for tree");
-	if (!*path)
-		die("empty filename in tree entry for tree");
+	if (!path) {
+		strbuf_addstr(err, "malformed mode in tree entry");
+		return -1;
+	}
+	if (!*path) {
+		strbuf_addstr(err, "empty filename in tree entry");
+		return -1;
+	}
 	len = strlen(path) + 1;
 
 	/* Initialize the descriptor entry */
 	desc->entry.path = path;
 	desc->entry.mode = canon_mode(mode);
 	desc->entry.oid  = (const struct object_id *)(path + len);
+
+	return 0;
 }
 
-void init_tree_desc(struct tree_desc *desc, const void *buffer, unsigned long size)
+static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer, unsigned long size, struct strbuf *err)
 {
 	desc->buffer = buffer;
 	desc->size = size;
 	if (size)
-		decode_tree_entry(desc, buffer, size);
+		return decode_tree_entry(desc, buffer, size, err);
+	return 0;
+}
+
+void init_tree_desc(struct tree_desc *desc, const void *buffer, unsigned long size)
+{
+	struct strbuf err = STRBUF_INIT;
+	if (init_tree_desc_internal(desc, buffer, size, &err))
+		die("%s", err.buf);
+	strbuf_release(&err);
+}
+
+int init_tree_desc_gently(struct tree_desc *desc, const void *buffer, unsigned long size)
+{
+	struct strbuf err = STRBUF_INIT;
+	int result = init_tree_desc_internal(desc, buffer, size, &err);
+	if (result)
+		warning("%s", err.buf);
+	strbuf_release(&err);
+	return result;
 }
 
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
@@ -75,7 +102,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
 	*a = t->entry;
 }
 
-void update_tree_entry(struct tree_desc *desc)
+static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
 {
 	const void *buf = desc->buffer;
 	const unsigned char *end = desc->entry.oid->hash + 20;
@@ -89,7 +116,30 @@ void update_tree_entry(struct tree_desc *desc)
 	desc->buffer = buf;
 	desc->size = size;
 	if (size)
-		decode_tree_entry(desc, buf, size);
+		return decode_tree_entry(desc, buf, size, err);
+	return 0;
+}
+
+void update_tree_entry(struct tree_desc *desc)
+{
+	struct strbuf err = STRBUF_INIT;
+	if (update_tree_entry_internal(desc, &err))
+		die("%s", err.buf);
+	strbuf_release(&err);
+}
+
+int update_tree_entry_gently(struct tree_desc *desc)
+{
+	struct strbuf err = STRBUF_INIT;
+	if (update_tree_entry_internal(desc, &err)) {
+		warning("%s", err.buf);
+		strbuf_release(&err);
+		/* Stop processing this tree after error */
+		desc->size = 0;
+		return -1;
+	}
+	strbuf_release(&err);
+	return 0;
 }
 
 int tree_entry(struct tree_desc *desc, struct name_entry *entry)
@@ -102,6 +152,17 @@ int tree_entry(struct tree_desc *desc, struct name_entry *entry)
 	return 1;
 }
 
+int tree_entry_gently(struct tree_desc *desc, struct name_entry *entry)
+{
+	if (!desc->size)
+		return 0;
+
+	*entry = desc->entry;
+	if (update_tree_entry_gently(desc))
+		return 0;
+	return 1;
+}
+
 void setup_traverse_info(struct traverse_info *info, const char *base)
 {
 	int pathlen = strlen(base);
diff --git a/tree-walk.h b/tree-walk.h
index 97a7d69..68bb78b 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -25,14 +25,22 @@ static inline int tree_entry_len(const struct name_entry *ne)
 	return (const char *)ne->oid - ne->path - 1;
 }
 
+/*
+ * The _gently versions of these functions warn and return false on a
+ * corrupt tree entry rather than dying,
+ */
+
 void update_tree_entry(struct tree_desc *);
+int update_tree_entry_gently(struct tree_desc *);
 void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size);
+int init_tree_desc_gently(struct tree_desc *desc, const void *buf, unsigned long size);
 
 /*
  * Helper function that does both tree_entry_extract() and update_tree_entry()
  * and returns true for success
  */
 int tree_entry(struct tree_desc *, struct name_entry *);
+int tree_entry_gently(struct tree_desc *, struct name_entry *);
 
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
 
-- 
2.8.0.rc4.22.g8ae061a


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] tree-walk improvements
  2016-09-26 19:32 [PATCH 0/2] tree-walk improvements David Turner
  2016-09-26 19:32 ` [PATCH 1/2] tree-walk: be more specific about corrupt tree errors David Turner
  2016-09-26 19:32 ` [PATCH 2/2] fsck: handle bad trees like other errors David Turner
@ 2016-09-26 19:39 ` Stefan Beller
  2016-09-26 19:43 ` Junio C Hamano
  2016-09-26 21:04 ` Junio C Hamano
  4 siblings, 0 replies; 22+ messages in thread
From: Stefan Beller @ 2016-09-26 19:39 UTC (permalink / raw)
  To: David Turner; +Cc: git, Jeff King, Michael Haggerty

On Mon, Sep 26, 2016 at 12:32 PM, David Turner <novalis@novalis.org> wrote:
> The first patch is a re-roll of Peff's patch from 2014 -- here's
> the archive message:
>
> http://git.661346.n2.nabble.com/PATCH-tree-walk-be-more-specific-about-corrupt-tree-errors-td7603558.html
>
> Not sure why this wasn't applied then, but I thought it looked pretty
> good, so I added a few tests.
>
> Hopefully the encoding works correctly on these patches. If not, you
> can fetch from
> https://github.com/novalis/git/
> on branch dturner/bad-trees
>
> Email address note 1: my employer wants me to use my company address,
> but not my company computer, for patches I write on work time.  This
> means that I'm going to continue corresponding from
> novalis@novalis.org, but will send patches with the @twosigma.com
> address in the author line.

Mind sending a patch for .mailmap to reflect that different email
addresses are still the same person? ;)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] tree-walk improvements
  2016-09-26 19:32 [PATCH 0/2] tree-walk improvements David Turner
                   ` (2 preceding siblings ...)
  2016-09-26 19:39 ` [PATCH 0/2] tree-walk improvements Stefan Beller
@ 2016-09-26 19:43 ` Junio C Hamano
  2016-09-26 20:22   ` David Turner
  2016-09-26 21:04 ` Junio C Hamano
  4 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 19:43 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, mhagger

David Turner <novalis@novalis.org> writes:

> The first patch is a re-roll of Peff's patch from 2014 -- here's
> the archive message:
>
> http://git.661346.n2.nabble.com/PATCH-tree-walk-be-more-specific-about-corrupt-tree-errors-td7603558.html
>
> Not sure why this wasn't applied then, but I thought it looked pretty
> good, so I added a few tests.

Thanks.  Adding tests is very much appreciated.  I however wonder
why you needed to reword a perfectly readable "truncated" to
something else, though?

> Email address note 1: my employer wants me to use my company address,
> but not my company computer, for patches I write on work time.  This
> means that I'm going to continue corresponding from
> novalis@novalis.org, but will send patches with the @twosigma.com
> address in the author line.

That seems like not an uncommon practice ;-)

> Email address note 2: I'm not subscribed to the mailing list these
> days, so please CC me (at novalis@novalis.org) on replies.

It is good to tell others this, but I suspect that it is known by
those who are likely to respond to these messages that always CC'ing
to individual is the norm on this list ;-)

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-26 19:32 ` [PATCH 2/2] fsck: handle bad trees like other errors David Turner
@ 2016-09-26 19:51   ` Junio C Hamano
  2016-09-26 20:08   ` Junio C Hamano
  2016-09-27  5:27   ` Jeff King
  2 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 19:51 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, mhagger, David Turner

David Turner <novalis@novalis.org> writes:

> @@ -535,7 +536,10 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
>  	unsigned o_mode;
>  	const char *o_name;
>  
> -	init_tree_desc(&desc, item->buffer, item->size);
> +	if (init_tree_desc_gently(&desc, item->buffer, item->size)) {
> +		retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
> +		return retval;
> +	}

Good.  If BAD_TREE is being ignored, this may report a non-error,
but we won't descend into the unreadable tree so it is OK.

> @@ -556,7 +560,10 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
>  			       is_hfs_dotgit(name) ||
>  			       is_ntfs_dotgit(name));
>  		has_zero_pad |= *(char *)desc.buffer == '0';
> -		update_tree_entry(&desc);
> +		if (update_tree_entry_gently(&desc)) {
> +			retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
> +			break;
> +		}

Likewise; breaking out of the loop will stop us from reading further
into the corrupted tree data, so this is good.

> @@ -597,7 +604,6 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
>  		o_name = name;
>  	}
>  
> -	retval = 0;

Good code hygiene that you moved this to the very top where it is
defined, so anybody before this step can set it if it wants to.

Reading purely from the text of this function, it was surprising
that you can do without a gently variant of tree_entry_extract(),
but it merely reads into two variables and does not do any error
detection (which happens all in the caller), so it is not at all
surprising after all ;-)

I didn't see anything objectionable in this patch.  Thanks for
working on this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-26 19:32 ` [PATCH 2/2] fsck: handle bad trees like other errors David Turner
  2016-09-26 19:51   ` Junio C Hamano
@ 2016-09-26 20:08   ` Junio C Hamano
  2016-09-26 20:11     ` Junio C Hamano
  2016-09-27  5:27   ` Jeff King
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 20:08 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, mhagger, David Turner

David Turner <novalis@novalis.org> writes:

> From: David Turner <dturner@twosigma.com>
>
> Instead of dying when fsck hits a malformed tree object, log the error
> like any other and continue.  Now fsck can tell the user which tree is
> bad, too.
>
> Signed-off-by: David Turner <dturner@twosigma.com>
> ---
>  fsck.c                                             |  18 +++--
>  t/t1450-fsck.sh                                    |  17 ++++-
>  .../307e300745b82417cc1a903f875c7d22e45ef907       |   4 +

To prevent further headaches in this directory, can we have
.gitattributes that tells us that everything in there are binary
files?  Something like the attached.

The other object was transferred as a binary patch, but I have no
faith in what I applied from your e-mail message for this file that
went though latin-1 to utf-8 conversion X-<.

 t/t1450/bad-objects/.gitattributes | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t1450/bad-objects/.gitattributes b/t/t1450/bad-objects/.gitattributes
new file mode 100644
index 0000000..a173f27
--- /dev/null
+++ b/t/t1450/bad-objects/.gitattributes
@@ -0,0 +1 @@
+[0-9a-f]*[0-9a-f]	-diff



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-26 20:08   ` Junio C Hamano
@ 2016-09-26 20:11     ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 20:11 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, mhagger, David Turner

Junio C Hamano <gitster@pobox.com> writes:

> To prevent further headaches in this directory, can we have
> .gitattributes that tells us that everything in there are binary
> files?  Something like the attached.
>
> The other object was transferred as a binary patch, but I have no
> faith in what I applied from your e-mail message for this file that
> went though latin-1 to utf-8 conversion X-<.
>
>  t/t1450/bad-objects/.gitattributes | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/t/t1450/bad-objects/.gitattributes b/t/t1450/bad-objects/.gitattributes
> new file mode 100644
> index 0000000..a173f27
> --- /dev/null
> +++ b/t/t1450/bad-objects/.gitattributes
> @@ -0,0 +1 @@
> +[0-9a-f]*[0-9a-f]	-diff

I suspect that the t/t1007 directory in 1/2 may deserve a similar
treatment.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] tree-walk improvements
  2016-09-26 19:43 ` Junio C Hamano
@ 2016-09-26 20:22   ` David Turner
  2016-09-27  0:35     ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: David Turner @ 2016-09-26 20:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, mhagger

On Mon, 2016-09-26 at 12:43 -0700, Junio C Hamano wrote:
> David Turner <novalis@novalis.org> writes:
> 
> > The first patch is a re-roll of Peff's patch from 2014 -- here's
> > the archive message:
> >
> > http://git.661346.n2.nabble.com/PATCH-tree-walk-be-more-specific-about-corrupt-tree-errors-td7603558.html
> >
> > Not sure why this wasn't applied then, but I thought it looked pretty
> > good, so I added a few tests.
> 
> Thanks.  Adding tests is very much appreciated.  I however wonder
> why you needed to reword a perfectly readable "truncated" to
> something else, though?

Because truncated, to me, means "something that has been cut off". Here,
the recorded length is too short, so it's probably not the case that
something was cut off -- it was never right to begin with.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] tree-walk improvements
  2016-09-26 19:32 [PATCH 0/2] tree-walk improvements David Turner
                   ` (3 preceding siblings ...)
  2016-09-26 19:43 ` Junio C Hamano
@ 2016-09-26 21:04 ` Junio C Hamano
  4 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-26 21:04 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, mhagger

David Turner <novalis@novalis.org> writes:

> Hopefully the encoding works correctly on these patches. If not, you
> can fetch from
> https://github.com/novalis/git/
> on branch dturner/bad-trees

This does not test cleanly here, unfortunately.  Specifically, tests
30 and 31 t1007 do fine with 1/2 alone, but they seem to break with
2/2 applied.

I didn't dug further.  At least not yet.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] tree-walk improvements
  2016-09-26 20:22   ` David Turner
@ 2016-09-27  0:35     ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-27  0:35 UTC (permalink / raw)
  To: David Turner; +Cc: git, peff, mhagger

David Turner <novalis@novalis.org> writes:

> Because truncated, to me, means "something that has been cut off". Here,
> the recorded length is too short, so it's probably not the case that
> something was cut off -- it was never right to begin with.

That's perfectly sensible. Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] tree-walk: be more specific about corrupt tree errors
  2016-09-26 19:32 ` [PATCH 1/2] tree-walk: be more specific about corrupt tree errors David Turner
@ 2016-09-27  5:14   ` Jeff King
  2016-09-27  5:35     ` Junio C Hamano
  2016-09-27 15:21     ` David Turner
  0 siblings, 2 replies; 22+ messages in thread
From: Jeff King @ 2016-09-27  5:14 UTC (permalink / raw)
  To: David Turner; +Cc: git, mhagger, David Turner

On Mon, Sep 26, 2016 at 03:32:44PM -0400, David Turner wrote:

> From: Jeff King <peff@peff.net>
> 
> When the tree-walker runs into an error, it just calls
> die(), and the message is always "corrupt tree file".
> However, we are actually covering several cases here; let's
> give the user a hint about what happened.
> 
> Let's also avoid using the word "corrupt", which makes it
> seem like the data bit-rotted on disk. Our sha1 check would
> already have found that. These errors are ones of data that
> is malformed in the first place.
> 
> Signed-off-by: David Turner <dturner@twosigma.com>
> Signed-off-by: Jeff King <peff@peff.net>

Yay. This has been on my "to look at and repost" list for literally 2
years. Thanks for picking it up (see kids, procrastination _does_ pay
off).

>  t/t1007-hash-object.sh           |  15 +++++++++++++--
>  t/t1007/tree-with-empty-filename | Bin 0 -> 28 bytes
>  t/t1007/tree-with-malformed-mode | Bin 0 -> 39 bytes

Ooh, and tests. Exciting.

> -test_expect_success 'corrupt tree' '
> +test_expect_success 'too-short tree' '
>  	echo abc >malformed-tree &&
> -	test_must_fail git hash-object -t tree malformed-tree
> +	test_must_fail git hash-object -t tree malformed-tree 2>err &&
> +	grep "too-short tree object" err
> +'

Should this be test_i18ngrep? Even if the message is not translated now,
it seems like a good proactive measure (and probably it _should_ be
translated).

> +test_expect_success 'malformed mode in tree' '
> +	test_must_fail git hash-object -t tree ../t1007/tree-with-malformed-mode 2>err &&
> +	grep "malformed mode in tree entry for tree" err
> +'

This ".." will break when the test is run with "--root". You should use

  "$TEST_DIRECTORY"/t1007/...

instead. And ditto in the second test, of course.

> diff --git a/t/t1007/tree-with-empty-filename b/t/t1007/tree-with-empty-filename
> new file mode 100644
> index 0000000000000000000000000000000000000000..aeb1ceb20e485eebd0acbb81c974d1c6fedcc1fe
> GIT binary patch
> literal 28
> kcmXpsFfcPQQDAsB_tET47q2;ccWbUIkGgT_Nl)-Z0Hx{;SO5S3
> 
> literal 0
> HcmV?d00001

This is rather opaque, of course. :)

I wonder if it would be possible to generate the test vector with
something like:

  # any 20 bytes will do
  bin_sha1='\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0'

  printf "100644 \0$bin_sha1" >tree-with-empty-filename

I know that is longer and possibly more error-prone to run, but I think
it makes the test much easier to read and modify later.

I also wonder if $bin_sha1 should actually be more like:

  hex_sha1=$(echo foo | git hash-object --stdin -w)
  bin_sha1=$(echo $hex_sha1 | perl -ne 'printf "\\%3o", ord for /./g')

so that it's a real sha1 (or maybe it is in your original, from an
object that happens to be in the repo; it's hard to tell). I wouldn't
expect the code to actually get to the point of looking at the sha1, but
it's perhaps a more realistic test.

I also think it would be nice if hash-object had a "--binary-sha1"
option to avoid the perl grossness. :)

> diff --git a/tree-walk.c b/tree-walk.c
> index ce27842..ba544cf 100644

The code change itself looks brilliant, naturally. :)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-26 19:32 ` [PATCH 2/2] fsck: handle bad trees like other errors David Turner
  2016-09-26 19:51   ` Junio C Hamano
  2016-09-26 20:08   ` Junio C Hamano
@ 2016-09-27  5:27   ` Jeff King
  2016-09-27 15:19     ` David Turner
  2 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2016-09-27  5:27 UTC (permalink / raw)
  To: David Turner; +Cc: git, mhagger, David Turner

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 4422 bytes --]

On Mon, Sep 26, 2016 at 03:32:45PM -0400, David Turner wrote:

> Instead of dying when fsck hits a malformed tree object, log the error
> like any other and continue.  Now fsck can tell the user which tree is
> bad, too.

Cool. I think the lack of this is what made me drag my feet on the first
patch. Thanks for finishing it off.

> diff --git a/fsck.c b/fsck.c
> index c9cf3de..4a3069e 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -347,8 +347,9 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
>  		return -1;
>  
>  	name = get_object_name(options, &tree->object);
> -	init_tree_desc(&desc, tree->buffer, tree->size);
> -	while (tree_entry(&desc, &entry)) {
> +	if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
> +		return -1;
> +	while (tree_entry_gently(&desc, &entry)) {

I wondered if other callers would be happy with init_tree_desc_gently().
Grepping for init_tree_desc(), it seems like it would be a fairly
trivial conversion for most of them, because they almost invariably run
unpack_trees() right afterwards, and so have to deal with errors from
it.

So perhaps in the long run we can convert them all. But certainly that
does not need to be part of this series.

> +test_expect_success 'unparseable tree object' '
> +	test_when_finished "git update-ref -d refs/heads/wrong" &&
> +	test_when_finished "remove_object 307e300745b82417cc1a903f875c7d22e45ef907" &&
> +	test_when_finished "remove_object f506a346749bb96f52d8605ffba9fb93d46b5ffd" &&
> +	mkdir -p .git/objects/30 mkdir -p .git/objects/f5 &&
> +	cp ../t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907 .git/objects/30/7e300745b82417cc1a903f875c7d22e45ef907 &&
> +	cp ../t1450/bad-objects/f506a346749bb96f52d8605ffba9fb93d46b5ffd .git/objects/f5/06a346749bb96f52d8605ffba9fb93d46b5ffd &&

This needs the same $TEST_DIRECTORY treatment as t1007.

> +	git update-ref refs/heads/wrong 307e300745b82417cc1a903f875c7d22e45ef907 &&
> +	test_must_fail git fsck 2>out &&
> +	grep "warning: empty filename in tree entry" out &&
> +	grep "f506a346749bb96f52d8605ffba9fb93d46b5ffd" out &&
> +	! grep "fatal: empty filename in tree entry" out
> +'

I'd also expect these to be test_i18ngrep, but I see that t1450 is quite
bad about this in general. I'm OK with adding them as greps and leaving
a conversion of the whole script until later.

> diff --git a/t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907 b/t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907
> new file mode 100644
> index 0000000..6e23d62
> --- /dev/null
> +++ b/t/t1450/bad-objects/307e300745b82417cc1a903f875c7d22e45ef907
> @@ -0,0 +1,4 @@
> +x\x01ŽA\x0e \x10E]sŠ¹€f°@Ä\x18\x17\x1eÁ\v0\x05Z\x12[\x12
> +õú¢é	\ýüÅ{ÿ\x0fižc\x01IòP²÷\x104\x1aÛ)Ó+b&\x13ôÙ]\fê\x10ØR`êœ2Üš\x13¶–)exØ-:xÖ¼ø\f×%mö\x15×ûž§”Ç^[HÕd\x12{-áˆ
> +Q\f¿ÍÒ€\x7fè\x1d‡w,\x13p\x1aë
> +ßçâ\x03&ë?Þ
> \ No newline at end of file

Yikes. :)

I wonder if some printfs, similar to what I showed in the last patch,
combined with "hash-object --literally", could make these tests more
readable and avoid the binary goo.

> -static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size)
> +static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err)
>  {

I know we used the "err" strbuf pattern in the ref code, and it makes
sense there where we have a lot of different functions with public
interfaces. But here, we literally just feed the result to die() or
warning(). I wonder if a nicer interface would be:

  typedef void (*err_fn)(const char *, ...);

  static int decode_tree_entry(struct tree_desc *desc,
                               const char *buf, unsigned long size,
			       err_fn err)
  {
         ...
         if (size < 23 || buf[size - 21]) {
	        err("too-short tree object");
		return -1;
	 }
  }

I dunno. Maybe that is overengineering. I guess we only hit the strbufs
in the error path (which used to die!), so nobody really cares that much
about the extra allocation.

> +int init_tree_desc_gently(struct tree_desc *desc, const void *buffer, unsigned long size)
> +{
> +	struct strbuf err = STRBUF_INIT;
> +	int result = init_tree_desc_internal(desc, buffer, size, &err);
> +	if (result)
> +		warning("%s", err.buf);
> +	strbuf_release(&err);
> +	return result;
>  }

I also wonder if this ought to be "error()" and not "warning()". I think
it's pretty common for fsck to spit out errors from sub-code but keep going.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] tree-walk: be more specific about corrupt tree errors
  2016-09-27  5:14   ` Jeff King
@ 2016-09-27  5:35     ` Junio C Hamano
  2016-09-27 15:21     ` David Turner
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-27  5:35 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git, mhagger, David Turner

Jeff King <peff@peff.net> writes:

>> +test_expect_success 'malformed mode in tree' '
>> +	test_must_fail git hash-object -t tree ../t1007/tree-with-malformed-mode 2>err &&
>> +	grep "malformed mode in tree entry for tree" err
>> +'
>
> This ".." will break when the test is run with "--root". You should use
>
>   "$TEST_DIRECTORY"/t1007/...
>
> instead. And ditto in the second test, of course.

Ahh, that explains the breakage I saw.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-27  5:27   ` Jeff King
@ 2016-09-27 15:19     ` David Turner
  2016-09-27 19:19       ` thoughts on error passing, was " Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: David Turner @ 2016-09-27 15:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git, mhagger, David Turner

On Tue, 2016-09-27 at 01:27 -0400, Jeff King wrote:
> > -static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size)
> > +static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err)
> >  {
> 
> I know we used the "err" strbuf pattern in the ref code, and it makes
> sense there where we have a lot of different functions with public
> interfaces. But here, we literally just feed the result to die() or
> warning(). I wonder if a nicer interface would be:
> 
>   typedef void (*err_fn)(const char *, ...);
> 
>   static int decode_tree_entry(struct tree_desc *desc,
>                                const char *buf, unsigned long size,
> 			       err_fn err)
>   {
>          ...
>          if (size < 23 || buf[size - 21]) {
> 	        err("too-short tree object");
> 		return -1;
> 	 }
>   }
> 
> I dunno. Maybe that is overengineering. I guess we only hit the strbufs
> in the error path (which used to die!), so nobody really cares that much
> about the extra allocation.

I don't really like err_fn because:
(a) without a baton, it's somewhat less general (or less thread-safe)
than the strbuf approach and
(b) with a baton, it's two arguments instead of one.

Thanks for all of the rest of the commentary; I've incorporated it and
will re-roll shortly.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] tree-walk: be more specific about corrupt tree errors
  2016-09-27  5:14   ` Jeff King
  2016-09-27  5:35     ` Junio C Hamano
@ 2016-09-27 15:21     ` David Turner
  1 sibling, 0 replies; 22+ messages in thread
From: David Turner @ 2016-09-27 15:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git, mhagger, David Turner

On Tue, 2016-09-27 at 01:14 -0400, Jeff King wrote:
> I also wonder if $bin_sha1 should actually be more like:
> 
>   hex_sha1=$(echo foo | git hash-object --stdin -w)
>   bin_sha1=$(echo $hex_sha1 | perl -ne 'printf "\\%3o", ord for /./g')
> 
> so that it's a real sha1 (or maybe it is in your original, from an
> object that happens to be in the repo; it's hard to tell). I wouldn't
> expect the code to actually get to the point of looking at the sha1, but
> it's perhaps a more realistic test.

I'm going to do this for this patch, but for the second patch, I don't
think it matters, and it's a bit simpler to avoid it.  I guess it's
probably better to test both ways anyway.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* thoughts on error passing, was Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-27 15:19     ` David Turner
@ 2016-09-27 19:19       ` Jeff King
  2016-09-27 22:57         ` David Turner
  2016-09-28  5:01         ` Michael Haggerty
  0 siblings, 2 replies; 22+ messages in thread
From: Jeff King @ 2016-09-27 19:19 UTC (permalink / raw)
  To: David Turner; +Cc: git, mhagger, David Turner

On Tue, Sep 27, 2016 at 11:19:34AM -0400, David Turner wrote:

> >   typedef void (*err_fn)(const char *, ...);
> > 
> >   static int decode_tree_entry(struct tree_desc *desc,
> >                                const char *buf, unsigned long size,
> > 			       err_fn err)
> >   {
> >          ...
> >          if (size < 23 || buf[size - 21]) {
> > 	        err("too-short tree object");
> > 		return -1;
> > 	 }
> >   }
> > 
> > I dunno. Maybe that is overengineering. I guess we only hit the strbufs
> > in the error path (which used to die!), so nobody really cares that much
> > about the extra allocation.
> 
> I don't really like err_fn because:
> (a) without a baton, it's somewhat less general (or less thread-safe)
> than the strbuf approach and
> (b) with a baton, it's two arguments instead of one.

I'm going to ramble for a minute, and I don't think it's worth exploring
for this patch series in particular, so feel free to ignore me.

I think this error concept could be extended fairly elegantly with
something like:

  typedef void (*err_fn)(void *, const char *fmt, va_list ap)
  struct error_context {
        err_fn fn;
        void *data;
  };

  int report_error(struct error_context *err, const char *fmt, ...)
  {
        if (err->fn) {
                va_list ap;
                va_start(ap, fmt);
                err->fn(err->data, fmt, ap);
                va_end(ap);
        }
        return -1;
  }

Then low-level functions just take a context and do:

  return report_error(&err, "some error: %s", foo);

And then the callers would pick one of a few generic error contexts:

  - passing NULL silences the errors

  - a global for chaining to error, like:

       struct error_context print_errors = {
          error, /* actually a wrapper to handle va_list and NULL data */
          NULL
       };

   - a context that collects errors in a strbuf (or list, etc)

       struct strbuf err_buf = STRBUF_INIT;
       struct error_context = STRBUF_ERR_CONTEXT(&err_buf);

And that error_context can be passed around like a baton through several
functions.

I remember having a big discussion about error-passing patterns around
the ref refactoring, but I don't remember this particular thing coming
up. 

Anyway, this is all way outside the scope of what we should consider for
your current series. If we were to do something like this, it would make
sense to start using it consistently. This discussion just made me think
of it.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: thoughts on error passing, was Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-27 19:19       ` thoughts on error passing, was " Jeff King
@ 2016-09-27 22:57         ` David Turner
  2016-09-28  6:54           ` Jeff King
  2016-09-28  5:01         ` Michael Haggerty
  1 sibling, 1 reply; 22+ messages in thread
From: David Turner @ 2016-09-27 22:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, mhagger, David Turner

On Tue, 2016-09-27 at 15:19 -0400, Jeff King wrote:
> On Tue, Sep 27, 2016 at 11:19:34AM -0400, David Turner wrote:
> 
> > >   typedef void (*err_fn)(const char *, ...);
> > > 
> > >   static int decode_tree_entry(struct tree_desc *desc,
> > >                                const char *buf, unsigned long size,
> > > 			       err_fn err)
> > >   {
> > >          ...
> > >          if (size < 23 || buf[size - 21]) {
> > > 	        err("too-short tree object");
> > > 		return -1;
> > > 	 }
> > >   }
> > > 
> > > I dunno. Maybe that is overengineering. I guess we only hit the strbufs
> > > in the error path (which used to die!), so nobody really cares that much
> > > about the extra allocation.
> > 
> > I don't really like err_fn because:
> > (a) without a baton, it's somewhat less general (or less thread-safe)
> > than the strbuf approach and
> > (b) with a baton, it's two arguments instead of one.
> 
> I'm going to ramble for a minute, and I don't think it's worth exploring
> for this patch series in particular, so feel free to ignore me.
> 
> I think this error concept could be extended fairly elegantly with
> something like:
> 
>   typedef void (*err_fn)(void *, const char *fmt, va_list ap)
>   struct error_context {
>         err_fn fn;
>         void *data;
>   };
> 
>   int report_error(struct error_context *err, const char *fmt, ...)
>   {
>         if (err->fn) {
>                 va_list ap;
>                 va_start(ap, fmt);
>                 err->fn(err->data, fmt, ap);
>                 va_end(ap);
>         }
>         return -1;
>   }
> 
> Then low-level functions just take a context and do:
> 
>   return report_error(&err, "some error: %s", foo);
> 
> And then the callers would pick one of a few generic error contexts:
> 
>   - passing NULL silences the errors

Overall, +1.

I guess I would rather have a sentinel value for silencing errors,
because I'm worried that someone might read NULL as "don't handle the
errors, just die".  Of course, code review would hopefully catch this,
but even so, it would be easier to read foo(x, y, silence_errors) than
foo(x, y, null).




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: thoughts on error passing, was Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-27 19:19       ` thoughts on error passing, was " Jeff King
  2016-09-27 22:57         ` David Turner
@ 2016-09-28  5:01         ` Michael Haggerty
  2016-09-28  8:58           ` Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Michael Haggerty @ 2016-09-28  5:01 UTC (permalink / raw)
  To: Jeff King, David Turner; +Cc: git, David Turner

On 09/27/2016 09:19 PM, Jeff King wrote:
> [...]
> I'm going to ramble for a minute, and I don't think it's worth exploring
> for this patch series in particular, so feel free to ignore me.
> 
> I think this error concept could be extended fairly elegantly with
> something like:
> 
>   typedef void (*err_fn)(void *, const char *fmt, va_list ap)
>   struct error_context {
>         err_fn fn;
>         void *data;
>   };
> 
>   int report_error(struct error_context *err, const char *fmt, ...)
>   {
>         if (err->fn) {
>                 va_list ap;
>                 va_start(ap, fmt);
>                 err->fn(err->data, fmt, ap);
>                 va_end(ap);
>         }
>         return -1;
>   }

I like this idea. It's nicely flexible (more so than the `struct strbuf
*err` that is currently used for reference transactions) without being
cumbersome.

> Then low-level functions just take a context and do:
> 
>   return report_error(&err, "some error: %s", foo);
> 
> And then the callers would pick one of a few generic error contexts:
> 
>   - passing NULL silences the errors
> 
>   - a global for chaining to error, like:
> 
>        struct error_context print_errors = {
>           error, /* actually a wrapper to handle va_list and NULL data */
>           NULL
>        };

There could also be a global for chaining to `warn()` or `die()`.

> [...]

Michael


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: thoughts on error passing, was Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-27 22:57         ` David Turner
@ 2016-09-28  6:54           ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2016-09-28  6:54 UTC (permalink / raw)
  To: David Turner; +Cc: git, mhagger, David Turner

On Tue, Sep 27, 2016 at 06:57:34PM -0400, David Turner wrote:

> >   int report_error(struct error_context *err, const char *fmt, ...)
> >   {
> >         if (err->fn) {
> >                 va_list ap;
> >                 va_start(ap, fmt);
> >                 err->fn(err->data, fmt, ap);
> >                 va_end(ap);
> >         }
> >         return -1;
> >   }
> > 
> > Then low-level functions just take a context and do:
> > 
> >   return report_error(&err, "some error: %s", foo);
> > 
> > And then the callers would pick one of a few generic error contexts:
> > 
> >   - passing NULL silences the errors
> 
> Overall, +1.
> 
> I guess I would rather have a sentinel value for silencing errors,
> because I'm worried that someone might read NULL as "don't handle the
> errors, just die".  Of course, code review would hopefully catch this,
> but even so, it would be easier to read foo(x, y, silence_errors) than
> foo(x, y, null).

Yeah, I waffled on that. If you look carefully, you'll note that
the report_error() I showed above would actually require such a
"{ NULL, NULL }" global.

I don't plan to make any patches immediately for this, but I'll let it
percolate and consider whether it makes sense to try out for a future
series.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: thoughts on error passing, was Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-28  5:01         ` Michael Haggerty
@ 2016-09-28  8:58           ` Jeff King
  2016-09-28 18:02             ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2016-09-28  8:58 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: David Turner, git, David Turner

On Wed, Sep 28, 2016 at 07:01:38AM +0200, Michael Haggerty wrote:

> >   - a global for chaining to error, like:
> > 
> >        struct error_context print_errors = {
> >           error, /* actually a wrapper to handle va_list and NULL data */
> >           NULL
> >        };
> 
> There could also be a global for chaining to `warn()` or `die()`.

I played around a little with this. The latter actually makes a lot of
code cleaner, because we can rely on the functions not returning at all.
So for example, you get:

diff --git a/branch.c b/branch.c
index a5a8dcb..53404b8 100644
--- a/branch.c
+++ b/branch.c
@@ -303,17 +303,13 @@ void create_branch(const char *head,
 
 	if (!dont_change_ref) {
 		struct ref_transaction *transaction;
-		struct strbuf err = STRBUF_INIT;
-
-		transaction = ref_transaction_begin(&err);
-		if (!transaction ||
-		    ref_transaction_update(transaction, ref.buf,
-					   sha1, forcing ? NULL : null_sha1,
-					   0, msg, &err) ||
-		    ref_transaction_commit(transaction, &err))
-			die("%s", err.buf);
+
+		transaction = ref_transaction_begin(&error_die);
+		ref_transaction_update(transaction, ref.buf,
+				       sha1, forcing ? NULL : null_sha1,
+				       0, msg, &error_die);
+		ref_transaction_commit(transaction, &error_die);
 		ref_transaction_free(transaction);
-		strbuf_release(&err);
 	}
 
 	if (real_ref && track)

which is much shorter and to the point (it does rely on the called
functions always calling report_error() and never just returning NULL or
"-1", but that should be the already. If it isn't, we'd be printing
"fatal: " with no message).

Cases that call:

  error("%s", err.buf);

can drop the strbuf handling, but of course still need to retain their
conditionals. So they're better, but not as much. I did a half-hearted
conversion of some of the ref code that uses strbufs, and it seems like
it would save a few hundred lines of boilerplate.

There are some cases that are _worse_, because they want to prefix the
error. E.g., in init-db, we have:

  struct strbuf err = STRBUF_INIT;
  ...
  if (refs_init_db(&err))
	die("failed to set up refs db: %s", err.buf);

which is fairly clean. Using an error_context adds slightly to the
boilerplate:

  struct strbuf err_buf = STRBUF_INIT;
  struct error_context err = STRBUF_ERR(&err_buf);
  ...
  if (refs_init_db(&err))
	die("failed to set up refs db: %s", err_buf.buf);

Though if we wanted to get really magical, the err_buf/err pattern could
be its own single-line macro.

You could solve this more generally with something like:

  struct error_prefix_data err;

  error_prefix(&err, &error_die, "failed to set up refs db");
  refs_init_db(&err.err);

where error_prefix() basically sets us up to call back a function which
concatenates the prefix to the real error, then chains to error_die.
But to cover all cases, error_prefix() would actually have to format the
prefix string. Because some callers would be more like:

  error_prefix(&err, &error_print, "unable to frob %s", foo);
  do_frob(foo, &err);

We can't just save the va_list passed to error_prefix(), because it's
not valid after we return. So you have to format the prefix into a
buffer, even though in most cases we won't see an error at all (and
doing it completely correctly would involve using a strbuf, which means
there needs to be a cleanup step; yuck).

-Peff

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: thoughts on error passing, was Re: [PATCH 2/2] fsck: handle bad trees like other errors
  2016-09-28  8:58           ` Jeff King
@ 2016-09-28 18:02             ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-09-28 18:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, David Turner, git, David Turner

Jeff King <peff@peff.net> writes:

>  	if (!dont_change_ref) {
>  		struct ref_transaction *transaction;
> -		struct strbuf err = STRBUF_INIT;
> -
> -		transaction = ref_transaction_begin(&err);
> -		if (!transaction ||
> -		    ref_transaction_update(transaction, ref.buf,
> -					   sha1, forcing ? NULL : null_sha1,
> -					   0, msg, &err) ||
> -		    ref_transaction_commit(transaction, &err))
> -			die("%s", err.buf);
> +
> +		transaction = ref_transaction_begin(&error_die);
> +		ref_transaction_update(transaction, ref.buf,
> +				       sha1, forcing ? NULL : null_sha1,
> +				       0, msg, &error_die);
> +		ref_transaction_commit(transaction, &error_die);
>  		ref_transaction_free(transaction);
> -		strbuf_release(&err);
>  	}
>  
>  	if (real_ref && track)
>
> which is much shorter and to the point (it does rely on the called
> functions always calling report_error() and never just returning NULL or
> "-1", but that should be the already. If it isn't, we'd be printing
> "fatal: " with no message).

Yes but... grepping for die() got a lot harder, which may not be a
good thing.

I do like the flexibility such a mechanism offers, but
wrapping/hiding die in it is probably an example that the
flexibility went a bit too far.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-09-28 18:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26 19:32 [PATCH 0/2] tree-walk improvements David Turner
2016-09-26 19:32 ` [PATCH 1/2] tree-walk: be more specific about corrupt tree errors David Turner
2016-09-27  5:14   ` Jeff King
2016-09-27  5:35     ` Junio C Hamano
2016-09-27 15:21     ` David Turner
2016-09-26 19:32 ` [PATCH 2/2] fsck: handle bad trees like other errors David Turner
2016-09-26 19:51   ` Junio C Hamano
2016-09-26 20:08   ` Junio C Hamano
2016-09-26 20:11     ` Junio C Hamano
2016-09-27  5:27   ` Jeff King
2016-09-27 15:19     ` David Turner
2016-09-27 19:19       ` thoughts on error passing, was " Jeff King
2016-09-27 22:57         ` David Turner
2016-09-28  6:54           ` Jeff King
2016-09-28  5:01         ` Michael Haggerty
2016-09-28  8:58           ` Jeff King
2016-09-28 18:02             ` Junio C Hamano
2016-09-26 19:39 ` [PATCH 0/2] tree-walk improvements Stefan Beller
2016-09-26 19:43 ` Junio C Hamano
2016-09-26 20:22   ` David Turner
2016-09-27  0:35     ` Junio C Hamano
2016-09-26 21:04 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).