git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic
@ 2021-07-12 11:46 ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
                   ` (21 more replies)
  0 siblings, 22 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu

This patch series make cat-file reuse ref-filter logic.

Change from last version:

 1. Declare buf_size in if (atom_type == ATOM_RAW) block.
 2. Modify the code style of the test.
 3. Delete "use_textconv" and "use_filter" flag. Instead, add member
    cat_file_cmdmode to struct ref_array_item.
 4. Add function reject_atom() to enhance the readability of the code.
 5. Create p1006-cat-file.sh for performance regression testing.
 6. Use a "fast path" to output object data to reduce the performance
    degradation of cat-file --batch with the suggest of Ævar Arnfjörð
    Bjarmason.

ZheNing Hu (19):
  cat-file: handle trivial --batch format with --batch-all-objects
  cat-file: merge two block into one
  [GSOC] ref-filter: add obj-type check in grab contents
  [GSOC] ref-filter: add %(raw) atom
  [GSOC] ref-filter: --format=%(raw) re-support --perl
  [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
  [GSOC] ref-filter: add %(rest) atom
  [GSOC] ref-filter: pass get_object() return value to their callers
  [GSOC] ref-filter: introduce free_ref_array_item_value() function
  [GSOC] ref-filter: introduce reject_atom()
  [GSOC] ref-filter: modify the error message and value in get_object
  [GSOC] cat-file: add has_object_file() check
  [GSOC] cat-file: change batch_objects parameter name
  [GSOC] cat-file: reuse ref-filter logic
  [GSOC] cat-file: reuse err buf in batch_object_write()
  [GSOC] cat-file: re-implement --textconv, --filters options
  [GSOC] ref-filter: remove grab_oid() function
  [GSOC] cat-file: create p1006-cat-file.sh
  [GSOC] cat-file: use fast path when using default_format

 Documentation/git-cat-file.txt     |   6 +
 Documentation/git-for-each-ref.txt |   9 +
 builtin/cat-file.c                 | 308 +++++++++----------------
 builtin/tag.c                      |   2 +-
 quote.c                            |  17 ++
 quote.h                            |   1 +
 ref-filter.c                       | 346 +++++++++++++++++++++--------
 ref-filter.h                       |  13 +-
 t/perf/p1006-cat-file.sh           |  28 +++
 t/t1006-cat-file.sh                | 273 +++++++++++++++++++++++
 t/t3203-branch-output.sh           |   4 +
 t/t6300-for-each-ref.sh            | 235 ++++++++++++++++++++
 t/t6301-for-each-ref-errors.sh     |   2 +-
 t/t7004-tag.sh                     |   4 +
 t/t7030-verify-tag.sh              |   4 +
 15 files changed, 955 insertions(+), 297 deletions(-)
 create mode 100755 t/perf/p1006-cat-file.sh


base-commit: d486ca60a51c9cb1fe068803c3f540724e95e83a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-993%2Fadlternative%2Fcat-file-batch-refactor-2-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-993/adlternative/cat-file-batch-refactor-2-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/993
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 02/19] cat-file: merge two block into one ZheNing Hu via GitGitGadget
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

The --batch code to print an object assumes we found out the type of
the object from calling oid_object_info_extended(). This is true for
the default format, but even in a custom format, we manually modify
the object_info struct to ask for the type.

This assumption was broken by 845de33a5b (cat-file: avoid noop calls
to sha1_object_info_extended, 2016-05-18). That commit skips the call
to oid_object_info_extended() entirely when --batch-all-objects is in
use, and the custom format does not include any placeholders that
require calling it.

Or when the custom format only include placeholders like %(objectname) or
%(rest), oid_object_info_extended() will not get the type of the object.

This results in an error when we try to confirm that the type didn't
change:

$ git cat-file --batch=batman --batch-all-objects
batman
fatal: object 000023961a0c02d6e21dc51ea3484ff71abf1c74 changed type!?

and also has other subtle effects (e.g., we'd fail to stream a blob,
since we don't realize it's a blob in the first place).

We can fix this by flipping the order of the setup. The check for "do
we need to get the object info" must come _after_ we've decided
whether we need to look up the type.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/cat-file.c  | 13 +++++++------
 t/t1006-cat-file.sh | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..02461bb5ea6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -512,12 +512,6 @@ static int batch_objects(struct batch_options *opt)
 	if (opt->cmdmode)
 		data.split_on_whitespace = 1;
 
-	if (opt->all_objects) {
-		struct object_info empty = OBJECT_INFO_INIT;
-		if (!memcmp(&data.info, &empty, sizeof(empty)))
-			data.skip_object_info = 1;
-	}
-
 	/*
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
@@ -525,6 +519,13 @@ static int batch_objects(struct batch_options *opt)
 	if (opt->print_contents)
 		data.info.typep = &data.type;
 
+	if (opt->all_objects) {
+		struct object_info empty = OBJECT_INFO_INIT;
+
+		if (!memcmp(&data.info, &empty, sizeof(empty)))
+			data.skip_object_info = 1;
+	}
+
 	if (opt->all_objects) {
 		struct object_cb_data cb;
 
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74a..18b3779ccb6 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -586,4 +586,26 @@ test_expect_success 'cat-file --unordered works' '
 	test_cmp expect actual
 '
 
+test_expect_success 'set up object list for --batch-all-objects tests' '
+	git -C all-two cat-file --batch-all-objects --batch-check="%(objectname)" >objects
+'
+
+test_expect_success 'cat-file --batch="%(objectname)" with --batch-all-objects will work' '
+	git -C all-two cat-file --batch="%(objectname)" <objects >expect &&
+	git -C all-two cat-file --batch-all-objects --batch="%(objectname)" >actual &&
+	cmp expect actual
+'
+
+test_expect_success 'cat-file --batch="%(rest)" with --batch-all-objects will work' '
+	git -C all-two cat-file --batch="%(rest)" <objects >expect &&
+	git -C all-two cat-file --batch-all-objects --batch="%(rest)" >actual &&
+	cmp expect actual
+'
+
+test_expect_success 'cat-file --batch="batman" with --batch-all-objects will work' '
+	git -C all-two cat-file --batch="batman" <objects >expect &&
+	git -C all-two cat-file --batch-all-objects --batch="batman" >actual &&
+	cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/19] cat-file: merge two block into one
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

There are two "if (opt->all_objects)" blocks next
to each other, merge them into one to provide better
readability.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/cat-file.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 02461bb5ea6..243fe6844bc 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -520,14 +520,11 @@ static int batch_objects(struct batch_options *opt)
 		data.info.typep = &data.type;
 
 	if (opt->all_objects) {
+		struct object_cb_data cb;
 		struct object_info empty = OBJECT_INFO_INIT;
 
 		if (!memcmp(&data.info, &empty, sizeof(empty)))
 			data.skip_object_info = 1;
-	}
-
-	if (opt->all_objects) {
-		struct object_cb_data cb;
 
 		if (has_promisor_remote())
 			warning("This repository uses promisor remotes. Some objects may not be loaded.");
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 02/19] cat-file: merge two block into one ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase.  We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag.

Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4db0e40ff4c..5cee6512fba 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1356,11 +1356,12 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }
 
 /* See grab_values */
-static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
+static void grab_sub_body_contents(struct atom_value *val, int deref, struct expand_data *data)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
+	void *buf = data->content;
 
 	for (i = 0; i < used_atom_cnt; i++) {
 		struct used_atom *atom = &used_atom[i];
@@ -1371,10 +1372,13 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			continue;
 		if (deref)
 			name++;
-		if (strcmp(name, "body") &&
-		    !starts_with(name, "subject") &&
-		    !starts_with(name, "trailers") &&
-		    !starts_with(name, "contents"))
+
+		if ((data->type != OBJ_TAG &&
+		     data->type != OBJ_COMMIT) ||
+		    (strcmp(name, "body") &&
+		     !starts_with(name, "subject") &&
+		     !starts_with(name, "trailers") &&
+		     !starts_with(name, "contents")))
 			continue;
 		if (!subpos)
 			find_subpos(buf,
@@ -1438,17 +1442,19 @@ static void fill_missing_values(struct atom_value *val)
  * pointed at by the ref itself; otherwise it is the object the
  * ref (which is a tag) refers to.
  */
-static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
+static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
 {
+	void *buf = data->content;
+
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, data);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, data);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;
@@ -1678,7 +1684,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi->content);
+		grab_values(ref->value, deref, *obj, oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depends on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, we need to find a way to record the length of the
atom_value's member `s`. Although strbuf can already record the
string and its length, if we want to replace the type of atom_value's
member `s` with strbuf, many places in ref-filter that are filled
with dynamically allocated mermory in `v->s` are not easy to replace.
At the same time, we need to check if `v->s == NULL` in
populate_value(), and strbuf cannot easily distinguish NULL and empty
strings, but c-style "const char *" can do it. So add a new member in
`struct atom_value`: `s_size`, which can record raw object size, it
can help us add raw object data to the buffer or compare two buffers
which contain raw object data.

Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, and `--perl` because if the binary raw data is passed to a
variable in such languages, these may not support arbitrary binary data
in their string variable type.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Bagas Sanjaya <bagasdotme@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 140 +++++++++++++++----
 t/t6300-for-each-ref.sh            | 216 +++++++++++++++++++++++++++++
 3 files changed, 338 insertions(+), 27 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2ae2478de70..cbb6f87d13f 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,15 @@ and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol
 out of the trimmed email.
 
+The raw data in an object is `raw`.
+
+raw:size::
+	The raw data size of the object.
+
+Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+`--perl` because such language may not support arbitrary binary data in their
+string variable type.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index 5cee6512fba..506fbc3d691 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,
 	ATOM_TRAILERS,
 	ATOM_CONTENTS,
+	ATOM_RAW,
 	ATOM_UPSTREAM,
 	ATOM_PUSH,
 	ATOM_SYMREF,
@@ -189,6 +190,9 @@ static struct used_atom {
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
+		struct {
+			enum { RAW_BARE, RAW_LENGTH } option;
+		} raw_data;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -426,6 +430,18 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -586,6 +602,7 @@ static struct {
 	[ATOM_BODY] = { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
 	[ATOM_TRAILERS] = { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	[ATOM_CONTENTS] = { "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
+	[ATOM_RAW] = { "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
 	[ATOM_UPSTREAM] = { "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_PUSH] = { "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_SYMREF] = { "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -620,12 +637,15 @@ struct ref_formatting_state {
 
 struct atom_value {
 	const char *s;
+	size_t s_size;
 	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
 		       struct strbuf *err);
 	uintmax_t value; /* used for sorting when not FIELD_STR */
 	struct used_atom *atom;
 };
 
+#define ATOM_VALUE_S_SIZE_INIT (-1)
+
 /*
  * Used to parse format string and sort specifiers
  */
@@ -644,13 +664,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
 				       (int)(ep-atom), atom);
 
-	/* Do we have the atom already used elsewhere? */
-	for (i = 0; i < used_atom_cnt; i++) {
-		int len = strlen(used_atom[i].name);
-		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-			return i;
-	}
-
 	/*
 	 * If the atom name has a colon, strip it and everything after
 	 * it off - it specifies the format for this entry, and
@@ -660,6 +673,13 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	arg = memchr(sp, ':', ep - sp);
 	atom_len = (arg ? arg : ep) - sp;
 
+	/* Do we have the atom already used elsewhere? */
+	for (i = 0; i < used_atom_cnt; i++) {
+		int len = strlen(used_atom[i].name);
+		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
+			return i;
+	}
+
 	/* Is the atom a valid one? */
 	for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
 		int len = strlen(valid_atom[i].name);
@@ -709,11 +729,14 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	return at;
 }
 
-static void quote_formatting(struct strbuf *s, const char *str, int quote_style)
+static void quote_formatting(struct strbuf *s, const char *str, size_t len, int quote_style)
 {
 	switch (quote_style) {
 	case QUOTE_NONE:
-		strbuf_addstr(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(s, str, len);
+		else
+			strbuf_addstr(s, str);
 		break;
 	case QUOTE_SHELL:
 		sq_quote_buf(s, str);
@@ -740,9 +763,12 @@ static int append_atom(struct atom_value *v, struct ref_formatting_state *state,
 	 * encountered.
 	 */
 	if (!state->stack->prev)
-		quote_formatting(&state->stack->output, v->s, state->quote_style);
+		quote_formatting(&state->stack->output, v->s, v->s_size, state->quote_style);
 	else
-		strbuf_addstr(&state->stack->output, v->s);
+		if (v->s_size != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(&state->stack->output, v->s, v->s_size);
+		else
+			strbuf_addstr(&state->stack->output, v->s);
 	return 0;
 }
 
@@ -842,21 +868,23 @@ static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	return 0;
 }
 
-static int is_empty(const char *s)
+static int is_empty(struct strbuf *buf)
 {
-	while (*s != '\0') {
-		if (!isspace(*s))
-			return 0;
-		s++;
-	}
-	return 1;
-}
+	const char *cur = buf->buf;
+	const char *end = buf->buf + buf->len;
+
+	while (cur != end && (isspace(*cur)))
+		cur++;
+
+	return cur == end;
+ }
 
 static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			     struct strbuf *err)
 {
 	struct ref_formatting_stack *cur = state->stack;
 	struct if_then_else *if_then_else = NULL;
+	size_t str_len = 0;
 
 	if (cur->at_end == if_then_else_handler)
 		if_then_else = (struct if_then_else *)cur->at_end_data;
@@ -867,18 +895,22 @@ static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 	if (if_then_else->else_atom_seen)
 		return strbuf_addf_ret(err, -1, _("format: %%(then) atom used after %%(else)"));
 	if_then_else->then_atom_seen = 1;
+	if (if_then_else->str)
+		str_len = strlen(if_then_else->str);
 	/*
 	 * If the 'equals' or 'notequals' attribute is used then
 	 * perform the required comparison. If not, only non-empty
 	 * strings satisfy the 'if' condition.
 	 */
 	if (if_then_else->cmp_status == COMPARE_EQUAL) {
-		if (!strcmp(if_then_else->str, cur->output.buf))
+		if (str_len == cur->output.len &&
+		    !memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
 	} else if (if_then_else->cmp_status == COMPARE_UNEQUAL) {
-		if (strcmp(if_then_else->str, cur->output.buf))
+		if (str_len != cur->output.len ||
+		    memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
-	} else if (cur->output.len && !is_empty(cur->output.buf))
+	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
 	strbuf_reset(&cur->output);
 	return 0;
@@ -924,7 +956,7 @@ static int end_atom_handler(struct atom_value *atomv, struct ref_formatting_stat
 	 * only on the topmost supporting atom.
 	 */
 	if (!current->prev->prev) {
-		quote_formatting(&s, current->output.buf, state->quote_style);
+		quote_formatting(&s, current->output.buf, current->output.len, state->quote_style);
 		strbuf_swap(&current->output, &s);
 	}
 	strbuf_release(&s);
@@ -974,6 +1006,10 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
+		    used_atom[at].u.raw_data.option == RAW_BARE)
+			die(_("--format=%.*s cannot be used with"
+			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
 		cp = ep + 1;
 
 		if (skip_prefix(used_atom[at].name, "color:", &color))
@@ -1367,12 +1403,25 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+		enum atom_type atom_type = atom->atom_type;
 
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
 			name++;
 
+		if (atom_type == ATOM_RAW) {
+			unsigned long buf_size = data->size;
+
+			if (atom->u.raw_data.option == RAW_BARE) {
+				v->s = xmemdupz(buf, buf_size);
+				v->s_size = buf_size;
+			} else if (atom->u.raw_data.option == RAW_LENGTH) {
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
+			}
+			continue;
+		}
+
 		if ((data->type != OBJ_TAG &&
 		     data->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
@@ -1460,9 +1509,11 @@ static void grab_values(struct atom_value *val, int deref, struct object *obj, s
 		break;
 	case OBJ_TREE:
 		/* grab_tree_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	case OBJ_BLOB:
 		/* grab_blob_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	default:
 		die("Eh?  Object of type %d?", obj->type);
@@ -1766,6 +1817,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		const char *refname;
 		struct branch *branch = NULL;
 
+		v->s_size = ATOM_VALUE_S_SIZE_INIT;
 		v->handler = append_atom;
 		v->atom = atom;
 
@@ -2369,6 +2421,19 @@ static int compare_detached_head(struct ref_array_item *a, struct ref_array_item
 	return 0;
 }
 
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)
+{
+	const char *s1 = vs1, *s2 = vs2;
+	const char *end = s1 + n;
+
+	for (; s1 < end; s1++, s2++) {
+		int diff = tolower(*s1) - tolower(*s2);
+		if (diff)
+			return diff;
+	}
+	return 0;
+}
+
 static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
 {
 	struct atom_value *va, *vb;
@@ -2389,10 +2454,30 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 	} else if (s->sort_flags & REF_SORTING_VERSION) {
 		cmp = versioncmp(va->s, vb->s);
 	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(vb->s) : vb->s_size;
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+
+			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
+				     a_size : b_size);
+			if (!cmp) {
+				if (a_size > b_size)
+					cmp = 1;
+				else if (a_size < b_size)
+					cmp = -1;
+			}
+		}
 	} else {
 		if (va->value < vb->value)
 			cmp = -1;
@@ -2492,6 +2577,7 @@ int format_ref_array_item(struct ref_array_item *info,
 	}
 	if (format->need_color_reset_at_eol) {
 		struct atom_value resetv;
+		resetv.s_size = ATOM_VALUE_S_SIZE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9e0214076b4..18554f62d94 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -130,6 +130,8 @@ test_atom head parent:short=10 ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head raw "$(git cat-file commit refs/heads/main)
+"
 test_atom head '*objectname' ''
 test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
@@ -221,6 +223,15 @@ test_atom tag contents 'Tagging at 1151968727
 '
 test_atom tag HEAD ' '
 
+test_expect_success 'basic atom: refs/tags/testtag *raw' '
+	git cat-file commit refs/tags/testtag^{} >expected &&
+	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -686,6 +697,15 @@ test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-empty raw' '
+	git cat-file tag refs/tags/signed-empty >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short subject:sanitize 'subject-line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -695,6 +715,15 @@ test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-short raw' '
+	git cat-file tag refs/tags/signed-short >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-short >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long subject:sanitize 'subject-line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -708,6 +737,15 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-long raw' '
+	git cat-file tag refs/tags/signed-long >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-long >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/main^{tree} &&
 	git update-ref refs/myblobs/first refs/heads/main:one
@@ -720,6 +758,16 @@ test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
 
+test_expect_success 'basic atom: refs/mytrees/first raw' '
+	git cat-file tree refs/mytrees/first >expected &&
+	echo >>expected &&
+	git for-each-ref --format="%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/mytrees/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -727,6 +775,174 @@ test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
 
+test_expect_success 'basic atom: refs/myblobs/first raw' '
+	git cat-file blob refs/myblobs/first >expected &&
+	echo >>expected &&
+	git for-each-ref --format="%(raw)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/myblobs/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'set up refs pointing to binary blob' '
+	printf "a\0b\0c" >blob1 &&
+	printf "a\0c\0b" >blob2 &&
+	printf "\0a\0b\0c" >blob3 &&
+	printf "abc" >blob4 &&
+	printf "\0 \0 \0 " >blob5 &&
+	printf "\0 \0a\0 " >blob6 &&
+	printf "  " >blob7 &&
+	>blob8 &&
+	obj=$(git hash-object -w blob1) &&
+	git update-ref refs/myblobs/blob1 "$obj" &&
+	obj=$(git hash-object -w blob2) &&
+	git update-ref refs/myblobs/blob2 "$obj" &&
+	obj=$(git hash-object -w blob3) &&
+	git update-ref refs/myblobs/blob3 "$obj" &&
+	obj=$(git hash-object -w blob4) &&
+	git update-ref refs/myblobs/blob4 "$obj" &&
+	obj=$(git hash-object -w blob5) &&
+	git update-ref refs/myblobs/blob5 "$obj" &&
+	obj=$(git hash-object -w blob6) &&
+	git update-ref refs/myblobs/blob6 "$obj" &&
+	obj=$(git hash-object -w blob7) &&
+	git update-ref refs/myblobs/blob7 "$obj" &&
+	obj=$(git hash-object -w blob8) &&
+	git update-ref refs/myblobs/blob8 "$obj"
+'
+
+test_expect_success 'Verify sorts with raw' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob3
+	refs/myblobs/blob7
+	refs/mytrees/first
+	refs/myblobs/first
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob4
+	refs/heads/main
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'Verify sorts with raw:size' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/first
+	refs/myblobs/blob7
+	refs/heads/main
+	refs/myblobs/blob4
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/mytrees/first
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw:size \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:equals)' '
+	cat >expected <<-EOF &&
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	refs/myblobs/blob4
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	EOF
+	git for-each-ref --format="%(if:equals=abc)%(raw)%(then)%(refname)%(else)not equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:notequals)' '
+	cat >expected <<-EOF &&
+	refs/heads/ambiguous
+	refs/heads/main
+	refs/heads/newtag
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	equals
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob7
+	refs/myblobs/blob8
+	refs/myblobs/first
+	EOF
+	git for-each-ref --format="%(if:notequals=abc)%(raw)%(then)%(refname)%(else)equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'empty raw refs with %(if)' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob1 not empty
+	refs/myblobs/blob2 not empty
+	refs/myblobs/blob3 not empty
+	refs/myblobs/blob4 not empty
+	refs/myblobs/blob5 not empty
+	refs/myblobs/blob6 not empty
+	refs/myblobs/blob7 empty
+	refs/myblobs/blob8 empty
+	refs/myblobs/first not empty
+	EOF
+	git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \
+		refs/myblobs/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '%(raw) with --python must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --python
+'
+
+test_expect_success '%(raw) with --tcl must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --tcl
+'
+
+test_expect_success '%(raw) with --perl must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --perl
+'
+
+test_expect_success '%(raw) with --shell must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --shell
+'
+
+test_expect_success '%(raw) with --shell and --sort=raw must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
+'
+
+test_expect_success '%(raw:size) with --shell' '
+	git for-each-ref --format="%(raw:size)" | while read line
+	do
+		echo "'\''$line'\''" >>expect
+	done &&
+	git for-each-ref --format="%(raw:size)" --shell >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'for-each-ref --format compare with cat-file --batch' '
+	git rev-parse refs/mytrees/first | git cat-file --batch >expected &&
+	git for-each-ref --format="%(objectname) %(objecttype) %(objectsize)
+%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000
 	do
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because the perl language can handle binary data correctly,
add the function perl_quote_buf_with_len(), which can specify
the length of the data and prevent the data from being truncated
at '\0' to help `--format="%(raw)"` re-support `--perl`.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |  4 ++--
 quote.c                            | 17 +++++++++++++++++
 quote.h                            |  1 +
 ref-filter.c                       | 15 +++++++++++----
 t/t6300-for-each-ref.sh            | 19 +++++++++++++++++--
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index cbb6f87d13f..6da899c6296 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -241,8 +241,8 @@ raw:size::
 	The raw data size of the object.
 
 Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
-`--perl` because such language may not support arbitrary binary data in their
-string variable type.
+because such language may not support arbitrary binary data in their string
+variable type.
 
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
diff --git a/quote.c b/quote.c
index 8a3a5e39eb1..26719d21d1e 100644
--- a/quote.c
+++ b/quote.c
@@ -471,6 +471,23 @@ void perl_quote_buf(struct strbuf *sb, const char *src)
 	strbuf_addch(sb, sq);
 }
 
+void perl_quote_buf_with_len(struct strbuf *sb, const char *src, size_t len)
+{
+	const char sq = '\'';
+	const char bq = '\\';
+	const char *c = src;
+	const char *end = src + len;
+
+	strbuf_addch(sb, sq);
+	while (c != end) {
+		if (*c == sq || *c == bq)
+			strbuf_addch(sb, bq);
+		strbuf_addch(sb, *c);
+		c++;
+	}
+	strbuf_addch(sb, sq);
+}
+
 void python_quote_buf(struct strbuf *sb, const char *src)
 {
 	const char sq = '\'';
diff --git a/quote.h b/quote.h
index 768cc6338e2..0fe69e264b0 100644
--- a/quote.h
+++ b/quote.h
@@ -94,6 +94,7 @@ char *quote_path(const char *in, const char *prefix, struct strbuf *out, unsigne
 
 /* quoting as a string literal for other languages */
 void perl_quote_buf(struct strbuf *sb, const char *src);
+void perl_quote_buf_with_len(struct strbuf *sb, const char *src, size_t len);
 void python_quote_buf(struct strbuf *sb, const char *src);
 void tcl_quote_buf(struct strbuf *sb, const char *src);
 void basic_regex_quote_buf(struct strbuf *sb, const char *src);
diff --git a/ref-filter.c b/ref-filter.c
index 506fbc3d691..ba9ab35d7ec 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -742,7 +742,10 @@ static void quote_formatting(struct strbuf *s, const char *str, size_t len, int
 		sq_quote_buf(s, str);
 		break;
 	case QUOTE_PERL:
-		perl_quote_buf(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			perl_quote_buf_with_len(s, str, len);
+		else
+			perl_quote_buf(s, str);
 		break;
 	case QUOTE_PYTHON:
 		python_quote_buf(s, str);
@@ -1006,10 +1009,14 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
-		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
-		    used_atom[at].u.raw_data.option == RAW_BARE)
+
+		if ((format->quote_style == QUOTE_PYTHON ||
+		     format->quote_style == QUOTE_SHELL ||
+		     format->quote_style == QUOTE_TCL) &&
+		     used_atom[at].atom_type == ATOM_RAW &&
+		     used_atom[at].u.raw_data.option == RAW_BARE)
 			die(_("--format=%.*s cannot be used with"
-			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
+			      "--python, --shell, --tcl"), (int)(ep - sp - 2), sp + 2);
 		cp = ep + 1;
 
 		if (skip_prefix(used_atom[at].name, "color:", &color))
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 18554f62d94..3d15d0a5360 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -915,8 +915,23 @@ test_expect_success '%(raw) with --tcl must fail' '
 	test_must_fail git for-each-ref --format="%(raw)" --tcl
 '
 
-test_expect_success '%(raw) with --perl must fail' '
-	test_must_fail git for-each-ref --format="%(raw)" --perl
+test_expect_success '%(raw) with --perl' '
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/blob1 --perl | perl >actual &&
+	cmp blob1 actual &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/blob3 --perl | perl >actual &&
+	cmp blob3 actual &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/blob8 --perl | perl >actual &&
+	cmp blob8 actual &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/first --perl | perl >actual &&
+	cmp one actual &&
+	git cat-file tree refs/mytrees/first > expected &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/mytrees/first --perl | perl >actual &&
+	cmp expected actual
 '
 
 test_expect_success '%(raw) with --shell must fail' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Use non-const ref_format in *_atom_parser(), which can help us
modify the members of ref_format in *_atom_parser().

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/tag.c |  2 +-
 ref-filter.c  | 44 ++++++++++++++++++++++----------------------
 ref-filter.h  |  4 ++--
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 82fcfc09824..452558ec957 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -146,7 +146,7 @@ static int verify_tag(const char *name, const char *ref,
 		      const struct object_id *oid, void *cb_data)
 {
 	int flags;
-	const struct ref_format *format = cb_data;
+	struct ref_format *format = cb_data;
 	flags = GPG_VERIFY_VERBOSE;
 
 	if (format->format)
diff --git a/ref-filter.c b/ref-filter.c
index ba9ab35d7ec..c8e561a3687 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -226,7 +226,7 @@ static int strbuf_addf_ret(struct strbuf *sb, int ret, const char *fmt, ...)
 	return ret;
 }
 
-static int color_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int color_atom_parser(struct ref_format *format, struct used_atom *atom,
 			     const char *color_value, struct strbuf *err)
 {
 	if (!color_value)
@@ -264,7 +264,7 @@ static int refname_atom_parser_internal(struct refname_atom *atom, const char *a
 	return 0;
 }
 
-static int remote_ref_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int remote_ref_atom_parser(struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
 	struct string_list params = STRING_LIST_INIT_DUP;
@@ -311,7 +311,7 @@ static int remote_ref_atom_parser(const struct ref_format *format, struct used_a
 	return 0;
 }
 
-static int objecttype_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int objecttype_atom_parser(struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
 	if (arg)
@@ -323,7 +323,7 @@ static int objecttype_atom_parser(const struct ref_format *format, struct used_a
 	return 0;
 }
 
-static int objectsize_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int objectsize_atom_parser(struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
 	if (!arg) {
@@ -343,7 +343,7 @@ static int objectsize_atom_parser(const struct ref_format *format, struct used_a
 	return 0;
 }
 
-static int deltabase_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int deltabase_atom_parser(struct ref_format *format, struct used_atom *atom,
 				 const char *arg, struct strbuf *err)
 {
 	if (arg)
@@ -355,7 +355,7 @@ static int deltabase_atom_parser(const struct ref_format *format, struct used_at
 	return 0;
 }
 
-static int body_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int body_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *err)
 {
 	if (arg)
@@ -364,7 +364,7 @@ static int body_atom_parser(const struct ref_format *format, struct used_atom *a
 	return 0;
 }
 
-static int subject_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int subject_atom_parser(struct ref_format *format, struct used_atom *atom,
 			       const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -376,7 +376,7 @@ static int subject_atom_parser(const struct ref_format *format, struct used_atom
 	return 0;
 }
 
-static int trailers_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int trailers_atom_parser(struct ref_format *format, struct used_atom *atom,
 				const char *arg, struct strbuf *err)
 {
 	atom->u.contents.trailer_opts.no_divider = 1;
@@ -402,7 +402,7 @@ static int trailers_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
-static int contents_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int contents_atom_parser(struct ref_format *format, struct used_atom *atom,
 				const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -430,7 +430,7 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
-static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int raw_atom_parser(struct ref_format *format, struct used_atom *atom,
 				const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -442,7 +442,7 @@ static int raw_atom_parser(const struct ref_format *format, struct used_atom *at
 	return 0;
 }
 
-static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int oid_atom_parser(struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -461,7 +461,7 @@ static int oid_atom_parser(const struct ref_format *format, struct used_atom *at
 	return 0;
 }
 
-static int person_email_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int person_email_atom_parser(struct ref_format *format, struct used_atom *atom,
 				    const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -475,7 +475,7 @@ static int person_email_atom_parser(const struct ref_format *format, struct used
 	return 0;
 }
 
-static int refname_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int refname_atom_parser(struct ref_format *format, struct used_atom *atom,
 			       const char *arg, struct strbuf *err)
 {
 	return refname_atom_parser_internal(&atom->u.refname, arg, atom->name, err);
@@ -492,7 +492,7 @@ static align_type parse_align_position(const char *s)
 	return -1;
 }
 
-static int align_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int align_atom_parser(struct ref_format *format, struct used_atom *atom,
 			     const char *arg, struct strbuf *err)
 {
 	struct align *align = &atom->u.align;
@@ -544,7 +544,7 @@ static int align_atom_parser(const struct ref_format *format, struct used_atom *
 	return 0;
 }
 
-static int if_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int if_atom_parser(struct ref_format *format, struct used_atom *atom,
 			  const char *arg, struct strbuf *err)
 {
 	if (!arg) {
@@ -559,7 +559,7 @@ static int if_atom_parser(const struct ref_format *format, struct used_atom *ato
 	return 0;
 }
 
-static int head_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *unused_err)
 {
 	atom->u.head = resolve_refdup("HEAD", RESOLVE_REF_READING, NULL, NULL);
@@ -570,7 +570,7 @@ static struct {
 	const char *name;
 	info_source source;
 	cmp_type cmp_type;
-	int (*parser)(const struct ref_format *format, struct used_atom *atom,
+	int (*parser)(struct ref_format *format, struct used_atom *atom,
 		      const char *arg, struct strbuf *err);
 } valid_atom[] = {
 	[ATOM_REFNAME] = { "refname", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -649,7 +649,7 @@ struct atom_value {
 /*
  * Used to parse format string and sort specifiers
  */
-static int parse_ref_filter_atom(const struct ref_format *format,
+static int parse_ref_filter_atom(struct ref_format *format,
 				 const char *atom, const char *ep,
 				 struct strbuf *err)
 {
@@ -2554,9 +2554,9 @@ static void append_literal(const char *cp, const char *ep, struct ref_formatting
 }
 
 int format_ref_array_item(struct ref_array_item *info,
-			   const struct ref_format *format,
-			   struct strbuf *final_buf,
-			   struct strbuf *error_buf)
+			  struct ref_format *format,
+			  struct strbuf *final_buf,
+			  struct strbuf *error_buf)
 {
 	const char *cp, *sp, *ep;
 	struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
@@ -2601,7 +2601,7 @@ int format_ref_array_item(struct ref_array_item *info,
 }
 
 void pretty_print_ref(const char *name, const struct object_id *oid,
-		      const struct ref_format *format)
+		      struct ref_format *format)
 {
 	struct ref_array_item *ref_item;
 	struct strbuf output = STRBUF_INIT;
diff --git a/ref-filter.h b/ref-filter.h
index baf72a71896..74fb423fc89 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -116,7 +116,7 @@ void ref_array_sort(struct ref_sorting *sort, struct ref_array *array);
 void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting, unsigned int mask, int on);
 /*  Based on the given format and quote_style, fill the strbuf */
 int format_ref_array_item(struct ref_array_item *info,
-			  const struct ref_format *format,
+			  struct ref_format *format,
 			  struct strbuf *final_buf,
 			  struct strbuf *error_buf);
 /*  Parse a single sort specifier and add it to the list */
@@ -137,7 +137,7 @@ void setup_ref_filter_porcelain_msg(void);
  * name must be a fully qualified refname.
  */
 void pretty_print_ref(const char *name, const struct object_id *oid,
-		      const struct ref_format *format);
+		      struct ref_format *format);
 
 /*
  * Push a single ref onto the array; this can be used to construct your own
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to let "cat-file --batch=%(rest)" use the ref-filter
interface, add %(rest) atom for ref-filter. "git for-each-ref",
"git branch", "git tag" and "git verify-tag" will reject %(rest)
by default.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c             | 21 +++++++++++++++++++++
 ref-filter.h             |  5 ++++-
 t/t3203-branch-output.sh |  4 ++++
 t/t6300-for-each-ref.sh  |  4 ++++
 t/t7004-tag.sh           |  4 ++++
 t/t7030-verify-tag.sh    |  4 ++++
 6 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index c8e561a3687..ee6c906f22d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -157,6 +157,7 @@ enum atom_type {
 	ATOM_IF,
 	ATOM_THEN,
 	ATOM_ELSE,
+	ATOM_REST,
 };
 
 /*
@@ -559,6 +560,15 @@ static int if_atom_parser(struct ref_format *format, struct used_atom *atom,
 	return 0;
 }
 
+static int rest_atom_parser(struct ref_format *format, struct used_atom *atom,
+			    const char *arg, struct strbuf *err)
+{
+	if (arg)
+		return strbuf_addf_ret(err, -1, _("%%(rest) does not take arguments"));
+	format->use_rest = 1;
+	return 0;
+}
+
 static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *unused_err)
 {
@@ -615,6 +625,7 @@ static struct {
 	[ATOM_IF] = { "if", SOURCE_NONE, FIELD_STR, if_atom_parser },
 	[ATOM_THEN] = { "then", SOURCE_NONE },
 	[ATOM_ELSE] = { "else", SOURCE_NONE },
+	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
 	/*
 	 * Please update $__git_ref_fieldlist in git-completion.bash
 	 * when you add new atoms
@@ -1010,6 +1021,9 @@ int verify_ref_format(struct ref_format *format)
 		if (at < 0)
 			die("%s", err.buf);
 
+		if (used_atom[at].atom_type == ATOM_REST)
+			die("this command reject atom %%(%.*s)", (int)(ep - sp - 2), sp + 2);
+
 		if ((format->quote_style == QUOTE_PYTHON ||
 		     format->quote_style == QUOTE_SHELL ||
 		     format->quote_style == QUOTE_TCL) &&
@@ -1928,6 +1942,12 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			v->handler = else_atom_handler;
 			v->s = xstrdup("");
 			continue;
+		} else if (atom_type == ATOM_REST) {
+			if (ref->rest)
+				v->s = xstrdup(ref->rest);
+			else
+				v->s = xstrdup("");
+			continue;
 		} else
 			continue;
 
@@ -2145,6 +2165,7 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
 
 	FLEX_ALLOC_STR(ref, refname, refname);
 	oidcpy(&ref->objectname, oid);
+	ref->rest = NULL;
 
 	return ref;
 }
diff --git a/ref-filter.h b/ref-filter.h
index 74fb423fc89..c15dee8d6b9 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -38,6 +38,7 @@ struct ref_sorting {
 
 struct ref_array_item {
 	struct object_id objectname;
+	const char *rest;
 	int flag;
 	unsigned int kind;
 	const char *symref;
@@ -76,14 +77,16 @@ struct ref_format {
 	 * verify_ref_format() afterwards to finalize.
 	 */
 	const char *format;
+	const char *rest;
 	int quote_style;
+	int use_rest;
 	int use_color;
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
 };
 
-#define REF_FORMAT_INIT { NULL, 0, -1 }
+#define REF_FORMAT_INIT { .use_color = -1 }
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 5325b9f67a0..6e94c6db7b5 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -340,6 +340,10 @@ test_expect_success 'git branch --format option' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git branch with --format=%(rest) must fail' '
+	test_must_fail git branch --format="%(rest)" >actual
+'
+
 test_expect_success 'worktree colors correct' '
 	cat >expect <<-EOF &&
 	* <GREEN>(HEAD detached from fromtag)<RESET>
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 3d15d0a5360..0d2e062f791 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1211,6 +1211,10 @@ test_expect_success 'basic atom: head contents:trailers' '
 	test_cmp expect actual.clean
 '
 
+test_expect_success 'basic atom: rest must fail' '
+	test_must_fail git for-each-ref --format="%(rest)" refs/heads/main
+'
+
 test_expect_success 'trailer parsing not fooled by --- line' '
 	git commit --allow-empty -F - <<-\EOF &&
 	this is the subject
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 2f72c5c6883..082be85dffc 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1998,6 +1998,10 @@ test_expect_success '--format should list tags as per format given' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git tag -l with --format="%(rest)" must fail' '
+	test_must_fail git tag -l --format="%(rest)" "v1*"
+'
+
 test_expect_success "set up color tests" '
 	echo "<RED>v1.0<RESET>" >expect.color &&
 	echo "v1.0" >expect.bare &&
diff --git a/t/t7030-verify-tag.sh b/t/t7030-verify-tag.sh
index 3cefde9602b..10faa645157 100755
--- a/t/t7030-verify-tag.sh
+++ b/t/t7030-verify-tag.sh
@@ -194,6 +194,10 @@ test_expect_success GPG 'verifying tag with --format' '
 	test_cmp expect actual
 '
 
+test_expect_success GPG 'verifying tag with --format="%(rest)" must fail' '
+	test_must_fail git verify-tag --format="%(rest)" "fourth-signed"
+'
+
 test_expect_success GPG 'verifying a forged tag with --format should fail silently' '
 	test_must_fail git verify-tag --format="tagname : %(tag)" $(cat forged1.tag) >actual-forged &&
 	test_must_be_empty actual-forged
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because in the refactor of `git cat-file --batch` later,
oid_object_info_extended() in get_object() will be used to obtain
the info of an object with it's oid. When the object cannot be
obtained in the git repository, `cat-file --batch` expects to output
"<oid> missing" and continue the next oid query instead of letting
Git exit. In other error conditions, Git should exit normally. So we
can achieve this function by passing the return value of get_object().

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index ee6c906f22d..3189872188a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1817,6 +1817,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 {
 	struct object *obj;
 	int i;
+	int ret;
 	struct object_info empty = OBJECT_INFO_INIT;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
@@ -1973,8 +1974,9 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 
 
 	oi.oid = ref->objectname;
-	if (get_object(ref, 0, &obj, &oi, err))
-		return -1;
+	ret = get_object(ref, 0, &obj, &oi, err);
+	if (ret)
+		return ret;
 
 	/*
 	 * If there is no atom that wants to know about tagged
@@ -2006,8 +2008,10 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
 			      struct atom_value **v, struct strbuf *err)
 {
 	if (!ref->value) {
-		if (populate_value(ref, err))
-			return -1;
+		int ret = populate_value(ref, err);
+
+		if (ret)
+			return ret;
 		fill_missing_values(ref->value);
 	}
 	*v = &ref->value[atom];
@@ -2581,6 +2585,7 @@ int format_ref_array_item(struct ref_array_item *info,
 {
 	const char *cp, *sp, *ep;
 	struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
+	int ret;
 
 	state.quote_style = format->quote_style;
 	push_stack_element(&state.stack);
@@ -2593,10 +2598,10 @@ int format_ref_array_item(struct ref_array_item *info,
 		if (cp < sp)
 			append_literal(cp, sp, &state);
 		pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf);
-		if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) ||
+		if (pos < 0 || (ret = get_ref_atom_value(info, pos, &atomv, error_buf)) ||
 		    atomv->handler(atomv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
-			return -1;
+			return ret ? ret : -1;
 		}
 	}
 	if (*cp) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom() ZheNing Hu via GitGitGadget
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

When we use ref_array_item which is not dynamically allocated and
want to free the space of its member "value" after the end of use,
free_array_item() does not meet our needs, because it tries to free
ref_array_item itself and its member "symref".

Introduce free_ref_array_item_value() for freeing ref_array_item value.
It will be called internally by free_array_item(), and it will help
`cat-file --batch` free ref_array_item's value memory later.

Helped-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 11 ++++++++---
 ref-filter.h |  2 ++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 3189872188a..80b09fce1d5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2292,16 +2292,21 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	return 0;
 }
 
-/*  Free memory allocated for a ref_array_item */
-static void free_array_item(struct ref_array_item *item)
+void free_ref_array_item_value(struct ref_array_item *item)
 {
-	free((char *)item->symref);
 	if (item->value) {
 		int i;
 		for (i = 0; i < used_atom_cnt; i++)
 			free((char *)item->value[i].s);
 		free(item->value);
 	}
+}
+
+/*  Free memory allocated for a ref_array_item */
+static void free_array_item(struct ref_array_item *item)
+{
+	free((char *)item->symref);
+	free_ref_array_item_value(item);
 	free(item);
 }
 
diff --git a/ref-filter.h b/ref-filter.h
index c15dee8d6b9..44e6dc05ac2 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -111,6 +111,8 @@ struct ref_format {
 int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type);
 /*  Clear all memory allocated to ref_array */
 void ref_array_clear(struct ref_array *array);
+/* Free ref_array_item's value */
+void free_ref_array_item_value(struct ref_array_item *item);
 /*  Used to verify if the given format is correct and to parse out the used atoms */
 int verify_ref_format(struct ref_format *format);
 /*  Sort the given ref_array as per the ref_sorting provided */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom()
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add `cat_file_mode` member in struct `ref_format` and introduce
the function `reject_atom()`, when `cat-file --batch` use ref-filter
logic later, it can help us reject atoms in verify_ref_format()
which cat-file cannot use, e.g. `%(refname)`, `%(push)`,
`%(upstream)`... or the atom `%(rest)` which for-each-ref family
cannot use.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 25 ++++++++++++++++++++++---
 ref-filter.h |  1 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 80b09fce1d5..27199ba40f5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1000,6 +1000,26 @@ static const char *find_next(const char *cp)
 	return NULL;
 }
 
+static int reject_atom(int cat_file_mode, enum atom_type atom_type)
+{
+	if (!cat_file_mode)
+		return atom_type == ATOM_REST;
+
+	/* cat_file_mode */
+	switch (atom_type) {
+	case ATOM_FLAG:
+	case ATOM_HEAD:
+	case ATOM_PUSH:
+	case ATOM_REFNAME:
+	case ATOM_SYMREF:
+	case ATOM_UPSTREAM:
+	case ATOM_WORKTREEPATH:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 /*
  * Make sure the format string is well formed, and parse out
  * the used atoms.
@@ -1020,9 +1040,8 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
-
-		if (used_atom[at].atom_type == ATOM_REST)
-			die("this command reject atom %%(%.*s)", (int)(ep - sp - 2), sp + 2);
+		if (reject_atom(format->cat_file_mode, used_atom[at].atom_type))
+			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
 
 		if ((format->quote_style == QUOTE_PYTHON ||
 		     format->quote_style == QUOTE_SHELL ||
diff --git a/ref-filter.h b/ref-filter.h
index 44e6dc05ac2..053980a6a42 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -78,6 +78,7 @@ struct ref_format {
 	 */
 	const char *format;
 	const char *rest;
+	int cat_file_mode;
 	int quote_style;
 	int use_rest;
 	int use_color;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom() ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 12/19] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Let get_object() return 1 and print "<oid> missing" instead
of returning -1 and printing "missing object <oid> for <refname>"
if oid_object_info_extended() unable to find the data corresponding
to oid. When `cat-file --batch` use ref-filter logic later it can
help `format_ref_array_item()` just report that the object is missing
without letting Git exit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c                   | 4 ++--
 t/t6301-for-each-ref-errors.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 27199ba40f5..b4f41fec871 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1762,8 +1762,8 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 	}
 	if (oid_object_info_extended(the_repository, &oi->oid, &oi->info,
 				     OBJECT_INFO_LOOKUP_REPLACE))
-		return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
-				       oid_to_hex(&oi->oid), ref->refname);
+		return strbuf_addf_ret(err, 1, _("%s missing"),
+				       oid_to_hex(&oi->oid));
 	if (oi->info.disk_sizep && oi->disk_size < 0)
 		BUG("Object size is less than zero.");
 
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index 40edf9dab53..3553f84a00c 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -41,7 +41,7 @@ test_expect_success 'Missing objects are reported correctly' '
 	r=refs/heads/missing &&
 	echo $MISSING >.git/$r &&
 	test_when_finished "rm -f .git/$r" &&
-	echo "fatal: missing object $MISSING for $r" >missing-err &&
+	echo "fatal: $MISSING missing" >missing-err &&
 	test_must_fail git for-each-ref 2>err &&
 	test_cmp missing-err err &&
 	(
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 12/19] [GSOC] cat-file: add has_object_file() check
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (10 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Use `has_object_file()` in `batch_one_object()` to check
whether the input object exists. This can help us reject
the missing oid when we let `cat-file --batch` use ref-filter
logic later.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 243fe6844bc..59a86412fd0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -428,6 +428,13 @@ static void batch_one_object(const char *obj_name,
 		return;
 	}
 
+	if (!has_object_file(&data->oid)) {
+		printf("%s missing\n",
+		       obj_name ? obj_name : oid_to_hex(&data->oid));
+		fflush(stdout);
+		return;
+	}
+
 	batch_object_write(obj_name, scratch, opt, data);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (11 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 12/19] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because later cat-file reuses ref-filter logic that will add
parameter "const struct option *options" to batch_objects(),
the two synonymous parameters of "opt" and "options" may
confuse readers, so change batch_options parameter of
batch_objects() from "opt" to "batch".

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 59a86412fd0..41d407638d5 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -495,7 +495,7 @@ static int batch_unordered_packed(const struct object_id *oid,
 	return batch_unordered_object(oid, data);
 }
 
-static int batch_objects(struct batch_options *opt)
+static int batch_objects(struct batch_options *batch)
 {
 	struct strbuf input = STRBUF_INIT;
 	struct strbuf output = STRBUF_INIT;
@@ -503,8 +503,8 @@ static int batch_objects(struct batch_options *opt)
 	int save_warning;
 	int retval = 0;
 
-	if (!opt->format)
-		opt->format = "%(objectname) %(objecttype) %(objectsize)";
+	if (!batch->format)
+		batch->format = "%(objectname) %(objecttype) %(objectsize)";
 
 	/*
 	 * Expand once with our special mark_query flag, which will prime the
@@ -513,20 +513,20 @@ static int batch_objects(struct batch_options *opt)
 	 */
 	memset(&data, 0, sizeof(data));
 	data.mark_query = 1;
-	strbuf_expand(&output, opt->format, expand_format, &data);
+	strbuf_expand(&output, batch->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (batch->cmdmode)
 		data.split_on_whitespace = 1;
 
 	/*
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (batch->print_contents)
 		data.info.typep = &data.type;
 
-	if (opt->all_objects) {
+	if (batch->all_objects) {
 		struct object_cb_data cb;
 		struct object_info empty = OBJECT_INFO_INIT;
 
@@ -536,11 +536,11 @@ static int batch_objects(struct batch_options *opt)
 		if (has_promisor_remote())
 			warning("This repository uses promisor remotes. Some objects may not be loaded.");
 
-		cb.opt = opt;
+		cb.opt = batch;
 		cb.expand = &data;
 		cb.scratch = &output;
 
-		if (opt->unordered) {
+		if (batch->unordered) {
 			struct oidset seen = OIDSET_INIT;
 
 			cb.seen = &seen;
@@ -590,7 +590,7 @@ static int batch_objects(struct batch_options *opt)
 			data.rest = p;
 		}
 
-		batch_one_object(input.buf, &output, opt, &data);
+		batch_one_object(input.buf, &output, batch, &data);
 	}
 
 	strbuf_release(&input);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (12 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 13:17   ` Christian Couder
  2021-07-12 11:46 ` [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to let cat-file use ref-filter logic, let's do the
following:

1. Change the type of member `format` in struct `batch_options`
to `ref_format`, we will pass it to ref-filter later.
2. Let `batch_objects()` add atoms to format, and use
`verify_ref_format()` to check atoms.
3. Use `format_ref_array_item()` in `batch_object_write()` to
get the formatted data corresponding to the object. If the
return value of `format_ref_array_item()` is equals to zero,
use `batch_write()` to print object data; else if the return
value is less than zero, use `die()` to print the error message
and exit; else if return value is greater than zero, only print
the error message, but don't exit.
4. Use free_ref_array_item_value() to free ref_array_item's
value.

Most of the atoms in `for-each-ref --format` are now supported,
such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
`%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
that pointed to by a ref, "for-each-ref"'s family can naturally use
these atoms, but not all objects are pointed to be a ref, so "cat-file"
will not be able to use them.

The performance for `git cat-file --batch-all-objects
--batch-check` on the Git repository itself with performance
testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
1.134 s ±  0.063 s.

The performance for `git cat-file --batch-all-objects --batch
>/dev/null` on the Git repository itself with performance testing
tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
total" to "33.69s user 1.54s system 87% cpu 40.258 total".

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-cat-file.txt |   6 +
 builtin/cat-file.c             | 242 ++++++-------------------------
 t/t1006-cat-file.sh            | 251 +++++++++++++++++++++++++++++++++
 3 files changed, 304 insertions(+), 195 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 4eb0421b3fd..ef8ab952b2f 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -226,6 +226,12 @@ newline. The available atoms are:
 	after that first run of whitespace (i.e., the "rest" of the
 	line) are output in place of the `%(rest)` atom.
 
+Note that most of the atoms in `for-each-ref --format` are now supported,
+such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
+`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
+`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
+`%(flag)`, `%(HEAD)`. See linkgit:git-for-each-ref[1].
+
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 41d407638d5..5b163551fc6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "ref-filter.h"
 
 struct batch_options {
 	int enabled;
@@ -25,7 +26,7 @@ struct batch_options {
 	int all_objects;
 	int unordered;
 	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
-	const char *format;
+	struct ref_format format;
 };
 
 static const char *force_path;
@@ -195,99 +196,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 struct expand_data {
 	struct object_id oid;
-	enum object_type type;
-	unsigned long size;
-	off_t disk_size;
 	const char *rest;
-	struct object_id delta_base_oid;
-
-	/*
-	 * If mark_query is true, we do not expand anything, but rather
-	 * just mark the object_info with items we wish to query.
-	 */
-	int mark_query;
-
-	/*
-	 * Whether to split the input on whitespace before feeding it to
-	 * get_sha1; this is decided during the mark_query phase based on
-	 * whether we have a %(rest) token in our format.
-	 */
 	int split_on_whitespace;
-
-	/*
-	 * After a mark_query run, this object_info is set up to be
-	 * passed to oid_object_info_extended. It will point to the data
-	 * elements above, so you can retrieve the response from there.
-	 */
-	struct object_info info;
-
-	/*
-	 * This flag will be true if the requested batch format and options
-	 * don't require us to call oid_object_info, which can then be
-	 * optimized out.
-	 */
-	unsigned skip_object_info : 1;
 };
 
-static int is_atom(const char *atom, const char *s, int slen)
-{
-	int alen = strlen(atom);
-	return alen == slen && !memcmp(atom, s, alen);
-}
-
-static void expand_atom(struct strbuf *sb, const char *atom, int len,
-			void *vdata)
-{
-	struct expand_data *data = vdata;
-
-	if (is_atom("objectname", atom, len)) {
-		if (!data->mark_query)
-			strbuf_addstr(sb, oid_to_hex(&data->oid));
-	} else if (is_atom("objecttype", atom, len)) {
-		if (data->mark_query)
-			data->info.typep = &data->type;
-		else
-			strbuf_addstr(sb, type_name(data->type));
-	} else if (is_atom("objectsize", atom, len)) {
-		if (data->mark_query)
-			data->info.sizep = &data->size;
-		else
-			strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size);
-	} else if (is_atom("objectsize:disk", atom, len)) {
-		if (data->mark_query)
-			data->info.disk_sizep = &data->disk_size;
-		else
-			strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size);
-	} else if (is_atom("rest", atom, len)) {
-		if (data->mark_query)
-			data->split_on_whitespace = 1;
-		else if (data->rest)
-			strbuf_addstr(sb, data->rest);
-	} else if (is_atom("deltabase", atom, len)) {
-		if (data->mark_query)
-			data->info.delta_base_oid = &data->delta_base_oid;
-		else
-			strbuf_addstr(sb,
-				      oid_to_hex(&data->delta_base_oid));
-	} else
-		die("unknown format element: %.*s", len, atom);
-}
-
-static size_t expand_format(struct strbuf *sb, const char *start, void *data)
-{
-	const char *end;
-
-	if (*start != '(')
-		return 0;
-	end = strchr(start + 1, ')');
-	if (!end)
-		die("format element '%s' does not end in ')'", start);
-
-	expand_atom(sb, start + 1, end - start - 1, data);
-
-	return end - start + 1;
-}
-
 static void batch_write(struct batch_options *opt, const void *data, int len)
 {
 	if (opt->buffer_output) {
@@ -297,87 +209,34 @@ static void batch_write(struct batch_options *opt, const void *data, int len)
 		write_or_die(1, data, len);
 }
 
-static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
-{
-	const struct object_id *oid = &data->oid;
-
-	assert(data->info.typep);
-
-	if (data->type == OBJ_BLOB) {
-		if (opt->buffer_output)
-			fflush(stdout);
-		if (opt->cmdmode) {
-			char *contents;
-			unsigned long size;
-
-			if (!data->rest)
-				die("missing path for '%s'", oid_to_hex(oid));
-
-			if (opt->cmdmode == 'w') {
-				if (filter_object(data->rest, 0100644, oid,
-						  &contents, &size))
-					die("could not convert '%s' %s",
-					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
-				enum object_type type;
-				if (!textconv_object(the_repository,
-						     data->rest, 0100644, oid,
-						     1, &contents, &size))
-					contents = read_object_file(oid,
-								    &type,
-								    &size);
-				if (!contents)
-					die("could not convert '%s' %s",
-					    oid_to_hex(oid), data->rest);
-			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
-			batch_write(opt, contents, size);
-			free(contents);
-		} else {
-			stream_blob(oid);
-		}
-	}
-	else {
-		enum object_type type;
-		unsigned long size;
-		void *contents;
-
-		contents = read_object_file(oid, &type, &size);
-		if (!contents)
-			die("object %s disappeared", oid_to_hex(oid));
-		if (type != data->type)
-			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
-			die("object %s changed size!?", oid_to_hex(oid));
-
-		batch_write(opt, contents, size);
-		free(contents);
-	}
-}
 
 static void batch_object_write(const char *obj_name,
 			       struct strbuf *scratch,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
-	if (!data->skip_object_info &&
-	    oid_object_info_extended(the_repository, &data->oid, &data->info,
-				     OBJECT_INFO_LOOKUP_REPLACE) < 0) {
-		printf("%s missing\n",
-		       obj_name ? obj_name : oid_to_hex(&data->oid));
-		fflush(stdout);
-		return;
-	}
+	int ret;
+	struct strbuf err = STRBUF_INIT;
+	struct ref_array_item item = { data->oid, data->rest };
 
 	strbuf_reset(scratch);
-	strbuf_expand(scratch, opt->format, expand_format, data);
-	strbuf_addch(scratch, '\n');
-	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
-		print_object_or_die(opt, data);
-		batch_write(opt, "\n", 1);
+	ret = format_ref_array_item(&item, &opt->format, scratch, &err);
+	if (ret < 0)
+		die("%s\n", err.buf);
+	if (ret) {
+		/* ret > 0 means when the object corresponding to oid
+		 * cannot be found in format_ref_array_item(), we only print
+		 * the error message.
+		 */
+		printf("%s\n", err.buf);
+		fflush(stdout);
+	} else {
+		strbuf_addch(scratch, '\n');
+		batch_write(opt, scratch->buf, scratch->len);
 	}
+	free_ref_array_item_value(&item);
+	strbuf_release(&err);
 }
 
 static void batch_one_object(const char *obj_name,
@@ -495,43 +354,37 @@ static int batch_unordered_packed(const struct object_id *oid,
 	return batch_unordered_object(oid, data);
 }
 
-static int batch_objects(struct batch_options *batch)
+static const char * const cat_file_usage[] = {
+	N_("git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>"),
+	N_("git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]"),
+	NULL
+};
+
+static int batch_objects(struct batch_options *batch, const struct option *options)
 {
 	struct strbuf input = STRBUF_INIT;
 	struct strbuf output = STRBUF_INIT;
+	struct strbuf format = STRBUF_INIT;
 	struct expand_data data;
 	int save_warning;
 	int retval = 0;
 
-	if (!batch->format)
-		batch->format = "%(objectname) %(objecttype) %(objectsize)";
-
-	/*
-	 * Expand once with our special mark_query flag, which will prime the
-	 * object_info to be handed to oid_object_info_extended for each
-	 * object.
-	 */
 	memset(&data, 0, sizeof(data));
-	data.mark_query = 1;
-	strbuf_expand(&output, batch->format, expand_format, &data);
-	data.mark_query = 0;
-	strbuf_release(&output);
-	if (batch->cmdmode)
-		data.split_on_whitespace = 1;
-
-	/*
-	 * If we are printing out the object, then always fill in the type,
-	 * since we will want to decide whether or not to stream.
-	 */
+	if (batch->format.format)
+		strbuf_addstr(&format, batch->format.format);
+	else
+		strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)");
 	if (batch->print_contents)
-		data.info.typep = &data.type;
+		strbuf_addstr(&format, "\n%(raw)");
+	batch->format.format = format.buf;
+	if (verify_ref_format(&batch->format))
+		usage_with_options(cat_file_usage, options);
+
+	if (batch->cmdmode || batch->format.use_rest)
+		data.split_on_whitespace = 1;
 
 	if (batch->all_objects) {
 		struct object_cb_data cb;
-		struct object_info empty = OBJECT_INFO_INIT;
-
-		if (!memcmp(&data.info, &empty, sizeof(empty)))
-			data.skip_object_info = 1;
 
 		if (has_promisor_remote())
 			warning("This repository uses promisor remotes. Some objects may not be loaded.");
@@ -561,6 +414,7 @@ static int batch_objects(struct batch_options *batch)
 			oid_array_clear(&sa);
 		}
 
+		strbuf_release(&format);
 		strbuf_release(&output);
 		return 0;
 	}
@@ -593,18 +447,13 @@ static int batch_objects(struct batch_options *batch)
 		batch_one_object(input.buf, &output, batch, &data);
 	}
 
+	strbuf_release(&format);
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
 	return retval;
 }
 
-static const char * const cat_file_usage[] = {
-	N_("git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>"),
-	N_("git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]"),
-	NULL
-};
-
 static int git_cat_file_config(const char *var, const char *value, void *cb)
 {
 	if (userdiff_config(var, value) < 0)
@@ -627,7 +476,7 @@ static int batch_option_callback(const struct option *opt,
 
 	bo->enabled = 1;
 	bo->print_contents = !strcmp(opt->long_name, "batch");
-	bo->format = arg;
+	bo->format.format = arg;
 
 	return 0;
 }
@@ -636,7 +485,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 {
 	int opt = 0;
 	const char *exp_type = NULL, *obj_name = NULL;
-	struct batch_options batch = {0};
+	struct batch_options batch = {
+		.format = REF_FORMAT_INIT
+	};
 	int unknown_type = 0;
 
 	const struct option options[] = {
@@ -675,6 +526,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	git_config(git_cat_file_config, NULL);
 
 	batch.buffer_output = -1;
+	batch.format.cat_file_mode = 1;
 	argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
 
 	if (opt) {
@@ -718,7 +570,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		batch.buffer_output = batch.all_objects;
 
 	if (batch.enabled)
-		return batch_objects(&batch);
+		return batch_objects(&batch, options);
 
 	if (unknown_type && opt != 't' && opt != 's')
 		die("git cat-file --allow-unknown-type: use with -s or -t");
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..7452404f24a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -607,5 +607,256 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
 	git -C all-two cat-file --batch-all-objects --batch="batman" >actual &&
 	cmp expect actual
 '
+. "$TEST_DIRECTORY"/lib-gpg.sh
+. "$TEST_DIRECTORY"/lib-terminal.sh
+
+test_expect_success 'cat-file --batch|--batch-check setup' '
+	echo 1>blob1 &&
+	printf "a\0b\0\c" >blob2 &&
+	git add blob1 blob2 &&
+	git commit -m "Commit Message" &&
+	git branch -M main &&
+	git tag -a -m "v0.0.0" testtag &&
+	git update-ref refs/myblobs/blob1 HEAD:blob1 &&
+	git update-ref refs/myblobs/blob2 HEAD:blob2 &&
+	git update-ref refs/mytrees/tree1 HEAD^{tree}
+'
+
+batch_test_atom() {
+	if test "$3" = "fail"
+	then
+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2 must fail" "
+			test_must_fail git cat-file --batch-check='$2' >bad <<-EOF
+			$1
+			EOF
+		"
+	else
+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
+			git for-each-ref --format='$2' $1 >expected &&
+			git cat-file --batch-check='$2' >actual <<-EOF &&
+			$1
+			EOF
+			sanitize_pgp <actual >actual.clean &&
+			cmp expected actual.clean
+		"
+	fi
+}
+
+batch_test_atom refs/heads/main '%(refname)' fail
+batch_test_atom refs/heads/main '%(refname:)' fail
+batch_test_atom refs/heads/main '%(refname:short)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=1)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=2)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=-1)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=-2)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=1)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=2)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=-1)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=-2)' fail
+batch_test_atom refs/heads/main '%(refname:strip=1)' fail
+batch_test_atom refs/heads/main '%(refname:strip=2)' fail
+batch_test_atom refs/heads/main '%(refname:strip=-1)' fail
+batch_test_atom refs/heads/main '%(refname:strip=-2)' fail
+batch_test_atom refs/heads/main '%(upstream)' fail
+batch_test_atom refs/heads/main '%(upstream:short)' fail
+batch_test_atom refs/heads/main '%(upstream:lstrip=2)' fail
+batch_test_atom refs/heads/main '%(upstream:lstrip=-2)' fail
+batch_test_atom refs/heads/main '%(upstream:rstrip=2)' fail
+batch_test_atom refs/heads/main '%(upstream:rstrip=-2)' fail
+batch_test_atom refs/heads/main '%(upstream:strip=2)' fail
+batch_test_atom refs/heads/main '%(upstream:strip=-2)' fail
+batch_test_atom refs/heads/main '%(push)' fail
+batch_test_atom refs/heads/main '%(push:short)' fail
+batch_test_atom refs/heads/main '%(push:lstrip=1)' fail
+batch_test_atom refs/heads/main '%(push:lstrip=-1)' fail
+batch_test_atom refs/heads/main '%(push:rstrip=1)' fail
+batch_test_atom refs/heads/main '%(push:rstrip=-1)' fail
+batch_test_atom refs/heads/main '%(push:strip=1)' fail
+batch_test_atom refs/heads/main '%(push:strip=-1)' fail
+batch_test_atom refs/heads/main '%(objecttype)'
+batch_test_atom refs/heads/main '%(objectsize)'
+batch_test_atom refs/heads/main '%(objectsize:disk)'
+batch_test_atom refs/heads/main '%(deltabase)'
+batch_test_atom refs/heads/main '%(objectname)'
+batch_test_atom refs/heads/main '%(objectname:short)'
+batch_test_atom refs/heads/main '%(objectname:short=1)'
+batch_test_atom refs/heads/main '%(objectname:short=10)'
+batch_test_atom refs/heads/main '%(tree)'
+batch_test_atom refs/heads/main '%(tree:short)'
+batch_test_atom refs/heads/main '%(tree:short=1)'
+batch_test_atom refs/heads/main '%(tree:short=10)'
+batch_test_atom refs/heads/main '%(parent)'
+batch_test_atom refs/heads/main '%(parent:short)'
+batch_test_atom refs/heads/main '%(parent:short=1)'
+batch_test_atom refs/heads/main '%(parent:short=10)'
+batch_test_atom refs/heads/main '%(numparent)'
+batch_test_atom refs/heads/main '%(object)'
+batch_test_atom refs/heads/main '%(type)'
+batch_test_atom refs/heads/main '%(raw)'
+batch_test_atom refs/heads/main '%(*objectname)'
+batch_test_atom refs/heads/main '%(*objecttype)'
+batch_test_atom refs/heads/main '%(author)'
+batch_test_atom refs/heads/main '%(authorname)'
+batch_test_atom refs/heads/main '%(authoremail)'
+batch_test_atom refs/heads/main '%(authoremail:trim)'
+batch_test_atom refs/heads/main '%(authoremail:localpart)'
+batch_test_atom refs/heads/main '%(authordate)'
+batch_test_atom refs/heads/main '%(committer)'
+batch_test_atom refs/heads/main '%(committername)'
+batch_test_atom refs/heads/main '%(committeremail)'
+batch_test_atom refs/heads/main '%(committeremail:trim)'
+batch_test_atom refs/heads/main '%(committeremail:localpart)'
+batch_test_atom refs/heads/main '%(committerdate)'
+batch_test_atom refs/heads/main '%(tag)'
+batch_test_atom refs/heads/main '%(tagger)'
+batch_test_atom refs/heads/main '%(taggername)'
+batch_test_atom refs/heads/main '%(taggeremail)'
+batch_test_atom refs/heads/main '%(taggeremail:trim)'
+batch_test_atom refs/heads/main '%(taggeremail:localpart)'
+batch_test_atom refs/heads/main '%(taggerdate)'
+batch_test_atom refs/heads/main '%(creator)'
+batch_test_atom refs/heads/main '%(creatordate)'
+batch_test_atom refs/heads/main '%(subject)'
+batch_test_atom refs/heads/main '%(subject:sanitize)'
+batch_test_atom refs/heads/main '%(contents:subject)'
+batch_test_atom refs/heads/main '%(body)'
+batch_test_atom refs/heads/main '%(contents:body)'
+batch_test_atom refs/heads/main '%(contents:signature)'
+batch_test_atom refs/heads/main '%(contents)'
+batch_test_atom refs/heads/main '%(HEAD)' fail
+batch_test_atom refs/heads/main '%(upstream:track)' fail
+batch_test_atom refs/heads/main '%(upstream:trackshort)' fail
+batch_test_atom refs/heads/main '%(upstream:track,nobracket)' fail
+batch_test_atom refs/heads/main '%(upstream:nobracket,track)' fail
+batch_test_atom refs/heads/main '%(push:track)' fail
+batch_test_atom refs/heads/main '%(push:trackshort)' fail
+batch_test_atom refs/heads/main '%(worktreepath)' fail
+batch_test_atom refs/heads/main '%(symref)' fail
+batch_test_atom refs/heads/main '%(flag)' fail
+
+batch_test_atom refs/tags/testtag '%(refname)' fail
+batch_test_atom refs/tags/testtag '%(refname:short)' fail
+batch_test_atom refs/tags/testtag '%(upstream)' fail
+batch_test_atom refs/tags/testtag '%(push)' fail
+batch_test_atom refs/tags/testtag '%(objecttype)'
+batch_test_atom refs/tags/testtag '%(objectsize)'
+batch_test_atom refs/tags/testtag '%(objectsize:disk)'
+batch_test_atom refs/tags/testtag '%(*objectsize:disk)'
+batch_test_atom refs/tags/testtag '%(deltabase)'
+batch_test_atom refs/tags/testtag '%(*deltabase)'
+batch_test_atom refs/tags/testtag '%(objectname)'
+batch_test_atom refs/tags/testtag '%(objectname:short)'
+batch_test_atom refs/tags/testtag '%(tree)'
+batch_test_atom refs/tags/testtag '%(tree:short)'
+batch_test_atom refs/tags/testtag '%(tree:short=1)'
+batch_test_atom refs/tags/testtag '%(tree:short=10)'
+batch_test_atom refs/tags/testtag '%(parent)'
+batch_test_atom refs/tags/testtag '%(parent:short)'
+batch_test_atom refs/tags/testtag '%(parent:short=1)'
+batch_test_atom refs/tags/testtag '%(parent:short=10)'
+batch_test_atom refs/tags/testtag '%(numparent)'
+batch_test_atom refs/tags/testtag '%(object)'
+batch_test_atom refs/tags/testtag '%(type)'
+batch_test_atom refs/tags/testtag '%(*objectname)'
+batch_test_atom refs/tags/testtag '%(*objecttype)'
+batch_test_atom refs/tags/testtag '%(author)'
+batch_test_atom refs/tags/testtag '%(authorname)'
+batch_test_atom refs/tags/testtag '%(authoremail)'
+batch_test_atom refs/tags/testtag '%(authoremail:trim)'
+batch_test_atom refs/tags/testtag '%(authoremail:localpart)'
+batch_test_atom refs/tags/testtag '%(authordate)'
+batch_test_atom refs/tags/testtag '%(committer)'
+batch_test_atom refs/tags/testtag '%(committername)'
+batch_test_atom refs/tags/testtag '%(committeremail)'
+batch_test_atom refs/tags/testtag '%(committeremail:trim)'
+batch_test_atom refs/tags/testtag '%(committeremail:localpart)'
+batch_test_atom refs/tags/testtag '%(committerdate)'
+batch_test_atom refs/tags/testtag '%(tag)'
+batch_test_atom refs/tags/testtag '%(tagger)'
+batch_test_atom refs/tags/testtag '%(taggername)'
+batch_test_atom refs/tags/testtag '%(taggeremail)'
+batch_test_atom refs/tags/testtag '%(taggeremail:trim)'
+batch_test_atom refs/tags/testtag '%(taggeremail:localpart)'
+batch_test_atom refs/tags/testtag '%(taggerdate)'
+batch_test_atom refs/tags/testtag '%(creator)'
+batch_test_atom refs/tags/testtag '%(creatordate)'
+batch_test_atom refs/tags/testtag '%(subject)'
+batch_test_atom refs/tags/testtag '%(subject:sanitize)'
+batch_test_atom refs/tags/testtag '%(contents:subject)'
+batch_test_atom refs/tags/testtag '%(body)'
+batch_test_atom refs/tags/testtag '%(contents:body)'
+batch_test_atom refs/tags/testtag '%(contents:signature)'
+batch_test_atom refs/tags/testtag '%(contents)'
+batch_test_atom refs/tags/testtag '%(HEAD)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(refname)' fail
+batch_test_atom refs/myblobs/blob1 '%(upstream)' fail
+batch_test_atom refs/myblobs/blob1 '%(push)' fail
+batch_test_atom refs/myblobs/blob1 '%(HEAD)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(objectname)'
+batch_test_atom refs/myblobs/blob1 '%(objecttype)'
+batch_test_atom refs/myblobs/blob1 '%(objectsize)'
+batch_test_atom refs/myblobs/blob1 '%(objectsize:disk)'
+batch_test_atom refs/myblobs/blob1 '%(deltabase)'
+
+batch_test_atom refs/myblobs/blob1 '%(contents)'
+batch_test_atom refs/myblobs/blob2 '%(contents)'
+
+batch_test_atom refs/myblobs/blob1 '%(raw)'
+batch_test_atom refs/myblobs/blob2 '%(raw)'
+batch_test_atom refs/mytrees/tree1 '%(raw)'
+
+batch_test_atom refs/myblobs/blob1 '%(raw:size)'
+batch_test_atom refs/myblobs/blob2 '%(raw:size)'
+batch_test_atom refs/mytrees/tree1 '%(raw:size)'
+
+batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)'
+batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)'
+batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)'
+
+batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)'
+batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)'
+batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)'
+batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)'
+
+batch_test_atom refs/heads/main 'VALID'
+batch_test_atom refs/heads/main '%(INVALID)' fail
+batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
+
+test_expect_success '%(rest) works with both a branch and a tag' '
+	cat >expected <<-EOF &&
+	123 commit 123
+	456 tag 456
+	EOF
+	git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF &&
+	refs/heads/main 123
+	refs/tags/testtag 456
+	EOF
+	test_cmp expected actual
+'
+
+batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
+%(raw)'
+batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
+%(raw)'
+batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
+%(raw)'
+batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
+%(raw)'
+
+
+test_expect_success 'cat-file --batch equals to --batch-check with atoms' '
+	git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)
+%(raw)" >expected <<-EOF &&
+	refs/heads/main
+	refs/tags/testtag
+	EOF
+	git cat-file --batch >actual <<-EOF &&
+	refs/heads/main
+	refs/tags/testtag
+	EOF
+	cmp expected actual
+'
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write()
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (13 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Reuse the `err` buffer in batch_object_write(), as the
buffer `scratch` does. This will reduce the overhead
of multiple allocations of memory of the err buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5b163551fc6..dc604a9879d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -212,35 +212,36 @@ static void batch_write(struct batch_options *opt, const void *data, int len)
 
 static void batch_object_write(const char *obj_name,
 			       struct strbuf *scratch,
+			       struct strbuf *err,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
 	int ret;
-	struct strbuf err = STRBUF_INIT;
 	struct ref_array_item item = { data->oid, data->rest };
 
 	strbuf_reset(scratch);
+	strbuf_reset(err);
 
-	ret = format_ref_array_item(&item, &opt->format, scratch, &err);
+	ret = format_ref_array_item(&item, &opt->format, scratch, err);
 	if (ret < 0)
-		die("%s\n", err.buf);
+		die("%s\n", err->buf);
 	if (ret) {
 		/* ret > 0 means when the object corresponding to oid
 		 * cannot be found in format_ref_array_item(), we only print
 		 * the error message.
 		 */
-		printf("%s\n", err.buf);
+		printf("%s\n", err->buf);
 		fflush(stdout);
 	} else {
 		strbuf_addch(scratch, '\n');
 		batch_write(opt, scratch->buf, scratch->len);
 	}
 	free_ref_array_item_value(&item);
-	strbuf_release(&err);
 }
 
 static void batch_one_object(const char *obj_name,
 			     struct strbuf *scratch,
+			     struct strbuf *err,
 			     struct batch_options *opt,
 			     struct expand_data *data)
 {
@@ -294,7 +295,7 @@ static void batch_one_object(const char *obj_name,
 		return;
 	}
 
-	batch_object_write(obj_name, scratch, opt, data);
+	batch_object_write(obj_name, scratch, err, opt, data);
 }
 
 struct object_cb_data {
@@ -302,13 +303,14 @@ struct object_cb_data {
 	struct expand_data *expand;
 	struct oidset *seen;
 	struct strbuf *scratch;
+	struct strbuf *err;
 };
 
 static int batch_object_cb(const struct object_id *oid, void *vdata)
 {
 	struct object_cb_data *data = vdata;
 	oidcpy(&data->expand->oid, oid);
-	batch_object_write(NULL, data->scratch, data->opt, data->expand);
+	batch_object_write(NULL, data->scratch, data->err, data->opt, data->expand);
 	return 0;
 }
 
@@ -364,6 +366,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 {
 	struct strbuf input = STRBUF_INIT;
 	struct strbuf output = STRBUF_INIT;
+	struct strbuf err = STRBUF_INIT;
 	struct strbuf format = STRBUF_INIT;
 	struct expand_data data;
 	int save_warning;
@@ -392,6 +395,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 		cb.opt = batch;
 		cb.expand = &data;
 		cb.scratch = &output;
+		cb.err = &err;
 
 		if (batch->unordered) {
 			struct oidset seen = OIDSET_INIT;
@@ -416,6 +420,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 
 		strbuf_release(&format);
 		strbuf_release(&output);
+		strbuf_release(&err);
 		return 0;
 	}
 
@@ -444,12 +449,13 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 			data.rest = p;
 		}
 
-		batch_one_object(input.buf, &output, batch, &data);
+		batch_one_object(input.buf, &output, &err, batch, &data);
 	}
 
 	strbuf_release(&format);
 	strbuf_release(&input);
 	strbuf_release(&output);
+	strbuf_release(&err);
 	warn_on_object_refname_ambiguity = save_warning;
 	return retval;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (14 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

After cat-file reuses the ref-filter logic, we re-implement the
functions of --textconv and --filters options.

Add members `cat_file_cmdmode` to struct `ref_array_item`,
so that struct `batch_option` member `cmdmode` will be passed
to ref-filter, and then ref-filter will take use of it to filter
the content of the object in get_object().

Use `actual_oi` to record the real expand_data: it may point to the
original `oi` or the `act_oi` processed by `textconv_object()` or
`convert_to_working_tree()`. `grab_values()` will grab the contents
of `actual_oi` and `grab_common_values()` to grab the contents of origin
`oi`, this ensures that `%(objectsize)` still uses the size of the
unfiltered data.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c |  2 +-
 ref-filter.c       | 35 +++++++++++++++++++++++++++++++++--
 ref-filter.h       |  1 +
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index dc604a9879d..3a6153e778f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -217,7 +217,7 @@ static void batch_object_write(const char *obj_name,
 			       struct expand_data *data)
 {
 	int ret;
-	struct ref_array_item item = { data->oid, data->rest };
+	struct ref_array_item item = { data->oid, data->rest, opt->cmdmode };
 
 	strbuf_reset(scratch);
 	strbuf_reset(err);
diff --git a/ref-filter.c b/ref-filter.c
index b4f41fec871..91e26c9aba3 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1,3 +1,4 @@
+#define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
 #include "cache.h"
 #include "parse-options.h"
@@ -1755,6 +1756,9 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 {
 	/* parse_object_buffer() will set eaten to 0 if free() will be needed */
 	int eaten = 1;
+	struct expand_data *actual_oi = oi;
+	struct expand_data act_oi = {0};
+
 	if (oi->info.contentp) {
 		/* We need to know that to use parse_object_buffer properly */
 		oi->info.sizep = &oi->size;
@@ -1768,19 +1772,45 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 		BUG("Object size is less than zero.");
 
 	if (oi->info.contentp) {
-		*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
+		if ((ref->cat_file_cmdmode == 'c' || ref->cat_file_cmdmode == 'w') && !ref->rest)
+			return strbuf_addf_ret(err, -1, _("missing path for '%s'"),
+					       oid_to_hex(&act_oi.oid));
+		if (oi->type == OBJ_BLOB) {
+			if (ref->cat_file_cmdmode == 'c') {
+				act_oi = *oi;
+				if (textconv_object(the_repository,
+						    ref->rest, 0100644, &act_oi.oid,
+						    1, (char **)(&act_oi.content), &act_oi.size))
+					actual_oi = &act_oi;
+			} else if (ref->cat_file_cmdmode == 'w') {
+				struct strbuf strbuf = STRBUF_INIT;
+				struct checkout_metadata meta;
+				act_oi = *oi;
+
+				init_checkout_metadata(&meta, NULL, NULL, &act_oi.oid);
+				if (!convert_to_working_tree(&the_index, ref->rest, act_oi.content, act_oi.size, &strbuf, &meta))
+					die("could not convert '%s' %s",
+					    oid_to_hex(&oi->oid), ref->rest);
+				act_oi.size = strbuf.len;
+				act_oi.content = strbuf_detach(&strbuf, NULL);
+				actual_oi = &act_oi;
+			}
+		}
+		*obj = parse_object_buffer(the_repository, &actual_oi->oid, actual_oi->type, actual_oi->size, actual_oi->content, &eaten);
 		if (!*obj) {
 			if (!eaten)
 				free(oi->content);
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi);
+		grab_values(ref->value, deref, *obj, actual_oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
 	if (!eaten)
 		free(oi->content);
+	if (actual_oi != oi)
+		free(actual_oi->content);
 	return 0;
 }
 
@@ -2189,6 +2219,7 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
 	FLEX_ALLOC_STR(ref, refname, refname);
 	oidcpy(&ref->objectname, oid);
 	ref->rest = NULL;
+	ref->cat_file_cmdmode = 0;
 
 	return ref;
 }
diff --git a/ref-filter.h b/ref-filter.h
index 053980a6a42..a93d5e4dd61 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -39,6 +39,7 @@ struct ref_sorting {
 struct ref_array_item {
 	struct object_id objectname;
 	const char *rest;
+	int cat_file_cmdmode;
 	int flag;
 	unsigned int kind;
 	const char *symref;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (15 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because "atom_type == ATOM_OBJECTNAME" implies the condition
of `starts_with(name, "objectname")`, "atom_type == ATOM_TREE"
implies the condition of `starts_with(name, "tree")`, so the
check for `starts_with(name, field)` in grab_oid() is redundant.

So Remove the grab_oid() from ref-filter, to reduce repeated check.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 91e26c9aba3..1c7287f1061 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1077,16 +1077,6 @@ static const char *do_grab_oid(const char *field, const struct object_id *oid,
 	}
 }
 
-static int grab_oid(const char *name, const char *field, const struct object_id *oid,
-		    struct atom_value *v, struct used_atom *atom)
-{
-	if (starts_with(name, field)) {
-		v->s = xstrdup(do_grab_oid(field, oid, atom));
-		return 1;
-	}
-	return 0;
-}
-
 /* See grab_values */
 static void grab_common_values(struct atom_value *val, int deref, struct expand_data *oi)
 {
@@ -1112,8 +1102,9 @@ static void grab_common_values(struct atom_value *val, int deref, struct expand_
 			}
 		} else if (atom_type == ATOM_DELTABASE)
 			v->s = xstrdup(oid_to_hex(&oi->delta_base_oid));
-		else if (atom_type == ATOM_OBJECTNAME && deref)
-			grab_oid(name, "objectname", &oi->oid, v, &used_atom[i]);
+		else if (atom_type == ATOM_OBJECTNAME && deref) {
+			v->s = xstrdup(do_grab_oid("objectname", &oi->oid, &used_atom[i]));
+		}
 	}
 }
 
@@ -1154,9 +1145,10 @@ static void grab_commit_values(struct atom_value *val, int deref, struct object
 			continue;
 		if (deref)
 			name++;
-		if (atom_type == ATOM_TREE &&
-		    grab_oid(name, "tree", get_commit_tree_oid(commit), v, &used_atom[i]))
+		if (atom_type == ATOM_TREE) {
+			v->s = xstrdup(do_grab_oid("tree", get_commit_tree_oid(commit), &used_atom[i]));
 			continue;
+		}
 		if (atom_type == ATOM_NUMPARENT) {
 			v->value = commit_list_count(commit->parents);
 			v->s = xstrfmt("%lu", (unsigned long)v->value);
@@ -1959,9 +1951,9 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 				v->s = xstrdup(buf + 1);
 			}
 			continue;
-		} else if (!deref && atom_type == ATOM_OBJECTNAME &&
-			   grab_oid(name, "objectname", &ref->objectname, v, atom)) {
-				continue;
+		} else if (!deref && atom_type == ATOM_OBJECTNAME) {
+			   v->s = xstrdup(do_grab_oid("objectname", &ref->objectname, atom));
+			   continue;
 		} else if (atom_type == ATOM_HEAD) {
 			if (atom->u.head && !strcmp(ref->refname, atom->u.head))
 				v->s = xstrdup("*");
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (16 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 11:46 ` [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Create p1006-cat-file.sh to provide performance testing for
`git cat-file --batch` and `git cat-file --batch-check`. This
will help us compare the performance changes after we let
cat-file reuse the ref-filter logic.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 t/perf/p1006-cat-file.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100755 t/perf/p1006-cat-file.sh

diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh
new file mode 100755
index 00000000000..b84ac31f9cc
--- /dev/null
+++ b/t/perf/p1006-cat-file.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='Basic sort performance tests'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'setup' '
+	git rev-list --all >rla
+'
+
+test_perf 'cat-file --batch-check' '
+	git cat-file --batch-check <rla
+'
+
+test_perf 'cat-file --batch-check with atoms' '
+	git cat-file --batch-check="%(objectname) %(objecttype)" <rla
+'
+
+test_perf 'cat-file --batch' '
+	git cat-file --batch <rla
+'
+
+test_perf 'cat-file --batch with atoms' '
+	git cat-file --batch="%(objectname) %(objecttype)" <rla
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (17 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
@ 2021-07-12 11:46 ` ZheNing Hu via GitGitGadget
  2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add the member `default_format` to struct `batch_options`,
when we are using the default format on `git cat-file --batch`,
or `git cat-file --batch-check`, `default_format` will be set,
if we don't use `--textconv` or `--filter`, then we will not call
verify_ref_format(), has_object_file() and format_ref_array_item().
Instead, we get the object data directly through
oid_object_info_extended() and then output the data directly.

By using this fast path, we can reduce some of the extra overhead
when `cat-file --batch` using ref-filter. The running time of
`git cat-file --batch-check` will be similar to before, and the
running time of `git cat-file --batch` will be 9.1% less than before.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 79 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 57 insertions(+), 22 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 3a6153e778f..8edc19f2d5a 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -26,6 +26,7 @@ struct batch_options {
 	int all_objects;
 	int unordered;
 	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int default_format;
 	struct ref_format format;
 };
 
@@ -196,6 +197,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 struct expand_data {
 	struct object_id oid;
+	struct object_info info;
 	const char *rest;
 	int split_on_whitespace;
 };
@@ -216,27 +218,58 @@ static void batch_object_write(const char *obj_name,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
-	int ret;
-	struct ref_array_item item = { data->oid, data->rest, opt->cmdmode };
-
-	strbuf_reset(scratch);
-	strbuf_reset(err);
-
-	ret = format_ref_array_item(&item, &opt->format, scratch, err);
-	if (ret < 0)
-		die("%s\n", err->buf);
-	if (ret) {
-		/* ret > 0 means when the object corresponding to oid
-		 * cannot be found in format_ref_array_item(), we only print
-		 * the error message.
-		 */
-		printf("%s\n", err->buf);
+	if (opt->default_format && !opt->cmdmode) {
+		struct strbuf type_name = STRBUF_INIT;
+		unsigned long size;
+		void *content;
+
+		if (opt->print_contents)
+			data->info.contentp = &content;
+
+		data->info.type_name = &type_name;
+		data->info.sizep = &size;
+
+		if (oid_object_info_extended(the_repository, &data->oid, &data->info,
+					     OBJECT_INFO_LOOKUP_REPLACE) < 0) {
+			printf("%s missing\n",
+			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			fflush(stdout);
+			return;
+		}
+
+		fprintf(stdout, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
+			data->info.type_name->buf,
+			(uintmax_t)*data->info.sizep);
 		fflush(stdout);
+		strbuf_release(&type_name);
+		if (opt->print_contents) {
+			batch_write(opt, content, *data->info.sizep);
+			batch_write(opt, "\n", 1);
+			free(content);
+		}
 	} else {
-		strbuf_addch(scratch, '\n');
-		batch_write(opt, scratch->buf, scratch->len);
+		int ret;
+		struct ref_array_item item = { data->oid, data->rest, opt->cmdmode };
+
+		strbuf_reset(scratch);
+		strbuf_reset(err);
+
+		ret = format_ref_array_item(&item, &opt->format, scratch, err);
+		if (ret < 0)
+			die("%s\n", err->buf);
+		if (ret) {
+			/* ret > 0 means when the object corresponding to oid
+			 * cannot be found in format_ref_array_item(), we only print
+			 * the error message.
+			 */
+			printf("%s\n", err->buf);
+			fflush(stdout);
+		} else {
+			strbuf_addch(scratch, '\n');
+			batch_write(opt, scratch->buf, scratch->len);
+		}
+		free_ref_array_item_value(&item);
 	}
-	free_ref_array_item_value(&item);
 }
 
 static void batch_one_object(const char *obj_name,
@@ -288,7 +321,7 @@ static void batch_one_object(const char *obj_name,
 		return;
 	}
 
-	if (!has_object_file(&data->oid)) {
+	if ((!opt->default_format || opt->cmdmode) && !has_object_file(&data->oid)) {
 		printf("%s missing\n",
 		       obj_name ? obj_name : oid_to_hex(&data->oid));
 		fflush(stdout);
@@ -380,7 +413,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 	if (batch->print_contents)
 		strbuf_addstr(&format, "\n%(raw)");
 	batch->format.format = format.buf;
-	if (verify_ref_format(&batch->format))
+	if ((!batch->default_format || batch->cmdmode) && verify_ref_format(&batch->format))
 		usage_with_options(cat_file_usage, options);
 
 	if (batch->cmdmode || batch->format.use_rest)
@@ -483,7 +516,8 @@ static int batch_option_callback(const struct option *opt,
 	bo->enabled = 1;
 	bo->print_contents = !strcmp(opt->long_name, "batch");
 	bo->format.format = arg;
-
+	if (arg)
+		bo->default_format = 0;
 	return 0;
 }
 
@@ -492,7 +526,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	int opt = 0;
 	const char *exp_type = NULL, *obj_name = NULL;
 	struct batch_options batch = {
-		.format = REF_FORMAT_INIT
+		.format = REF_FORMAT_INIT,
+		.default_format = 1
 	};
 	int unknown_type = 0;
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (18 preceding siblings ...)
  2021-07-12 11:46 ` [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
@ 2021-07-12 12:36 ` Christian Couder
  2021-07-12 13:01   ` ZheNing Hu
  2021-07-12 13:02 ` Philip Oakley
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
  21 siblings, 1 reply; 52+ messages in thread
From: Christian Couder @ 2021-07-12 12:36 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Hariom Verma, Bagas Sanjaya, Jeff King,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu

On Mon, Jul 12, 2021 at 1:47 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This patch series make cat-file reuse ref-filter logic.

s/make/makes/

By the way if you have already sent some of the patches in this series
(and if they haven't changed much since when you sent them), it's a
good idea to use V2 or V3, V4, etc, so we can easily refer to each of
the versions you sent. (See the `-v` option of `git format-patch`.)

> Change from last version:
>
>  1. Declare buf_size in if (atom_type == ATOM_RAW) block.
>  2. Modify the code style of the test.
>  3. Delete "use_textconv" and "use_filter" flag. Instead, add member

s/flag/flags/

>     cat_file_cmdmode to struct ref_array_item.
>  4. Add function reject_atom() to enhance the readability of the code.
>  5. Create p1006-cat-file.sh for performance regression testing.
>  6. Use a "fast path" to output object data to reduce the performance
>     degradation of cat-file --batch with the suggest of Ævar Arnfjörð
>     Bjarmason.

Maybe:

s/with the suggest of Ævar Arnfjörð Bjarmason/as suggested by Ævar
Arnfjörð Bjarmason/

or:

s/with the suggest of Ævar Arnfjörð Bjarmason/according to Ævar
Arnfjörð Bjarmason's suggestion/


> ZheNing Hu (19):
>   cat-file: handle trivial --batch format with --batch-all-objects
>   cat-file: merge two block into one

It's a bit strange that the above 2 don't have [GSOC] while the others
below have it.

>   [GSOC] ref-filter: add obj-type check in grab contents
>   [GSOC] ref-filter: add %(raw) atom
>   [GSOC] ref-filter: --format=%(raw) re-support --perl
>   [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
>   [GSOC] ref-filter: add %(rest) atom
>   [GSOC] ref-filter: pass get_object() return value to their callers
>   [GSOC] ref-filter: introduce free_ref_array_item_value() function
>   [GSOC] ref-filter: introduce reject_atom()
>   [GSOC] ref-filter: modify the error message and value in get_object
>   [GSOC] cat-file: add has_object_file() check
>   [GSOC] cat-file: change batch_objects parameter name
>   [GSOC] cat-file: reuse ref-filter logic
>   [GSOC] cat-file: reuse err buf in batch_object_write()
>   [GSOC] cat-file: re-implement --textconv, --filters options
>   [GSOC] ref-filter: remove grab_oid() function
>   [GSOC] cat-file: create p1006-cat-file.sh

Maybe you could add the new perf test earlier in the series so that we
could see how performance changes when ref-filter logic is reused in
cat-file earlier in the series.

>   [GSOC] cat-file: use fast path when using default_format

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
@ 2021-07-12 13:01   ` ZheNing Hu
  0 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu @ 2021-07-12 13:01 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Christian Couder <christian.couder@gmail.com> 于2021年7月12日周一 下午8:36写道:
>
> On Mon, Jul 12, 2021 at 1:47 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > This patch series make cat-file reuse ref-filter logic.
>
> s/make/makes/
>
> By the way if you have already sent some of the patches in this series
> (and if they haven't changed much since when you sent them), it's a
> good idea to use V2 or V3, V4, etc, so we can easily refer to each of
> the versions you sent. (See the `-v` option of `git format-patch`.)
>

Yes, but my patch was generated by gitgitgadget, when I want to send
a new patch which is totally different with before, I will send a new PR.
GGG will not know the connection between these two PRs. So the
version number of the patch will be reset.

> > Change from last version:
> >
> >  1. Declare buf_size in if (atom_type == ATOM_RAW) block.
> >  2. Modify the code style of the test.
> >  3. Delete "use_textconv" and "use_filter" flag. Instead, add member
>
> s/flag/flags/
>
> >     cat_file_cmdmode to struct ref_array_item.
> >  4. Add function reject_atom() to enhance the readability of the code.
> >  5. Create p1006-cat-file.sh for performance regression testing.
> >  6. Use a "fast path" to output object data to reduce the performance
> >     degradation of cat-file --batch with the suggest of Ævar Arnfjörð
> >     Bjarmason.
>
> Maybe:
>
> s/with the suggest of Ævar Arnfjörð Bjarmason/as suggested by Ævar
> Arnfjörð Bjarmason/
>
> or:
>
> s/with the suggest of Ævar Arnfjörð Bjarmason/according to Ævar
> Arnfjörð Bjarmason's suggestion/
>
>
> > ZheNing Hu (19):
> >   cat-file: handle trivial --batch format with --batch-all-objects
> >   cat-file: merge two block into one
>
> It's a bit strange that the above 2 don't have [GSOC] while the others
> below have it.
>

That's because it's belong to the branch zh/cat-file-batch-fix. I
should mention it
in the cover-letter.

> >   [GSOC] ref-filter: add obj-type check in grab contents
> >   [GSOC] ref-filter: add %(raw) atom
> >   [GSOC] ref-filter: --format=%(raw) re-support --perl
> >   [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
> >   [GSOC] ref-filter: add %(rest) atom
> >   [GSOC] ref-filter: pass get_object() return value to their callers
> >   [GSOC] ref-filter: introduce free_ref_array_item_value() function
> >   [GSOC] ref-filter: introduce reject_atom()
> >   [GSOC] ref-filter: modify the error message and value in get_object
> >   [GSOC] cat-file: add has_object_file() check
> >   [GSOC] cat-file: change batch_objects parameter name
> >   [GSOC] cat-file: reuse ref-filter logic
> >   [GSOC] cat-file: reuse err buf in batch_object_write()
> >   [GSOC] cat-file: re-implement --textconv, --filters options
> >   [GSOC] ref-filter: remove grab_oid() function
> >   [GSOC] cat-file: create p1006-cat-file.sh
>
> Maybe you could add the new perf test earlier in the series so that we
> could see how performance changes when ref-filter logic is reused in
> cat-file earlier in the series.

Make sence.

>
> >   [GSOC] cat-file: use fast path when using default_format

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (19 preceding siblings ...)
  2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
@ 2021-07-12 13:02 ` Philip Oakley
  2021-07-12 13:27   ` ZheNing Hu
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
  21 siblings, 1 reply; 52+ messages in thread
From: Philip Oakley @ 2021-07-12 13:02 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget, git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu

On 12/07/2021 12:46, ZheNing Hu via GitGitGadget wrote:
> This patch series make cat-file reuse ref-filter logic.
>
> Change from last version:
minor nit..
Not sure if this is a gitgitgadget feature, but would it be possible
that a version indication be included in future versions of the patch,
e.g. [PATCH vN 00/19] [GSOC] ?
--
Philip

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
@ 2021-07-12 13:17   ` Christian Couder
  2021-07-12 13:26     ` Christian Couder
                       ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Christian Couder @ 2021-07-12 13:17 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Hariom Verma, Bagas Sanjaya, Jeff King,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu

On Mon, Jul 12, 2021 at 1:47 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> In order to let cat-file use ref-filter logic, let's do the
> following:
>
> 1. Change the type of member `format` in struct `batch_options`
> to `ref_format`, we will pass it to ref-filter later.
> 2. Let `batch_objects()` add atoms to format, and use
> `verify_ref_format()` to check atoms.
> 3. Use `format_ref_array_item()` in `batch_object_write()` to
> get the formatted data corresponding to the object. If the
> return value of `format_ref_array_item()` is equals to zero,
> use `batch_write()` to print object data; else if the return
> value is less than zero, use `die()` to print the error message
> and exit; else if return value is greater than zero, only print
> the error message, but don't exit.
> 4. Use free_ref_array_item_value() to free ref_array_item's
> value.
>
> Most of the atoms in `for-each-ref --format` are now supported,
> such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
> that pointed to by a ref, "for-each-ref"'s family can naturally use
> these atoms, but not all objects are pointed to be a ref, so "cat-file"
> will not be able to use them.
>
> The performance for `git cat-file --batch-all-objects
> --batch-check` on the Git repository itself with performance
> testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
> 1.134 s ±  0.063 s.
>
> The performance for `git cat-file --batch-all-objects --batch
> >/dev/null` on the Git repository itself with performance testing
> tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
> total" to "33.69s user 1.54s system 87% cpu 40.258 total".

Saying that a later patch will add a fast path which will mitigate the
performance regression introduced by this patch might help reassure
reviewers.

By the way it is not clear if adding the fast path fully mitigates
this performance regression or not. You might want to discuss that in
the cover letter, or maybe in the patch adding the fast path.

> Mentored-by: Christian Couder <christian.couder@gmail.com>
> Mentored-by: Hariom Verma <hariom18599@gmail.com>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>  Documentation/git-cat-file.txt |   6 +
>  builtin/cat-file.c             | 242 ++++++-------------------------
>  t/t1006-cat-file.sh            | 251 +++++++++++++++++++++++++++++++++
>  3 files changed, 304 insertions(+), 195 deletions(-)
>
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 4eb0421b3fd..ef8ab952b2f 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -226,6 +226,12 @@ newline. The available atoms are:
>         after that first run of whitespace (i.e., the "rest" of the
>         line) are output in place of the `%(rest)` atom.
>
> +Note that most of the atoms in `for-each-ref --format` are now supported,
> +such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> +`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> +`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> +`%(flag)`, `%(HEAD)`. See linkgit:git-for-each-ref[1].
> +
>  If no format is specified, the default format is `%(objectname)
>  %(objecttype) %(objectsize)`.
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 41d407638d5..5b163551fc6 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -16,6 +16,7 @@
>  #include "packfile.h"
>  #include "object-store.h"
>  #include "promisor-remote.h"
> +#include "ref-filter.h"
>
>  struct batch_options {
>         int enabled;
> @@ -25,7 +26,7 @@ struct batch_options {
>         int all_objects;
>         int unordered;
>         int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
> -       const char *format;
> +       struct ref_format format;
>  };
>
>  static const char *force_path;
> @@ -195,99 +196,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>
>  struct expand_data {
>         struct object_id oid;
> -       enum object_type type;
> -       unsigned long size;
> -       off_t disk_size;
>         const char *rest;
> -       struct object_id delta_base_oid;
> -
> -       /*
> -        * If mark_query is true, we do not expand anything, but rather
> -        * just mark the object_info with items we wish to query.
> -        */
> -       int mark_query;
> -
> -       /*
> -        * Whether to split the input on whitespace before feeding it to
> -        * get_sha1; this is decided during the mark_query phase based on
> -        * whether we have a %(rest) token in our format.
> -        */
>         int split_on_whitespace;
> -
> -       /*
> -        * After a mark_query run, this object_info is set up to be
> -        * passed to oid_object_info_extended. It will point to the data
> -        * elements above, so you can retrieve the response from there.
> -        */
> -       struct object_info info;
> -
> -       /*
> -        * This flag will be true if the requested batch format and options
> -        * don't require us to call oid_object_info, which can then be
> -        * optimized out.
> -        */
> -       unsigned skip_object_info : 1;
>  };
>
> -static int is_atom(const char *atom, const char *s, int slen)
> -{
> -       int alen = strlen(atom);
> -       return alen == slen && !memcmp(atom, s, alen);
> -}
> -
> -static void expand_atom(struct strbuf *sb, const char *atom, int len,
> -                       void *vdata)
> -{
> -       struct expand_data *data = vdata;
> -
> -       if (is_atom("objectname", atom, len)) {
> -               if (!data->mark_query)
> -                       strbuf_addstr(sb, oid_to_hex(&data->oid));
> -       } else if (is_atom("objecttype", atom, len)) {
> -               if (data->mark_query)
> -                       data->info.typep = &data->type;
> -               else
> -                       strbuf_addstr(sb, type_name(data->type));
> -       } else if (is_atom("objectsize", atom, len)) {
> -               if (data->mark_query)
> -                       data->info.sizep = &data->size;
> -               else
> -                       strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size);
> -       } else if (is_atom("objectsize:disk", atom, len)) {
> -               if (data->mark_query)
> -                       data->info.disk_sizep = &data->disk_size;
> -               else
> -                       strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size);
> -       } else if (is_atom("rest", atom, len)) {
> -               if (data->mark_query)
> -                       data->split_on_whitespace = 1;
> -               else if (data->rest)
> -                       strbuf_addstr(sb, data->rest);
> -       } else if (is_atom("deltabase", atom, len)) {
> -               if (data->mark_query)
> -                       data->info.delta_base_oid = &data->delta_base_oid;
> -               else
> -                       strbuf_addstr(sb,
> -                                     oid_to_hex(&data->delta_base_oid));
> -       } else
> -               die("unknown format element: %.*s", len, atom);
> -}
> -
> -static size_t expand_format(struct strbuf *sb, const char *start, void *data)
> -{
> -       const char *end;
> -
> -       if (*start != '(')
> -               return 0;
> -       end = strchr(start + 1, ')');
> -       if (!end)
> -               die("format element '%s' does not end in ')'", start);
> -
> -       expand_atom(sb, start + 1, end - start - 1, data);
> -
> -       return end - start + 1;
> -}
> -
>  static void batch_write(struct batch_options *opt, const void *data, int len)
>  {
>         if (opt->buffer_output) {
> @@ -297,87 +209,34 @@ static void batch_write(struct batch_options *opt, const void *data, int len)
>                 write_or_die(1, data, len);
>  }
>
> -static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
> -{
> -       const struct object_id *oid = &data->oid;
> -
> -       assert(data->info.typep);
> -
> -       if (data->type == OBJ_BLOB) {
> -               if (opt->buffer_output)
> -                       fflush(stdout);
> -               if (opt->cmdmode) {
> -                       char *contents;
> -                       unsigned long size;
> -
> -                       if (!data->rest)
> -                               die("missing path for '%s'", oid_to_hex(oid));
> -
> -                       if (opt->cmdmode == 'w') {
> -                               if (filter_object(data->rest, 0100644, oid,
> -                                                 &contents, &size))
> -                                       die("could not convert '%s' %s",
> -                                           oid_to_hex(oid), data->rest);
> -                       } else if (opt->cmdmode == 'c') {
> -                               enum object_type type;
> -                               if (!textconv_object(the_repository,
> -                                                    data->rest, 0100644, oid,
> -                                                    1, &contents, &size))
> -                                       contents = read_object_file(oid,
> -                                                                   &type,
> -                                                                   &size);
> -                               if (!contents)
> -                                       die("could not convert '%s' %s",
> -                                           oid_to_hex(oid), data->rest);
> -                       } else
> -                               BUG("invalid cmdmode: %c", opt->cmdmode);
> -                       batch_write(opt, contents, size);
> -                       free(contents);
> -               } else {
> -                       stream_blob(oid);
> -               }
> -       }
> -       else {
> -               enum object_type type;
> -               unsigned long size;
> -               void *contents;
> -
> -               contents = read_object_file(oid, &type, &size);
> -               if (!contents)
> -                       die("object %s disappeared", oid_to_hex(oid));
> -               if (type != data->type)
> -                       die("object %s changed type!?", oid_to_hex(oid));
> -               if (data->info.sizep && size != data->size)
> -                       die("object %s changed size!?", oid_to_hex(oid));
> -
> -               batch_write(opt, contents, size);
> -               free(contents);
> -       }
> -}
>
>  static void batch_object_write(const char *obj_name,
>                                struct strbuf *scratch,
>                                struct batch_options *opt,
>                                struct expand_data *data)
>  {
> -       if (!data->skip_object_info &&
> -           oid_object_info_extended(the_repository, &data->oid, &data->info,
> -                                    OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> -               printf("%s missing\n",
> -                      obj_name ? obj_name : oid_to_hex(&data->oid));
> -               fflush(stdout);
> -               return;
> -       }
> +       int ret;
> +       struct strbuf err = STRBUF_INIT;
> +       struct ref_array_item item = { data->oid, data->rest };
>
>         strbuf_reset(scratch);
> -       strbuf_expand(scratch, opt->format, expand_format, data);
> -       strbuf_addch(scratch, '\n');
> -       batch_write(opt, scratch->buf, scratch->len);
>
> -       if (opt->print_contents) {
> -               print_object_or_die(opt, data);
> -               batch_write(opt, "\n", 1);
> +       ret = format_ref_array_item(&item, &opt->format, scratch, &err);
> +       if (ret < 0)
> +               die("%s\n", err.buf);
> +       if (ret) {
> +               /* ret > 0 means when the object corresponding to oid
> +                * cannot be found in format_ref_array_item(), we only print
> +                * the error message.
> +                */
> +               printf("%s\n", err.buf);
> +               fflush(stdout);
> +       } else {
> +               strbuf_addch(scratch, '\n');
> +               batch_write(opt, scratch->buf, scratch->len);
>         }
> +       free_ref_array_item_value(&item);
> +       strbuf_release(&err);
>  }
>
>  static void batch_one_object(const char *obj_name,
> @@ -495,43 +354,37 @@ static int batch_unordered_packed(const struct object_id *oid,
>         return batch_unordered_object(oid, data);
>  }
>
> -static int batch_objects(struct batch_options *batch)
> +static const char * const cat_file_usage[] = {
> +       N_("git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>"),
> +       N_("git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]"),
> +       NULL
> +};
> +
> +static int batch_objects(struct batch_options *batch, const struct option *options)
>  {
>         struct strbuf input = STRBUF_INIT;
>         struct strbuf output = STRBUF_INIT;
> +       struct strbuf format = STRBUF_INIT;
>         struct expand_data data;
>         int save_warning;
>         int retval = 0;
>
> -       if (!batch->format)
> -               batch->format = "%(objectname) %(objecttype) %(objectsize)";
> -
> -       /*
> -        * Expand once with our special mark_query flag, which will prime the
> -        * object_info to be handed to oid_object_info_extended for each
> -        * object.
> -        */
>         memset(&data, 0, sizeof(data));
> -       data.mark_query = 1;
> -       strbuf_expand(&output, batch->format, expand_format, &data);
> -       data.mark_query = 0;
> -       strbuf_release(&output);
> -       if (batch->cmdmode)
> -               data.split_on_whitespace = 1;
> -
> -       /*
> -        * If we are printing out the object, then always fill in the type,
> -        * since we will want to decide whether or not to stream.
> -        */
> +       if (batch->format.format)
> +               strbuf_addstr(&format, batch->format.format);
> +       else
> +               strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)");
>         if (batch->print_contents)
> -               data.info.typep = &data.type;
> +               strbuf_addstr(&format, "\n%(raw)");
> +       batch->format.format = format.buf;
> +       if (verify_ref_format(&batch->format))
> +               usage_with_options(cat_file_usage, options);
> +
> +       if (batch->cmdmode || batch->format.use_rest)
> +               data.split_on_whitespace = 1;
>
>         if (batch->all_objects) {
>                 struct object_cb_data cb;
> -               struct object_info empty = OBJECT_INFO_INIT;
> -
> -               if (!memcmp(&data.info, &empty, sizeof(empty)))
> -                       data.skip_object_info = 1;
>
>                 if (has_promisor_remote())
>                         warning("This repository uses promisor remotes. Some objects may not be loaded.");
> @@ -561,6 +414,7 @@ static int batch_objects(struct batch_options *batch)
>                         oid_array_clear(&sa);
>                 }
>
> +               strbuf_release(&format);
>                 strbuf_release(&output);
>                 return 0;
>         }
> @@ -593,18 +447,13 @@ static int batch_objects(struct batch_options *batch)
>                 batch_one_object(input.buf, &output, batch, &data);
>         }
>
> +       strbuf_release(&format);
>         strbuf_release(&input);
>         strbuf_release(&output);
>         warn_on_object_refname_ambiguity = save_warning;
>         return retval;
>  }
>
> -static const char * const cat_file_usage[] = {
> -       N_("git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>"),
> -       N_("git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]"),
> -       NULL
> -};
> -
>  static int git_cat_file_config(const char *var, const char *value, void *cb)
>  {
>         if (userdiff_config(var, value) < 0)
> @@ -627,7 +476,7 @@ static int batch_option_callback(const struct option *opt,
>
>         bo->enabled = 1;
>         bo->print_contents = !strcmp(opt->long_name, "batch");
> -       bo->format = arg;
> +       bo->format.format = arg;
>
>         return 0;
>  }
> @@ -636,7 +485,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  {
>         int opt = 0;
>         const char *exp_type = NULL, *obj_name = NULL;
> -       struct batch_options batch = {0};
> +       struct batch_options batch = {
> +               .format = REF_FORMAT_INIT
> +       };
>         int unknown_type = 0;
>
>         const struct option options[] = {
> @@ -675,6 +526,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>         git_config(git_cat_file_config, NULL);
>
>         batch.buffer_output = -1;
> +       batch.format.cat_file_mode = 1;
>         argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
>
>         if (opt) {
> @@ -718,7 +570,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
>                 batch.buffer_output = batch.all_objects;
>
>         if (batch.enabled)
> -               return batch_objects(&batch);
> +               return batch_objects(&batch, options);
>
>         if (unknown_type && opt != 't' && opt != 's')
>                 die("git cat-file --allow-unknown-type: use with -s or -t");
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 18b3779ccb6..7452404f24a 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -607,5 +607,256 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
>         git -C all-two cat-file --batch-all-objects --batch="batman" >actual &&
>         cmp expect actual
>  '
> +. "$TEST_DIRECTORY"/lib-gpg.sh
> +. "$TEST_DIRECTORY"/lib-terminal.sh
> +
> +test_expect_success 'cat-file --batch|--batch-check setup' '
> +       echo 1>blob1 &&
> +       printf "a\0b\0\c" >blob2 &&
> +       git add blob1 blob2 &&
> +       git commit -m "Commit Message" &&
> +       git branch -M main &&
> +       git tag -a -m "v0.0.0" testtag &&
> +       git update-ref refs/myblobs/blob1 HEAD:blob1 &&
> +       git update-ref refs/myblobs/blob2 HEAD:blob2 &&
> +       git update-ref refs/mytrees/tree1 HEAD^{tree}
> +'
> +
> +batch_test_atom() {
> +       if test "$3" = "fail"
> +       then
> +               test_expect_${4:-success} $PREREQ "basic atom: $1 $2 must fail" "
> +                       test_must_fail git cat-file --batch-check='$2' >bad <<-EOF
> +                       $1
> +                       EOF
> +               "
> +       else
> +               test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
> +                       git for-each-ref --format='$2' $1 >expected &&
> +                       git cat-file --batch-check='$2' >actual <<-EOF &&
> +                       $1
> +                       EOF
> +                       sanitize_pgp <actual >actual.clean &&
> +                       cmp expected actual.clean
> +               "
> +       fi
> +}

I wonder if the above function and some of the tests below could be
introduced in a preparatory patch before this one. It could help check
that reusing ref-filter doesn't change the behavior with some atoms
that were previously supported or rejected. Of course if some atoms
are now failing or are now supported, then it's ok to add new tests
for these atoms in this patch.

> +batch_test_atom refs/heads/main '%(refname)' fail
> +batch_test_atom refs/heads/main '%(refname:)' fail

[...]

> +batch_test_atom refs/heads/main 'VALID'
> +batch_test_atom refs/heads/main '%(INVALID)' fail
> +batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
> +
> +test_expect_success '%(rest) works with both a branch and a tag' '
> +       cat >expected <<-EOF &&
> +       123 commit 123
> +       456 tag 456
> +       EOF
> +       git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF &&
> +       refs/heads/main 123
> +       refs/tags/testtag 456
> +       EOF
> +       test_cmp expected actual
> +'

It's a bit strange that this test is added in this patch while the
commit message doesn't talk about %(rest). So I wonder if this new
test could move to another previous commit.

> +batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
> +%(raw)'
> +batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
> +%(raw)'
> +batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
> +%(raw)'
> +batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
> +%(raw)'
> +
> +

It looks like there are two empty lines instead of one above.

> +test_expect_success 'cat-file --batch equals to --batch-check with atoms' '
> +       git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)
> +%(raw)" >expected <<-EOF &&
> +       refs/heads/main
> +       refs/tags/testtag
> +       EOF
> +       git cat-file --batch >actual <<-EOF &&
> +       refs/heads/main
> +       refs/tags/testtag
> +       EOF
> +       cmp expected actual
> +'

I also wonder if the above new test belong to this commit or if it
could be moved to a previous commit.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 13:17   ` Christian Couder
@ 2021-07-12 13:26     ` Christian Couder
  2021-07-12 13:51       ` ZheNing Hu
  2021-07-12 13:49     ` ZheNing Hu
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 52+ messages in thread
From: Christian Couder @ 2021-07-12 13:26 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Hariom Verma, Bagas Sanjaya, Jeff King,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu

On Mon, Jul 12, 2021 at 3:17 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Jul 12, 2021 at 1:47 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

> > +batch_test_atom() {
> > +       if test "$3" = "fail"
> > +       then
> > +               test_expect_${4:-success} $PREREQ "basic atom: $1 $2 must fail" "
> > +                       test_must_fail git cat-file --batch-check='$2' >bad <<-EOF
> > +                       $1
> > +                       EOF
> > +               "
> > +       else
> > +               test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
> > +                       git for-each-ref --format='$2' $1 >expected &&
> > +                       git cat-file --batch-check='$2' >actual <<-EOF &&
> > +                       $1
> > +                       EOF
> > +                       sanitize_pgp <actual >actual.clean &&
> > +                       cmp expected actual.clean
> > +               "
> > +       fi
> > +}
>
> I wonder if the above function and some of the tests below could be
> introduced in a preparatory patch before this one. It could help check
> that reusing ref-filter doesn't change the behavior with some atoms
> that were previously supported or rejected. Of course if some atoms
> are now failing or are now supported, then it's ok to add new tests
> for these atoms in this patch.

For example maybe some of the tests could be introduced earlier when
the reject_atom() function is introduced.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 13:02 ` Philip Oakley
@ 2021-07-12 13:27   ` ZheNing Hu
  0 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu @ 2021-07-12 13:27 UTC (permalink / raw)
  To: Philip Oakley
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Bagas Sanjaya, Jeff King,
	Ævar Arnfjörð Bjarmason, Eric Sunshine

Philip Oakley <philipoakley@iee.email> 于2021年7月12日周一 下午9:02写道:
>
> On 12/07/2021 12:46, ZheNing Hu via GitGitGadget wrote:
> > This patch series make cat-file reuse ref-filter logic.
> >
> > Change from last version:
> minor nit..
> Not sure if this is a gitgitgadget feature, but would it be possible
> that a version indication be included in future versions of the patch,
> e.g. [PATCH vN 00/19] [GSOC] ?
> --

I think ggg have this [1] , but it is not for the user to control.

[1] https://github.com/gitgitgadget/gitgitgadget/blob/1df3c008552abd1fb788d4988e6d3b92f5765369/lib/patch-series.ts#L654

> Philip

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 13:17   ` Christian Couder
  2021-07-12 13:26     ` Christian Couder
@ 2021-07-12 13:49     ` ZheNing Hu
  2021-07-12 20:38     ` Junio C Hamano
  2021-07-15 14:55     ` ZheNing Hu
  3 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu @ 2021-07-12 13:49 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Christian Couder <christian.couder@gmail.com> 于2021年7月12日周一 下午9:17写道:
>
> On Mon, Jul 12, 2021 at 1:47 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to let cat-file use ref-filter logic, let's do the
> > following:
> >
> > 1. Change the type of member `format` in struct `batch_options`
> > to `ref_format`, we will pass it to ref-filter later.
> > 2. Let `batch_objects()` add atoms to format, and use
> > `verify_ref_format()` to check atoms.
> > 3. Use `format_ref_array_item()` in `batch_object_write()` to
> > get the formatted data corresponding to the object. If the
> > return value of `format_ref_array_item()` is equals to zero,
> > use `batch_write()` to print object data; else if the return
> > value is less than zero, use `die()` to print the error message
> > and exit; else if return value is greater than zero, only print
> > the error message, but don't exit.
> > 4. Use free_ref_array_item_value() to free ref_array_item's
> > value.
> >
> > Most of the atoms in `for-each-ref --format` are now supported,
> > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> > `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
> > that pointed to by a ref, "for-each-ref"'s family can naturally use
> > these atoms, but not all objects are pointed to be a ref, so "cat-file"
> > will not be able to use them.
> >
> > The performance for `git cat-file --batch-all-objects
> > --batch-check` on the Git repository itself with performance
> > testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
> > 1.134 s ±  0.063 s.
> >
> > The performance for `git cat-file --batch-all-objects --batch
> > >/dev/null` on the Git repository itself with performance testing
> > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
> > total" to "33.69s user 1.54s system 87% cpu 40.258 total".
>
> Saying that a later patch will add a fast path which will mitigate the
> performance regression introduced by this patch might help reassure
> reviewers.
>

OK.

> By the way it is not clear if adding the fast path fully mitigates
> this performance regression or not. You might want to discuss that in
> the cover letter, or maybe in the patch adding the fast path.
>

I mentioned it: "By using this fast path, we can reduce some of the
extra overhead
when cat-file --batch using ref-filter. The running time of
git cat-file --batch-check will be similar to before, and the
running time of git cat-file --batch will be 9.1% less than before."
which is using the result of t/perf/p1006-cat-file.sh.

>
> I wonder if the above function and some of the tests below could be
> introduced in a preparatory patch before this one. It could help check
> that reusing ref-filter doesn't change the behavior with some atoms
> that were previously supported or rejected. Of course if some atoms
> are now failing or are now supported, then it's ok to add new tests
> for these atoms in this patch.
>

Yes, it might be worth splitting into two commits.

> > +batch_test_atom refs/heads/main '%(refname)' fail
> > +batch_test_atom refs/heads/main '%(refname:)' fail
>
> [...]
>
> > +batch_test_atom refs/heads/main 'VALID'
> > +batch_test_atom refs/heads/main '%(INVALID)' fail
> > +batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
> > +
> > +test_expect_success '%(rest) works with both a branch and a tag' '
> > +       cat >expected <<-EOF &&
> > +       123 commit 123
> > +       456 tag 456
> > +       EOF
> > +       git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF &&
> > +       refs/heads/main 123
> > +       refs/tags/testtag 456
> > +       EOF
> > +       test_cmp expected actual
> > +'
>
> It's a bit strange that this test is added in this patch while the
> commit message doesn't talk about %(rest). So I wonder if this new
> test could move to another previous commit.
>

It's just used for checking the uncommonly atoms "%(rest)".
But as you said, we can move it to a split commit.

> > +batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
> > +%(raw)'
> > +batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
> > +%(raw)'
> > +batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
> > +%(raw)'
> > +batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
> > +%(raw)'
> > +
> > +
>
> It looks like there are two empty lines instead of one above.
>
> > +test_expect_success 'cat-file --batch equals to --batch-check with atoms' '
> > +       git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)
> > +%(raw)" >expected <<-EOF &&
> > +       refs/heads/main
> > +       refs/tags/testtag
> > +       EOF
> > +       git cat-file --batch >actual <<-EOF &&
> > +       refs/heads/main
> > +       refs/tags/testtag
> > +       EOF
> > +       cmp expected actual
> > +'
>
> I also wonder if the above new test belong to this commit or if it
> could be moved to a previous commit.

It's same.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 13:26     ` Christian Couder
@ 2021-07-12 13:51       ` ZheNing Hu
  0 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu @ 2021-07-12 13:51 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Christian Couder <christian.couder@gmail.com> 于2021年7月12日周一 下午9:26写道:
>
> >
> > I wonder if the above function and some of the tests below could be
> > introduced in a preparatory patch before this one. It could help check
> > that reusing ref-filter doesn't change the behavior with some atoms
> > that were previously supported or rejected. Of course if some atoms
> > are now failing or are now supported, then it's ok to add new tests
> > for these atoms in this patch.
>
> For example maybe some of the tests could be introduced earlier when
> the reject_atom() function is introduced.

Yeah, move it to "[GSOC] ref-filter: add %(rest) atom"  is better.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 13:17   ` Christian Couder
  2021-07-12 13:26     ` Christian Couder
  2021-07-12 13:49     ` ZheNing Hu
@ 2021-07-12 20:38     ` Junio C Hamano
  2021-07-14 16:24       ` ZheNing Hu
  2021-07-15 14:55     ` ZheNing Hu
  3 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2021-07-12 20:38 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	ZheNing Hu

Christian Couder <christian.couder@gmail.com> writes:

>> The performance for `git cat-file --batch-all-objects --batch
>> >/dev/null` on the Git repository itself with performance testing
>> tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
>> total" to "33.69s user 1.54s system 87% cpu 40.258 total".
>
> Saying that a later patch will add a fast path which will mitigate the
> performance regression introduced by this patch might help reassure
> reviewers.

More importantly, why is such a fast-path even needed?  Isn't it a
sign that the ref-filter implementation is eating more cycles than
it should for given set of placeholders?  Do we know where the extra
cycles goes?

I find it somewhat alarming if we are talking about "fast-path"
workaround before understanding why we are seeing slowdown in the
first place.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 20:38     ` Junio C Hamano
@ 2021-07-14 16:24       ` ZheNing Hu
  2021-07-15  1:53         ` ZheNing Hu
  0 siblings, 1 reply; 52+ messages in thread
From: ZheNing Hu @ 2021-07-14 16:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, ZheNing Hu via GitGitGadget, git, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:

>
> Christian Couder <christian.couder@gmail.com> writes:
>
> More importantly, why is such a fast-path even needed?  Isn't it a
> sign that the ref-filter implementation is eating more cycles than
> it should for given set of placeholders?  Do we know where the extra
> cycles goes?
>
> I find it somewhat alarming if we are talking about "fast-path"
> workaround before understanding why we are seeing slowdown in the
> first place.

There is no complete conclusion yet, but I try to use time and hyperfine test
for these commits (t/perf/* is not accurate enough):

----------------------------------------------------------------------------------------------------------------------------
|                        subject                                  |
--batch-check (using hyperfine) |   --batch(using time) |
----------------------------------------------------------------------------------------------------------------------------
|[GSOC] cat-file: use fast path when using default_format         |
        700ms                |          25.450s      |
----------------------------------------------------------------------------------------------------------------------------
|[GSOC] cat-file: re-implement --textconv, --filters options      |
        790ms                |          29.933s      |
----------------------------------------------------------------------------------------------------------------------------
|[GSOC] cat-file: reuse err buf in batch_object_write()           |
        770ms                |          29.153s      |
----------------------------------------------------------------------------------------------------------------------------
|[GSOC] cat-file: reuse ref-filter logic                          |
        780ms                |          29.412s      |
----------------------------------------------------------------------------------------------------------------------------
|The third batch (upstream/master)                                |
        640ms                |          26.025s      |
----------------------------------------------------------------------------------------------------------------------------

I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
But what causes the loss of performance needs further analysis.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-14 16:24       ` ZheNing Hu
@ 2021-07-15  1:53         ` ZheNing Hu
  2021-07-15  9:45           ` Christian Couder
  0 siblings, 1 reply; 52+ messages in thread
From: ZheNing Hu @ 2021-07-15  1:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, ZheNing Hu via GitGitGadget, git, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

ZheNing Hu <adlternative@gmail.com> 于2021年7月15日周四 上午12:24写道:
>
> Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:
>
> >
> > Christian Couder <christian.couder@gmail.com> writes:
> >
> > More importantly, why is such a fast-path even needed?  Isn't it a
> > sign that the ref-filter implementation is eating more cycles than
> > it should for given set of placeholders?  Do we know where the extra
> > cycles goes?
> >
> > I find it somewhat alarming if we are talking about "fast-path"
> > workaround before understanding why we are seeing slowdown in the
> > first place.
>
> There is no complete conclusion yet, but I try to use time and hyperfine test
> for these commits (t/perf/* is not accurate enough):
>
> ----------------------------------------------------------------------------------------------------------------------------
> |                        subject                                  |
> --batch-check (using hyperfine) |   --batch(using time) |
> ----------------------------------------------------------------------------------------------------------------------------
> |[GSOC] cat-file: use fast path when using default_format         |
>         700ms                |          25.450s      |
> ----------------------------------------------------------------------------------------------------------------------------
> |[GSOC] cat-file: re-implement --textconv, --filters options      |
>         790ms                |          29.933s      |
> ----------------------------------------------------------------------------------------------------------------------------
> |[GSOC] cat-file: reuse err buf in batch_object_write()           |
>         770ms                |          29.153s      |
> ----------------------------------------------------------------------------------------------------------------------------
> |[GSOC] cat-file: reuse ref-filter logic                          |
>         780ms                |          29.412s      |
> ----------------------------------------------------------------------------------------------------------------------------
> |The third batch (upstream/master)                                |
>         640ms                |          26.025s      |
> ----------------------------------------------------------------------------------------------------------------------------
>
> I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> But what causes the loss of performance needs further analysis.
>

Now I think:
There are three main reasons why the performance of cat-file --batch
deteriorates after refactor.

1. Too many copies are used in ref-filter and we cannot avoid these copies
easily because ref-filter needs these copied data to implement atoms %(if),
%(else), %(end)... and the --sort option. The original cat-file
--batch only needs
to output the data to the final string. Its copy times are relatively small.

2. More complex data structure and parsing process are used in ref-filter.
This is why it can provide more and more useful atoms. Therefore, I think the
performance degradation that occurs here is normal.

3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
twice in get_object() before. oid_object_info_extend() is the hot
path, we should
try to avoid calling it, So in last version of  "[GSOC] cat-file:
re-implement --textconv,
--filters options", I make the unified processing of --textconv and
--filter avoid calling
oid_object_info_extend() twice.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-15  1:53         ` ZheNing Hu
@ 2021-07-15  9:45           ` Christian Couder
  2021-07-15 13:53             ` ZheNing Hu
  0 siblings, 1 reply; 52+ messages in thread
From: Christian Couder @ 2021-07-15  9:45 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: Junio C Hamano, ZheNing Hu via GitGitGadget, git, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> ZheNing Hu <adlternative@gmail.com> 于2021年7月15日周四 上午12:24写道:
> >
> > Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:

> > > I find it somewhat alarming if we are talking about "fast-path"
> > > workaround before understanding why we are seeing slowdown in the
> > > first place.
> >
> > There is no complete conclusion yet, but I try to use time and hyperfine test
> > for these commits (t/perf/* is not accurate enough):
> >
> > ----------------------------------------------------------------------------------------------------------------------------
> > |                        subject                                  |
> > --batch-check (using hyperfine) |   --batch(using time) |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: use fast path when using default_format         |
> >         700ms                |          25.450s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: re-implement --textconv, --filters options      |
> >         790ms                |          29.933s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: reuse err buf in batch_object_write()           |
> >         770ms                |          29.153s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |[GSOC] cat-file: reuse ref-filter logic                          |
> >         780ms                |          29.412s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> > |The third batch (upstream/master)                                |
> >         640ms                |          26.025s      |
> > ----------------------------------------------------------------------------------------------------------------------------
> >
> > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> > But what causes the loss of performance needs further analysis.
>
> Now I think:
> There are three main reasons why the performance of cat-file --batch
> deteriorates after refactor.
>
> 1. Too many copies are used in ref-filter and we cannot avoid these copies
> easily because ref-filter needs these copied data to implement atoms %(if),
> %(else), %(end)... and the --sort option. The original cat-file
> --batch only needs
> to output the data to the final string. Its copy times are relatively small.

Is it possible to check early if any of the atoms that needs these
copied data is specified, and if none of them is specified then to
avoid the copies?

> 2. More complex data structure and parsing process are used in ref-filter.
> This is why it can provide more and more useful atoms. Therefore, I think the
> performance degradation that occurs here is normal.

Are there way the more complex parsing could be avoided if it's not
needed by the atoms that are actually used?

> 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
> twice in get_object() before. oid_object_info_extend() is the hot
> path, we should
> try to avoid calling it, So in last version of  "[GSOC] cat-file:
> re-implement --textconv,
> --filters options", I make the unified processing of --textconv and
> --filter avoid calling
> oid_object_info_extend() twice.

Ok, thanks for the details and your work on this performance issue!

I wonder if your patch series could be split, so that the early parts
that add new atoms to ref-filter could be merged sooner?

Best,
Christian.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-15  9:45           ` Christian Couder
@ 2021-07-15 13:53             ` ZheNing Hu
  0 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu @ 2021-07-15 13:53 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, ZheNing Hu via GitGitGadget, git, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Christian Couder <christian.couder@gmail.com> 于2021年7月15日周四 下午5:45写道:
>
> On Thu, Jul 15, 2021 at 3:53 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年7月15日周四 上午12:24写道:
> > >
> > > Junio C Hamano <gitster@pobox.com> 于2021年7月13日周二 上午4:38写道:
>
> > > > I find it somewhat alarming if we are talking about "fast-path"
> > > > workaround before understanding why we are seeing slowdown in the
> > > > first place.
> > >
> > > There is no complete conclusion yet, but I try to use time and hyperfine test
> > > for these commits (t/perf/* is not accurate enough):
> > >
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |                        subject                                  |
> > > --batch-check (using hyperfine) |   --batch(using time) |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: use fast path when using default_format         |
> > >         700ms                |          25.450s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: re-implement --textconv, --filters options      |
> > >         790ms                |          29.933s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse err buf in batch_object_write()           |
> > >         770ms                |          29.153s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |[GSOC] cat-file: reuse ref-filter logic                          |
> > >         780ms                |          29.412s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > > |The third batch (upstream/master)                                |
> > >         640ms                |          26.025s      |
> > > ----------------------------------------------------------------------------------------------------------------------------
> > >
> > > I think we their cost is indeed from "[GSOC] cat-file: reuse ref-filter logic".
> > > But what causes the loss of performance needs further analysis.
> >
> > Now I think:
> > There are three main reasons why the performance of cat-file --batch
> > deteriorates after refactor.
> >
> > 1. Too many copies are used in ref-filter and we cannot avoid these copies
> > easily because ref-filter needs these copied data to implement atoms %(if),
> > %(else), %(end)... and the --sort option. The original cat-file
> > --batch only needs
> > to output the data to the final string. Its copy times are relatively small.
>
> Is it possible to check early if any of the atoms that needs these
> copied data is specified, and if none of them is specified then to
> avoid the copies?
>

Well, The copy I'm talking about here refers to something like "v->s =
xstrdup(xxx)";
but v->s is need by --sort, so it is very difficult to remove. At the
moment I think the
only solution is the fast path mentioned by Ævar Arnfjörð Bjarmason.

> > 2. More complex data structure and parsing process are used in ref-filter.
> > This is why it can provide more and more useful atoms. Therefore, I think the
> > performance degradation that occurs here is normal.
>
> Are there way the more complex parsing could be avoided if it's not
> needed by the atoms that are actually used?

No. For example, we can only support "objectsize" before and now we can
support "objectsize:short", so we need to pay more parsing process here.
(It's necessary)

>
> > 3. As Ævar Arnfjörð Bjarmason mentioned, oid_object_info_extend() was used
> > twice in get_object() before. oid_object_info_extend() is the hot
> > path, we should
> > try to avoid calling it, So in last version of  "[GSOC] cat-file:
> > re-implement --textconv,
> > --filters options", I make the unified processing of --textconv and
> > --filter avoid calling
> > oid_object_info_extend() twice.
>
> Ok, thanks for the details and your work on this performance issue!
>
> I wonder if your patch series could be split, so that the early parts
> that add new atoms to ref-filter could be merged sooner?
>

Should this part of the work be handed over to Junio?
The implementation of %(rest) and %(raw)  may be worth merging,
they are truly "zh/ref-filter-raw-data".
The other part may be called "cat-file-reuse-ref-filter-logic".

> Best,
> Christian.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 13:17   ` Christian Couder
                       ` (2 preceding siblings ...)
  2021-07-12 20:38     ` Junio C Hamano
@ 2021-07-15 14:55     ` ZheNing Hu
  3 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu @ 2021-07-15 14:55 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Bagas Sanjaya, Jeff King, Ævar Arnfjörð Bjarmason,
	Eric Sunshine

Christian Couder <christian.couder@gmail.com> 于2021年7月12日周一 下午9:17写道:
>
> > +test_expect_success 'cat-file --batch equals to --batch-check with atoms' '
> > +       git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)
> > +%(raw)" >expected <<-EOF &&
> > +       refs/heads/main
> > +       refs/tags/testtag
> > +       EOF
> > +       git cat-file --batch >actual <<-EOF &&
> > +       refs/heads/main
> > +       refs/tags/testtag
> > +       EOF
> > +       cmp expected actual
> > +'
>
> I also wonder if the above new test belong to this commit or if it
> could be moved to a previous commit.

It's relate to %(raw), but cat-file don't realize %(raw) before. So the location
of this test should remain unchanged.

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 00/17] [GSOC] cat-file: reuse ref-filter logic
  2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                   ` (20 preceding siblings ...)
  2021-07-12 13:02 ` Philip Oakley
@ 2021-07-15 15:40 ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
                     ` (16 more replies)
  21 siblings, 17 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu

This patch series makes cat-file reuse ref-filter logic.

Change from last version:

 1. Move some atoms' test to the commit: [GSOC] ref-filter: add
    cat_file_mode to ref_format.
 2. Advance the commit of performance tests.
 3. Modified some commit messages related to cat-file performance.

By the way, "[GSOC] ref-filter: add %(rest) atom" and its previous commits
should belong to zh/ref-filter-raw-data and the rest should belong to
zh/cat-file-batch-refactor.

ZheNing Hu (17):
  [GSOC] ref-filter: add obj-type check in grab contents
  [GSOC] ref-filter: add %(raw) atom
  [GSOC] ref-filter: --format=%(raw) re-support --perl
  [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
  [GSOC] ref-filter: add %(rest) atom
  [GSOC] ref-filter: pass get_object() return value to their callers
  [GSOC] ref-filter: introduce free_ref_array_item_value() function
  [GSOC] ref-filter: add cat_file_mode to ref_format
  [GSOC] ref-filter: modify the error message and value in get_object
  [GSOC] cat-file: add has_object_file() check
  [GSOC] cat-file: change batch_objects parameter name
  [GSOC] cat-file: create p1006-cat-file.sh
  [GSOC] cat-file: reuse ref-filter logic
  [GSOC] cat-file: reuse err buf in batch_object_write()
  [GSOC] cat-file: re-implement --textconv, --filters options
  [GSOC] ref-filter: remove grab_oid() function
  [GSOC] cat-file: use fast path when using default_format

 Documentation/git-cat-file.txt     |   6 +
 Documentation/git-for-each-ref.txt |   9 +
 builtin/cat-file.c                 | 306 +++++++++----------------
 builtin/tag.c                      |   2 +-
 quote.c                            |  17 ++
 quote.h                            |   1 +
 ref-filter.c                       | 346 +++++++++++++++++++++--------
 ref-filter.h                       |  13 +-
 t/perf/p1006-cat-file.sh           |  28 +++
 t/t1006-cat-file.sh                | 239 ++++++++++++++++++++
 t/t3203-branch-output.sh           |   4 +
 t/t6300-for-each-ref.sh            | 235 ++++++++++++++++++++
 t/t6301-for-each-ref-errors.sh     |   2 +-
 t/t7004-tag.sh                     |   4 +
 t/t7030-verify-tag.sh              |   4 +
 15 files changed, 921 insertions(+), 295 deletions(-)
 create mode 100755 t/perf/p1006-cat-file.sh


base-commit: 75ae10bc75336db031ee58d13c5037b929235912
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-993%2Fadlternative%2Fcat-file-batch-refactor-2-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-993/adlternative/cat-file-batch-refactor-2-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/993

Range-diff vs v1:

  1:  9aef8882bd1 <  -:  ----------- cat-file: handle trivial --batch format with --batch-all-objects
  2:  1332006006f <  -:  ----------- cat-file: merge two block into one
  3:  0dacc60bbcc =  1:  45c0cbe44d5 [GSOC] ref-filter: add obj-type check in grab contents
  4:  5bd715ae9f0 =  2:  554d7653ee7 [GSOC] ref-filter: add %(raw) atom
  5:  df2cd9e8e61 =  3:  94addd4676a [GSOC] ref-filter: --format=%(raw) re-support --perl
  6:  2d34ad3bc77 =  4:  45984f94bf3 [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
  7:  460c807c6ab !  5:  6fb9cfdeab1 [GSOC] ref-filter: add %(rest) atom
     @@ Commit message
          [GSOC] ref-filter: add %(rest) atom
      
          In order to let "cat-file --batch=%(rest)" use the ref-filter
     -    interface, add %(rest) atom for ref-filter. "git for-each-ref",
     -    "git branch", "git tag" and "git verify-tag" will reject %(rest)
     -    by default.
     +    interface, add %(rest) atom for ref-filter. Introduce the
     +    reject_atom() to reject the atom %(rest) for "git for-each-ref",
     +    "git branch", "git tag" and "git verify-tag".
      
          Mentored-by: Christian Couder <christian.couder@gmail.com>
          Mentored-by: Hariom Verma <hariom18599@gmail.com>
     @@ ref-filter.c: static struct {
       	/*
       	 * Please update $__git_ref_fieldlist in git-completion.bash
       	 * when you add new atoms
     +@@ ref-filter.c: static const char *find_next(const char *cp)
     + 	return NULL;
     + }
     + 
     ++static int reject_atom(enum atom_type atom_type)
     ++{
     ++	return atom_type == ATOM_REST;
     ++}
     ++
     + /*
     +  * Make sure the format string is well formed, and parse out
     +  * the used atoms.
      @@ ref-filter.c: int verify_ref_format(struct ref_format *format)
     + 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
       		if (at < 0)
       			die("%s", err.buf);
     ++		if (reject_atom(used_atom[at].atom_type))
     ++			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
       
     -+		if (used_atom[at].atom_type == ATOM_REST)
     -+			die("this command reject atom %%(%.*s)", (int)(ep - sp - 2), sp + 2);
     -+
       		if ((format->quote_style == QUOTE_PYTHON ||
       		     format->quote_style == QUOTE_SHELL ||
     - 		     format->quote_style == QUOTE_TCL) &&
      @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbuf *err)
       			v->handler = else_atom_handler;
       			v->s = xstrdup("");
  8:  e1aca51d500 =  6:  c3378dbfaed [GSOC] ref-filter: pass get_object() return value to their callers
  9:  6ad42c96405 =  7:  e5cf5541024 [GSOC] ref-filter: introduce free_ref_array_item_value() function
 10:  b61d538d53d !  8:  bf052cc5d3f [GSOC] ref-filter: introduce reject_atom()
     @@ Metadata
      Author: ZheNing Hu <adlternative@gmail.com>
      
       ## Commit message ##
     -    [GSOC] ref-filter: introduce reject_atom()
     +    [GSOC] ref-filter: add cat_file_mode to ref_format
      
     -    Add `cat_file_mode` member in struct `ref_format` and introduce
     -    the function `reject_atom()`, when `cat-file --batch` use ref-filter
     -    logic later, it can help us reject atoms in verify_ref_format()
     -    which cat-file cannot use, e.g. `%(refname)`, `%(push)`,
     -    `%(upstream)`... or the atom `%(rest)` which for-each-ref family
     -    cannot use.
     +    Add `cat_file_mode` member to struct `ref_format`. When
     +    `cat-file --batch` use ref-filter logic later, it can help us
     +    reject atoms in verify_ref_format() which cat-file cannot use,
     +    e.g. `%(refname)`, `%(push)`, `%(upstream)"...
     +
     +    Add batch_test_atom() to t/t1006-cat-file.sh and add check
     +    for cat-file --batch, this can help us clearly show which
     +    atoms cat-file accepts and which atoms it rejects.
      
          Helped-by: Eric Sunshine <sunshine@sunshineco.com>
          Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     @@ ref-filter.c: static const char *find_next(const char *cp)
       	return NULL;
       }
       
     +-static int reject_atom(enum atom_type atom_type)
     +-{
     +-	return atom_type == ATOM_REST;
      +static int reject_atom(int cat_file_mode, enum atom_type atom_type)
      +{
      +	if (!cat_file_mode)
     @@ ref-filter.c: static const char *find_next(const char *cp)
      +	default:
      +		return 0;
      +	}
     -+}
     -+
     + }
     + 
       /*
     -  * Make sure the format string is well formed, and parse out
     -  * the used atoms.
      @@ ref-filter.c: int verify_ref_format(struct ref_format *format)
       		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
       		if (at < 0)
       			die("%s", err.buf);
     --
     --		if (used_atom[at].atom_type == ATOM_REST)
     --			die("this command reject atom %%(%.*s)", (int)(ep - sp - 2), sp + 2);
     +-		if (reject_atom(used_atom[at].atom_type))
      +		if (reject_atom(format->cat_file_mode, used_atom[at].atom_type))
     -+			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
     + 			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
       
       		if ((format->quote_style == QUOTE_PYTHON ||
     - 		     format->quote_style == QUOTE_SHELL ||
      
       ## ref-filter.h ##
      @@ ref-filter.h: struct ref_format {
     @@ ref-filter.h: struct ref_format {
       	int quote_style;
       	int use_rest;
       	int use_color;
     +
     + ## t/t1006-cat-file.sh ##
     +@@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
     + 	cmp expect actual
     + '
     + 
     ++. "$TEST_DIRECTORY"/lib-gpg.sh
     ++. "$TEST_DIRECTORY"/lib-terminal.sh
     ++
     ++test_expect_success 'cat-file --batch|--batch-check setup' '
     ++	echo 1>blob1 &&
     ++	printf "a\0b\0\c" >blob2 &&
     ++	git add blob1 blob2 &&
     ++	git commit -m "Commit Message" &&
     ++	git branch -M main &&
     ++	git tag -a -m "v0.0.0" testtag &&
     ++	git update-ref refs/myblobs/blob1 HEAD:blob1 &&
     ++	git update-ref refs/myblobs/blob2 HEAD:blob2 &&
     ++	git update-ref refs/mytrees/tree1 HEAD^{tree}
     ++'
     ++
     ++batch_test_atom() {
     ++	if test "$3" = "fail"
     ++	then
     ++		test_expect_${4:-success} $PREREQ "basic atom: $1 $2 must fail" "
     ++			test_must_fail git cat-file --batch-check='$2' >bad <<-EOF
     ++			$1
     ++			EOF
     ++		"
     ++	else
     ++		test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
     ++			git for-each-ref --format='$2' $1 >expected &&
     ++			git cat-file --batch-check='$2' >actual <<-EOF &&
     ++			$1
     ++			EOF
     ++			sanitize_pgp <actual >actual.clean &&
     ++			cmp expected actual.clean
     ++		"
     ++	fi
     ++}
     ++
     ++batch_test_atom refs/heads/main '%(refname)' fail
     ++batch_test_atom refs/heads/main '%(refname:)' fail
     ++batch_test_atom refs/heads/main '%(refname:short)' fail
     ++batch_test_atom refs/heads/main '%(refname:lstrip=1)' fail
     ++batch_test_atom refs/heads/main '%(refname:lstrip=2)' fail
     ++batch_test_atom refs/heads/main '%(refname:lstrip=-1)' fail
     ++batch_test_atom refs/heads/main '%(refname:lstrip=-2)' fail
     ++batch_test_atom refs/heads/main '%(refname:rstrip=1)' fail
     ++batch_test_atom refs/heads/main '%(refname:rstrip=2)' fail
     ++batch_test_atom refs/heads/main '%(refname:rstrip=-1)' fail
     ++batch_test_atom refs/heads/main '%(refname:rstrip=-2)' fail
     ++batch_test_atom refs/heads/main '%(refname:strip=1)' fail
     ++batch_test_atom refs/heads/main '%(refname:strip=2)' fail
     ++batch_test_atom refs/heads/main '%(refname:strip=-1)' fail
     ++batch_test_atom refs/heads/main '%(refname:strip=-2)' fail
     ++batch_test_atom refs/heads/main '%(upstream)' fail
     ++batch_test_atom refs/heads/main '%(upstream:short)' fail
     ++batch_test_atom refs/heads/main '%(upstream:lstrip=2)' fail
     ++batch_test_atom refs/heads/main '%(upstream:lstrip=-2)' fail
     ++batch_test_atom refs/heads/main '%(upstream:rstrip=2)' fail
     ++batch_test_atom refs/heads/main '%(upstream:rstrip=-2)' fail
     ++batch_test_atom refs/heads/main '%(upstream:strip=2)' fail
     ++batch_test_atom refs/heads/main '%(upstream:strip=-2)' fail
     ++batch_test_atom refs/heads/main '%(push)' fail
     ++batch_test_atom refs/heads/main '%(push:short)' fail
     ++batch_test_atom refs/heads/main '%(push:lstrip=1)' fail
     ++batch_test_atom refs/heads/main '%(push:lstrip=-1)' fail
     ++batch_test_atom refs/heads/main '%(push:rstrip=1)' fail
     ++batch_test_atom refs/heads/main '%(push:rstrip=-1)' fail
     ++batch_test_atom refs/heads/main '%(push:strip=1)' fail
     ++batch_test_atom refs/heads/main '%(push:strip=-1)' fail
     ++batch_test_atom refs/heads/main '%(objecttype)'
     ++batch_test_atom refs/heads/main '%(objectsize)'
     ++batch_test_atom refs/heads/main '%(objectsize:disk)'
     ++batch_test_atom refs/heads/main '%(deltabase)'
     ++batch_test_atom refs/heads/main '%(objectname)'
     ++batch_test_atom refs/heads/main '%(objectname:short)' fail
     ++batch_test_atom refs/heads/main '%(objectname:short=1)' fail
     ++batch_test_atom refs/heads/main '%(objectname:short=10)' fail
     ++batch_test_atom refs/heads/main '%(tree)' fail
     ++batch_test_atom refs/heads/main '%(tree:short)' fail
     ++batch_test_atom refs/heads/main '%(tree:short=1)' fail
     ++batch_test_atom refs/heads/main '%(tree:short=10)' fail
     ++batch_test_atom refs/heads/main '%(parent)' fail
     ++batch_test_atom refs/heads/main '%(parent:short)' fail
     ++batch_test_atom refs/heads/main '%(parent:short=1)' fail
     ++batch_test_atom refs/heads/main '%(parent:short=10)' fail
     ++batch_test_atom refs/heads/main '%(numparent)' fail
     ++batch_test_atom refs/heads/main '%(object)' fail
     ++batch_test_atom refs/heads/main '%(type)' fail
     ++batch_test_atom refs/heads/main '%(raw)' fail
     ++batch_test_atom refs/heads/main '%(*objectname)' fail
     ++batch_test_atom refs/heads/main '%(*objecttype)' fail
     ++batch_test_atom refs/heads/main '%(author)' fail
     ++batch_test_atom refs/heads/main '%(authorname)' fail
     ++batch_test_atom refs/heads/main '%(authoremail)' fail
     ++batch_test_atom refs/heads/main '%(authoremail:trim)' fail
     ++batch_test_atom refs/heads/main '%(authoremail:localpart)' fail
     ++batch_test_atom refs/heads/main '%(authordate)' fail
     ++batch_test_atom refs/heads/main '%(committer)' fail
     ++batch_test_atom refs/heads/main '%(committername)' fail
     ++batch_test_atom refs/heads/main '%(committeremail)' fail
     ++batch_test_atom refs/heads/main '%(committeremail:trim)' fail
     ++batch_test_atom refs/heads/main '%(committeremail:localpart)' fail
     ++batch_test_atom refs/heads/main '%(committerdate)' fail
     ++batch_test_atom refs/heads/main '%(tag)' fail
     ++batch_test_atom refs/heads/main '%(tagger)' fail
     ++batch_test_atom refs/heads/main '%(taggername)' fail
     ++batch_test_atom refs/heads/main '%(taggeremail)' fail
     ++batch_test_atom refs/heads/main '%(taggeremail:trim)' fail
     ++batch_test_atom refs/heads/main '%(taggeremail:localpart)' fail
     ++batch_test_atom refs/heads/main '%(taggerdate)' fail
     ++batch_test_atom refs/heads/main '%(creator)' fail
     ++batch_test_atom refs/heads/main '%(creatordate)' fail
     ++batch_test_atom refs/heads/main '%(subject)' fail
     ++batch_test_atom refs/heads/main '%(subject:sanitize)' fail
     ++batch_test_atom refs/heads/main '%(contents:subject)' fail
     ++batch_test_atom refs/heads/main '%(body)' fail
     ++batch_test_atom refs/heads/main '%(contents:body)' fail
     ++batch_test_atom refs/heads/main '%(contents:signature)' fail
     ++batch_test_atom refs/heads/main '%(contents)' fail
     ++batch_test_atom refs/heads/main '%(HEAD)' fail
     ++batch_test_atom refs/heads/main '%(upstream:track)' fail
     ++batch_test_atom refs/heads/main '%(upstream:trackshort)' fail
     ++batch_test_atom refs/heads/main '%(upstream:track,nobracket)' fail
     ++batch_test_atom refs/heads/main '%(upstream:nobracket,track)' fail
     ++batch_test_atom refs/heads/main '%(push:track)' fail
     ++batch_test_atom refs/heads/main '%(push:trackshort)' fail
     ++batch_test_atom refs/heads/main '%(worktreepath)' fail
     ++batch_test_atom refs/heads/main '%(symref)' fail
     ++batch_test_atom refs/heads/main '%(flag)' fail
     ++
     ++batch_test_atom refs/tags/testtag '%(refname)' fail
     ++batch_test_atom refs/tags/testtag '%(refname:short)' fail
     ++batch_test_atom refs/tags/testtag '%(upstream)' fail
     ++batch_test_atom refs/tags/testtag '%(push)' fail
     ++batch_test_atom refs/tags/testtag '%(objecttype)'
     ++batch_test_atom refs/tags/testtag '%(objectsize)'
     ++batch_test_atom refs/tags/testtag '%(objectsize:disk)'
     ++batch_test_atom refs/tags/testtag '%(*objectsize:disk)' fail
     ++batch_test_atom refs/tags/testtag '%(deltabase)'
     ++batch_test_atom refs/tags/testtag '%(*deltabase)' fail
     ++batch_test_atom refs/tags/testtag '%(objectname)'
     ++batch_test_atom refs/tags/testtag '%(objectname:short)' fail
     ++batch_test_atom refs/tags/testtag '%(tree)' fail
     ++batch_test_atom refs/tags/testtag '%(tree:short)' fail
     ++batch_test_atom refs/tags/testtag '%(tree:short=1)' fail
     ++batch_test_atom refs/tags/testtag '%(tree:short=10)' fail
     ++batch_test_atom refs/tags/testtag '%(parent)' fail
     ++batch_test_atom refs/tags/testtag '%(parent:short)' fail
     ++batch_test_atom refs/tags/testtag '%(parent:short=1)' fail
     ++batch_test_atom refs/tags/testtag '%(parent:short=10)' fail
     ++batch_test_atom refs/tags/testtag '%(numparent)' fail
     ++batch_test_atom refs/tags/testtag '%(object)' fail
     ++batch_test_atom refs/tags/testtag '%(type)' fail
     ++batch_test_atom refs/tags/testtag '%(*objectname)' fail
     ++batch_test_atom refs/tags/testtag '%(*objecttype)' fail
     ++batch_test_atom refs/tags/testtag '%(author)' fail
     ++batch_test_atom refs/tags/testtag '%(authorname)' fail
     ++batch_test_atom refs/tags/testtag '%(authoremail)' fail
     ++batch_test_atom refs/tags/testtag '%(authoremail:trim)' fail
     ++batch_test_atom refs/tags/testtag '%(authoremail:localpart)' fail
     ++batch_test_atom refs/tags/testtag '%(authordate)' fail
     ++batch_test_atom refs/tags/testtag '%(committer)' fail
     ++batch_test_atom refs/tags/testtag '%(committername)' fail
     ++batch_test_atom refs/tags/testtag '%(committeremail)' fail
     ++batch_test_atom refs/tags/testtag '%(committeremail:trim)' fail
     ++batch_test_atom refs/tags/testtag '%(committeremail:localpart)' fail
     ++batch_test_atom refs/tags/testtag '%(committerdate)' fail
     ++batch_test_atom refs/tags/testtag '%(tag)' fail
     ++batch_test_atom refs/tags/testtag '%(tagger)' fail
     ++batch_test_atom refs/tags/testtag '%(taggername)' fail
     ++batch_test_atom refs/tags/testtag '%(taggeremail)' fail
     ++batch_test_atom refs/tags/testtag '%(taggeremail:trim)' fail
     ++batch_test_atom refs/tags/testtag '%(taggeremail:localpart)' fail
     ++batch_test_atom refs/tags/testtag '%(taggerdate)' fail
     ++batch_test_atom refs/tags/testtag '%(creator)' fail
     ++batch_test_atom refs/tags/testtag '%(creatordate)' fail
     ++batch_test_atom refs/tags/testtag '%(subject)' fail
     ++batch_test_atom refs/tags/testtag '%(subject:sanitize)' fail
     ++batch_test_atom refs/tags/testtag '%(contents:subject)' fail
     ++batch_test_atom refs/tags/testtag '%(body)' fail
     ++batch_test_atom refs/tags/testtag '%(contents:body)' fail
     ++batch_test_atom refs/tags/testtag '%(contents:signature)' fail
     ++batch_test_atom refs/tags/testtag '%(contents)' fail
     ++batch_test_atom refs/tags/testtag '%(HEAD)' fail
     ++
     ++batch_test_atom refs/myblobs/blob1 '%(refname)' fail
     ++batch_test_atom refs/myblobs/blob1 '%(upstream)' fail
     ++batch_test_atom refs/myblobs/blob1 '%(push)' fail
     ++batch_test_atom refs/myblobs/blob1 '%(HEAD)' fail
     ++
     ++batch_test_atom refs/myblobs/blob1 '%(objectname)'
     ++batch_test_atom refs/myblobs/blob1 '%(objecttype)'
     ++batch_test_atom refs/myblobs/blob1 '%(objectsize)'
     ++batch_test_atom refs/myblobs/blob1 '%(objectsize:disk)'
     ++batch_test_atom refs/myblobs/blob1 '%(deltabase)'
     ++
     ++batch_test_atom refs/myblobs/blob1 '%(contents)' fail
     ++batch_test_atom refs/myblobs/blob2 '%(contents)' fail
     ++
     ++batch_test_atom refs/myblobs/blob1 '%(raw)' fail
     ++batch_test_atom refs/myblobs/blob2 '%(raw)' fail
     ++batch_test_atom refs/mytrees/tree1 '%(raw)' fail
     ++
     ++batch_test_atom refs/myblobs/blob1 '%(raw:size)' fail
     ++batch_test_atom refs/myblobs/blob2 '%(raw:size)' fail
     ++batch_test_atom refs/mytrees/tree1 '%(raw:size)' fail
     ++
     ++batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
     ++batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
     ++batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)' fail
     ++
     ++batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)' fail
     ++batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)' fail
     ++batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)' fail
     ++batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)' fail
     ++
     ++batch_test_atom refs/heads/main 'VALID'
     ++batch_test_atom refs/heads/main '%(INVALID)' fail
     ++batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
     ++
     ++batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
     ++%(raw)' fail
     ++batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
     ++%(raw)' fail
     ++batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
     ++%(raw)' fail
     ++batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
     ++%(raw)' fail
     ++
     + test_done
 11:  790c558d7cc =  9:  7058f47d41d [GSOC] ref-filter: modify the error message and value in get_object
 12:  fca49379025 = 10:  3af0def894c [GSOC] cat-file: add has_object_file() check
 13:  43ce6bf2626 = 11:  c445fa6520b [GSOC] cat-file: change batch_objects parameter name
 18:  2edca9dc465 = 12:  ae3a7816013 [GSOC] cat-file: create p1006-cat-file.sh
 14:  70e83e4ba3c ! 13:  8b26c9e7455 [GSOC] cat-file: reuse ref-filter logic
     @@ Commit message
      
          The performance for `git cat-file --batch-all-objects
          --batch-check` on the Git repository itself with performance
     -    testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
     -    1.134 s ±  0.063 s.
     +    testing tool `hyperfine` changes from 616.7 ms ± 8.9 ms to
     +    758.7 ms ± 16.4 ms.
      
          The performance for `git cat-file --batch-all-objects --batch
          >/dev/null` on the Git repository itself with performance testing
     -    tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
     -    total" to "33.69s user 1.54s system 87% cpu 40.258 total".
     +    tool `time` change from "25.26s user 0.30s system 98% cpu 25.840 total"
     +    to "28.79s user 0.83s system 99% cpu 29.829 total".
     +
     +    The reasons for the performance degradation are as follows:
     +    1. There are a lot of data copies in the logic of ref-filter.
     +    2, In order to be able to support more useful formats, complex
     +    data structure and parsing process are used in ref-filter.
     +
     +    A later patch will add a fast path which will mitigate the
     +    performance regression introduced by this patch.
      
          Mentored-by: Christian Couder <christian.couder@gmail.com>
          Mentored-by: Hariom Verma <hariom18599@gmail.com>
     @@ builtin/cat-file.c: int cmd_cat_file(int argc, const char **argv, const char *pr
       		die("git cat-file --allow-unknown-type: use with -s or -t");
      
       ## t/t1006-cat-file.sh ##
     -@@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
     - 	git -C all-two cat-file --batch-all-objects --batch="batman" >actual &&
     - 	cmp expect actual
     - '
     -+. "$TEST_DIRECTORY"/lib-gpg.sh
     -+. "$TEST_DIRECTORY"/lib-terminal.sh
     -+
     -+test_expect_success 'cat-file --batch|--batch-check setup' '
     -+	echo 1>blob1 &&
     -+	printf "a\0b\0\c" >blob2 &&
     -+	git add blob1 blob2 &&
     -+	git commit -m "Commit Message" &&
     -+	git branch -M main &&
     -+	git tag -a -m "v0.0.0" testtag &&
     -+	git update-ref refs/myblobs/blob1 HEAD:blob1 &&
     -+	git update-ref refs/myblobs/blob2 HEAD:blob2 &&
     -+	git update-ref refs/mytrees/tree1 HEAD^{tree}
     -+'
     -+
     -+batch_test_atom() {
     -+	if test "$3" = "fail"
     -+	then
     -+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2 must fail" "
     -+			test_must_fail git cat-file --batch-check='$2' >bad <<-EOF
     -+			$1
     -+			EOF
     -+		"
     -+	else
     -+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
     -+			git for-each-ref --format='$2' $1 >expected &&
     -+			git cat-file --batch-check='$2' >actual <<-EOF &&
     -+			$1
     -+			EOF
     -+			sanitize_pgp <actual >actual.clean &&
     -+			cmp expected actual.clean
     -+		"
     -+	fi
     -+}
     -+
     -+batch_test_atom refs/heads/main '%(refname)' fail
     -+batch_test_atom refs/heads/main '%(refname:)' fail
     -+batch_test_atom refs/heads/main '%(refname:short)' fail
     -+batch_test_atom refs/heads/main '%(refname:lstrip=1)' fail
     -+batch_test_atom refs/heads/main '%(refname:lstrip=2)' fail
     -+batch_test_atom refs/heads/main '%(refname:lstrip=-1)' fail
     -+batch_test_atom refs/heads/main '%(refname:lstrip=-2)' fail
     -+batch_test_atom refs/heads/main '%(refname:rstrip=1)' fail
     -+batch_test_atom refs/heads/main '%(refname:rstrip=2)' fail
     -+batch_test_atom refs/heads/main '%(refname:rstrip=-1)' fail
     -+batch_test_atom refs/heads/main '%(refname:rstrip=-2)' fail
     -+batch_test_atom refs/heads/main '%(refname:strip=1)' fail
     -+batch_test_atom refs/heads/main '%(refname:strip=2)' fail
     -+batch_test_atom refs/heads/main '%(refname:strip=-1)' fail
     -+batch_test_atom refs/heads/main '%(refname:strip=-2)' fail
     -+batch_test_atom refs/heads/main '%(upstream)' fail
     -+batch_test_atom refs/heads/main '%(upstream:short)' fail
     -+batch_test_atom refs/heads/main '%(upstream:lstrip=2)' fail
     -+batch_test_atom refs/heads/main '%(upstream:lstrip=-2)' fail
     -+batch_test_atom refs/heads/main '%(upstream:rstrip=2)' fail
     -+batch_test_atom refs/heads/main '%(upstream:rstrip=-2)' fail
     -+batch_test_atom refs/heads/main '%(upstream:strip=2)' fail
     -+batch_test_atom refs/heads/main '%(upstream:strip=-2)' fail
     -+batch_test_atom refs/heads/main '%(push)' fail
     -+batch_test_atom refs/heads/main '%(push:short)' fail
     -+batch_test_atom refs/heads/main '%(push:lstrip=1)' fail
     -+batch_test_atom refs/heads/main '%(push:lstrip=-1)' fail
     -+batch_test_atom refs/heads/main '%(push:rstrip=1)' fail
     -+batch_test_atom refs/heads/main '%(push:rstrip=-1)' fail
     -+batch_test_atom refs/heads/main '%(push:strip=1)' fail
     -+batch_test_atom refs/heads/main '%(push:strip=-1)' fail
     -+batch_test_atom refs/heads/main '%(objecttype)'
     -+batch_test_atom refs/heads/main '%(objectsize)'
     -+batch_test_atom refs/heads/main '%(objectsize:disk)'
     -+batch_test_atom refs/heads/main '%(deltabase)'
     -+batch_test_atom refs/heads/main '%(objectname)'
     +@@ t/t1006-cat-file.sh: batch_test_atom refs/heads/main '%(objectsize)'
     + batch_test_atom refs/heads/main '%(objectsize:disk)'
     + batch_test_atom refs/heads/main '%(deltabase)'
     + batch_test_atom refs/heads/main '%(objectname)'
     +-batch_test_atom refs/heads/main '%(objectname:short)' fail
     +-batch_test_atom refs/heads/main '%(objectname:short=1)' fail
     +-batch_test_atom refs/heads/main '%(objectname:short=10)' fail
     +-batch_test_atom refs/heads/main '%(tree)' fail
     +-batch_test_atom refs/heads/main '%(tree:short)' fail
     +-batch_test_atom refs/heads/main '%(tree:short=1)' fail
     +-batch_test_atom refs/heads/main '%(tree:short=10)' fail
     +-batch_test_atom refs/heads/main '%(parent)' fail
     +-batch_test_atom refs/heads/main '%(parent:short)' fail
     +-batch_test_atom refs/heads/main '%(parent:short=1)' fail
     +-batch_test_atom refs/heads/main '%(parent:short=10)' fail
     +-batch_test_atom refs/heads/main '%(numparent)' fail
     +-batch_test_atom refs/heads/main '%(object)' fail
     +-batch_test_atom refs/heads/main '%(type)' fail
     +-batch_test_atom refs/heads/main '%(raw)' fail
     +-batch_test_atom refs/heads/main '%(*objectname)' fail
     +-batch_test_atom refs/heads/main '%(*objecttype)' fail
     +-batch_test_atom refs/heads/main '%(author)' fail
     +-batch_test_atom refs/heads/main '%(authorname)' fail
     +-batch_test_atom refs/heads/main '%(authoremail)' fail
     +-batch_test_atom refs/heads/main '%(authoremail:trim)' fail
     +-batch_test_atom refs/heads/main '%(authoremail:localpart)' fail
     +-batch_test_atom refs/heads/main '%(authordate)' fail
     +-batch_test_atom refs/heads/main '%(committer)' fail
     +-batch_test_atom refs/heads/main '%(committername)' fail
     +-batch_test_atom refs/heads/main '%(committeremail)' fail
     +-batch_test_atom refs/heads/main '%(committeremail:trim)' fail
     +-batch_test_atom refs/heads/main '%(committeremail:localpart)' fail
     +-batch_test_atom refs/heads/main '%(committerdate)' fail
     +-batch_test_atom refs/heads/main '%(tag)' fail
     +-batch_test_atom refs/heads/main '%(tagger)' fail
     +-batch_test_atom refs/heads/main '%(taggername)' fail
     +-batch_test_atom refs/heads/main '%(taggeremail)' fail
     +-batch_test_atom refs/heads/main '%(taggeremail:trim)' fail
     +-batch_test_atom refs/heads/main '%(taggeremail:localpart)' fail
     +-batch_test_atom refs/heads/main '%(taggerdate)' fail
     +-batch_test_atom refs/heads/main '%(creator)' fail
     +-batch_test_atom refs/heads/main '%(creatordate)' fail
     +-batch_test_atom refs/heads/main '%(subject)' fail
     +-batch_test_atom refs/heads/main '%(subject:sanitize)' fail
     +-batch_test_atom refs/heads/main '%(contents:subject)' fail
     +-batch_test_atom refs/heads/main '%(body)' fail
     +-batch_test_atom refs/heads/main '%(contents:body)' fail
     +-batch_test_atom refs/heads/main '%(contents:signature)' fail
     +-batch_test_atom refs/heads/main '%(contents)' fail
      +batch_test_atom refs/heads/main '%(objectname:short)'
      +batch_test_atom refs/heads/main '%(objectname:short=1)'
      +batch_test_atom refs/heads/main '%(objectname:short=10)'
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch="batman" with --batch
      +batch_test_atom refs/heads/main '%(contents:body)'
      +batch_test_atom refs/heads/main '%(contents:signature)'
      +batch_test_atom refs/heads/main '%(contents)'
     -+batch_test_atom refs/heads/main '%(HEAD)' fail
     -+batch_test_atom refs/heads/main '%(upstream:track)' fail
     -+batch_test_atom refs/heads/main '%(upstream:trackshort)' fail
     -+batch_test_atom refs/heads/main '%(upstream:track,nobracket)' fail
     -+batch_test_atom refs/heads/main '%(upstream:nobracket,track)' fail
     -+batch_test_atom refs/heads/main '%(push:track)' fail
     -+batch_test_atom refs/heads/main '%(push:trackshort)' fail
     -+batch_test_atom refs/heads/main '%(worktreepath)' fail
     -+batch_test_atom refs/heads/main '%(symref)' fail
     -+batch_test_atom refs/heads/main '%(flag)' fail
     -+
     -+batch_test_atom refs/tags/testtag '%(refname)' fail
     -+batch_test_atom refs/tags/testtag '%(refname:short)' fail
     -+batch_test_atom refs/tags/testtag '%(upstream)' fail
     -+batch_test_atom refs/tags/testtag '%(push)' fail
     -+batch_test_atom refs/tags/testtag '%(objecttype)'
     -+batch_test_atom refs/tags/testtag '%(objectsize)'
     -+batch_test_atom refs/tags/testtag '%(objectsize:disk)'
     + batch_test_atom refs/heads/main '%(HEAD)' fail
     + batch_test_atom refs/heads/main '%(upstream:track)' fail
     + batch_test_atom refs/heads/main '%(upstream:trackshort)' fail
     +@@ t/t1006-cat-file.sh: batch_test_atom refs/tags/testtag '%(push)' fail
     + batch_test_atom refs/tags/testtag '%(objecttype)'
     + batch_test_atom refs/tags/testtag '%(objectsize)'
     + batch_test_atom refs/tags/testtag '%(objectsize:disk)'
     +-batch_test_atom refs/tags/testtag '%(*objectsize:disk)' fail
      +batch_test_atom refs/tags/testtag '%(*objectsize:disk)'
     -+batch_test_atom refs/tags/testtag '%(deltabase)'
     + batch_test_atom refs/tags/testtag '%(deltabase)'
     +-batch_test_atom refs/tags/testtag '%(*deltabase)' fail
      +batch_test_atom refs/tags/testtag '%(*deltabase)'
     -+batch_test_atom refs/tags/testtag '%(objectname)'
     + batch_test_atom refs/tags/testtag '%(objectname)'
     +-batch_test_atom refs/tags/testtag '%(objectname:short)' fail
     +-batch_test_atom refs/tags/testtag '%(tree)' fail
     +-batch_test_atom refs/tags/testtag '%(tree:short)' fail
     +-batch_test_atom refs/tags/testtag '%(tree:short=1)' fail
     +-batch_test_atom refs/tags/testtag '%(tree:short=10)' fail
     +-batch_test_atom refs/tags/testtag '%(parent)' fail
     +-batch_test_atom refs/tags/testtag '%(parent:short)' fail
     +-batch_test_atom refs/tags/testtag '%(parent:short=1)' fail
     +-batch_test_atom refs/tags/testtag '%(parent:short=10)' fail
     +-batch_test_atom refs/tags/testtag '%(numparent)' fail
     +-batch_test_atom refs/tags/testtag '%(object)' fail
     +-batch_test_atom refs/tags/testtag '%(type)' fail
     +-batch_test_atom refs/tags/testtag '%(*objectname)' fail
     +-batch_test_atom refs/tags/testtag '%(*objecttype)' fail
     +-batch_test_atom refs/tags/testtag '%(author)' fail
     +-batch_test_atom refs/tags/testtag '%(authorname)' fail
     +-batch_test_atom refs/tags/testtag '%(authoremail)' fail
     +-batch_test_atom refs/tags/testtag '%(authoremail:trim)' fail
     +-batch_test_atom refs/tags/testtag '%(authoremail:localpart)' fail
     +-batch_test_atom refs/tags/testtag '%(authordate)' fail
     +-batch_test_atom refs/tags/testtag '%(committer)' fail
     +-batch_test_atom refs/tags/testtag '%(committername)' fail
     +-batch_test_atom refs/tags/testtag '%(committeremail)' fail
     +-batch_test_atom refs/tags/testtag '%(committeremail:trim)' fail
     +-batch_test_atom refs/tags/testtag '%(committeremail:localpart)' fail
     +-batch_test_atom refs/tags/testtag '%(committerdate)' fail
     +-batch_test_atom refs/tags/testtag '%(tag)' fail
     +-batch_test_atom refs/tags/testtag '%(tagger)' fail
     +-batch_test_atom refs/tags/testtag '%(taggername)' fail
     +-batch_test_atom refs/tags/testtag '%(taggeremail)' fail
     +-batch_test_atom refs/tags/testtag '%(taggeremail:trim)' fail
     +-batch_test_atom refs/tags/testtag '%(taggeremail:localpart)' fail
     +-batch_test_atom refs/tags/testtag '%(taggerdate)' fail
     +-batch_test_atom refs/tags/testtag '%(creator)' fail
     +-batch_test_atom refs/tags/testtag '%(creatordate)' fail
     +-batch_test_atom refs/tags/testtag '%(subject)' fail
     +-batch_test_atom refs/tags/testtag '%(subject:sanitize)' fail
     +-batch_test_atom refs/tags/testtag '%(contents:subject)' fail
     +-batch_test_atom refs/tags/testtag '%(body)' fail
     +-batch_test_atom refs/tags/testtag '%(contents:body)' fail
     +-batch_test_atom refs/tags/testtag '%(contents:signature)' fail
     +-batch_test_atom refs/tags/testtag '%(contents)' fail
      +batch_test_atom refs/tags/testtag '%(objectname:short)'
      +batch_test_atom refs/tags/testtag '%(tree)'
      +batch_test_atom refs/tags/testtag '%(tree:short)'
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file --batch="batman" with --batch
      +batch_test_atom refs/tags/testtag '%(contents:body)'
      +batch_test_atom refs/tags/testtag '%(contents:signature)'
      +batch_test_atom refs/tags/testtag '%(contents)'
     -+batch_test_atom refs/tags/testtag '%(HEAD)' fail
     -+
     -+batch_test_atom refs/myblobs/blob1 '%(refname)' fail
     -+batch_test_atom refs/myblobs/blob1 '%(upstream)' fail
     -+batch_test_atom refs/myblobs/blob1 '%(push)' fail
     -+batch_test_atom refs/myblobs/blob1 '%(HEAD)' fail
     -+
     -+batch_test_atom refs/myblobs/blob1 '%(objectname)'
     -+batch_test_atom refs/myblobs/blob1 '%(objecttype)'
     -+batch_test_atom refs/myblobs/blob1 '%(objectsize)'
     -+batch_test_atom refs/myblobs/blob1 '%(objectsize:disk)'
     -+batch_test_atom refs/myblobs/blob1 '%(deltabase)'
     -+
     + batch_test_atom refs/tags/testtag '%(HEAD)' fail
     + 
     + batch_test_atom refs/myblobs/blob1 '%(refname)' fail
     +@@ t/t1006-cat-file.sh: batch_test_atom refs/myblobs/blob1 '%(objectsize)'
     + batch_test_atom refs/myblobs/blob1 '%(objectsize:disk)'
     + batch_test_atom refs/myblobs/blob1 '%(deltabase)'
     + 
     +-batch_test_atom refs/myblobs/blob1 '%(contents)' fail
     +-batch_test_atom refs/myblobs/blob2 '%(contents)' fail
      +batch_test_atom refs/myblobs/blob1 '%(contents)'
      +batch_test_atom refs/myblobs/blob2 '%(contents)'
     -+
     + 
     +-batch_test_atom refs/myblobs/blob1 '%(raw)' fail
     +-batch_test_atom refs/myblobs/blob2 '%(raw)' fail
     +-batch_test_atom refs/mytrees/tree1 '%(raw)' fail
      +batch_test_atom refs/myblobs/blob1 '%(raw)'
      +batch_test_atom refs/myblobs/blob2 '%(raw)'
      +batch_test_atom refs/mytrees/tree1 '%(raw)'
     -+
     + 
     +-batch_test_atom refs/myblobs/blob1 '%(raw:size)' fail
     +-batch_test_atom refs/myblobs/blob2 '%(raw:size)' fail
     +-batch_test_atom refs/mytrees/tree1 '%(raw:size)' fail
      +batch_test_atom refs/myblobs/blob1 '%(raw:size)'
      +batch_test_atom refs/myblobs/blob2 '%(raw:size)'
      +batch_test_atom refs/mytrees/tree1 '%(raw:size)'
     -+
     + 
     +-batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
     +-batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
     +-batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)' fail
      +batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)'
      +batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)'
      +batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)'
     -+
     + 
     +-batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)' fail
     +-batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)' fail
     +-batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)' fail
     +-batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)' fail
      +batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)'
      +batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)'
      +batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)'
      +batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)'
     -+
     -+batch_test_atom refs/heads/main 'VALID'
     -+batch_test_atom refs/heads/main '%(INVALID)' fail
     -+batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
     -+
     -+test_expect_success '%(rest) works with both a branch and a tag' '
     -+	cat >expected <<-EOF &&
     -+	123 commit 123
     -+	456 tag 456
     -+	EOF
     -+	git cat-file --batch-check="%(rest) %(objecttype) %(rest)" >actual <<-EOF &&
     -+	refs/heads/main 123
     -+	refs/tags/testtag 456
     -+	EOF
     -+	test_cmp expected actual
     -+'
     -+
     -+batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
     + 
     + batch_test_atom refs/heads/main 'VALID'
     + batch_test_atom refs/heads/main '%(INVALID)' fail
     + batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
     + 
     + batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
     +-%(raw)' fail
      +%(raw)'
     -+batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
     + batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
     +-%(raw)' fail
      +%(raw)'
     -+batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
     + batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
     +-%(raw)' fail
      +%(raw)'
     -+batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
     + batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
     +-%(raw)' fail
      +%(raw)'
      +
     -+
      +test_expect_success 'cat-file --batch equals to --batch-check with atoms' '
      +	git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)
      +%(raw)" >expected <<-EOF &&
 15:  e20780e9a6c = 14:  fd3901dfee6 [GSOC] cat-file: reuse err buf in batch_object_write()
 16:  fa74bf9451c = 15:  ed556e5f31e [GSOC] cat-file: re-implement --textconv, --filters options
 17:  ff74fa9f2f2 = 16:  96ef6431a2b [GSOC] ref-filter: remove grab_oid() function
 19:  c35e7dfe542 = 17:  5903d02324f [GSOC] cat-file: use fast path when using default_format

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase.  We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag.

Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4db0e40ff4c..5cee6512fba 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1356,11 +1356,12 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }
 
 /* See grab_values */
-static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
+static void grab_sub_body_contents(struct atom_value *val, int deref, struct expand_data *data)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
+	void *buf = data->content;
 
 	for (i = 0; i < used_atom_cnt; i++) {
 		struct used_atom *atom = &used_atom[i];
@@ -1371,10 +1372,13 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			continue;
 		if (deref)
 			name++;
-		if (strcmp(name, "body") &&
-		    !starts_with(name, "subject") &&
-		    !starts_with(name, "trailers") &&
-		    !starts_with(name, "contents"))
+
+		if ((data->type != OBJ_TAG &&
+		     data->type != OBJ_COMMIT) ||
+		    (strcmp(name, "body") &&
+		     !starts_with(name, "subject") &&
+		     !starts_with(name, "trailers") &&
+		     !starts_with(name, "contents")))
 			continue;
 		if (!subpos)
 			find_subpos(buf,
@@ -1438,17 +1442,19 @@ static void fill_missing_values(struct atom_value *val)
  * pointed at by the ref itself; otherwise it is the object the
  * ref (which is a tag) refers to.
  */
-static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
+static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
 {
+	void *buf = data->content;
+
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, data);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, data);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;
@@ -1678,7 +1684,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi->content);
+		grab_values(ref->value, deref, *obj, oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depends on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, we need to find a way to record the length of the
atom_value's member `s`. Although strbuf can already record the
string and its length, if we want to replace the type of atom_value's
member `s` with strbuf, many places in ref-filter that are filled
with dynamically allocated mermory in `v->s` are not easy to replace.
At the same time, we need to check if `v->s == NULL` in
populate_value(), and strbuf cannot easily distinguish NULL and empty
strings, but c-style "const char *" can do it. So add a new member in
`struct atom_value`: `s_size`, which can record raw object size, it
can help us add raw object data to the buffer or compare two buffers
which contain raw object data.

Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, and `--perl` because if the binary raw data is passed to a
variable in such languages, these may not support arbitrary binary data
in their string variable type.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Bagas Sanjaya <bagasdotme@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 140 +++++++++++++++----
 t/t6300-for-each-ref.sh            | 216 +++++++++++++++++++++++++++++
 3 files changed, 338 insertions(+), 27 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2ae2478de70..cbb6f87d13f 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,15 @@ and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol
 out of the trimmed email.
 
+The raw data in an object is `raw`.
+
+raw:size::
+	The raw data size of the object.
+
+Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+`--perl` because such language may not support arbitrary binary data in their
+string variable type.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index 5cee6512fba..506fbc3d691 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,
 	ATOM_TRAILERS,
 	ATOM_CONTENTS,
+	ATOM_RAW,
 	ATOM_UPSTREAM,
 	ATOM_PUSH,
 	ATOM_SYMREF,
@@ -189,6 +190,9 @@ static struct used_atom {
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
+		struct {
+			enum { RAW_BARE, RAW_LENGTH } option;
+		} raw_data;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -426,6 +430,18 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -586,6 +602,7 @@ static struct {
 	[ATOM_BODY] = { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
 	[ATOM_TRAILERS] = { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	[ATOM_CONTENTS] = { "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
+	[ATOM_RAW] = { "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
 	[ATOM_UPSTREAM] = { "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_PUSH] = { "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_SYMREF] = { "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -620,12 +637,15 @@ struct ref_formatting_state {
 
 struct atom_value {
 	const char *s;
+	size_t s_size;
 	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
 		       struct strbuf *err);
 	uintmax_t value; /* used for sorting when not FIELD_STR */
 	struct used_atom *atom;
 };
 
+#define ATOM_VALUE_S_SIZE_INIT (-1)
+
 /*
  * Used to parse format string and sort specifiers
  */
@@ -644,13 +664,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
 				       (int)(ep-atom), atom);
 
-	/* Do we have the atom already used elsewhere? */
-	for (i = 0; i < used_atom_cnt; i++) {
-		int len = strlen(used_atom[i].name);
-		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-			return i;
-	}
-
 	/*
 	 * If the atom name has a colon, strip it and everything after
 	 * it off - it specifies the format for this entry, and
@@ -660,6 +673,13 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	arg = memchr(sp, ':', ep - sp);
 	atom_len = (arg ? arg : ep) - sp;
 
+	/* Do we have the atom already used elsewhere? */
+	for (i = 0; i < used_atom_cnt; i++) {
+		int len = strlen(used_atom[i].name);
+		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
+			return i;
+	}
+
 	/* Is the atom a valid one? */
 	for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
 		int len = strlen(valid_atom[i].name);
@@ -709,11 +729,14 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	return at;
 }
 
-static void quote_formatting(struct strbuf *s, const char *str, int quote_style)
+static void quote_formatting(struct strbuf *s, const char *str, size_t len, int quote_style)
 {
 	switch (quote_style) {
 	case QUOTE_NONE:
-		strbuf_addstr(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(s, str, len);
+		else
+			strbuf_addstr(s, str);
 		break;
 	case QUOTE_SHELL:
 		sq_quote_buf(s, str);
@@ -740,9 +763,12 @@ static int append_atom(struct atom_value *v, struct ref_formatting_state *state,
 	 * encountered.
 	 */
 	if (!state->stack->prev)
-		quote_formatting(&state->stack->output, v->s, state->quote_style);
+		quote_formatting(&state->stack->output, v->s, v->s_size, state->quote_style);
 	else
-		strbuf_addstr(&state->stack->output, v->s);
+		if (v->s_size != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(&state->stack->output, v->s, v->s_size);
+		else
+			strbuf_addstr(&state->stack->output, v->s);
 	return 0;
 }
 
@@ -842,21 +868,23 @@ static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	return 0;
 }
 
-static int is_empty(const char *s)
+static int is_empty(struct strbuf *buf)
 {
-	while (*s != '\0') {
-		if (!isspace(*s))
-			return 0;
-		s++;
-	}
-	return 1;
-}
+	const char *cur = buf->buf;
+	const char *end = buf->buf + buf->len;
+
+	while (cur != end && (isspace(*cur)))
+		cur++;
+
+	return cur == end;
+ }
 
 static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			     struct strbuf *err)
 {
 	struct ref_formatting_stack *cur = state->stack;
 	struct if_then_else *if_then_else = NULL;
+	size_t str_len = 0;
 
 	if (cur->at_end == if_then_else_handler)
 		if_then_else = (struct if_then_else *)cur->at_end_data;
@@ -867,18 +895,22 @@ static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 	if (if_then_else->else_atom_seen)
 		return strbuf_addf_ret(err, -1, _("format: %%(then) atom used after %%(else)"));
 	if_then_else->then_atom_seen = 1;
+	if (if_then_else->str)
+		str_len = strlen(if_then_else->str);
 	/*
 	 * If the 'equals' or 'notequals' attribute is used then
 	 * perform the required comparison. If not, only non-empty
 	 * strings satisfy the 'if' condition.
 	 */
 	if (if_then_else->cmp_status == COMPARE_EQUAL) {
-		if (!strcmp(if_then_else->str, cur->output.buf))
+		if (str_len == cur->output.len &&
+		    !memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
 	} else if (if_then_else->cmp_status == COMPARE_UNEQUAL) {
-		if (strcmp(if_then_else->str, cur->output.buf))
+		if (str_len != cur->output.len ||
+		    memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
-	} else if (cur->output.len && !is_empty(cur->output.buf))
+	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
 	strbuf_reset(&cur->output);
 	return 0;
@@ -924,7 +956,7 @@ static int end_atom_handler(struct atom_value *atomv, struct ref_formatting_stat
 	 * only on the topmost supporting atom.
 	 */
 	if (!current->prev->prev) {
-		quote_formatting(&s, current->output.buf, state->quote_style);
+		quote_formatting(&s, current->output.buf, current->output.len, state->quote_style);
 		strbuf_swap(&current->output, &s);
 	}
 	strbuf_release(&s);
@@ -974,6 +1006,10 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
+		    used_atom[at].u.raw_data.option == RAW_BARE)
+			die(_("--format=%.*s cannot be used with"
+			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
 		cp = ep + 1;
 
 		if (skip_prefix(used_atom[at].name, "color:", &color))
@@ -1367,12 +1403,25 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+		enum atom_type atom_type = atom->atom_type;
 
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
 			name++;
 
+		if (atom_type == ATOM_RAW) {
+			unsigned long buf_size = data->size;
+
+			if (atom->u.raw_data.option == RAW_BARE) {
+				v->s = xmemdupz(buf, buf_size);
+				v->s_size = buf_size;
+			} else if (atom->u.raw_data.option == RAW_LENGTH) {
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
+			}
+			continue;
+		}
+
 		if ((data->type != OBJ_TAG &&
 		     data->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
@@ -1460,9 +1509,11 @@ static void grab_values(struct atom_value *val, int deref, struct object *obj, s
 		break;
 	case OBJ_TREE:
 		/* grab_tree_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	case OBJ_BLOB:
 		/* grab_blob_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	default:
 		die("Eh?  Object of type %d?", obj->type);
@@ -1766,6 +1817,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		const char *refname;
 		struct branch *branch = NULL;
 
+		v->s_size = ATOM_VALUE_S_SIZE_INIT;
 		v->handler = append_atom;
 		v->atom = atom;
 
@@ -2369,6 +2421,19 @@ static int compare_detached_head(struct ref_array_item *a, struct ref_array_item
 	return 0;
 }
 
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)
+{
+	const char *s1 = vs1, *s2 = vs2;
+	const char *end = s1 + n;
+
+	for (; s1 < end; s1++, s2++) {
+		int diff = tolower(*s1) - tolower(*s2);
+		if (diff)
+			return diff;
+	}
+	return 0;
+}
+
 static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
 {
 	struct atom_value *va, *vb;
@@ -2389,10 +2454,30 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 	} else if (s->sort_flags & REF_SORTING_VERSION) {
 		cmp = versioncmp(va->s, vb->s);
 	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(vb->s) : vb->s_size;
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+
+			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
+				     a_size : b_size);
+			if (!cmp) {
+				if (a_size > b_size)
+					cmp = 1;
+				else if (a_size < b_size)
+					cmp = -1;
+			}
+		}
 	} else {
 		if (va->value < vb->value)
 			cmp = -1;
@@ -2492,6 +2577,7 @@ int format_ref_array_item(struct ref_array_item *info,
 	}
 	if (format->need_color_reset_at_eol) {
 		struct atom_value resetv;
+		resetv.s_size = ATOM_VALUE_S_SIZE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9e0214076b4..18554f62d94 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -130,6 +130,8 @@ test_atom head parent:short=10 ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head raw "$(git cat-file commit refs/heads/main)
+"
 test_atom head '*objectname' ''
 test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
@@ -221,6 +223,15 @@ test_atom tag contents 'Tagging at 1151968727
 '
 test_atom tag HEAD ' '
 
+test_expect_success 'basic atom: refs/tags/testtag *raw' '
+	git cat-file commit refs/tags/testtag^{} >expected &&
+	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -686,6 +697,15 @@ test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-empty raw' '
+	git cat-file tag refs/tags/signed-empty >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short subject:sanitize 'subject-line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -695,6 +715,15 @@ test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-short raw' '
+	git cat-file tag refs/tags/signed-short >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-short >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long subject:sanitize 'subject-line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -708,6 +737,15 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
+test_expect_success GPG 'basic atom: refs/tags/signed-long raw' '
+	git cat-file tag refs/tags/signed-long >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-long >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	echo >>expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/main^{tree} &&
 	git update-ref refs/myblobs/first refs/heads/main:one
@@ -720,6 +758,16 @@ test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
 
+test_expect_success 'basic atom: refs/mytrees/first raw' '
+	git cat-file tree refs/mytrees/first >expected &&
+	echo >>expected &&
+	git for-each-ref --format="%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/mytrees/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -727,6 +775,174 @@ test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
 
+test_expect_success 'basic atom: refs/myblobs/first raw' '
+	git cat-file blob refs/myblobs/first >expected &&
+	echo >>expected &&
+	git for-each-ref --format="%(raw)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/myblobs/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'set up refs pointing to binary blob' '
+	printf "a\0b\0c" >blob1 &&
+	printf "a\0c\0b" >blob2 &&
+	printf "\0a\0b\0c" >blob3 &&
+	printf "abc" >blob4 &&
+	printf "\0 \0 \0 " >blob5 &&
+	printf "\0 \0a\0 " >blob6 &&
+	printf "  " >blob7 &&
+	>blob8 &&
+	obj=$(git hash-object -w blob1) &&
+	git update-ref refs/myblobs/blob1 "$obj" &&
+	obj=$(git hash-object -w blob2) &&
+	git update-ref refs/myblobs/blob2 "$obj" &&
+	obj=$(git hash-object -w blob3) &&
+	git update-ref refs/myblobs/blob3 "$obj" &&
+	obj=$(git hash-object -w blob4) &&
+	git update-ref refs/myblobs/blob4 "$obj" &&
+	obj=$(git hash-object -w blob5) &&
+	git update-ref refs/myblobs/blob5 "$obj" &&
+	obj=$(git hash-object -w blob6) &&
+	git update-ref refs/myblobs/blob6 "$obj" &&
+	obj=$(git hash-object -w blob7) &&
+	git update-ref refs/myblobs/blob7 "$obj" &&
+	obj=$(git hash-object -w blob8) &&
+	git update-ref refs/myblobs/blob8 "$obj"
+'
+
+test_expect_success 'Verify sorts with raw' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob3
+	refs/myblobs/blob7
+	refs/mytrees/first
+	refs/myblobs/first
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob4
+	refs/heads/main
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'Verify sorts with raw:size' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/first
+	refs/myblobs/blob7
+	refs/heads/main
+	refs/myblobs/blob4
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/mytrees/first
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw:size \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:equals)' '
+	cat >expected <<-EOF &&
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	refs/myblobs/blob4
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	EOF
+	git for-each-ref --format="%(if:equals=abc)%(raw)%(then)%(refname)%(else)not equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:notequals)' '
+	cat >expected <<-EOF &&
+	refs/heads/ambiguous
+	refs/heads/main
+	refs/heads/newtag
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	equals
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob7
+	refs/myblobs/blob8
+	refs/myblobs/first
+	EOF
+	git for-each-ref --format="%(if:notequals=abc)%(raw)%(then)%(refname)%(else)equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'empty raw refs with %(if)' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob1 not empty
+	refs/myblobs/blob2 not empty
+	refs/myblobs/blob3 not empty
+	refs/myblobs/blob4 not empty
+	refs/myblobs/blob5 not empty
+	refs/myblobs/blob6 not empty
+	refs/myblobs/blob7 empty
+	refs/myblobs/blob8 empty
+	refs/myblobs/first not empty
+	EOF
+	git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \
+		refs/myblobs/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '%(raw) with --python must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --python
+'
+
+test_expect_success '%(raw) with --tcl must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --tcl
+'
+
+test_expect_success '%(raw) with --perl must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --perl
+'
+
+test_expect_success '%(raw) with --shell must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --shell
+'
+
+test_expect_success '%(raw) with --shell and --sort=raw must fail' '
+	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
+'
+
+test_expect_success '%(raw:size) with --shell' '
+	git for-each-ref --format="%(raw:size)" | while read line
+	do
+		echo "'\''$line'\''" >>expect
+	done &&
+	git for-each-ref --format="%(raw:size)" --shell >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'for-each-ref --format compare with cat-file --batch' '
+	git rev-parse refs/mytrees/first | git cat-file --batch >expected &&
+	git for-each-ref --format="%(objectname) %(objecttype) %(objectsize)
+%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000
 	do
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because the perl language can handle binary data correctly,
add the function perl_quote_buf_with_len(), which can specify
the length of the data and prevent the data from being truncated
at '\0' to help `--format="%(raw)"` re-support `--perl`.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |  4 ++--
 quote.c                            | 17 +++++++++++++++++
 quote.h                            |  1 +
 ref-filter.c                       | 15 +++++++++++----
 t/t6300-for-each-ref.sh            | 19 +++++++++++++++++--
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index cbb6f87d13f..6da899c6296 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -241,8 +241,8 @@ raw:size::
 	The raw data size of the object.
 
 Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
-`--perl` because such language may not support arbitrary binary data in their
-string variable type.
+because such language may not support arbitrary binary data in their string
+variable type.
 
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
diff --git a/quote.c b/quote.c
index 8a3a5e39eb1..26719d21d1e 100644
--- a/quote.c
+++ b/quote.c
@@ -471,6 +471,23 @@ void perl_quote_buf(struct strbuf *sb, const char *src)
 	strbuf_addch(sb, sq);
 }
 
+void perl_quote_buf_with_len(struct strbuf *sb, const char *src, size_t len)
+{
+	const char sq = '\'';
+	const char bq = '\\';
+	const char *c = src;
+	const char *end = src + len;
+
+	strbuf_addch(sb, sq);
+	while (c != end) {
+		if (*c == sq || *c == bq)
+			strbuf_addch(sb, bq);
+		strbuf_addch(sb, *c);
+		c++;
+	}
+	strbuf_addch(sb, sq);
+}
+
 void python_quote_buf(struct strbuf *sb, const char *src)
 {
 	const char sq = '\'';
diff --git a/quote.h b/quote.h
index 768cc6338e2..0fe69e264b0 100644
--- a/quote.h
+++ b/quote.h
@@ -94,6 +94,7 @@ char *quote_path(const char *in, const char *prefix, struct strbuf *out, unsigne
 
 /* quoting as a string literal for other languages */
 void perl_quote_buf(struct strbuf *sb, const char *src);
+void perl_quote_buf_with_len(struct strbuf *sb, const char *src, size_t len);
 void python_quote_buf(struct strbuf *sb, const char *src);
 void tcl_quote_buf(struct strbuf *sb, const char *src);
 void basic_regex_quote_buf(struct strbuf *sb, const char *src);
diff --git a/ref-filter.c b/ref-filter.c
index 506fbc3d691..ba9ab35d7ec 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -742,7 +742,10 @@ static void quote_formatting(struct strbuf *s, const char *str, size_t len, int
 		sq_quote_buf(s, str);
 		break;
 	case QUOTE_PERL:
-		perl_quote_buf(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			perl_quote_buf_with_len(s, str, len);
+		else
+			perl_quote_buf(s, str);
 		break;
 	case QUOTE_PYTHON:
 		python_quote_buf(s, str);
@@ -1006,10 +1009,14 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
-		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
-		    used_atom[at].u.raw_data.option == RAW_BARE)
+
+		if ((format->quote_style == QUOTE_PYTHON ||
+		     format->quote_style == QUOTE_SHELL ||
+		     format->quote_style == QUOTE_TCL) &&
+		     used_atom[at].atom_type == ATOM_RAW &&
+		     used_atom[at].u.raw_data.option == RAW_BARE)
 			die(_("--format=%.*s cannot be used with"
-			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
+			      "--python, --shell, --tcl"), (int)(ep - sp - 2), sp + 2);
 		cp = ep + 1;
 
 		if (skip_prefix(used_atom[at].name, "color:", &color))
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 18554f62d94..3d15d0a5360 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -915,8 +915,23 @@ test_expect_success '%(raw) with --tcl must fail' '
 	test_must_fail git for-each-ref --format="%(raw)" --tcl
 '
 
-test_expect_success '%(raw) with --perl must fail' '
-	test_must_fail git for-each-ref --format="%(raw)" --perl
+test_expect_success '%(raw) with --perl' '
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/blob1 --perl | perl >actual &&
+	cmp blob1 actual &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/blob3 --perl | perl >actual &&
+	cmp blob3 actual &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/blob8 --perl | perl >actual &&
+	cmp blob8 actual &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/myblobs/first --perl | perl >actual &&
+	cmp one actual &&
+	git cat-file tree refs/mytrees/first > expected &&
+	git for-each-ref --format="\$name= %(raw);
+print \"\$name\"" refs/mytrees/first --perl | perl >actual &&
+	cmp expected actual
 '
 
 test_expect_success '%(raw) with --shell must fail' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser()
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Use non-const ref_format in *_atom_parser(), which can help us
modify the members of ref_format in *_atom_parser().

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/tag.c |  2 +-
 ref-filter.c  | 44 ++++++++++++++++++++++----------------------
 ref-filter.h  |  4 ++--
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 82fcfc09824..452558ec957 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -146,7 +146,7 @@ static int verify_tag(const char *name, const char *ref,
 		      const struct object_id *oid, void *cb_data)
 {
 	int flags;
-	const struct ref_format *format = cb_data;
+	struct ref_format *format = cb_data;
 	flags = GPG_VERIFY_VERBOSE;
 
 	if (format->format)
diff --git a/ref-filter.c b/ref-filter.c
index ba9ab35d7ec..c8e561a3687 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -226,7 +226,7 @@ static int strbuf_addf_ret(struct strbuf *sb, int ret, const char *fmt, ...)
 	return ret;
 }
 
-static int color_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int color_atom_parser(struct ref_format *format, struct used_atom *atom,
 			     const char *color_value, struct strbuf *err)
 {
 	if (!color_value)
@@ -264,7 +264,7 @@ static int refname_atom_parser_internal(struct refname_atom *atom, const char *a
 	return 0;
 }
 
-static int remote_ref_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int remote_ref_atom_parser(struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
 	struct string_list params = STRING_LIST_INIT_DUP;
@@ -311,7 +311,7 @@ static int remote_ref_atom_parser(const struct ref_format *format, struct used_a
 	return 0;
 }
 
-static int objecttype_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int objecttype_atom_parser(struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
 	if (arg)
@@ -323,7 +323,7 @@ static int objecttype_atom_parser(const struct ref_format *format, struct used_a
 	return 0;
 }
 
-static int objectsize_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int objectsize_atom_parser(struct ref_format *format, struct used_atom *atom,
 				  const char *arg, struct strbuf *err)
 {
 	if (!arg) {
@@ -343,7 +343,7 @@ static int objectsize_atom_parser(const struct ref_format *format, struct used_a
 	return 0;
 }
 
-static int deltabase_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int deltabase_atom_parser(struct ref_format *format, struct used_atom *atom,
 				 const char *arg, struct strbuf *err)
 {
 	if (arg)
@@ -355,7 +355,7 @@ static int deltabase_atom_parser(const struct ref_format *format, struct used_at
 	return 0;
 }
 
-static int body_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int body_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *err)
 {
 	if (arg)
@@ -364,7 +364,7 @@ static int body_atom_parser(const struct ref_format *format, struct used_atom *a
 	return 0;
 }
 
-static int subject_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int subject_atom_parser(struct ref_format *format, struct used_atom *atom,
 			       const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -376,7 +376,7 @@ static int subject_atom_parser(const struct ref_format *format, struct used_atom
 	return 0;
 }
 
-static int trailers_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int trailers_atom_parser(struct ref_format *format, struct used_atom *atom,
 				const char *arg, struct strbuf *err)
 {
 	atom->u.contents.trailer_opts.no_divider = 1;
@@ -402,7 +402,7 @@ static int trailers_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
-static int contents_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int contents_atom_parser(struct ref_format *format, struct used_atom *atom,
 				const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -430,7 +430,7 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
-static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int raw_atom_parser(struct ref_format *format, struct used_atom *atom,
 				const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -442,7 +442,7 @@ static int raw_atom_parser(const struct ref_format *format, struct used_atom *at
 	return 0;
 }
 
-static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int oid_atom_parser(struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -461,7 +461,7 @@ static int oid_atom_parser(const struct ref_format *format, struct used_atom *at
 	return 0;
 }
 
-static int person_email_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int person_email_atom_parser(struct ref_format *format, struct used_atom *atom,
 				    const char *arg, struct strbuf *err)
 {
 	if (!arg)
@@ -475,7 +475,7 @@ static int person_email_atom_parser(const struct ref_format *format, struct used
 	return 0;
 }
 
-static int refname_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int refname_atom_parser(struct ref_format *format, struct used_atom *atom,
 			       const char *arg, struct strbuf *err)
 {
 	return refname_atom_parser_internal(&atom->u.refname, arg, atom->name, err);
@@ -492,7 +492,7 @@ static align_type parse_align_position(const char *s)
 	return -1;
 }
 
-static int align_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int align_atom_parser(struct ref_format *format, struct used_atom *atom,
 			     const char *arg, struct strbuf *err)
 {
 	struct align *align = &atom->u.align;
@@ -544,7 +544,7 @@ static int align_atom_parser(const struct ref_format *format, struct used_atom *
 	return 0;
 }
 
-static int if_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int if_atom_parser(struct ref_format *format, struct used_atom *atom,
 			  const char *arg, struct strbuf *err)
 {
 	if (!arg) {
@@ -559,7 +559,7 @@ static int if_atom_parser(const struct ref_format *format, struct used_atom *ato
 	return 0;
 }
 
-static int head_atom_parser(const struct ref_format *format, struct used_atom *atom,
+static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *unused_err)
 {
 	atom->u.head = resolve_refdup("HEAD", RESOLVE_REF_READING, NULL, NULL);
@@ -570,7 +570,7 @@ static struct {
 	const char *name;
 	info_source source;
 	cmp_type cmp_type;
-	int (*parser)(const struct ref_format *format, struct used_atom *atom,
+	int (*parser)(struct ref_format *format, struct used_atom *atom,
 		      const char *arg, struct strbuf *err);
 } valid_atom[] = {
 	[ATOM_REFNAME] = { "refname", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -649,7 +649,7 @@ struct atom_value {
 /*
  * Used to parse format string and sort specifiers
  */
-static int parse_ref_filter_atom(const struct ref_format *format,
+static int parse_ref_filter_atom(struct ref_format *format,
 				 const char *atom, const char *ep,
 				 struct strbuf *err)
 {
@@ -2554,9 +2554,9 @@ static void append_literal(const char *cp, const char *ep, struct ref_formatting
 }
 
 int format_ref_array_item(struct ref_array_item *info,
-			   const struct ref_format *format,
-			   struct strbuf *final_buf,
-			   struct strbuf *error_buf)
+			  struct ref_format *format,
+			  struct strbuf *final_buf,
+			  struct strbuf *error_buf)
 {
 	const char *cp, *sp, *ep;
 	struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
@@ -2601,7 +2601,7 @@ int format_ref_array_item(struct ref_array_item *info,
 }
 
 void pretty_print_ref(const char *name, const struct object_id *oid,
-		      const struct ref_format *format)
+		      struct ref_format *format)
 {
 	struct ref_array_item *ref_item;
 	struct strbuf output = STRBUF_INIT;
diff --git a/ref-filter.h b/ref-filter.h
index baf72a71896..74fb423fc89 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -116,7 +116,7 @@ void ref_array_sort(struct ref_sorting *sort, struct ref_array *array);
 void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting, unsigned int mask, int on);
 /*  Based on the given format and quote_style, fill the strbuf */
 int format_ref_array_item(struct ref_array_item *info,
-			  const struct ref_format *format,
+			  struct ref_format *format,
 			  struct strbuf *final_buf,
 			  struct strbuf *error_buf);
 /*  Parse a single sort specifier and add it to the list */
@@ -137,7 +137,7 @@ void setup_ref_filter_porcelain_msg(void);
  * name must be a fully qualified refname.
  */
 void pretty_print_ref(const char *name, const struct object_id *oid,
-		      const struct ref_format *format);
+		      struct ref_format *format);
 
 /*
  * Push a single ref onto the array; this can be used to construct your own
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to let "cat-file --batch=%(rest)" use the ref-filter
interface, add %(rest) atom for ref-filter. Introduce the
reject_atom() to reject the atom %(rest) for "git for-each-ref",
"git branch", "git tag" and "git verify-tag".

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c             | 25 +++++++++++++++++++++++++
 ref-filter.h             |  5 ++++-
 t/t3203-branch-output.sh |  4 ++++
 t/t6300-for-each-ref.sh  |  4 ++++
 t/t7004-tag.sh           |  4 ++++
 t/t7030-verify-tag.sh    |  4 ++++
 6 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index c8e561a3687..af8216dcd5b 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -157,6 +157,7 @@ enum atom_type {
 	ATOM_IF,
 	ATOM_THEN,
 	ATOM_ELSE,
+	ATOM_REST,
 };
 
 /*
@@ -559,6 +560,15 @@ static int if_atom_parser(struct ref_format *format, struct used_atom *atom,
 	return 0;
 }
 
+static int rest_atom_parser(struct ref_format *format, struct used_atom *atom,
+			    const char *arg, struct strbuf *err)
+{
+	if (arg)
+		return strbuf_addf_ret(err, -1, _("%%(rest) does not take arguments"));
+	format->use_rest = 1;
+	return 0;
+}
+
 static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *unused_err)
 {
@@ -615,6 +625,7 @@ static struct {
 	[ATOM_IF] = { "if", SOURCE_NONE, FIELD_STR, if_atom_parser },
 	[ATOM_THEN] = { "then", SOURCE_NONE },
 	[ATOM_ELSE] = { "else", SOURCE_NONE },
+	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
 	/*
 	 * Please update $__git_ref_fieldlist in git-completion.bash
 	 * when you add new atoms
@@ -989,6 +1000,11 @@ static const char *find_next(const char *cp)
 	return NULL;
 }
 
+static int reject_atom(enum atom_type atom_type)
+{
+	return atom_type == ATOM_REST;
+}
+
 /*
  * Make sure the format string is well formed, and parse out
  * the used atoms.
@@ -1009,6 +1025,8 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (reject_atom(used_atom[at].atom_type))
+			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
 
 		if ((format->quote_style == QUOTE_PYTHON ||
 		     format->quote_style == QUOTE_SHELL ||
@@ -1928,6 +1946,12 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			v->handler = else_atom_handler;
 			v->s = xstrdup("");
 			continue;
+		} else if (atom_type == ATOM_REST) {
+			if (ref->rest)
+				v->s = xstrdup(ref->rest);
+			else
+				v->s = xstrdup("");
+			continue;
 		} else
 			continue;
 
@@ -2145,6 +2169,7 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
 
 	FLEX_ALLOC_STR(ref, refname, refname);
 	oidcpy(&ref->objectname, oid);
+	ref->rest = NULL;
 
 	return ref;
 }
diff --git a/ref-filter.h b/ref-filter.h
index 74fb423fc89..c15dee8d6b9 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -38,6 +38,7 @@ struct ref_sorting {
 
 struct ref_array_item {
 	struct object_id objectname;
+	const char *rest;
 	int flag;
 	unsigned int kind;
 	const char *symref;
@@ -76,14 +77,16 @@ struct ref_format {
 	 * verify_ref_format() afterwards to finalize.
 	 */
 	const char *format;
+	const char *rest;
 	int quote_style;
+	int use_rest;
 	int use_color;
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
 };
 
-#define REF_FORMAT_INIT { NULL, 0, -1 }
+#define REF_FORMAT_INIT { .use_color = -1 }
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 5325b9f67a0..6e94c6db7b5 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -340,6 +340,10 @@ test_expect_success 'git branch --format option' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git branch with --format=%(rest) must fail' '
+	test_must_fail git branch --format="%(rest)" >actual
+'
+
 test_expect_success 'worktree colors correct' '
 	cat >expect <<-EOF &&
 	* <GREEN>(HEAD detached from fromtag)<RESET>
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 3d15d0a5360..0d2e062f791 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1211,6 +1211,10 @@ test_expect_success 'basic atom: head contents:trailers' '
 	test_cmp expect actual.clean
 '
 
+test_expect_success 'basic atom: rest must fail' '
+	test_must_fail git for-each-ref --format="%(rest)" refs/heads/main
+'
+
 test_expect_success 'trailer parsing not fooled by --- line' '
 	git commit --allow-empty -F - <<-\EOF &&
 	this is the subject
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 2f72c5c6883..082be85dffc 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1998,6 +1998,10 @@ test_expect_success '--format should list tags as per format given' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git tag -l with --format="%(rest)" must fail' '
+	test_must_fail git tag -l --format="%(rest)" "v1*"
+'
+
 test_expect_success "set up color tests" '
 	echo "<RED>v1.0<RESET>" >expect.color &&
 	echo "v1.0" >expect.bare &&
diff --git a/t/t7030-verify-tag.sh b/t/t7030-verify-tag.sh
index 3cefde9602b..10faa645157 100755
--- a/t/t7030-verify-tag.sh
+++ b/t/t7030-verify-tag.sh
@@ -194,6 +194,10 @@ test_expect_success GPG 'verifying tag with --format' '
 	test_cmp expect actual
 '
 
+test_expect_success GPG 'verifying tag with --format="%(rest)" must fail' '
+	test_must_fail git verify-tag --format="%(rest)" "fourth-signed"
+'
+
 test_expect_success GPG 'verifying a forged tag with --format should fail silently' '
 	test_must_fail git verify-tag --format="tagname : %(tag)" $(cat forged1.tag) >actual-forged &&
 	test_must_be_empty actual-forged
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because in the refactor of `git cat-file --batch` later,
oid_object_info_extended() in get_object() will be used to obtain
the info of an object with it's oid. When the object cannot be
obtained in the git repository, `cat-file --batch` expects to output
"<oid> missing" and continue the next oid query instead of letting
Git exit. In other error conditions, Git should exit normally. So we
can achieve this function by passing the return value of get_object().

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index af8216dcd5b..cfcea0e507e 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1821,6 +1821,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 {
 	struct object *obj;
 	int i;
+	int ret;
 	struct object_info empty = OBJECT_INFO_INIT;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
@@ -1977,8 +1978,9 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 
 
 	oi.oid = ref->objectname;
-	if (get_object(ref, 0, &obj, &oi, err))
-		return -1;
+	ret = get_object(ref, 0, &obj, &oi, err);
+	if (ret)
+		return ret;
 
 	/*
 	 * If there is no atom that wants to know about tagged
@@ -2010,8 +2012,10 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
 			      struct atom_value **v, struct strbuf *err)
 {
 	if (!ref->value) {
-		if (populate_value(ref, err))
-			return -1;
+		int ret = populate_value(ref, err);
+
+		if (ret)
+			return ret;
 		fill_missing_values(ref->value);
 	}
 	*v = &ref->value[atom];
@@ -2585,6 +2589,7 @@ int format_ref_array_item(struct ref_array_item *info,
 {
 	const char *cp, *sp, *ep;
 	struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
+	int ret;
 
 	state.quote_style = format->quote_style;
 	push_stack_element(&state.stack);
@@ -2597,10 +2602,10 @@ int format_ref_array_item(struct ref_array_item *info,
 		if (cp < sp)
 			append_literal(cp, sp, &state);
 		pos = parse_ref_filter_atom(format, sp + 2, ep, error_buf);
-		if (pos < 0 || get_ref_atom_value(info, pos, &atomv, error_buf) ||
+		if (pos < 0 || (ret = get_ref_atom_value(info, pos, &atomv, error_buf)) ||
 		    atomv->handler(atomv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
-			return -1;
+			return ret ? ret : -1;
 		}
 	}
 	if (*cp) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

When we use ref_array_item which is not dynamically allocated and
want to free the space of its member "value" after the end of use,
free_array_item() does not meet our needs, because it tries to free
ref_array_item itself and its member "symref".

Introduce free_ref_array_item_value() for freeing ref_array_item value.
It will be called internally by free_array_item(), and it will help
`cat-file --batch` free ref_array_item's value memory later.

Helped-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 11 ++++++++---
 ref-filter.h |  2 ++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index cfcea0e507e..d70b295672f 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2296,16 +2296,21 @@ static int ref_filter_handler(const char *refname, const struct object_id *oid,
 	return 0;
 }
 
-/*  Free memory allocated for a ref_array_item */
-static void free_array_item(struct ref_array_item *item)
+void free_ref_array_item_value(struct ref_array_item *item)
 {
-	free((char *)item->symref);
 	if (item->value) {
 		int i;
 		for (i = 0; i < used_atom_cnt; i++)
 			free((char *)item->value[i].s);
 		free(item->value);
 	}
+}
+
+/*  Free memory allocated for a ref_array_item */
+static void free_array_item(struct ref_array_item *item)
+{
+	free((char *)item->symref);
+	free_ref_array_item_value(item);
 	free(item);
 }
 
diff --git a/ref-filter.h b/ref-filter.h
index c15dee8d6b9..44e6dc05ac2 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -111,6 +111,8 @@ struct ref_format {
 int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type);
 /*  Clear all memory allocated to ref_array */
 void ref_array_clear(struct ref_array *array);
+/* Free ref_array_item's value */
+void free_ref_array_item_value(struct ref_array_item *item);
 /*  Used to verify if the given format is correct and to parse out the used atoms */
 int verify_ref_format(struct ref_format *format);
 /*  Sort the given ref_array as per the ref_sorting provided */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add `cat_file_mode` member to struct `ref_format`. When
`cat-file --batch` use ref-filter logic later, it can help us
reject atoms in verify_ref_format() which cat-file cannot use,
e.g. `%(refname)`, `%(push)`, `%(upstream)"...

Add batch_test_atom() to t/t1006-cat-file.sh and add check
for cat-file --batch, this can help us clearly show which
atoms cat-file accepts and which atoms it rejects.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c        |  23 ++++-
 ref-filter.h        |   1 +
 t/t1006-cat-file.sh | 226 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 246 insertions(+), 4 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index d70b295672f..27199ba40f5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1000,9 +1000,24 @@ static const char *find_next(const char *cp)
 	return NULL;
 }
 
-static int reject_atom(enum atom_type atom_type)
-{
-	return atom_type == ATOM_REST;
+static int reject_atom(int cat_file_mode, enum atom_type atom_type)
+{
+	if (!cat_file_mode)
+		return atom_type == ATOM_REST;
+
+	/* cat_file_mode */
+	switch (atom_type) {
+	case ATOM_FLAG:
+	case ATOM_HEAD:
+	case ATOM_PUSH:
+	case ATOM_REFNAME:
+	case ATOM_SYMREF:
+	case ATOM_UPSTREAM:
+	case ATOM_WORKTREEPATH:
+		return 1;
+	default:
+		return 0;
+	}
 }
 
 /*
@@ -1025,7 +1040,7 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
-		if (reject_atom(used_atom[at].atom_type))
+		if (reject_atom(format->cat_file_mode, used_atom[at].atom_type))
 			die(_("this command reject atom %%(%.*s)"), (int)(ep - sp - 2), sp + 2);
 
 		if ((format->quote_style == QUOTE_PYTHON ||
diff --git a/ref-filter.h b/ref-filter.h
index 44e6dc05ac2..053980a6a42 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -78,6 +78,7 @@ struct ref_format {
 	 */
 	const char *format;
 	const char *rest;
+	int cat_file_mode;
 	int quote_style;
 	int use_rest;
 	int use_color;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..95d760652eb 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -608,4 +608,230 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
 	cmp expect actual
 '
 
+. "$TEST_DIRECTORY"/lib-gpg.sh
+. "$TEST_DIRECTORY"/lib-terminal.sh
+
+test_expect_success 'cat-file --batch|--batch-check setup' '
+	echo 1>blob1 &&
+	printf "a\0b\0\c" >blob2 &&
+	git add blob1 blob2 &&
+	git commit -m "Commit Message" &&
+	git branch -M main &&
+	git tag -a -m "v0.0.0" testtag &&
+	git update-ref refs/myblobs/blob1 HEAD:blob1 &&
+	git update-ref refs/myblobs/blob2 HEAD:blob2 &&
+	git update-ref refs/mytrees/tree1 HEAD^{tree}
+'
+
+batch_test_atom() {
+	if test "$3" = "fail"
+	then
+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2 must fail" "
+			test_must_fail git cat-file --batch-check='$2' >bad <<-EOF
+			$1
+			EOF
+		"
+	else
+		test_expect_${4:-success} $PREREQ "basic atom: $1 $2" "
+			git for-each-ref --format='$2' $1 >expected &&
+			git cat-file --batch-check='$2' >actual <<-EOF &&
+			$1
+			EOF
+			sanitize_pgp <actual >actual.clean &&
+			cmp expected actual.clean
+		"
+	fi
+}
+
+batch_test_atom refs/heads/main '%(refname)' fail
+batch_test_atom refs/heads/main '%(refname:)' fail
+batch_test_atom refs/heads/main '%(refname:short)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=1)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=2)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=-1)' fail
+batch_test_atom refs/heads/main '%(refname:lstrip=-2)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=1)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=2)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=-1)' fail
+batch_test_atom refs/heads/main '%(refname:rstrip=-2)' fail
+batch_test_atom refs/heads/main '%(refname:strip=1)' fail
+batch_test_atom refs/heads/main '%(refname:strip=2)' fail
+batch_test_atom refs/heads/main '%(refname:strip=-1)' fail
+batch_test_atom refs/heads/main '%(refname:strip=-2)' fail
+batch_test_atom refs/heads/main '%(upstream)' fail
+batch_test_atom refs/heads/main '%(upstream:short)' fail
+batch_test_atom refs/heads/main '%(upstream:lstrip=2)' fail
+batch_test_atom refs/heads/main '%(upstream:lstrip=-2)' fail
+batch_test_atom refs/heads/main '%(upstream:rstrip=2)' fail
+batch_test_atom refs/heads/main '%(upstream:rstrip=-2)' fail
+batch_test_atom refs/heads/main '%(upstream:strip=2)' fail
+batch_test_atom refs/heads/main '%(upstream:strip=-2)' fail
+batch_test_atom refs/heads/main '%(push)' fail
+batch_test_atom refs/heads/main '%(push:short)' fail
+batch_test_atom refs/heads/main '%(push:lstrip=1)' fail
+batch_test_atom refs/heads/main '%(push:lstrip=-1)' fail
+batch_test_atom refs/heads/main '%(push:rstrip=1)' fail
+batch_test_atom refs/heads/main '%(push:rstrip=-1)' fail
+batch_test_atom refs/heads/main '%(push:strip=1)' fail
+batch_test_atom refs/heads/main '%(push:strip=-1)' fail
+batch_test_atom refs/heads/main '%(objecttype)'
+batch_test_atom refs/heads/main '%(objectsize)'
+batch_test_atom refs/heads/main '%(objectsize:disk)'
+batch_test_atom refs/heads/main '%(deltabase)'
+batch_test_atom refs/heads/main '%(objectname)'
+batch_test_atom refs/heads/main '%(objectname:short)' fail
+batch_test_atom refs/heads/main '%(objectname:short=1)' fail
+batch_test_atom refs/heads/main '%(objectname:short=10)' fail
+batch_test_atom refs/heads/main '%(tree)' fail
+batch_test_atom refs/heads/main '%(tree:short)' fail
+batch_test_atom refs/heads/main '%(tree:short=1)' fail
+batch_test_atom refs/heads/main '%(tree:short=10)' fail
+batch_test_atom refs/heads/main '%(parent)' fail
+batch_test_atom refs/heads/main '%(parent:short)' fail
+batch_test_atom refs/heads/main '%(parent:short=1)' fail
+batch_test_atom refs/heads/main '%(parent:short=10)' fail
+batch_test_atom refs/heads/main '%(numparent)' fail
+batch_test_atom refs/heads/main '%(object)' fail
+batch_test_atom refs/heads/main '%(type)' fail
+batch_test_atom refs/heads/main '%(raw)' fail
+batch_test_atom refs/heads/main '%(*objectname)' fail
+batch_test_atom refs/heads/main '%(*objecttype)' fail
+batch_test_atom refs/heads/main '%(author)' fail
+batch_test_atom refs/heads/main '%(authorname)' fail
+batch_test_atom refs/heads/main '%(authoremail)' fail
+batch_test_atom refs/heads/main '%(authoremail:trim)' fail
+batch_test_atom refs/heads/main '%(authoremail:localpart)' fail
+batch_test_atom refs/heads/main '%(authordate)' fail
+batch_test_atom refs/heads/main '%(committer)' fail
+batch_test_atom refs/heads/main '%(committername)' fail
+batch_test_atom refs/heads/main '%(committeremail)' fail
+batch_test_atom refs/heads/main '%(committeremail:trim)' fail
+batch_test_atom refs/heads/main '%(committeremail:localpart)' fail
+batch_test_atom refs/heads/main '%(committerdate)' fail
+batch_test_atom refs/heads/main '%(tag)' fail
+batch_test_atom refs/heads/main '%(tagger)' fail
+batch_test_atom refs/heads/main '%(taggername)' fail
+batch_test_atom refs/heads/main '%(taggeremail)' fail
+batch_test_atom refs/heads/main '%(taggeremail:trim)' fail
+batch_test_atom refs/heads/main '%(taggeremail:localpart)' fail
+batch_test_atom refs/heads/main '%(taggerdate)' fail
+batch_test_atom refs/heads/main '%(creator)' fail
+batch_test_atom refs/heads/main '%(creatordate)' fail
+batch_test_atom refs/heads/main '%(subject)' fail
+batch_test_atom refs/heads/main '%(subject:sanitize)' fail
+batch_test_atom refs/heads/main '%(contents:subject)' fail
+batch_test_atom refs/heads/main '%(body)' fail
+batch_test_atom refs/heads/main '%(contents:body)' fail
+batch_test_atom refs/heads/main '%(contents:signature)' fail
+batch_test_atom refs/heads/main '%(contents)' fail
+batch_test_atom refs/heads/main '%(HEAD)' fail
+batch_test_atom refs/heads/main '%(upstream:track)' fail
+batch_test_atom refs/heads/main '%(upstream:trackshort)' fail
+batch_test_atom refs/heads/main '%(upstream:track,nobracket)' fail
+batch_test_atom refs/heads/main '%(upstream:nobracket,track)' fail
+batch_test_atom refs/heads/main '%(push:track)' fail
+batch_test_atom refs/heads/main '%(push:trackshort)' fail
+batch_test_atom refs/heads/main '%(worktreepath)' fail
+batch_test_atom refs/heads/main '%(symref)' fail
+batch_test_atom refs/heads/main '%(flag)' fail
+
+batch_test_atom refs/tags/testtag '%(refname)' fail
+batch_test_atom refs/tags/testtag '%(refname:short)' fail
+batch_test_atom refs/tags/testtag '%(upstream)' fail
+batch_test_atom refs/tags/testtag '%(push)' fail
+batch_test_atom refs/tags/testtag '%(objecttype)'
+batch_test_atom refs/tags/testtag '%(objectsize)'
+batch_test_atom refs/tags/testtag '%(objectsize:disk)'
+batch_test_atom refs/tags/testtag '%(*objectsize:disk)' fail
+batch_test_atom refs/tags/testtag '%(deltabase)'
+batch_test_atom refs/tags/testtag '%(*deltabase)' fail
+batch_test_atom refs/tags/testtag '%(objectname)'
+batch_test_atom refs/tags/testtag '%(objectname:short)' fail
+batch_test_atom refs/tags/testtag '%(tree)' fail
+batch_test_atom refs/tags/testtag '%(tree:short)' fail
+batch_test_atom refs/tags/testtag '%(tree:short=1)' fail
+batch_test_atom refs/tags/testtag '%(tree:short=10)' fail
+batch_test_atom refs/tags/testtag '%(parent)' fail
+batch_test_atom refs/tags/testtag '%(parent:short)' fail
+batch_test_atom refs/tags/testtag '%(parent:short=1)' fail
+batch_test_atom refs/tags/testtag '%(parent:short=10)' fail
+batch_test_atom refs/tags/testtag '%(numparent)' fail
+batch_test_atom refs/tags/testtag '%(object)' fail
+batch_test_atom refs/tags/testtag '%(type)' fail
+batch_test_atom refs/tags/testtag '%(*objectname)' fail
+batch_test_atom refs/tags/testtag '%(*objecttype)' fail
+batch_test_atom refs/tags/testtag '%(author)' fail
+batch_test_atom refs/tags/testtag '%(authorname)' fail
+batch_test_atom refs/tags/testtag '%(authoremail)' fail
+batch_test_atom refs/tags/testtag '%(authoremail:trim)' fail
+batch_test_atom refs/tags/testtag '%(authoremail:localpart)' fail
+batch_test_atom refs/tags/testtag '%(authordate)' fail
+batch_test_atom refs/tags/testtag '%(committer)' fail
+batch_test_atom refs/tags/testtag '%(committername)' fail
+batch_test_atom refs/tags/testtag '%(committeremail)' fail
+batch_test_atom refs/tags/testtag '%(committeremail:trim)' fail
+batch_test_atom refs/tags/testtag '%(committeremail:localpart)' fail
+batch_test_atom refs/tags/testtag '%(committerdate)' fail
+batch_test_atom refs/tags/testtag '%(tag)' fail
+batch_test_atom refs/tags/testtag '%(tagger)' fail
+batch_test_atom refs/tags/testtag '%(taggername)' fail
+batch_test_atom refs/tags/testtag '%(taggeremail)' fail
+batch_test_atom refs/tags/testtag '%(taggeremail:trim)' fail
+batch_test_atom refs/tags/testtag '%(taggeremail:localpart)' fail
+batch_test_atom refs/tags/testtag '%(taggerdate)' fail
+batch_test_atom refs/tags/testtag '%(creator)' fail
+batch_test_atom refs/tags/testtag '%(creatordate)' fail
+batch_test_atom refs/tags/testtag '%(subject)' fail
+batch_test_atom refs/tags/testtag '%(subject:sanitize)' fail
+batch_test_atom refs/tags/testtag '%(contents:subject)' fail
+batch_test_atom refs/tags/testtag '%(body)' fail
+batch_test_atom refs/tags/testtag '%(contents:body)' fail
+batch_test_atom refs/tags/testtag '%(contents:signature)' fail
+batch_test_atom refs/tags/testtag '%(contents)' fail
+batch_test_atom refs/tags/testtag '%(HEAD)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(refname)' fail
+batch_test_atom refs/myblobs/blob1 '%(upstream)' fail
+batch_test_atom refs/myblobs/blob1 '%(push)' fail
+batch_test_atom refs/myblobs/blob1 '%(HEAD)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(objectname)'
+batch_test_atom refs/myblobs/blob1 '%(objecttype)'
+batch_test_atom refs/myblobs/blob1 '%(objectsize)'
+batch_test_atom refs/myblobs/blob1 '%(objectsize:disk)'
+batch_test_atom refs/myblobs/blob1 '%(deltabase)'
+
+batch_test_atom refs/myblobs/blob1 '%(contents)' fail
+batch_test_atom refs/myblobs/blob2 '%(contents)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(raw)' fail
+batch_test_atom refs/myblobs/blob2 '%(raw)' fail
+batch_test_atom refs/mytrees/tree1 '%(raw)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(raw:size)' fail
+batch_test_atom refs/myblobs/blob2 '%(raw:size)' fail
+batch_test_atom refs/mytrees/tree1 '%(raw:size)' fail
+
+batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
+batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
+batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)' fail
+
+batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)' fail
+batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)' fail
+batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)' fail
+batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)' fail
+
+batch_test_atom refs/heads/main 'VALID'
+batch_test_atom refs/heads/main '%(INVALID)' fail
+batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
+
+batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
+%(raw)' fail
+batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
+%(raw)' fail
+batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
+%(raw)' fail
+batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
+%(raw)' fail
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Let get_object() return 1 and print "<oid> missing" instead
of returning -1 and printing "missing object <oid> for <refname>"
if oid_object_info_extended() unable to find the data corresponding
to oid. When `cat-file --batch` use ref-filter logic later it can
help `format_ref_array_item()` just report that the object is missing
without letting Git exit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c                   | 4 ++--
 t/t6301-for-each-ref-errors.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 27199ba40f5..b4f41fec871 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1762,8 +1762,8 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 	}
 	if (oid_object_info_extended(the_repository, &oi->oid, &oi->info,
 				     OBJECT_INFO_LOOKUP_REPLACE))
-		return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
-				       oid_to_hex(&oi->oid), ref->refname);
+		return strbuf_addf_ret(err, 1, _("%s missing"),
+				       oid_to_hex(&oi->oid));
 	if (oi->info.disk_sizep && oi->disk_size < 0)
 		BUG("Object size is less than zero.");
 
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index 40edf9dab53..3553f84a00c 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -41,7 +41,7 @@ test_expect_success 'Missing objects are reported correctly' '
 	r=refs/heads/missing &&
 	echo $MISSING >.git/$r &&
 	test_when_finished "rm -f .git/$r" &&
-	echo "fatal: missing object $MISSING for $r" >missing-err &&
+	echo "fatal: $MISSING missing" >missing-err &&
 	test_must_fail git for-each-ref 2>err &&
 	test_cmp missing-err err &&
 	(
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Use `has_object_file()` in `batch_one_object()` to check
whether the input object exists. This can help us reject
the missing oid when we let `cat-file --batch` use ref-filter
logic later.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 243fe6844bc..59a86412fd0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -428,6 +428,13 @@ static void batch_one_object(const char *obj_name,
 		return;
 	}
 
+	if (!has_object_file(&data->oid)) {
+		printf("%s missing\n",
+		       obj_name ? obj_name : oid_to_hex(&data->oid));
+		fflush(stdout);
+		return;
+	}
+
 	batch_object_write(obj_name, scratch, opt, data);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (9 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because later cat-file reuses ref-filter logic that will add
parameter "const struct option *options" to batch_objects(),
the two synonymous parameters of "opt" and "options" may
confuse readers, so change batch_options parameter of
batch_objects() from "opt" to "batch".

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 59a86412fd0..41d407638d5 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -495,7 +495,7 @@ static int batch_unordered_packed(const struct object_id *oid,
 	return batch_unordered_object(oid, data);
 }
 
-static int batch_objects(struct batch_options *opt)
+static int batch_objects(struct batch_options *batch)
 {
 	struct strbuf input = STRBUF_INIT;
 	struct strbuf output = STRBUF_INIT;
@@ -503,8 +503,8 @@ static int batch_objects(struct batch_options *opt)
 	int save_warning;
 	int retval = 0;
 
-	if (!opt->format)
-		opt->format = "%(objectname) %(objecttype) %(objectsize)";
+	if (!batch->format)
+		batch->format = "%(objectname) %(objecttype) %(objectsize)";
 
 	/*
 	 * Expand once with our special mark_query flag, which will prime the
@@ -513,20 +513,20 @@ static int batch_objects(struct batch_options *opt)
 	 */
 	memset(&data, 0, sizeof(data));
 	data.mark_query = 1;
-	strbuf_expand(&output, opt->format, expand_format, &data);
+	strbuf_expand(&output, batch->format, expand_format, &data);
 	data.mark_query = 0;
 	strbuf_release(&output);
-	if (opt->cmdmode)
+	if (batch->cmdmode)
 		data.split_on_whitespace = 1;
 
 	/*
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
 	 */
-	if (opt->print_contents)
+	if (batch->print_contents)
 		data.info.typep = &data.type;
 
-	if (opt->all_objects) {
+	if (batch->all_objects) {
 		struct object_cb_data cb;
 		struct object_info empty = OBJECT_INFO_INIT;
 
@@ -536,11 +536,11 @@ static int batch_objects(struct batch_options *opt)
 		if (has_promisor_remote())
 			warning("This repository uses promisor remotes. Some objects may not be loaded.");
 
-		cb.opt = opt;
+		cb.opt = batch;
 		cb.expand = &data;
 		cb.scratch = &output;
 
-		if (opt->unordered) {
+		if (batch->unordered) {
 			struct oidset seen = OIDSET_INIT;
 
 			cb.seen = &seen;
@@ -590,7 +590,7 @@ static int batch_objects(struct batch_options *opt)
 			data.rest = p;
 		}
 
-		batch_one_object(input.buf, &output, opt, &data);
+		batch_one_object(input.buf, &output, batch, &data);
 	}
 
 	strbuf_release(&input);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (10 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Create p1006-cat-file.sh to provide performance testing for
`git cat-file --batch` and `git cat-file --batch-check`. This
will help us compare the performance changes after we let
cat-file reuse the ref-filter logic.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 t/perf/p1006-cat-file.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100755 t/perf/p1006-cat-file.sh

diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh
new file mode 100755
index 00000000000..b84ac31f9cc
--- /dev/null
+++ b/t/perf/p1006-cat-file.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+
+test_description='Basic sort performance tests'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'setup' '
+	git rev-list --all >rla
+'
+
+test_perf 'cat-file --batch-check' '
+	git cat-file --batch-check <rla
+'
+
+test_perf 'cat-file --batch-check with atoms' '
+	git cat-file --batch-check="%(objectname) %(objecttype)" <rla
+'
+
+test_perf 'cat-file --batch' '
+	git cat-file --batch <rla
+'
+
+test_perf 'cat-file --batch with atoms' '
+	git cat-file --batch="%(objectname) %(objecttype)" <rla
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (11 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

In order to let cat-file use ref-filter logic, let's do the
following:

1. Change the type of member `format` in struct `batch_options`
to `ref_format`, we will pass it to ref-filter later.
2. Let `batch_objects()` add atoms to format, and use
`verify_ref_format()` to check atoms.
3. Use `format_ref_array_item()` in `batch_object_write()` to
get the formatted data corresponding to the object. If the
return value of `format_ref_array_item()` is equals to zero,
use `batch_write()` to print object data; else if the return
value is less than zero, use `die()` to print the error message
and exit; else if return value is greater than zero, only print
the error message, but don't exit.
4. Use free_ref_array_item_value() to free ref_array_item's
value.

Most of the atoms in `for-each-ref --format` are now supported,
such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
`%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
that pointed to by a ref, "for-each-ref"'s family can naturally use
these atoms, but not all objects are pointed to be a ref, so "cat-file"
will not be able to use them.

The performance for `git cat-file --batch-all-objects
--batch-check` on the Git repository itself with performance
testing tool `hyperfine` changes from 616.7 ms ± 8.9 ms to
758.7 ms ± 16.4 ms.

The performance for `git cat-file --batch-all-objects --batch
>/dev/null` on the Git repository itself with performance testing
tool `time` change from "25.26s user 0.30s system 98% cpu 25.840 total"
to "28.79s user 0.83s system 99% cpu 29.829 total".

The reasons for the performance degradation are as follows:
1. There are a lot of data copies in the logic of ref-filter.
2, In order to be able to support more useful formats, complex
data structure and parsing process are used in ref-filter.

A later patch will add a fast path which will mitigate the
performance regression introduced by this patch.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-cat-file.txt |   6 +
 builtin/cat-file.c             | 242 +++++++--------------------------
 t/t1006-cat-file.sh            | 229 ++++++++++++++++---------------
 3 files changed, 174 insertions(+), 303 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 4eb0421b3fd..ef8ab952b2f 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -226,6 +226,12 @@ newline. The available atoms are:
 	after that first run of whitespace (i.e., the "rest" of the
 	line) are output in place of the `%(rest)` atom.
 
+Note that most of the atoms in `for-each-ref --format` are now supported,
+such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
+`%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
+`%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
+`%(flag)`, `%(HEAD)`. See linkgit:git-for-each-ref[1].
+
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 41d407638d5..5b163551fc6 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -16,6 +16,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "promisor-remote.h"
+#include "ref-filter.h"
 
 struct batch_options {
 	int enabled;
@@ -25,7 +26,7 @@ struct batch_options {
 	int all_objects;
 	int unordered;
 	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
-	const char *format;
+	struct ref_format format;
 };
 
 static const char *force_path;
@@ -195,99 +196,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 struct expand_data {
 	struct object_id oid;
-	enum object_type type;
-	unsigned long size;
-	off_t disk_size;
 	const char *rest;
-	struct object_id delta_base_oid;
-
-	/*
-	 * If mark_query is true, we do not expand anything, but rather
-	 * just mark the object_info with items we wish to query.
-	 */
-	int mark_query;
-
-	/*
-	 * Whether to split the input on whitespace before feeding it to
-	 * get_sha1; this is decided during the mark_query phase based on
-	 * whether we have a %(rest) token in our format.
-	 */
 	int split_on_whitespace;
-
-	/*
-	 * After a mark_query run, this object_info is set up to be
-	 * passed to oid_object_info_extended. It will point to the data
-	 * elements above, so you can retrieve the response from there.
-	 */
-	struct object_info info;
-
-	/*
-	 * This flag will be true if the requested batch format and options
-	 * don't require us to call oid_object_info, which can then be
-	 * optimized out.
-	 */
-	unsigned skip_object_info : 1;
 };
 
-static int is_atom(const char *atom, const char *s, int slen)
-{
-	int alen = strlen(atom);
-	return alen == slen && !memcmp(atom, s, alen);
-}
-
-static void expand_atom(struct strbuf *sb, const char *atom, int len,
-			void *vdata)
-{
-	struct expand_data *data = vdata;
-
-	if (is_atom("objectname", atom, len)) {
-		if (!data->mark_query)
-			strbuf_addstr(sb, oid_to_hex(&data->oid));
-	} else if (is_atom("objecttype", atom, len)) {
-		if (data->mark_query)
-			data->info.typep = &data->type;
-		else
-			strbuf_addstr(sb, type_name(data->type));
-	} else if (is_atom("objectsize", atom, len)) {
-		if (data->mark_query)
-			data->info.sizep = &data->size;
-		else
-			strbuf_addf(sb, "%"PRIuMAX , (uintmax_t)data->size);
-	} else if (is_atom("objectsize:disk", atom, len)) {
-		if (data->mark_query)
-			data->info.disk_sizep = &data->disk_size;
-		else
-			strbuf_addf(sb, "%"PRIuMAX, (uintmax_t)data->disk_size);
-	} else if (is_atom("rest", atom, len)) {
-		if (data->mark_query)
-			data->split_on_whitespace = 1;
-		else if (data->rest)
-			strbuf_addstr(sb, data->rest);
-	} else if (is_atom("deltabase", atom, len)) {
-		if (data->mark_query)
-			data->info.delta_base_oid = &data->delta_base_oid;
-		else
-			strbuf_addstr(sb,
-				      oid_to_hex(&data->delta_base_oid));
-	} else
-		die("unknown format element: %.*s", len, atom);
-}
-
-static size_t expand_format(struct strbuf *sb, const char *start, void *data)
-{
-	const char *end;
-
-	if (*start != '(')
-		return 0;
-	end = strchr(start + 1, ')');
-	if (!end)
-		die("format element '%s' does not end in ')'", start);
-
-	expand_atom(sb, start + 1, end - start - 1, data);
-
-	return end - start + 1;
-}
-
 static void batch_write(struct batch_options *opt, const void *data, int len)
 {
 	if (opt->buffer_output) {
@@ -297,87 +209,34 @@ static void batch_write(struct batch_options *opt, const void *data, int len)
 		write_or_die(1, data, len);
 }
 
-static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
-{
-	const struct object_id *oid = &data->oid;
-
-	assert(data->info.typep);
-
-	if (data->type == OBJ_BLOB) {
-		if (opt->buffer_output)
-			fflush(stdout);
-		if (opt->cmdmode) {
-			char *contents;
-			unsigned long size;
-
-			if (!data->rest)
-				die("missing path for '%s'", oid_to_hex(oid));
-
-			if (opt->cmdmode == 'w') {
-				if (filter_object(data->rest, 0100644, oid,
-						  &contents, &size))
-					die("could not convert '%s' %s",
-					    oid_to_hex(oid), data->rest);
-			} else if (opt->cmdmode == 'c') {
-				enum object_type type;
-				if (!textconv_object(the_repository,
-						     data->rest, 0100644, oid,
-						     1, &contents, &size))
-					contents = read_object_file(oid,
-								    &type,
-								    &size);
-				if (!contents)
-					die("could not convert '%s' %s",
-					    oid_to_hex(oid), data->rest);
-			} else
-				BUG("invalid cmdmode: %c", opt->cmdmode);
-			batch_write(opt, contents, size);
-			free(contents);
-		} else {
-			stream_blob(oid);
-		}
-	}
-	else {
-		enum object_type type;
-		unsigned long size;
-		void *contents;
-
-		contents = read_object_file(oid, &type, &size);
-		if (!contents)
-			die("object %s disappeared", oid_to_hex(oid));
-		if (type != data->type)
-			die("object %s changed type!?", oid_to_hex(oid));
-		if (data->info.sizep && size != data->size)
-			die("object %s changed size!?", oid_to_hex(oid));
-
-		batch_write(opt, contents, size);
-		free(contents);
-	}
-}
 
 static void batch_object_write(const char *obj_name,
 			       struct strbuf *scratch,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
-	if (!data->skip_object_info &&
-	    oid_object_info_extended(the_repository, &data->oid, &data->info,
-				     OBJECT_INFO_LOOKUP_REPLACE) < 0) {
-		printf("%s missing\n",
-		       obj_name ? obj_name : oid_to_hex(&data->oid));
-		fflush(stdout);
-		return;
-	}
+	int ret;
+	struct strbuf err = STRBUF_INIT;
+	struct ref_array_item item = { data->oid, data->rest };
 
 	strbuf_reset(scratch);
-	strbuf_expand(scratch, opt->format, expand_format, data);
-	strbuf_addch(scratch, '\n');
-	batch_write(opt, scratch->buf, scratch->len);
 
-	if (opt->print_contents) {
-		print_object_or_die(opt, data);
-		batch_write(opt, "\n", 1);
+	ret = format_ref_array_item(&item, &opt->format, scratch, &err);
+	if (ret < 0)
+		die("%s\n", err.buf);
+	if (ret) {
+		/* ret > 0 means when the object corresponding to oid
+		 * cannot be found in format_ref_array_item(), we only print
+		 * the error message.
+		 */
+		printf("%s\n", err.buf);
+		fflush(stdout);
+	} else {
+		strbuf_addch(scratch, '\n');
+		batch_write(opt, scratch->buf, scratch->len);
 	}
+	free_ref_array_item_value(&item);
+	strbuf_release(&err);
 }
 
 static void batch_one_object(const char *obj_name,
@@ -495,43 +354,37 @@ static int batch_unordered_packed(const struct object_id *oid,
 	return batch_unordered_object(oid, data);
 }
 
-static int batch_objects(struct batch_options *batch)
+static const char * const cat_file_usage[] = {
+	N_("git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>"),
+	N_("git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]"),
+	NULL
+};
+
+static int batch_objects(struct batch_options *batch, const struct option *options)
 {
 	struct strbuf input = STRBUF_INIT;
 	struct strbuf output = STRBUF_INIT;
+	struct strbuf format = STRBUF_INIT;
 	struct expand_data data;
 	int save_warning;
 	int retval = 0;
 
-	if (!batch->format)
-		batch->format = "%(objectname) %(objecttype) %(objectsize)";
-
-	/*
-	 * Expand once with our special mark_query flag, which will prime the
-	 * object_info to be handed to oid_object_info_extended for each
-	 * object.
-	 */
 	memset(&data, 0, sizeof(data));
-	data.mark_query = 1;
-	strbuf_expand(&output, batch->format, expand_format, &data);
-	data.mark_query = 0;
-	strbuf_release(&output);
-	if (batch->cmdmode)
-		data.split_on_whitespace = 1;
-
-	/*
-	 * If we are printing out the object, then always fill in the type,
-	 * since we will want to decide whether or not to stream.
-	 */
+	if (batch->format.format)
+		strbuf_addstr(&format, batch->format.format);
+	else
+		strbuf_addstr(&format, "%(objectname) %(objecttype) %(objectsize)");
 	if (batch->print_contents)
-		data.info.typep = &data.type;
+		strbuf_addstr(&format, "\n%(raw)");
+	batch->format.format = format.buf;
+	if (verify_ref_format(&batch->format))
+		usage_with_options(cat_file_usage, options);
+
+	if (batch->cmdmode || batch->format.use_rest)
+		data.split_on_whitespace = 1;
 
 	if (batch->all_objects) {
 		struct object_cb_data cb;
-		struct object_info empty = OBJECT_INFO_INIT;
-
-		if (!memcmp(&data.info, &empty, sizeof(empty)))
-			data.skip_object_info = 1;
 
 		if (has_promisor_remote())
 			warning("This repository uses promisor remotes. Some objects may not be loaded.");
@@ -561,6 +414,7 @@ static int batch_objects(struct batch_options *batch)
 			oid_array_clear(&sa);
 		}
 
+		strbuf_release(&format);
 		strbuf_release(&output);
 		return 0;
 	}
@@ -593,18 +447,13 @@ static int batch_objects(struct batch_options *batch)
 		batch_one_object(input.buf, &output, batch, &data);
 	}
 
+	strbuf_release(&format);
 	strbuf_release(&input);
 	strbuf_release(&output);
 	warn_on_object_refname_ambiguity = save_warning;
 	return retval;
 }
 
-static const char * const cat_file_usage[] = {
-	N_("git cat-file (-t [--allow-unknown-type] | -s [--allow-unknown-type] | -e | -p | <type> | --textconv | --filters) [--path=<path>] <object>"),
-	N_("git cat-file (--batch[=<format>] | --batch-check[=<format>]) [--follow-symlinks] [--textconv | --filters]"),
-	NULL
-};
-
 static int git_cat_file_config(const char *var, const char *value, void *cb)
 {
 	if (userdiff_config(var, value) < 0)
@@ -627,7 +476,7 @@ static int batch_option_callback(const struct option *opt,
 
 	bo->enabled = 1;
 	bo->print_contents = !strcmp(opt->long_name, "batch");
-	bo->format = arg;
+	bo->format.format = arg;
 
 	return 0;
 }
@@ -636,7 +485,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 {
 	int opt = 0;
 	const char *exp_type = NULL, *obj_name = NULL;
-	struct batch_options batch = {0};
+	struct batch_options batch = {
+		.format = REF_FORMAT_INIT
+	};
 	int unknown_type = 0;
 
 	const struct option options[] = {
@@ -675,6 +526,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	git_config(git_cat_file_config, NULL);
 
 	batch.buffer_output = -1;
+	batch.format.cat_file_mode = 1;
 	argc = parse_options(argc, argv, prefix, options, cat_file_usage, 0);
 
 	if (opt) {
@@ -718,7 +570,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		batch.buffer_output = batch.all_objects;
 
 	if (batch.enabled)
-		return batch_objects(&batch);
+		return batch_objects(&batch, options);
 
 	if (unknown_type && opt != 't' && opt != 's')
 		die("git cat-file --allow-unknown-type: use with -s or -t");
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 95d760652eb..8c1943011fb 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -679,51 +679,51 @@ batch_test_atom refs/heads/main '%(objectsize)'
 batch_test_atom refs/heads/main '%(objectsize:disk)'
 batch_test_atom refs/heads/main '%(deltabase)'
 batch_test_atom refs/heads/main '%(objectname)'
-batch_test_atom refs/heads/main '%(objectname:short)' fail
-batch_test_atom refs/heads/main '%(objectname:short=1)' fail
-batch_test_atom refs/heads/main '%(objectname:short=10)' fail
-batch_test_atom refs/heads/main '%(tree)' fail
-batch_test_atom refs/heads/main '%(tree:short)' fail
-batch_test_atom refs/heads/main '%(tree:short=1)' fail
-batch_test_atom refs/heads/main '%(tree:short=10)' fail
-batch_test_atom refs/heads/main '%(parent)' fail
-batch_test_atom refs/heads/main '%(parent:short)' fail
-batch_test_atom refs/heads/main '%(parent:short=1)' fail
-batch_test_atom refs/heads/main '%(parent:short=10)' fail
-batch_test_atom refs/heads/main '%(numparent)' fail
-batch_test_atom refs/heads/main '%(object)' fail
-batch_test_atom refs/heads/main '%(type)' fail
-batch_test_atom refs/heads/main '%(raw)' fail
-batch_test_atom refs/heads/main '%(*objectname)' fail
-batch_test_atom refs/heads/main '%(*objecttype)' fail
-batch_test_atom refs/heads/main '%(author)' fail
-batch_test_atom refs/heads/main '%(authorname)' fail
-batch_test_atom refs/heads/main '%(authoremail)' fail
-batch_test_atom refs/heads/main '%(authoremail:trim)' fail
-batch_test_atom refs/heads/main '%(authoremail:localpart)' fail
-batch_test_atom refs/heads/main '%(authordate)' fail
-batch_test_atom refs/heads/main '%(committer)' fail
-batch_test_atom refs/heads/main '%(committername)' fail
-batch_test_atom refs/heads/main '%(committeremail)' fail
-batch_test_atom refs/heads/main '%(committeremail:trim)' fail
-batch_test_atom refs/heads/main '%(committeremail:localpart)' fail
-batch_test_atom refs/heads/main '%(committerdate)' fail
-batch_test_atom refs/heads/main '%(tag)' fail
-batch_test_atom refs/heads/main '%(tagger)' fail
-batch_test_atom refs/heads/main '%(taggername)' fail
-batch_test_atom refs/heads/main '%(taggeremail)' fail
-batch_test_atom refs/heads/main '%(taggeremail:trim)' fail
-batch_test_atom refs/heads/main '%(taggeremail:localpart)' fail
-batch_test_atom refs/heads/main '%(taggerdate)' fail
-batch_test_atom refs/heads/main '%(creator)' fail
-batch_test_atom refs/heads/main '%(creatordate)' fail
-batch_test_atom refs/heads/main '%(subject)' fail
-batch_test_atom refs/heads/main '%(subject:sanitize)' fail
-batch_test_atom refs/heads/main '%(contents:subject)' fail
-batch_test_atom refs/heads/main '%(body)' fail
-batch_test_atom refs/heads/main '%(contents:body)' fail
-batch_test_atom refs/heads/main '%(contents:signature)' fail
-batch_test_atom refs/heads/main '%(contents)' fail
+batch_test_atom refs/heads/main '%(objectname:short)'
+batch_test_atom refs/heads/main '%(objectname:short=1)'
+batch_test_atom refs/heads/main '%(objectname:short=10)'
+batch_test_atom refs/heads/main '%(tree)'
+batch_test_atom refs/heads/main '%(tree:short)'
+batch_test_atom refs/heads/main '%(tree:short=1)'
+batch_test_atom refs/heads/main '%(tree:short=10)'
+batch_test_atom refs/heads/main '%(parent)'
+batch_test_atom refs/heads/main '%(parent:short)'
+batch_test_atom refs/heads/main '%(parent:short=1)'
+batch_test_atom refs/heads/main '%(parent:short=10)'
+batch_test_atom refs/heads/main '%(numparent)'
+batch_test_atom refs/heads/main '%(object)'
+batch_test_atom refs/heads/main '%(type)'
+batch_test_atom refs/heads/main '%(raw)'
+batch_test_atom refs/heads/main '%(*objectname)'
+batch_test_atom refs/heads/main '%(*objecttype)'
+batch_test_atom refs/heads/main '%(author)'
+batch_test_atom refs/heads/main '%(authorname)'
+batch_test_atom refs/heads/main '%(authoremail)'
+batch_test_atom refs/heads/main '%(authoremail:trim)'
+batch_test_atom refs/heads/main '%(authoremail:localpart)'
+batch_test_atom refs/heads/main '%(authordate)'
+batch_test_atom refs/heads/main '%(committer)'
+batch_test_atom refs/heads/main '%(committername)'
+batch_test_atom refs/heads/main '%(committeremail)'
+batch_test_atom refs/heads/main '%(committeremail:trim)'
+batch_test_atom refs/heads/main '%(committeremail:localpart)'
+batch_test_atom refs/heads/main '%(committerdate)'
+batch_test_atom refs/heads/main '%(tag)'
+batch_test_atom refs/heads/main '%(tagger)'
+batch_test_atom refs/heads/main '%(taggername)'
+batch_test_atom refs/heads/main '%(taggeremail)'
+batch_test_atom refs/heads/main '%(taggeremail:trim)'
+batch_test_atom refs/heads/main '%(taggeremail:localpart)'
+batch_test_atom refs/heads/main '%(taggerdate)'
+batch_test_atom refs/heads/main '%(creator)'
+batch_test_atom refs/heads/main '%(creatordate)'
+batch_test_atom refs/heads/main '%(subject)'
+batch_test_atom refs/heads/main '%(subject:sanitize)'
+batch_test_atom refs/heads/main '%(contents:subject)'
+batch_test_atom refs/heads/main '%(body)'
+batch_test_atom refs/heads/main '%(contents:body)'
+batch_test_atom refs/heads/main '%(contents:signature)'
+batch_test_atom refs/heads/main '%(contents)'
 batch_test_atom refs/heads/main '%(HEAD)' fail
 batch_test_atom refs/heads/main '%(upstream:track)' fail
 batch_test_atom refs/heads/main '%(upstream:trackshort)' fail
@@ -742,52 +742,52 @@ batch_test_atom refs/tags/testtag '%(push)' fail
 batch_test_atom refs/tags/testtag '%(objecttype)'
 batch_test_atom refs/tags/testtag '%(objectsize)'
 batch_test_atom refs/tags/testtag '%(objectsize:disk)'
-batch_test_atom refs/tags/testtag '%(*objectsize:disk)' fail
+batch_test_atom refs/tags/testtag '%(*objectsize:disk)'
 batch_test_atom refs/tags/testtag '%(deltabase)'
-batch_test_atom refs/tags/testtag '%(*deltabase)' fail
+batch_test_atom refs/tags/testtag '%(*deltabase)'
 batch_test_atom refs/tags/testtag '%(objectname)'
-batch_test_atom refs/tags/testtag '%(objectname:short)' fail
-batch_test_atom refs/tags/testtag '%(tree)' fail
-batch_test_atom refs/tags/testtag '%(tree:short)' fail
-batch_test_atom refs/tags/testtag '%(tree:short=1)' fail
-batch_test_atom refs/tags/testtag '%(tree:short=10)' fail
-batch_test_atom refs/tags/testtag '%(parent)' fail
-batch_test_atom refs/tags/testtag '%(parent:short)' fail
-batch_test_atom refs/tags/testtag '%(parent:short=1)' fail
-batch_test_atom refs/tags/testtag '%(parent:short=10)' fail
-batch_test_atom refs/tags/testtag '%(numparent)' fail
-batch_test_atom refs/tags/testtag '%(object)' fail
-batch_test_atom refs/tags/testtag '%(type)' fail
-batch_test_atom refs/tags/testtag '%(*objectname)' fail
-batch_test_atom refs/tags/testtag '%(*objecttype)' fail
-batch_test_atom refs/tags/testtag '%(author)' fail
-batch_test_atom refs/tags/testtag '%(authorname)' fail
-batch_test_atom refs/tags/testtag '%(authoremail)' fail
-batch_test_atom refs/tags/testtag '%(authoremail:trim)' fail
-batch_test_atom refs/tags/testtag '%(authoremail:localpart)' fail
-batch_test_atom refs/tags/testtag '%(authordate)' fail
-batch_test_atom refs/tags/testtag '%(committer)' fail
-batch_test_atom refs/tags/testtag '%(committername)' fail
-batch_test_atom refs/tags/testtag '%(committeremail)' fail
-batch_test_atom refs/tags/testtag '%(committeremail:trim)' fail
-batch_test_atom refs/tags/testtag '%(committeremail:localpart)' fail
-batch_test_atom refs/tags/testtag '%(committerdate)' fail
-batch_test_atom refs/tags/testtag '%(tag)' fail
-batch_test_atom refs/tags/testtag '%(tagger)' fail
-batch_test_atom refs/tags/testtag '%(taggername)' fail
-batch_test_atom refs/tags/testtag '%(taggeremail)' fail
-batch_test_atom refs/tags/testtag '%(taggeremail:trim)' fail
-batch_test_atom refs/tags/testtag '%(taggeremail:localpart)' fail
-batch_test_atom refs/tags/testtag '%(taggerdate)' fail
-batch_test_atom refs/tags/testtag '%(creator)' fail
-batch_test_atom refs/tags/testtag '%(creatordate)' fail
-batch_test_atom refs/tags/testtag '%(subject)' fail
-batch_test_atom refs/tags/testtag '%(subject:sanitize)' fail
-batch_test_atom refs/tags/testtag '%(contents:subject)' fail
-batch_test_atom refs/tags/testtag '%(body)' fail
-batch_test_atom refs/tags/testtag '%(contents:body)' fail
-batch_test_atom refs/tags/testtag '%(contents:signature)' fail
-batch_test_atom refs/tags/testtag '%(contents)' fail
+batch_test_atom refs/tags/testtag '%(objectname:short)'
+batch_test_atom refs/tags/testtag '%(tree)'
+batch_test_atom refs/tags/testtag '%(tree:short)'
+batch_test_atom refs/tags/testtag '%(tree:short=1)'
+batch_test_atom refs/tags/testtag '%(tree:short=10)'
+batch_test_atom refs/tags/testtag '%(parent)'
+batch_test_atom refs/tags/testtag '%(parent:short)'
+batch_test_atom refs/tags/testtag '%(parent:short=1)'
+batch_test_atom refs/tags/testtag '%(parent:short=10)'
+batch_test_atom refs/tags/testtag '%(numparent)'
+batch_test_atom refs/tags/testtag '%(object)'
+batch_test_atom refs/tags/testtag '%(type)'
+batch_test_atom refs/tags/testtag '%(*objectname)'
+batch_test_atom refs/tags/testtag '%(*objecttype)'
+batch_test_atom refs/tags/testtag '%(author)'
+batch_test_atom refs/tags/testtag '%(authorname)'
+batch_test_atom refs/tags/testtag '%(authoremail)'
+batch_test_atom refs/tags/testtag '%(authoremail:trim)'
+batch_test_atom refs/tags/testtag '%(authoremail:localpart)'
+batch_test_atom refs/tags/testtag '%(authordate)'
+batch_test_atom refs/tags/testtag '%(committer)'
+batch_test_atom refs/tags/testtag '%(committername)'
+batch_test_atom refs/tags/testtag '%(committeremail)'
+batch_test_atom refs/tags/testtag '%(committeremail:trim)'
+batch_test_atom refs/tags/testtag '%(committeremail:localpart)'
+batch_test_atom refs/tags/testtag '%(committerdate)'
+batch_test_atom refs/tags/testtag '%(tag)'
+batch_test_atom refs/tags/testtag '%(tagger)'
+batch_test_atom refs/tags/testtag '%(taggername)'
+batch_test_atom refs/tags/testtag '%(taggeremail)'
+batch_test_atom refs/tags/testtag '%(taggeremail:trim)'
+batch_test_atom refs/tags/testtag '%(taggeremail:localpart)'
+batch_test_atom refs/tags/testtag '%(taggerdate)'
+batch_test_atom refs/tags/testtag '%(creator)'
+batch_test_atom refs/tags/testtag '%(creatordate)'
+batch_test_atom refs/tags/testtag '%(subject)'
+batch_test_atom refs/tags/testtag '%(subject:sanitize)'
+batch_test_atom refs/tags/testtag '%(contents:subject)'
+batch_test_atom refs/tags/testtag '%(body)'
+batch_test_atom refs/tags/testtag '%(contents:body)'
+batch_test_atom refs/tags/testtag '%(contents:signature)'
+batch_test_atom refs/tags/testtag '%(contents)'
 batch_test_atom refs/tags/testtag '%(HEAD)' fail
 
 batch_test_atom refs/myblobs/blob1 '%(refname)' fail
@@ -801,37 +801,50 @@ batch_test_atom refs/myblobs/blob1 '%(objectsize)'
 batch_test_atom refs/myblobs/blob1 '%(objectsize:disk)'
 batch_test_atom refs/myblobs/blob1 '%(deltabase)'
 
-batch_test_atom refs/myblobs/blob1 '%(contents)' fail
-batch_test_atom refs/myblobs/blob2 '%(contents)' fail
+batch_test_atom refs/myblobs/blob1 '%(contents)'
+batch_test_atom refs/myblobs/blob2 '%(contents)'
 
-batch_test_atom refs/myblobs/blob1 '%(raw)' fail
-batch_test_atom refs/myblobs/blob2 '%(raw)' fail
-batch_test_atom refs/mytrees/tree1 '%(raw)' fail
+batch_test_atom refs/myblobs/blob1 '%(raw)'
+batch_test_atom refs/myblobs/blob2 '%(raw)'
+batch_test_atom refs/mytrees/tree1 '%(raw)'
 
-batch_test_atom refs/myblobs/blob1 '%(raw:size)' fail
-batch_test_atom refs/myblobs/blob2 '%(raw:size)' fail
-batch_test_atom refs/mytrees/tree1 '%(raw:size)' fail
+batch_test_atom refs/myblobs/blob1 '%(raw:size)'
+batch_test_atom refs/myblobs/blob2 '%(raw:size)'
+batch_test_atom refs/mytrees/tree1 '%(raw:size)'
 
-batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
-batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)' fail
-batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)' fail
+batch_test_atom refs/myblobs/blob1 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)'
+batch_test_atom refs/myblobs/blob2 '%(if:equals=blob)%(objecttype)%(then)commit%(else)not commit%(end)'
+batch_test_atom refs/mytrees/tree1 '%(if:equals=tree)%(objecttype)%(then)tree%(else)not tree%(end)'
 
-batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)' fail
-batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)' fail
-batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)' fail
-batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)' fail
+batch_test_atom refs/heads/main '%(align:60) objectname is %(objectname)%(end)|%(objectname)'
+batch_test_atom refs/heads/main '%(align:left,60) objectname is %(objectname)%(end)|%(objectname)'
+batch_test_atom refs/heads/main '%(align:middle,60) objectname is %(objectname)%(end)|%(objectname)'
+batch_test_atom refs/heads/main '%(align:60,right) objectname is %(objectname)%(end)|%(objectname)'
 
 batch_test_atom refs/heads/main 'VALID'
 batch_test_atom refs/heads/main '%(INVALID)' fail
 batch_test_atom refs/heads/main '%(authordate:INVALID)' fail
 
 batch_test_atom refs/heads/main '%(objectname) %(objecttype) %(objectsize)
-%(raw)' fail
+%(raw)'
 batch_test_atom refs/tags/testtag '%(objectname) %(objecttype) %(objectsize)
-%(raw)' fail
+%(raw)'
 batch_test_atom refs/myblobs/blob1 '%(objectname) %(objecttype) %(objectsize)
-%(raw)' fail
+%(raw)'
 batch_test_atom refs/myblobs/blob2 '%(objectname) %(objecttype) %(objectsize)
-%(raw)' fail
+%(raw)'
+
+test_expect_success 'cat-file --batch equals to --batch-check with atoms' '
+	git cat-file --batch-check="%(objectname) %(objecttype) %(objectsize)
+%(raw)" >expected <<-EOF &&
+	refs/heads/main
+	refs/tags/testtag
+	EOF
+	git cat-file --batch >actual <<-EOF &&
+	refs/heads/main
+	refs/tags/testtag
+	EOF
+	cmp expected actual
+'
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write()
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (12 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Reuse the `err` buffer in batch_object_write(), as the
buffer `scratch` does. This will reduce the overhead
of multiple allocations of memory of the err buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5b163551fc6..dc604a9879d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -212,35 +212,36 @@ static void batch_write(struct batch_options *opt, const void *data, int len)
 
 static void batch_object_write(const char *obj_name,
 			       struct strbuf *scratch,
+			       struct strbuf *err,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
 	int ret;
-	struct strbuf err = STRBUF_INIT;
 	struct ref_array_item item = { data->oid, data->rest };
 
 	strbuf_reset(scratch);
+	strbuf_reset(err);
 
-	ret = format_ref_array_item(&item, &opt->format, scratch, &err);
+	ret = format_ref_array_item(&item, &opt->format, scratch, err);
 	if (ret < 0)
-		die("%s\n", err.buf);
+		die("%s\n", err->buf);
 	if (ret) {
 		/* ret > 0 means when the object corresponding to oid
 		 * cannot be found in format_ref_array_item(), we only print
 		 * the error message.
 		 */
-		printf("%s\n", err.buf);
+		printf("%s\n", err->buf);
 		fflush(stdout);
 	} else {
 		strbuf_addch(scratch, '\n');
 		batch_write(opt, scratch->buf, scratch->len);
 	}
 	free_ref_array_item_value(&item);
-	strbuf_release(&err);
 }
 
 static void batch_one_object(const char *obj_name,
 			     struct strbuf *scratch,
+			     struct strbuf *err,
 			     struct batch_options *opt,
 			     struct expand_data *data)
 {
@@ -294,7 +295,7 @@ static void batch_one_object(const char *obj_name,
 		return;
 	}
 
-	batch_object_write(obj_name, scratch, opt, data);
+	batch_object_write(obj_name, scratch, err, opt, data);
 }
 
 struct object_cb_data {
@@ -302,13 +303,14 @@ struct object_cb_data {
 	struct expand_data *expand;
 	struct oidset *seen;
 	struct strbuf *scratch;
+	struct strbuf *err;
 };
 
 static int batch_object_cb(const struct object_id *oid, void *vdata)
 {
 	struct object_cb_data *data = vdata;
 	oidcpy(&data->expand->oid, oid);
-	batch_object_write(NULL, data->scratch, data->opt, data->expand);
+	batch_object_write(NULL, data->scratch, data->err, data->opt, data->expand);
 	return 0;
 }
 
@@ -364,6 +366,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 {
 	struct strbuf input = STRBUF_INIT;
 	struct strbuf output = STRBUF_INIT;
+	struct strbuf err = STRBUF_INIT;
 	struct strbuf format = STRBUF_INIT;
 	struct expand_data data;
 	int save_warning;
@@ -392,6 +395,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 		cb.opt = batch;
 		cb.expand = &data;
 		cb.scratch = &output;
+		cb.err = &err;
 
 		if (batch->unordered) {
 			struct oidset seen = OIDSET_INIT;
@@ -416,6 +420,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 
 		strbuf_release(&format);
 		strbuf_release(&output);
+		strbuf_release(&err);
 		return 0;
 	}
 
@@ -444,12 +449,13 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 			data.rest = p;
 		}
 
-		batch_one_object(input.buf, &output, batch, &data);
+		batch_one_object(input.buf, &output, &err, batch, &data);
 	}
 
 	strbuf_release(&format);
 	strbuf_release(&input);
 	strbuf_release(&output);
+	strbuf_release(&err);
 	warn_on_object_refname_ambiguity = save_warning;
 	return retval;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (13 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

After cat-file reuses the ref-filter logic, we re-implement the
functions of --textconv and --filters options.

Add members `cat_file_cmdmode` to struct `ref_array_item`,
so that struct `batch_option` member `cmdmode` will be passed
to ref-filter, and then ref-filter will take use of it to filter
the content of the object in get_object().

Use `actual_oi` to record the real expand_data: it may point to the
original `oi` or the `act_oi` processed by `textconv_object()` or
`convert_to_working_tree()`. `grab_values()` will grab the contents
of `actual_oi` and `grab_common_values()` to grab the contents of origin
`oi`, this ensures that `%(objectsize)` still uses the size of the
unfiltered data.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c |  2 +-
 ref-filter.c       | 35 +++++++++++++++++++++++++++++++++--
 ref-filter.h       |  1 +
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index dc604a9879d..3a6153e778f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -217,7 +217,7 @@ static void batch_object_write(const char *obj_name,
 			       struct expand_data *data)
 {
 	int ret;
-	struct ref_array_item item = { data->oid, data->rest };
+	struct ref_array_item item = { data->oid, data->rest, opt->cmdmode };
 
 	strbuf_reset(scratch);
 	strbuf_reset(err);
diff --git a/ref-filter.c b/ref-filter.c
index b4f41fec871..91e26c9aba3 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1,3 +1,4 @@
+#define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
 #include "cache.h"
 #include "parse-options.h"
@@ -1755,6 +1756,9 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 {
 	/* parse_object_buffer() will set eaten to 0 if free() will be needed */
 	int eaten = 1;
+	struct expand_data *actual_oi = oi;
+	struct expand_data act_oi = {0};
+
 	if (oi->info.contentp) {
 		/* We need to know that to use parse_object_buffer properly */
 		oi->info.sizep = &oi->size;
@@ -1768,19 +1772,45 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 		BUG("Object size is less than zero.");
 
 	if (oi->info.contentp) {
-		*obj = parse_object_buffer(the_repository, &oi->oid, oi->type, oi->size, oi->content, &eaten);
+		if ((ref->cat_file_cmdmode == 'c' || ref->cat_file_cmdmode == 'w') && !ref->rest)
+			return strbuf_addf_ret(err, -1, _("missing path for '%s'"),
+					       oid_to_hex(&act_oi.oid));
+		if (oi->type == OBJ_BLOB) {
+			if (ref->cat_file_cmdmode == 'c') {
+				act_oi = *oi;
+				if (textconv_object(the_repository,
+						    ref->rest, 0100644, &act_oi.oid,
+						    1, (char **)(&act_oi.content), &act_oi.size))
+					actual_oi = &act_oi;
+			} else if (ref->cat_file_cmdmode == 'w') {
+				struct strbuf strbuf = STRBUF_INIT;
+				struct checkout_metadata meta;
+				act_oi = *oi;
+
+				init_checkout_metadata(&meta, NULL, NULL, &act_oi.oid);
+				if (!convert_to_working_tree(&the_index, ref->rest, act_oi.content, act_oi.size, &strbuf, &meta))
+					die("could not convert '%s' %s",
+					    oid_to_hex(&oi->oid), ref->rest);
+				act_oi.size = strbuf.len;
+				act_oi.content = strbuf_detach(&strbuf, NULL);
+				actual_oi = &act_oi;
+			}
+		}
+		*obj = parse_object_buffer(the_repository, &actual_oi->oid, actual_oi->type, actual_oi->size, actual_oi->content, &eaten);
 		if (!*obj) {
 			if (!eaten)
 				free(oi->content);
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi);
+		grab_values(ref->value, deref, *obj, actual_oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
 	if (!eaten)
 		free(oi->content);
+	if (actual_oi != oi)
+		free(actual_oi->content);
 	return 0;
 }
 
@@ -2189,6 +2219,7 @@ static struct ref_array_item *new_ref_array_item(const char *refname,
 	FLEX_ALLOC_STR(ref, refname, refname);
 	oidcpy(&ref->objectname, oid);
 	ref->rest = NULL;
+	ref->cat_file_cmdmode = 0;
 
 	return ref;
 }
diff --git a/ref-filter.h b/ref-filter.h
index 053980a6a42..a93d5e4dd61 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -39,6 +39,7 @@ struct ref_sorting {
 struct ref_array_item {
 	struct object_id objectname;
 	const char *rest;
+	int cat_file_cmdmode;
 	int flag;
 	unsigned int kind;
 	const char *symref;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (14 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  2021-07-15 15:40   ` [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because "atom_type == ATOM_OBJECTNAME" implies the condition
of `starts_with(name, "objectname")`, "atom_type == ATOM_TREE"
implies the condition of `starts_with(name, "tree")`, so the
check for `starts_with(name, field)` in grab_oid() is redundant.

So Remove the grab_oid() from ref-filter, to reduce repeated check.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 91e26c9aba3..1c7287f1061 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1077,16 +1077,6 @@ static const char *do_grab_oid(const char *field, const struct object_id *oid,
 	}
 }
 
-static int grab_oid(const char *name, const char *field, const struct object_id *oid,
-		    struct atom_value *v, struct used_atom *atom)
-{
-	if (starts_with(name, field)) {
-		v->s = xstrdup(do_grab_oid(field, oid, atom));
-		return 1;
-	}
-	return 0;
-}
-
 /* See grab_values */
 static void grab_common_values(struct atom_value *val, int deref, struct expand_data *oi)
 {
@@ -1112,8 +1102,9 @@ static void grab_common_values(struct atom_value *val, int deref, struct expand_
 			}
 		} else if (atom_type == ATOM_DELTABASE)
 			v->s = xstrdup(oid_to_hex(&oi->delta_base_oid));
-		else if (atom_type == ATOM_OBJECTNAME && deref)
-			grab_oid(name, "objectname", &oi->oid, v, &used_atom[i]);
+		else if (atom_type == ATOM_OBJECTNAME && deref) {
+			v->s = xstrdup(do_grab_oid("objectname", &oi->oid, &used_atom[i]));
+		}
 	}
 }
 
@@ -1154,9 +1145,10 @@ static void grab_commit_values(struct atom_value *val, int deref, struct object
 			continue;
 		if (deref)
 			name++;
-		if (atom_type == ATOM_TREE &&
-		    grab_oid(name, "tree", get_commit_tree_oid(commit), v, &used_atom[i]))
+		if (atom_type == ATOM_TREE) {
+			v->s = xstrdup(do_grab_oid("tree", get_commit_tree_oid(commit), &used_atom[i]));
 			continue;
+		}
 		if (atom_type == ATOM_NUMPARENT) {
 			v->value = commit_list_count(commit->parents);
 			v->s = xstrfmt("%lu", (unsigned long)v->value);
@@ -1959,9 +1951,9 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 				v->s = xstrdup(buf + 1);
 			}
 			continue;
-		} else if (!deref && atom_type == ATOM_OBJECTNAME &&
-			   grab_oid(name, "objectname", &ref->objectname, v, atom)) {
-				continue;
+		} else if (!deref && atom_type == ATOM_OBJECTNAME) {
+			   v->s = xstrdup(do_grab_oid("objectname", &ref->objectname, atom));
+			   continue;
 		} else if (atom_type == ATOM_HEAD) {
 			if (atom->u.head && !strcmp(ref->refname, atom->u.head))
 				v->s = xstrdup("*");
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format
  2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
                     ` (15 preceding siblings ...)
  2021-07-15 15:40   ` [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
@ 2021-07-15 15:40   ` ZheNing Hu via GitGitGadget
  16 siblings, 0 replies; 52+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-07-15 15:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Bagas Sanjaya,
	Jeff King, Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Philip Oakley, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add the member `default_format` to struct `batch_options`,
when we are using the default format on `git cat-file --batch`,
or `git cat-file --batch-check`, `default_format` will be set,
if we don't use `--textconv` or `--filter`, then we will not call
verify_ref_format(), has_object_file() and format_ref_array_item().
Instead, we get the object data directly through
oid_object_info_extended() and then output the data directly.

By using this fast path, we can reduce some of the extra overhead
when `cat-file --batch` using ref-filter. The running time of
`git cat-file --batch-check` will be similar to before, and the
running time of `git cat-file --batch` will be 9.1% less than before.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 builtin/cat-file.c | 79 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 57 insertions(+), 22 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 3a6153e778f..8edc19f2d5a 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -26,6 +26,7 @@ struct batch_options {
 	int all_objects;
 	int unordered;
 	int cmdmode; /* may be 'w' or 'c' for --filters or --textconv */
+	int default_format;
 	struct ref_format format;
 };
 
@@ -196,6 +197,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 struct expand_data {
 	struct object_id oid;
+	struct object_info info;
 	const char *rest;
 	int split_on_whitespace;
 };
@@ -216,27 +218,58 @@ static void batch_object_write(const char *obj_name,
 			       struct batch_options *opt,
 			       struct expand_data *data)
 {
-	int ret;
-	struct ref_array_item item = { data->oid, data->rest, opt->cmdmode };
-
-	strbuf_reset(scratch);
-	strbuf_reset(err);
-
-	ret = format_ref_array_item(&item, &opt->format, scratch, err);
-	if (ret < 0)
-		die("%s\n", err->buf);
-	if (ret) {
-		/* ret > 0 means when the object corresponding to oid
-		 * cannot be found in format_ref_array_item(), we only print
-		 * the error message.
-		 */
-		printf("%s\n", err->buf);
+	if (opt->default_format && !opt->cmdmode) {
+		struct strbuf type_name = STRBUF_INIT;
+		unsigned long size;
+		void *content;
+
+		if (opt->print_contents)
+			data->info.contentp = &content;
+
+		data->info.type_name = &type_name;
+		data->info.sizep = &size;
+
+		if (oid_object_info_extended(the_repository, &data->oid, &data->info,
+					     OBJECT_INFO_LOOKUP_REPLACE) < 0) {
+			printf("%s missing\n",
+			       obj_name ? obj_name : oid_to_hex(&data->oid));
+			fflush(stdout);
+			return;
+		}
+
+		fprintf(stdout, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
+			data->info.type_name->buf,
+			(uintmax_t)*data->info.sizep);
 		fflush(stdout);
+		strbuf_release(&type_name);
+		if (opt->print_contents) {
+			batch_write(opt, content, *data->info.sizep);
+			batch_write(opt, "\n", 1);
+			free(content);
+		}
 	} else {
-		strbuf_addch(scratch, '\n');
-		batch_write(opt, scratch->buf, scratch->len);
+		int ret;
+		struct ref_array_item item = { data->oid, data->rest, opt->cmdmode };
+
+		strbuf_reset(scratch);
+		strbuf_reset(err);
+
+		ret = format_ref_array_item(&item, &opt->format, scratch, err);
+		if (ret < 0)
+			die("%s\n", err->buf);
+		if (ret) {
+			/* ret > 0 means when the object corresponding to oid
+			 * cannot be found in format_ref_array_item(), we only print
+			 * the error message.
+			 */
+			printf("%s\n", err->buf);
+			fflush(stdout);
+		} else {
+			strbuf_addch(scratch, '\n');
+			batch_write(opt, scratch->buf, scratch->len);
+		}
+		free_ref_array_item_value(&item);
 	}
-	free_ref_array_item_value(&item);
 }
 
 static void batch_one_object(const char *obj_name,
@@ -288,7 +321,7 @@ static void batch_one_object(const char *obj_name,
 		return;
 	}
 
-	if (!has_object_file(&data->oid)) {
+	if ((!opt->default_format || opt->cmdmode) && !has_object_file(&data->oid)) {
 		printf("%s missing\n",
 		       obj_name ? obj_name : oid_to_hex(&data->oid));
 		fflush(stdout);
@@ -380,7 +413,7 @@ static int batch_objects(struct batch_options *batch, const struct option *optio
 	if (batch->print_contents)
 		strbuf_addstr(&format, "\n%(raw)");
 	batch->format.format = format.buf;
-	if (verify_ref_format(&batch->format))
+	if ((!batch->default_format || batch->cmdmode) && verify_ref_format(&batch->format))
 		usage_with_options(cat_file_usage, options);
 
 	if (batch->cmdmode || batch->format.use_rest)
@@ -483,7 +516,8 @@ static int batch_option_callback(const struct option *opt,
 	bo->enabled = 1;
 	bo->print_contents = !strcmp(opt->long_name, "batch");
 	bo->format.format = arg;
-
+	if (arg)
+		bo->default_format = 0;
 	return 0;
 }
 
@@ -492,7 +526,8 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	int opt = 0;
 	const char *exp_type = NULL, *obj_name = NULL;
 	struct batch_options batch = {
-		.format = REF_FORMAT_INIT
+		.format = REF_FORMAT_INIT,
+		.default_format = 1
 	};
 	int unknown_type = 0;
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-07-15 15:40 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-12 11:46 [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 01/19] cat-file: handle trivial --batch format with --batch-all-objects ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 02/19] cat-file: merge two block into one ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 03/19] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 04/19] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 05/19] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 06/19] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 07/19] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 08/19] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 09/19] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 10/19] [GSOC] ref-filter: introduce reject_atom() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 11/19] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 12/19] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 13/19] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 14/19] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-12 13:17   ` Christian Couder
2021-07-12 13:26     ` Christian Couder
2021-07-12 13:51       ` ZheNing Hu
2021-07-12 13:49     ` ZheNing Hu
2021-07-12 20:38     ` Junio C Hamano
2021-07-14 16:24       ` ZheNing Hu
2021-07-15  1:53         ` ZheNing Hu
2021-07-15  9:45           ` Christian Couder
2021-07-15 13:53             ` ZheNing Hu
2021-07-15 14:55     ` ZheNing Hu
2021-07-12 11:46 ` [PATCH 15/19] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 16/19] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 17/19] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 18/19] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-12 11:46 ` [PATCH 19/19] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget
2021-07-12 12:36 ` [PATCH 00/19] [GSOC] cat-file: reuse ref-filter logic Christian Couder
2021-07-12 13:01   ` ZheNing Hu
2021-07-12 13:02 ` Philip Oakley
2021-07-12 13:27   ` ZheNing Hu
2021-07-15 15:40 ` [PATCH v2 00/17] " ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 01/17] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 02/17] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 03/17] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 04/17] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 05/17] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 06/17] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 07/17] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 08/17] [GSOC] ref-filter: add cat_file_mode to ref_format ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 09/17] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 10/17] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 11/17] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 12/17] [GSOC] cat-file: create p1006-cat-file.sh ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 13/17] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 14/17] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 15/17] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 16/17] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget
2021-07-15 15:40   ` [PATCH v2 17/17] [GSOC] cat-file: use fast path when using default_format ZheNing Hu via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).