git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom
@ 2021-06-01 14:37 ZheNing Hu via GitGitGadget
  2021-06-01 14:37 ` [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-06-01 14:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

In order to make git cat-file --batch use ref-filter logic, I add %(raw)
atom to ref-filter.

Change from last version:

 1. Change is_empty() logic.
 2. Simplify memcasecmp().
 3. rebase on zh/ref-filter-atom-type.

ZheNing Hu (2):
  [GSOC] ref-filter: add obj-type check in grab contents
  [GSOC] ref-filter: add %(raw) atom

 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 164 +++++++++++++++++------
 t/t6300-for-each-ref.sh            | 207 +++++++++++++++++++++++++++++
 3 files changed, 343 insertions(+), 37 deletions(-)


base-commit: 1197f1a46360d3ae96bd9c15908a3a6f8e562207
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-966%2Fadlternative%2Fref-filter-raw-atom-v4-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-966/adlternative/ref-filter-raw-atom-v4-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/966
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents
  2021-06-01 14:37 [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-06-01 14:37 ` ZheNing Hu via GitGitGadget
  2021-06-03  2:10   ` Junio C Hamano
  2021-06-01 14:37 ` [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-06-01 14:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase.  We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag.

Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4db0e40ff4c6..c0334857653a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1356,7 +1356,8 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }
 
 /* See grab_values */
-static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
+static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
+				   struct object *obj)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
@@ -1371,10 +1372,13 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			continue;
 		if (deref)
 			name++;
-		if (strcmp(name, "body") &&
-		    !starts_with(name, "subject") &&
-		    !starts_with(name, "trailers") &&
-		    !starts_with(name, "contents"))
+
+		if ((obj->type != OBJ_TAG &&
+		     obj->type != OBJ_COMMIT) ||
+		    (strcmp(name, "body") &&
+		     !starts_with(name, "subject") &&
+		     !starts_with(name, "trailers") &&
+		     !starts_with(name, "contents")))
 			continue;
 		if (!subpos)
 			find_subpos(buf,
@@ -1443,12 +1447,12 @@ static void grab_values(struct atom_value *val, int deref, struct object *obj, v
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, buf, obj);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, buf, obj);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01 14:37 [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
  2021-06-01 14:37 ` [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
@ 2021-06-01 14:37 ` ZheNing Hu via GitGitGadget
  2021-06-03  2:38   ` Junio C Hamano
  2021-06-03  5:11 ` [PATCH 0/2] " Bagas Sanjaya
  2021-06-04 12:12 ` [PATCH v2 " ZheNing Hu via GitGitGadget
  3 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-06-01 14:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depands on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, add a new member in `struct atom_value`: `s_size`, which
can record raw object size, it can help us add raw object data to
the buffer or compare two buffers which contain raw object data.

Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, `--perl` because if our binary raw data is passed to a variable
in the host language, the host language may not support arbitrary binary
data in the variables of its string type.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 152 ++++++++++++++++-----
 t/t6300-for-each-ref.sh            | 207 +++++++++++++++++++++++++++++
 3 files changed, 335 insertions(+), 33 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2ae2478de706..8f8d8cd1e04f 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,15 @@ and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol
 out of the trimmed email.
 
+The raw data in a object is `raw`.
+
+raw:size::
+	The raw data size of the object.
+
+Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+`--perl` because the host language may not support arbitrary binary data in the
+variables of its string type.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index c0334857653a..4d053a9e14ba 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,
 	ATOM_TRAILERS,
 	ATOM_CONTENTS,
+	ATOM_RAW,
 	ATOM_UPSTREAM,
 	ATOM_PUSH,
 	ATOM_SYMREF,
@@ -189,6 +190,9 @@ static struct used_atom {
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
+		struct {
+			enum { RAW_BARE, RAW_LENGTH } option;
+		} raw_data;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -426,6 +430,18 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -586,6 +602,7 @@ static struct {
 	[ATOM_BODY] = { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
 	[ATOM_TRAILERS] = { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	[ATOM_CONTENTS] = { "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
+	[ATOM_RAW] = { "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
 	[ATOM_UPSTREAM] = { "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_PUSH] = { "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_SYMREF] = { "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -620,12 +637,15 @@ struct ref_formatting_state {
 
 struct atom_value {
 	const char *s;
+	size_t s_size;
 	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
 		       struct strbuf *err);
 	uintmax_t value; /* used for sorting when not FIELD_STR */
 	struct used_atom *atom;
 };
 
+#define ATOM_VALUE_S_SIZE_INIT (-1)
+
 /*
  * Used to parse format string and sort specifiers
  */
@@ -644,13 +664,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
 				       (int)(ep-atom), atom);
 
-	/* Do we have the atom already used elsewhere? */
-	for (i = 0; i < used_atom_cnt; i++) {
-		int len = strlen(used_atom[i].name);
-		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-			return i;
-	}
-
 	/*
 	 * If the atom name has a colon, strip it and everything after
 	 * it off - it specifies the format for this entry, and
@@ -660,6 +673,17 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	arg = memchr(sp, ':', ep - sp);
 	atom_len = (arg ? arg : ep) - sp;
 
+	if (format->quote_style && !strncmp(sp, "raw", 3) && !arg)
+		return strbuf_addf_ret(err, -1, _("--format=%.*s cannot be used with"
+				"--python, --shell, --tcl, --perl"), (int)(ep-atom), atom);
+
+	/* Do we have the atom already used elsewhere? */
+	for (i = 0; i < used_atom_cnt; i++) {
+		int len = strlen(used_atom[i].name);
+		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
+			return i;
+	}
+
 	/* Is the atom a valid one? */
 	for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
 		int len = strlen(valid_atom[i].name);
@@ -709,11 +733,14 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	return at;
 }
 
-static void quote_formatting(struct strbuf *s, const char *str, int quote_style)
+static void quote_formatting(struct strbuf *s, const char *str, size_t len, int quote_style)
 {
 	switch (quote_style) {
 	case QUOTE_NONE:
-		strbuf_addstr(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(s, str, len);
+		else
+			strbuf_addstr(s, str);
 		break;
 	case QUOTE_SHELL:
 		sq_quote_buf(s, str);
@@ -740,9 +767,12 @@ static int append_atom(struct atom_value *v, struct ref_formatting_state *state,
 	 * encountered.
 	 */
 	if (!state->stack->prev)
-		quote_formatting(&state->stack->output, v->s, state->quote_style);
+		quote_formatting(&state->stack->output, v->s, v->s_size, state->quote_style);
 	else
-		strbuf_addstr(&state->stack->output, v->s);
+		if (v->s_size != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(&state->stack->output, v->s, v->s_size);
+		else
+			strbuf_addstr(&state->stack->output, v->s);
 	return 0;
 }
 
@@ -842,21 +872,23 @@ static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	return 0;
 }
 
-static int is_empty(const char *s)
+static int is_empty(struct strbuf *buf)
 {
-	while (*s != '\0') {
-		if (!isspace(*s))
-			return 0;
-		s++;
-	}
-	return 1;
-}
+	const char *cur = buf->buf;
+	const char *end = buf->buf + buf->len;
+
+	while (cur != end && (isspace(*cur)))
+		cur++;
+
+	return cur == end;
+ }
 
 static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			     struct strbuf *err)
 {
 	struct ref_formatting_stack *cur = state->stack;
 	struct if_then_else *if_then_else = NULL;
+	size_t str_len = 0;
 
 	if (cur->at_end == if_then_else_handler)
 		if_then_else = (struct if_then_else *)cur->at_end_data;
@@ -867,18 +899,22 @@ static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 	if (if_then_else->else_atom_seen)
 		return strbuf_addf_ret(err, -1, _("format: %%(then) atom used after %%(else)"));
 	if_then_else->then_atom_seen = 1;
+	if (if_then_else->str)
+		str_len = strlen(if_then_else->str);
 	/*
 	 * If the 'equals' or 'notequals' attribute is used then
 	 * perform the required comparison. If not, only non-empty
 	 * strings satisfy the 'if' condition.
 	 */
 	if (if_then_else->cmp_status == COMPARE_EQUAL) {
-		if (!strcmp(if_then_else->str, cur->output.buf))
+		if (str_len == cur->output.len &&
+		    !memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
 	} else if (if_then_else->cmp_status == COMPARE_UNEQUAL) {
-		if (strcmp(if_then_else->str, cur->output.buf))
+		if (str_len != cur->output.len ||
+		    memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
-	} else if (cur->output.len && !is_empty(cur->output.buf))
+	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
 	strbuf_reset(&cur->output);
 	return 0;
@@ -924,7 +960,7 @@ static int end_atom_handler(struct atom_value *atomv, struct ref_formatting_stat
 	 * only on the topmost supporting atom.
 	 */
 	if (!current->prev->prev) {
-		quote_formatting(&s, current->output.buf, state->quote_style);
+		quote_formatting(&s, current->output.buf, current->output.len, state->quote_style);
 		strbuf_swap(&current->output, &s);
 	}
 	strbuf_release(&s);
@@ -1357,7 +1393,7 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 
 /* See grab_values */
 static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
-				   struct object *obj)
+				   unsigned long buf_size, struct object *obj)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
@@ -1367,12 +1403,23 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+		enum atom_type atom_type = atom->atom_type;
 
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
 			name++;
 
+		if (atom_type == ATOM_RAW) {
+			if (atom->u.raw_data.option == RAW_BARE) {
+				v->s = xmemdupz(buf, buf_size);
+				v->s_size = buf_size;
+			} else if (atom->u.raw_data.option == RAW_LENGTH) {
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
+			}
+			continue;
+		}
+
 		if ((obj->type != OBJ_TAG &&
 		     obj->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
@@ -1442,25 +1489,30 @@ static void fill_missing_values(struct atom_value *val)
  * pointed at by the ref itself; otherwise it is the object the
  * ref (which is a tag) refers to.
  */
-static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
+static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
 {
+	void *buf = data->content;
+	unsigned long buf_size = data->size;
+
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf, obj);
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf, obj);
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;
 	case OBJ_TREE:
 		/* grab_tree_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		break;
 	case OBJ_BLOB:
 		/* grab_blob_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		break;
 	default:
 		die("Eh?  Object of type %d?", obj->type);
@@ -1682,7 +1734,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi->content);
+		grab_values(ref->value, deref, *obj, oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
@@ -1763,7 +1815,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		int deref = 0;
 		const char *refname;
 		struct branch *branch = NULL;
-
+		v->s_size = ATOM_VALUE_S_SIZE_INIT;
 		v->handler = append_atom;
 		v->atom = atom;
 
@@ -2367,6 +2419,19 @@ static int compare_detached_head(struct ref_array_item *a, struct ref_array_item
 	return 0;
 }
 
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)
+{
+	const char *s1 = vs1, *s2 = vs2;
+	const char *end = s1 + n;
+
+	for (; s1 < end; s1++, s2++) {
+		int diff = tolower(*s1) - tolower(*s2);
+		if (diff)
+			return diff;
+	}
+	return 0;
+}
+
 static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
 {
 	struct atom_value *va, *vb;
@@ -2387,10 +2452,30 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 	} else if (s->sort_flags & REF_SORTING_VERSION) {
 		cmp = versioncmp(va->s, vb->s);
 	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(vb->s) : vb->s_size;
+
+			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
+				     a_size : b_size);
+			if (!cmp) {
+				if (a_size > b_size)
+					cmp = 1;
+				else if (a_size < b_size)
+					cmp = -1;
+			}
+		}
 	} else {
 		if (va->value < vb->value)
 			cmp = -1;
@@ -2490,6 +2575,7 @@ int format_ref_array_item(struct ref_array_item *info,
 	}
 	if (format->need_color_reset_at_eol) {
 		struct atom_value resetv;
+		resetv.s_size = ATOM_VALUE_S_SIZE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9e0214076b4d..5f66d933ace0 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -130,6 +130,8 @@ test_atom head parent:short=10 ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head raw "$(git cat-file commit refs/heads/main)
+"
 test_atom head '*objectname' ''
 test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
@@ -221,6 +223,15 @@ test_atom tag contents 'Tagging at 1151968727
 '
 test_atom tag HEAD ' '
 
+test_expect_success 'basic atom: refs/tags/testtag *raw' '
+	git cat-file commit refs/tags/testtag^{} >expected &&
+	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -686,6 +697,15 @@ test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
 
+test_expect_success 'basic atom: refs/tags/signed-empty raw' '
+	git cat-file tag refs/tags/signed-empty >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short subject:sanitize 'subject-line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -695,6 +715,15 @@ test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
 
+test_expect_success 'basic atom: refs/tags/signed-short raw' '
+	git cat-file tag refs/tags/signed-short >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-short >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long subject:sanitize 'subject-line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -708,6 +737,15 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
+test_expect_success 'basic atom: refs/tags/signed-long raw' '
+	git cat-file tag refs/tags/signed-long >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-long >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/main^{tree} &&
 	git update-ref refs/myblobs/first refs/heads/main:one
@@ -720,6 +758,16 @@ test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
 
+test_expect_success 'basic atom: refs/mytrees/first raw' '
+	git cat-file tree refs/mytrees/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/mytrees/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -727,6 +775,165 @@ test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
 
+test_expect_success 'basic atom: refs/myblobs/first raw' '
+	git cat-file blob refs/myblobs/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/myblobs/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'set up refs pointing to binary blob' '
+	printf "%b" "a\0b\0c" >blob1 &&
+	printf "%b" "a\0c\0b" >blob2 &&
+	printf "%b" "\0a\0b\0c" >blob3 &&
+	printf "%b" "abc" >blob4 &&
+	printf "%b" "\0 \0 \0 " >blob5 &&
+	printf "%b" "\0 \0a\0 " >blob6 &&
+	printf "%b" "  " >blob7 &&
+	>blob8 &&
+	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
+	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
+	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
+	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
+	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
+	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
+	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
+	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
+'
+
+test_expect_success 'Verify sorts with raw' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob3
+	refs/myblobs/blob7
+	refs/mytrees/first
+	refs/myblobs/first
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob4
+	refs/heads/main
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'Verify sorts with raw:size' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/first
+	refs/myblobs/blob7
+	refs/heads/main
+	refs/myblobs/blob4
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/mytrees/first
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw:size \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:equals)' '
+	cat >expected <<-EOF &&
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	refs/myblobs/blob4
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	EOF
+	git for-each-ref --format="%(if:equals=abc)%(raw)%(then)%(refname)%(else)not equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+test_expect_success 'validate raw atom with %(if:notequals)' '
+	cat >expected <<-EOF &&
+	refs/heads/ambiguous
+	refs/heads/main
+	refs/heads/newtag
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	equals
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob7
+	refs/myblobs/blob8
+	refs/myblobs/first
+	EOF
+	git for-each-ref --format="%(if:notequals=abc)%(raw)%(then)%(refname)%(else)equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'empty raw refs with %(if)' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob1 not empty
+	refs/myblobs/blob2 not empty
+	refs/myblobs/blob3 not empty
+	refs/myblobs/blob4 not empty
+	refs/myblobs/blob5 not empty
+	refs/myblobs/blob6 not empty
+	refs/myblobs/blob7 empty
+	refs/myblobs/blob8 empty
+	refs/myblobs/first not empty
+	EOF
+	git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \
+		refs/myblobs/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '%(raw) with --python must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --python
+'
+
+test_expect_success '%(raw) with --tcl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --tcl
+'
+
+test_expect_success '%(raw) with --perl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --perl
+'
+
+test_expect_success '%(raw) with --shell must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --shell
+'
+
+test_expect_success '%(raw) with --shell and --sort=raw must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
+'
+
+test_expect_success '%(raw:size) with --shell' '
+	git for-each-ref --format="%(raw:size)" | while read line
+	do
+		echo "'\''$line'\''" >>expect
+	done &&
+	git for-each-ref --format="%(raw:size)" --shell >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'for-each-ref --format compare with cat-file --batch' '
+	git rev-parse refs/mytrees/first | git cat-file --batch >expected &&
+	git for-each-ref --format="%(objectname) %(objecttype) %(objectsize)
+%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000
 	do
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents
  2021-06-01 14:37 ` [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
@ 2021-06-03  2:10   ` Junio C Hamano
  2021-06-03  4:52     ` ZheNing Hu
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-06-03  2:10 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  /* See grab_values */
> -static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
> +static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
> +				   struct object *obj)

Neither this step or the next change needs anything but type member
of the 'obj' (and 'buf' is coming from oi.content of the result of
asking about that same 'obj').

I wonder if we should do one of the following:

 (1) stop passing "void *buf" and instead "struct expand_data
     *data", and use "data->content" to access "buf", which would
     allow you to access "data->type" to perform the added check.

 (2) instead of adding "struct obj *obj" to the parameters, just add
     "enum object_type type", as that is the only thing you need.

Obviously (2) is with lessor impact, but if it can be done safely
without breaking the code [*], (1) would probably be a much more
preferrable direction to go in the longer term.

    Side note [*].  A caller is allowed to choose to feed "buf" that
    is different from "oi.content" (perhaps buf may sometimes want
    to be a utf-8 recoded version of oi.content for certain types of
    objects) with the current system, but if we pass expand_data
    throughout the callchain, such a caller is broken, for example.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01 14:37 ` [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-06-03  2:38   ` Junio C Hamano
  2021-06-03  5:36     ` ZheNing Hu
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-06-03  2:38 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -644,13 +664,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
>  		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
>  				       (int)(ep-atom), atom);
>  
> -	/* Do we have the atom already used elsewhere? */
> -	for (i = 0; i < used_atom_cnt; i++) {
> -		int len = strlen(used_atom[i].name);
> -		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
> -			return i;
> -	}
> -
>  	/*
>  	 * If the atom name has a colon, strip it and everything after
>  	 * it off - it specifies the format for this entry, and
> @@ -660,6 +673,17 @@ static int parse_ref_filter_atom(const struct ref_format *format,
>  	arg = memchr(sp, ':', ep - sp);
>  	atom_len = (arg ? arg : ep) - sp;
>  
> +	if (format->quote_style && !strncmp(sp, "raw", 3) && !arg)
> +		return strbuf_addf_ret(err, -1, _("--format=%.*s cannot be used with"
> +				"--python, --shell, --tcl, --perl"), (int)(ep-atom), atom);
> +
> +	/* Do we have the atom already used elsewhere? */
> +	for (i = 0; i < used_atom_cnt; i++) {
> +		int len = strlen(used_atom[i].name);
> +		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
> +			return i;
> +	}
> +

These two hunks

 - hoists up the code that sets 'arg' to optional string after
   "<atom>:" and counts how long the "<atom>" is in 'atom_len'; the
   change causes the counting done even when the same placeholder is
   already used elsewhere (in which case we do not have to do such
   counting);

 - inserts the early return to reject use of "raw" atom when
   language specific quoting is used.

It probably makes it easier to understand if the former is split
into a separate commit, but at the same time a series with too many
small steps is harder to manage, so let's keep them in a single
change.

But I do not think we want to add the new change at this location,
at least for two reasons:

 * The posted patch checks '!arg' to avoid rejecting "raw:size",
   which would not scale at all.  What if you wanted to later add
   "raw:upcase", which you must reject?

 * We do have enumerated constants for each atom types, but this
   early check and return does string comparison.

Where it belongs is either after "Is the atom a valid one?" loop
where 'atom_len' is used to locate the placeholder's atom in the
table of valid atoms [*], or inside raw_atom_parser().

    Side note: If you read the original code, you would notice that
    there already is a similar "this is a valid atom that appear in
    the valid_atom[] table, but unallowed in this situation" check
    done with .source != SOURCE_NONE conditional.  One downside is
    that until calling raw_atom_parser(), you do not know if
    RAW_BARE or RAW_LENGTH is requested.

If we do inside raw_atom_parser(), it would probably look like this:
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+
+	if (atom->u.raw_data.option == RAW_BARE && format->quote_style)
+		return strbuf_addf_ret(err, -1,
+				       _("--format=%%(raw) cannot be used with ...")...);
+
+	return 0;
+}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents
  2021-06-03  2:10   ` Junio C Hamano
@ 2021-06-03  4:52     ` ZheNing Hu
  0 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-03  4:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

Junio C Hamano <gitster@pobox.com> 于2021年6月3日周四 上午10:10写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> >  /* See grab_values */
> > -static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
> > +static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
> > +                                struct object *obj)
>
> Neither this step or the next change needs anything but type member
> of the 'obj' (and 'buf' is coming from oi.content of the result of
> asking about that same 'obj').
>
> I wonder if we should do one of the following:
>
>  (1) stop passing "void *buf" and instead "struct expand_data
>      *data", and use "data->content" to access "buf", which would
>      allow you to access "data->type" to perform the added check.
>
>  (2) instead of adding "struct obj *obj" to the parameters, just add
>      "enum object_type type", as that is the only thing you need.
>
> Obviously (2) is with lessor impact, but if it can be done safely
> without breaking the code [*], (1) would probably be a much more
> preferrable direction to go in the longer term.
>

I agree with (1). In future versions of grab_sub_body_contents(), we will
use the content of "data" more frequently instead of using the
crude "obj". The type provided by "obj" can also be provided by
"data". So yes, I would be very willing to let grab_sub_body_contents()
only use "data". (delete "obj")

E.g.

static void grab_sub_body_contents(struct atom_value *val, int deref,
struct expand_data *data)

Using (2), we will need more parameters to pass other object info.

>     Side note [*].  A caller is allowed to choose to feed "buf" that
>     is different from "oi.content" (perhaps buf may sometimes want
>     to be a utf-8 recoded version of oi.content for certain types of
>     objects) with the current system, but if we pass expand_data
>     throughout the callchain, such a caller is broken, for example.
>

Just see the situation in front of us: grab_sub_body_contents()
have only one caller: grab_values(). If someone need a function like
grab_sub_body_contents() to grab another buf, they can use rewrite
a more universal function interface:

static void grab_sub_body_contents(struct atom_value *val, int deref,
struct expand_data *data)
{
   grab_sub_body_contents_internal(val, deref, data->content,
data->size, data->type);
}

static void grab_sub_body_contents_internal(struct atom_value *val,
int deref, void *buf,
                                           unsigned long buf_size,
enum object_type type)
{
...
}

But for the time being, the above one is sufficient.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01 14:37 [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
  2021-06-01 14:37 ` [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
  2021-06-01 14:37 ` [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-06-03  5:11 ` Bagas Sanjaya
  2021-06-03  5:37   ` ZheNing Hu
  2021-06-04 12:12 ` [PATCH v2 " ZheNing Hu via GitGitGadget
  3 siblings, 1 reply; 36+ messages in thread
From: Bagas Sanjaya @ 2021-06-03  5:11 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget, git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Jeff King, Phillip Wood, ZheNing Hu

Hi,

On 01/06/21 21.37, ZheNing Hu via GitGitGadget wrote:
> In order to make git cat-file --batch use ref-filter logic, I add %(raw)
> atom to ref-filter.
> 
> Change from last version:
> 
>   1. Change is_empty() logic.
>   2. Simplify memcasecmp().
>   3. rebase on zh/ref-filter-atom-type.
> 

I prefer no first-person pronouns (I and we) in patch cover letter and 
commit message, so better say:

"Add %(raw) atom to ref-filter to make git cat-file --batch use 
ref-filter logic."

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-03  2:38   ` Junio C Hamano
@ 2021-06-03  5:36     ` ZheNing Hu
  2021-06-03 14:06       ` ZheNing Hu
  2021-06-03 21:35       ` Junio C Hamano
  0 siblings, 2 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-03  5:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

Junio C Hamano <gitster@pobox.com> 于2021年6月3日周四 上午10:38写道:
>
>
> These two hunks
>
>  - hoists up the code that sets 'arg' to optional string after
>    "<atom>:" and counts how long the "<atom>" is in 'atom_len'; the
>    change causes the counting done even when the same placeholder is
>    already used elsewhere (in which case we do not have to do such
>    counting);
>

I admit that I am doing repetitive work here.

>  - inserts the early return to reject use of "raw" atom when
>    language specific quoting is used.
>
> It probably makes it easier to understand if the former is split
> into a separate commit, but at the same time a series with too many
> small steps is harder to manage, so let's keep them in a single
> change.
>
> But I do not think we want to add the new change at this location,
> at least for two reasons:
>
>  * The posted patch checks '!arg' to avoid rejecting "raw:size",
>    which would not scale at all.  What if you wanted to later add
>    "raw:upcase", which you must reject?
>

Yeah, the code here makes "raw" lack of scalability. Especially we
want to add "%(raw:textconv)" and "%(raw:filter)" later.

>  * We do have enumerated constants for each atom types, but this
>    early check and return does string comparison.
>

Note that at this time we must compare strings... parse_ref_filter_atom()
passes string form of the atom. Code block A also requires comparing strings.

-------------------
Code block A:

        for (i = 0; i < used_atom_cnt; i++) {
               int len = strlen(used_atom[i].name);
               if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
                       return i;
       }
-------------------

All the following replies are based on such a fact:
We will reuse used atoms as much as possible.

Think about this situation:

$ git for-each-ref --format="%(raw)" --sort="raw" --python

Since we specified --sort="raw",`parse_sorting_atom()`
will be called in parse_opt_ref_sorting(), but at this time
we haven't parsed --<lang> yet. So format->quote_style == 0,
we cannot refuse  --<lang> at this time, and a "%(raw)" atom
item will be added to used_atom, when we use `verify_ref_format()`
to call `parse_sorting_atom()` for the second time, we already have
raw atom item in used_atom, in Code Block A we directly returned,
We can't refuse --<lang> too after Code Block A. So as a last solution,
we refuse --<lang> with "%(raw)" before Code Block A.

> Where it belongs is either after "Is the atom a valid one?" loop
> where 'atom_len' is used to locate the placeholder's atom in the
> table of valid atoms [*], or inside raw_atom_parser().
>
>     Side note: If you read the original code, you would notice that
>     there already is a similar "this is a valid atom that appear in
>     the valid_atom[] table, but unallowed in this situation" check
>     done with .source != SOURCE_NONE conditional.  One downside is
>     that until calling raw_atom_parser(), you do not know if
>     RAW_BARE or RAW_LENGTH is requested.
>

Yes, but we need to pay attention to it is below Code Block A.

> If we do inside raw_atom_parser(), it would probably look like this:
>
> +static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
> +                               const char *arg, struct strbuf *err)
> +{
> +       if (!arg)
> +               atom->u.raw_data.option = RAW_BARE;
> +       else if (!strcmp(arg, "size"))
> +               atom->u.raw_data.option = RAW_LENGTH;
> +       else
> +               return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
> +
> +       if (atom->u.raw_data.option == RAW_BARE && format->quote_style)
> +               return strbuf_addf_ret(err, -1,
> +                                      _("--format=%%(raw) cannot be used with ...")...);
> +
> +       return 0;
> +}

It's same, "*.parser()" is below Code Block A.

After all, the reason why this must be done here is the ref-filter
original logic
has not considered rejecting a format atom and an option.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-03  5:11 ` [PATCH 0/2] " Bagas Sanjaya
@ 2021-06-03  5:37   ` ZheNing Hu
  0 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-03  5:37 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Karthik Nayak, Felipe Contreras,
	Jeff King, Phillip Wood

Bagas Sanjaya <bagasdotme@gmail.com> 于2021年6月3日周四 下午1:11写道:
>
> Hi,
>
> On 01/06/21 21.37, ZheNing Hu via GitGitGadget wrote:
> > In order to make git cat-file --batch use ref-filter logic, I add %(raw)
> > atom to ref-filter.
> >
> > Change from last version:
> >
> >   1. Change is_empty() logic.
> >   2. Simplify memcasecmp().
> >   3. rebase on zh/ref-filter-atom-type.
> >
>
> I prefer no first-person pronouns (I and we) in patch cover letter and
> commit message, so better say:
>
> "Add %(raw) atom to ref-filter to make git cat-file --batch use
> ref-filter logic."
>

Ok. I will change my way of narrating.

> --
> An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-03  5:36     ` ZheNing Hu
@ 2021-06-03 14:06       ` ZheNing Hu
  2021-06-03 21:36         ` Junio C Hamano
  2021-06-03 21:35       ` Junio C Hamano
  1 sibling, 1 reply; 36+ messages in thread
From: ZheNing Hu @ 2021-06-03 14:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

ZheNing Hu <adlternative@gmail.com> 于2021年6月3日周四 下午1:36写道:
>
> Junio C Hamano <gitster@pobox.com> 于2021年6月3日周四 上午10:38写道:
> > But I do not think we want to add the new change at this location,
> > at least for two reasons:
> >
> >  * The posted patch checks '!arg' to avoid rejecting "raw:size",
> >    which would not scale at all.  What if you wanted to later add
> >    "raw:upcase", which you must reject?
> >
>
> Yeah, the code here makes "raw" lack of scalability. Especially we
> want to add "%(raw:textconv)" and "%(raw:filter)" later.
>

Now I am building %(raw:textconv) and %(raw:filter), the code will be
very difficult to write:

        if (format->quote_style && !strncmp(sp, "raw", 3)
                                && ((!arg) || (!strncmp(arg,
":textconv", 9)) || (!strncmp(arg, ":filter", 7))))
                return strbuf_addf_ret(err, -1, _("--format=%.*s
cannot be used with"
                                "--python, --shell, --tcl, --perl"),
(int)(ep-atom), atom);

Is there any good way?

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-03  5:36     ` ZheNing Hu
  2021-06-03 14:06       ` ZheNing Hu
@ 2021-06-03 21:35       ` Junio C Hamano
  2021-06-04 10:59         ` ZheNing Hu
  1 sibling, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-06-03 21:35 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

ZheNing Hu <adlternative@gmail.com> writes:

>>  * We do have enumerated constants for each atom types, but this
>>    early check and return does string comparison.
>>
>
> Note that at this time we must compare strings... parse_ref_filter_atom()
> passes string form of the atom. Code block A also requires comparing strings.
>
> -------------------
> Code block A:
>
>         for (i = 0; i < used_atom_cnt; i++) {
>                int len = strlen(used_atom[i].name);
>                if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
>                        return i;
>        }
> -------------------

The point is that you can piggyback existing string comparison
(which is called "parsing") and use the parsed result (i.e. if you
can compare with ATOM_RAW instead of adding another strcmp(), that
can be a better solution).

> All the following replies are based on such a fact:
> We will reuse used atoms as much as possible.
>
> Think about this situation:
>
> $ git for-each-ref --format="%(raw)" --sort="raw" --python
>
> Since we specified --sort="raw",`parse_sorting_atom()`
> will be called in parse_opt_ref_sorting(), but at this time
> we haven't parsed --<lang> yet.

That only says using parse_sorting_atom() and relying on the check
in the function is still too early, and does not necessarily support
the posted patch that redundantly runs strcmp().

After parsing all the command line options, we have used_atom[]
fully populated and we know what host language we are quoting the
result for---and that makes a good place to check for comflicting
requests.

>> +static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
>> +                               const char *arg, struct strbuf *err)
>> +{
>> +       if (!arg)
>> +               atom->u.raw_data.option = RAW_BARE;
>> +       else if (!strcmp(arg, "size"))
>> +               atom->u.raw_data.option = RAW_LENGTH;
>> +       else
>> +               return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
>> +
>> +       if (atom->u.raw_data.option == RAW_BARE && format->quote_style)
>> +               return strbuf_addf_ret(err, -1,
>> +                                      _("--format=%%(raw) cannot be used with ...")...);
>> +
>> +       return 0;
>> +}
>
> It's same, "*.parser()" is below Code Block A.
>
> After all, the reason why this must be done here is the ref-filter
> original logic
> has not considered rejecting a format atom and an option.

That is something you can fix to make the code easier to follow, no?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-03 14:06       ` ZheNing Hu
@ 2021-06-03 21:36         ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2021-06-03 21:36 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

ZheNing Hu <adlternative@gmail.com> writes:

> Now I am building %(raw:textconv) and %(raw:filter), the code will be
> very difficult to write:
>
>         if (format->quote_style && !strncmp(sp, "raw", 3)
>                                 && ((!arg) || (!strncmp(arg,
> ":textconv", 9)) || (!strncmp(arg, ":filter", 7))))
>                 return strbuf_addf_ret(err, -1, _("--format=%.*s
> cannot be used with"
>                                 "--python, --shell, --tcl, --perl"),
> (int)(ep-atom), atom);
>
> Is there any good way?

The problem you are having sounds like a natural consequence of
doing the check at the wrong place in the code, at least to me.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-03 21:35       ` Junio C Hamano
@ 2021-06-04 10:59         ` ZheNing Hu
  0 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-04 10:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

Junio C Hamano <gitster@pobox.com> 于2021年6月4日周五 上午5:35写道:
>
> The point is that you can piggyback existing string comparison
> (which is called "parsing") and use the parsed result (i.e. if you
> can compare with ATOM_RAW instead of adding another strcmp(), that
> can be a better solution).
>
> That only says using parse_sorting_atom() and relying on the check
> in the function is still too early, and does not necessarily support
> the posted patch that redundantly runs strcmp().
>
> After parsing all the command line options, we have used_atom[]
> fully populated and we know what host language we are quoting the
> result for---and that makes a good place to check for comflicting
> requests.
>

Alright, I got it: we can perform related check in verify_ref_format(),
after parse_ref_filter_atom(), It is a good checkpoint.

> > After all, the reason why this must be done here is the ref-filter
> > original logic
> > has not considered rejecting a format atom and an option.
>
> That is something you can fix to make the code easier to follow, no?

You are right. ;-)

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01 14:37 [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-06-03  5:11 ` [PATCH 0/2] " Bagas Sanjaya
@ 2021-06-04 12:12 ` ZheNing Hu via GitGitGadget
  2021-06-04 12:12   ` [PATCH v2 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
                     ` (2 more replies)
  3 siblings, 3 replies; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-06-04 12:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

In order to make git cat-file --batch use ref-filter logic, %(raw) atom is
adding to ref-filter.

Change from last version:

 1. Change --<lang> and --format=%(raw) checkpoint to verify_ref_format(),
    which make it more scalable.
 2. Change grab_sub_body_contents() use struct expand_data *data instread of
    using obj,buf,buf_size to pass object info which can reduce the delivery
    of function parameters.

ZheNing Hu (2):
  [GSOC] ref-filter: add obj-type check in grab contents
  [GSOC] ref-filter: add %(raw) atom

 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 164 +++++++++++++++++------
 t/t6300-for-each-ref.sh            | 207 +++++++++++++++++++++++++++++
 3 files changed, 343 insertions(+), 37 deletions(-)


base-commit: 1197f1a46360d3ae96bd9c15908a3a6f8e562207
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-966%2Fadlternative%2Fref-filter-raw-atom-v4-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-966/adlternative/ref-filter-raw-atom-v4-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/966

Range-diff vs v1:

 1:  97955705c22e ! 1:  48d256db5c34 [GSOC] ref-filter: add obj-type check in grab contents
     @@ ref-filter.c: static void append_lines(struct strbuf *out, const char *buf, unsi
       
       /* See grab_values */
      -static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
     -+static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
     -+				   struct object *obj)
     ++static void grab_sub_body_contents(struct atom_value *val, int deref, struct expand_data *data)
       {
       	int i;
       	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
     + 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
     ++	void *buf = data->content;
     + 
     + 	for (i = 0; i < used_atom_cnt; i++) {
     + 		struct used_atom *atom = &used_atom[i];
      @@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
       			continue;
       		if (deref)
     @@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int der
      -		    !starts_with(name, "trailers") &&
      -		    !starts_with(name, "contents"))
      +
     -+		if ((obj->type != OBJ_TAG &&
     -+		     obj->type != OBJ_COMMIT) ||
     ++		if ((data->type != OBJ_TAG &&
     ++		     data->type != OBJ_COMMIT) ||
      +		    (strcmp(name, "body") &&
      +		     !starts_with(name, "subject") &&
      +		     !starts_with(name, "trailers") &&
     @@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int der
       			continue;
       		if (!subpos)
       			find_subpos(buf,
     -@@ ref-filter.c: static void grab_values(struct atom_value *val, int deref, struct object *obj, v
     +@@ ref-filter.c: static void fill_missing_values(struct atom_value *val)
     +  * pointed at by the ref itself; otherwise it is the object the
     +  * ref (which is a tag) refers to.
     +  */
     +-static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
     ++static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
     + {
     ++	void *buf = data->content;
     ++
       	switch (obj->type) {
       	case OBJ_TAG:
       		grab_tag_values(val, deref, obj);
      -		grab_sub_body_contents(val, deref, buf);
     -+		grab_sub_body_contents(val, deref, buf, obj);
     ++		grab_sub_body_contents(val, deref, data);
       		grab_person("tagger", val, deref, buf);
       		break;
       	case OBJ_COMMIT:
       		grab_commit_values(val, deref, obj);
      -		grab_sub_body_contents(val, deref, buf);
     -+		grab_sub_body_contents(val, deref, buf, obj);
     ++		grab_sub_body_contents(val, deref, data);
       		grab_person("author", val, deref, buf);
       		grab_person("committer", val, deref, buf);
       		break;
     +@@ ref-filter.c: static int get_object(struct ref_array_item *ref, int deref, struct object **obj
     + 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
     + 					       oid_to_hex(&oi->oid), ref->refname);
     + 		}
     +-		grab_values(ref->value, deref, *obj, oi->content);
     ++		grab_values(ref->value, deref, *obj, oi);
     + 	}
     + 
     + 	grab_common_values(ref->value, deref, oi);
 2:  5a94705cdbc1 ! 2:  0efed9435b59 [GSOC] ref-filter: add %(raw) atom
     @@ ref-filter.c: static int parse_ref_filter_atom(const struct ref_format *format,
       	arg = memchr(sp, ':', ep - sp);
       	atom_len = (arg ? arg : ep) - sp;
       
     -+	if (format->quote_style && !strncmp(sp, "raw", 3) && !arg)
     -+		return strbuf_addf_ret(err, -1, _("--format=%.*s cannot be used with"
     -+				"--python, --shell, --tcl, --perl"), (int)(ep-atom), atom);
     -+
      +	/* Do we have the atom already used elsewhere? */
      +	for (i = 0; i < used_atom_cnt; i++) {
      +		int len = strlen(used_atom[i].name);
     @@ ref-filter.c: static int end_atom_handler(struct atom_value *atomv, struct ref_f
       		strbuf_swap(&current->output, &s);
       	}
       	strbuf_release(&s);
     -@@ ref-filter.c: static void append_lines(struct strbuf *out, const char *buf, unsigned long size
     +@@ ref-filter.c: int verify_ref_format(struct ref_format *format)
     + 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
     + 		if (at < 0)
     + 			die("%s", err.buf);
     ++		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
     ++		    used_atom[at].u.raw_data.option == RAW_BARE)
     ++			die(_("--format=%.*s cannot be used with"
     ++			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
     + 		cp = ep + 1;
       
     - /* See grab_values */
     - static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
     --				   struct object *obj)
     -+				   unsigned long buf_size, struct object *obj)
     - {
     - 	int i;
     + 		if (skip_prefix(used_atom[at].name, "color:", &color))
     +@@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
       	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
     -@@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
     + 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
     + 	void *buf = data->content;
     ++	unsigned long buf_size = data->size;
     + 
     + 	for (i = 0; i < used_atom_cnt; i++) {
       		struct used_atom *atom = &used_atom[i];
       		const char *name = atom->name;
       		struct atom_value *v = &val[i];
     @@ ref-filter.c: static void grab_sub_body_contents(struct atom_value *val, int der
      +			continue;
      +		}
      +
     - 		if ((obj->type != OBJ_TAG &&
     - 		     obj->type != OBJ_COMMIT) ||
     + 		if ((data->type != OBJ_TAG &&
     + 		     data->type != OBJ_COMMIT) ||
       		    (strcmp(name, "body") &&
     -@@ ref-filter.c: static void fill_missing_values(struct atom_value *val)
     -  * pointed at by the ref itself; otherwise it is the object the
     -  * ref (which is a tag) refers to.
     -  */
     --static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
     -+static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
     - {
     -+	void *buf = data->content;
     -+	unsigned long buf_size = data->size;
     -+
     - 	switch (obj->type) {
     - 	case OBJ_TAG:
     - 		grab_tag_values(val, deref, obj);
     --		grab_sub_body_contents(val, deref, buf, obj);
     -+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
     - 		grab_person("tagger", val, deref, buf);
     - 		break;
     - 	case OBJ_COMMIT:
     - 		grab_commit_values(val, deref, obj);
     --		grab_sub_body_contents(val, deref, buf, obj);
     -+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
     - 		grab_person("author", val, deref, buf);
     - 		grab_person("committer", val, deref, buf);
     +@@ ref-filter.c: static void grab_values(struct atom_value *val, int deref, struct object *obj, s
       		break;
       	case OBJ_TREE:
       		/* grab_tree_values(val, deref, obj, buf, sz); */
     -+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
     ++		grab_sub_body_contents(val, deref, data);
       		break;
       	case OBJ_BLOB:
       		/* grab_blob_values(val, deref, obj, buf, sz); */
     -+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
     ++		grab_sub_body_contents(val, deref, data);
       		break;
       	default:
       		die("Eh?  Object of type %d?", obj->type);
     -@@ ref-filter.c: static int get_object(struct ref_array_item *ref, int deref, struct object **obj
     - 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
     - 					       oid_to_hex(&oi->oid), ref->refname);
     - 		}
     --		grab_values(ref->value, deref, *obj, oi->content);
     -+		grab_values(ref->value, deref, *obj, oi);
     - 	}
     - 
     - 	grab_common_values(ref->value, deref, oi);
      @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbuf *err)
       		int deref = 0;
       		const char *refname;

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/2] [GSOC] ref-filter: add obj-type check in grab contents
  2021-06-04 12:12 ` [PATCH v2 " ZheNing Hu via GitGitGadget
@ 2021-06-04 12:12   ` ZheNing Hu via GitGitGadget
  2021-06-04 12:12   ` [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
  2021-06-04 12:53   ` [PATCH v2 0/2] " Christian Couder
  2 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-06-04 12:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Only tag and commit objects use `grab_sub_body_contents()` to grab
object contents in the current codebase.  We want to teach the
function to also handle blobs and trees to get their raw data,
without parsing a blob (whose contents looks like a commit or a tag)
incorrectly as a commit or a tag.

Skip the block of code that is specific to handling commits and tags
early when the given object is of a wrong type to help later
addition to handle other types of objects in this function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 ref-filter.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 4db0e40ff4c6..5cee6512fbaf 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1356,11 +1356,12 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }
 
 /* See grab_values */
-static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
+static void grab_sub_body_contents(struct atom_value *val, int deref, struct expand_data *data)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
+	void *buf = data->content;
 
 	for (i = 0; i < used_atom_cnt; i++) {
 		struct used_atom *atom = &used_atom[i];
@@ -1371,10 +1372,13 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			continue;
 		if (deref)
 			name++;
-		if (strcmp(name, "body") &&
-		    !starts_with(name, "subject") &&
-		    !starts_with(name, "trailers") &&
-		    !starts_with(name, "contents"))
+
+		if ((data->type != OBJ_TAG &&
+		     data->type != OBJ_COMMIT) ||
+		    (strcmp(name, "body") &&
+		     !starts_with(name, "subject") &&
+		     !starts_with(name, "trailers") &&
+		     !starts_with(name, "contents")))
 			continue;
 		if (!subpos)
 			find_subpos(buf,
@@ -1438,17 +1442,19 @@ static void fill_missing_values(struct atom_value *val)
  * pointed at by the ref itself; otherwise it is the object the
  * ref (which is a tag) refers to.
  */
-static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
+static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
 {
+	void *buf = data->content;
+
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, data);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf);
+		grab_sub_body_contents(val, deref, data);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;
@@ -1678,7 +1684,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi->content);
+		grab_values(ref->value, deref, *obj, oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-04 12:12 ` [PATCH v2 " ZheNing Hu via GitGitGadget
  2021-06-04 12:12   ` [PATCH v2 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
@ 2021-06-04 12:12   ` ZheNing Hu via GitGitGadget
  2021-06-04 13:23     ` Christian Couder
  2021-06-04 12:53   ` [PATCH v2 0/2] " Christian Couder
  2 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-06-04 12:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depands on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, add a new member in `struct atom_value`: `s_size`, which
can record raw object size, it can help us add raw object data to
the buffer or compare two buffers which contain raw object data.

Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, `--perl` because if our binary raw data is passed to a variable
in the host language, the host language may not support arbitrary binary
data in the variables of its string type.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |   9 ++
 ref-filter.c                       | 140 +++++++++++++++----
 t/t6300-for-each-ref.sh            | 207 +++++++++++++++++++++++++++++
 3 files changed, 328 insertions(+), 28 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2ae2478de706..8f8d8cd1e04f 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,15 @@ and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol
 out of the trimmed email.
 
+The raw data in a object is `raw`.
+
+raw:size::
+	The raw data size of the object.
+
+Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+`--perl` because the host language may not support arbitrary binary data in the
+variables of its string type.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index 5cee6512fbaf..46aec291de62 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -144,6 +144,7 @@ enum atom_type {
 	ATOM_BODY,
 	ATOM_TRAILERS,
 	ATOM_CONTENTS,
+	ATOM_RAW,
 	ATOM_UPSTREAM,
 	ATOM_PUSH,
 	ATOM_SYMREF,
@@ -189,6 +190,9 @@ static struct used_atom {
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
+		struct {
+			enum { RAW_BARE, RAW_LENGTH } option;
+		} raw_data;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -426,6 +430,18 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -586,6 +602,7 @@ static struct {
 	[ATOM_BODY] = { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
 	[ATOM_TRAILERS] = { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	[ATOM_CONTENTS] = { "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
+	[ATOM_RAW] = { "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
 	[ATOM_UPSTREAM] = { "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_PUSH] = { "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	[ATOM_SYMREF] = { "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -620,12 +637,15 @@ struct ref_formatting_state {
 
 struct atom_value {
 	const char *s;
+	size_t s_size;
 	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
 		       struct strbuf *err);
 	uintmax_t value; /* used for sorting when not FIELD_STR */
 	struct used_atom *atom;
 };
 
+#define ATOM_VALUE_S_SIZE_INIT (-1)
+
 /*
  * Used to parse format string and sort specifiers
  */
@@ -644,13 +664,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
 				       (int)(ep-atom), atom);
 
-	/* Do we have the atom already used elsewhere? */
-	for (i = 0; i < used_atom_cnt; i++) {
-		int len = strlen(used_atom[i].name);
-		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-			return i;
-	}
-
 	/*
 	 * If the atom name has a colon, strip it and everything after
 	 * it off - it specifies the format for this entry, and
@@ -660,6 +673,13 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	arg = memchr(sp, ':', ep - sp);
 	atom_len = (arg ? arg : ep) - sp;
 
+	/* Do we have the atom already used elsewhere? */
+	for (i = 0; i < used_atom_cnt; i++) {
+		int len = strlen(used_atom[i].name);
+		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
+			return i;
+	}
+
 	/* Is the atom a valid one? */
 	for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
 		int len = strlen(valid_atom[i].name);
@@ -709,11 +729,14 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	return at;
 }
 
-static void quote_formatting(struct strbuf *s, const char *str, int quote_style)
+static void quote_formatting(struct strbuf *s, const char *str, size_t len, int quote_style)
 {
 	switch (quote_style) {
 	case QUOTE_NONE:
-		strbuf_addstr(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(s, str, len);
+		else
+			strbuf_addstr(s, str);
 		break;
 	case QUOTE_SHELL:
 		sq_quote_buf(s, str);
@@ -740,9 +763,12 @@ static int append_atom(struct atom_value *v, struct ref_formatting_state *state,
 	 * encountered.
 	 */
 	if (!state->stack->prev)
-		quote_formatting(&state->stack->output, v->s, state->quote_style);
+		quote_formatting(&state->stack->output, v->s, v->s_size, state->quote_style);
 	else
-		strbuf_addstr(&state->stack->output, v->s);
+		if (v->s_size != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(&state->stack->output, v->s, v->s_size);
+		else
+			strbuf_addstr(&state->stack->output, v->s);
 	return 0;
 }
 
@@ -842,21 +868,23 @@ static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	return 0;
 }
 
-static int is_empty(const char *s)
+static int is_empty(struct strbuf *buf)
 {
-	while (*s != '\0') {
-		if (!isspace(*s))
-			return 0;
-		s++;
-	}
-	return 1;
-}
+	const char *cur = buf->buf;
+	const char *end = buf->buf + buf->len;
+
+	while (cur != end && (isspace(*cur)))
+		cur++;
+
+	return cur == end;
+ }
 
 static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			     struct strbuf *err)
 {
 	struct ref_formatting_stack *cur = state->stack;
 	struct if_then_else *if_then_else = NULL;
+	size_t str_len = 0;
 
 	if (cur->at_end == if_then_else_handler)
 		if_then_else = (struct if_then_else *)cur->at_end_data;
@@ -867,18 +895,22 @@ static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 	if (if_then_else->else_atom_seen)
 		return strbuf_addf_ret(err, -1, _("format: %%(then) atom used after %%(else)"));
 	if_then_else->then_atom_seen = 1;
+	if (if_then_else->str)
+		str_len = strlen(if_then_else->str);
 	/*
 	 * If the 'equals' or 'notequals' attribute is used then
 	 * perform the required comparison. If not, only non-empty
 	 * strings satisfy the 'if' condition.
 	 */
 	if (if_then_else->cmp_status == COMPARE_EQUAL) {
-		if (!strcmp(if_then_else->str, cur->output.buf))
+		if (str_len == cur->output.len &&
+		    !memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
 	} else if (if_then_else->cmp_status == COMPARE_UNEQUAL) {
-		if (strcmp(if_then_else->str, cur->output.buf))
+		if (str_len != cur->output.len ||
+		    memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
-	} else if (cur->output.len && !is_empty(cur->output.buf))
+	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
 	strbuf_reset(&cur->output);
 	return 0;
@@ -924,7 +956,7 @@ static int end_atom_handler(struct atom_value *atomv, struct ref_formatting_stat
 	 * only on the topmost supporting atom.
 	 */
 	if (!current->prev->prev) {
-		quote_formatting(&s, current->output.buf, state->quote_style);
+		quote_formatting(&s, current->output.buf, current->output.len, state->quote_style);
 		strbuf_swap(&current->output, &s);
 	}
 	strbuf_release(&s);
@@ -974,6 +1006,10 @@ int verify_ref_format(struct ref_format *format)
 		at = parse_ref_filter_atom(format, sp + 2, ep, &err);
 		if (at < 0)
 			die("%s", err.buf);
+		if (format->quote_style && used_atom[at].atom_type == ATOM_RAW &&
+		    used_atom[at].u.raw_data.option == RAW_BARE)
+			die(_("--format=%.*s cannot be used with"
+			      "--python, --shell, --tcl, --perl"), (int)(ep - sp - 2), sp + 2);
 		cp = ep + 1;
 
 		if (skip_prefix(used_atom[at].name, "color:", &color))
@@ -1362,17 +1398,29 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
 	size_t sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
 	void *buf = data->content;
+	unsigned long buf_size = data->size;
 
 	for (i = 0; i < used_atom_cnt; i++) {
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+		enum atom_type atom_type = atom->atom_type;
 
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
 			name++;
 
+		if (atom_type == ATOM_RAW) {
+			if (atom->u.raw_data.option == RAW_BARE) {
+				v->s = xmemdupz(buf, buf_size);
+				v->s_size = buf_size;
+			} else if (atom->u.raw_data.option == RAW_LENGTH) {
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
+			}
+			continue;
+		}
+
 		if ((data->type != OBJ_TAG &&
 		     data->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
@@ -1460,9 +1508,11 @@ static void grab_values(struct atom_value *val, int deref, struct object *obj, s
 		break;
 	case OBJ_TREE:
 		/* grab_tree_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	case OBJ_BLOB:
 		/* grab_blob_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, data);
 		break;
 	default:
 		die("Eh?  Object of type %d?", obj->type);
@@ -1765,7 +1815,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		int deref = 0;
 		const char *refname;
 		struct branch *branch = NULL;
-
+		v->s_size = ATOM_VALUE_S_SIZE_INIT;
 		v->handler = append_atom;
 		v->atom = atom;
 
@@ -2369,6 +2419,19 @@ static int compare_detached_head(struct ref_array_item *a, struct ref_array_item
 	return 0;
 }
 
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)
+{
+	const char *s1 = vs1, *s2 = vs2;
+	const char *end = s1 + n;
+
+	for (; s1 < end; s1++, s2++) {
+		int diff = tolower(*s1) - tolower(*s2);
+		if (diff)
+			return diff;
+	}
+	return 0;
+}
+
 static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
 {
 	struct atom_value *va, *vb;
@@ -2389,10 +2452,30 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 	} else if (s->sort_flags & REF_SORTING_VERSION) {
 		cmp = versioncmp(va->s, vb->s);
 	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(vb->s) : vb->s_size;
+
+			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
+				     a_size : b_size);
+			if (!cmp) {
+				if (a_size > b_size)
+					cmp = 1;
+				else if (a_size < b_size)
+					cmp = -1;
+			}
+		}
 	} else {
 		if (va->value < vb->value)
 			cmp = -1;
@@ -2492,6 +2575,7 @@ int format_ref_array_item(struct ref_array_item *info,
 	}
 	if (format->need_color_reset_at_eol) {
 		struct atom_value resetv;
+		resetv.s_size = ATOM_VALUE_S_SIZE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9e0214076b4d..5f66d933ace0 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -130,6 +130,8 @@ test_atom head parent:short=10 ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head raw "$(git cat-file commit refs/heads/main)
+"
 test_atom head '*objectname' ''
 test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
@@ -221,6 +223,15 @@ test_atom tag contents 'Tagging at 1151968727
 '
 test_atom tag HEAD ' '
 
+test_expect_success 'basic atom: refs/tags/testtag *raw' '
+	git cat-file commit refs/tags/testtag^{} >expected &&
+	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -686,6 +697,15 @@ test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
 
+test_expect_success 'basic atom: refs/tags/signed-empty raw' '
+	git cat-file tag refs/tags/signed-empty >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short subject:sanitize 'subject-line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -695,6 +715,15 @@ test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
 
+test_expect_success 'basic atom: refs/tags/signed-short raw' '
+	git cat-file tag refs/tags/signed-short >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-short >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long subject:sanitize 'subject-line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -708,6 +737,15 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
+test_expect_success 'basic atom: refs/tags/signed-long raw' '
+	git cat-file tag refs/tags/signed-long >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-long >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/main^{tree} &&
 	git update-ref refs/myblobs/first refs/heads/main:one
@@ -720,6 +758,16 @@ test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
 
+test_expect_success 'basic atom: refs/mytrees/first raw' '
+	git cat-file tree refs/mytrees/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/mytrees/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -727,6 +775,165 @@ test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
 
+test_expect_success 'basic atom: refs/myblobs/first raw' '
+	git cat-file blob refs/myblobs/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/myblobs/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'set up refs pointing to binary blob' '
+	printf "%b" "a\0b\0c" >blob1 &&
+	printf "%b" "a\0c\0b" >blob2 &&
+	printf "%b" "\0a\0b\0c" >blob3 &&
+	printf "%b" "abc" >blob4 &&
+	printf "%b" "\0 \0 \0 " >blob5 &&
+	printf "%b" "\0 \0a\0 " >blob6 &&
+	printf "%b" "  " >blob7 &&
+	>blob8 &&
+	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
+	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
+	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
+	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
+	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
+	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
+	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7 &&
+	git hash-object blob8 -w | xargs git update-ref refs/myblobs/blob8
+'
+
+test_expect_success 'Verify sorts with raw' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob3
+	refs/myblobs/blob7
+	refs/mytrees/first
+	refs/myblobs/first
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob4
+	refs/heads/main
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'Verify sorts with raw:size' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob8
+	refs/myblobs/first
+	refs/myblobs/blob7
+	refs/heads/main
+	refs/myblobs/blob4
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/mytrees/first
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw:size \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:equals)' '
+	cat >expected <<-EOF &&
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	refs/myblobs/blob4
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	EOF
+	git for-each-ref --format="%(if:equals=abc)%(raw)%(then)%(refname)%(else)not equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+test_expect_success 'validate raw atom with %(if:notequals)' '
+	cat >expected <<-EOF &&
+	refs/heads/ambiguous
+	refs/heads/main
+	refs/heads/newtag
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	equals
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob7
+	refs/myblobs/blob8
+	refs/myblobs/first
+	EOF
+	git for-each-ref --format="%(if:notequals=abc)%(raw)%(then)%(refname)%(else)equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'empty raw refs with %(if)' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob1 not empty
+	refs/myblobs/blob2 not empty
+	refs/myblobs/blob3 not empty
+	refs/myblobs/blob4 not empty
+	refs/myblobs/blob5 not empty
+	refs/myblobs/blob6 not empty
+	refs/myblobs/blob7 empty
+	refs/myblobs/blob8 empty
+	refs/myblobs/first not empty
+	EOF
+	git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \
+		refs/myblobs/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '%(raw) with --python must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --python
+'
+
+test_expect_success '%(raw) with --tcl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --tcl
+'
+
+test_expect_success '%(raw) with --perl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --perl
+'
+
+test_expect_success '%(raw) with --shell must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --shell
+'
+
+test_expect_success '%(raw) with --shell and --sort=raw must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
+'
+
+test_expect_success '%(raw:size) with --shell' '
+	git for-each-ref --format="%(raw:size)" | while read line
+	do
+		echo "'\''$line'\''" >>expect
+	done &&
+	git for-each-ref --format="%(raw:size)" --shell >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'for-each-ref --format compare with cat-file --batch' '
+	git rev-parse refs/mytrees/first | git cat-file --batch >expected &&
+	git for-each-ref --format="%(objectname) %(objecttype) %(objectsize)
+%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000
 	do
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-04 12:12 ` [PATCH v2 " ZheNing Hu via GitGitGadget
  2021-06-04 12:12   ` [PATCH v2 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
  2021-06-04 12:12   ` [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-06-04 12:53   ` Christian Couder
  2021-06-05  4:34     ` ZheNing Hu
  2 siblings, 1 reply; 36+ messages in thread
From: Christian Couder @ 2021-06-04 12:53 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

No need to resend as it's a cover letter, but just in case there is
another round and you copy things from this cover letter:

On Fri, Jun 4, 2021 at 2:12 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> In order to make git cat-file --batch use ref-filter logic, %(raw) atom is
> adding to ref-filter.

s/adding/added/

> Change from last version:
>
>  1. Change --<lang> and --format=%(raw) checkpoint to verify_ref_format(),
>     which make it more scalable.

s/make/makes/

>  2. Change grab_sub_body_contents() use struct expand_data *data instread of

s/use/to use/
s/instread/instead/

>     using obj,buf,buf_size to pass object info which can reduce the delivery
>     of function parameters.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-04 12:12   ` [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
@ 2021-06-04 13:23     ` Christian Couder
  0 siblings, 0 replies; 36+ messages in thread
From: Christian Couder @ 2021-06-04 13:23 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Junio C Hamano, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

On Fri, Jun 4, 2021 at 2:12 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> Add new formatting option `%(raw)`, which will print the raw
> object data without any changes. It will help further to migrate
> all cat-file formatting logic from cat-file to ref-filter.
>
> The raw data of blob, tree objects may contain '\0', but most of
> the logic in `ref-filter` depands on the output of the atom being

s/depands/depends/

> text (specifically, no embedded NULs in it).

> diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
> index 2ae2478de706..8f8d8cd1e04f 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -235,6 +235,15 @@ and `date` to extract the named component.  For email fields (`authoremail`,
>  without angle brackets, and `:localpart` to get the part before the `@` symbol
>  out of the trimmed email.
>
> +The raw data in a object is `raw`.

s/a object/an object/

> +
> +raw:size::
> +       The raw data size of the object.
> +
> +Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
> +`--perl` because the host language may not support arbitrary binary data in the
> +variables of its string type.





> @@ -1765,7 +1815,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>                 int deref = 0;
>                 const char *refname;
>                 struct branch *branch = NULL;
> -
> +               v->s_size = ATOM_VALUE_S_SIZE_INIT;

It looks like a blank line was removed as you added the above new line.

>                 v->handler = append_atom;
>                 v->atom = atom;
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-04 12:53   ` [PATCH v2 0/2] " Christian Couder
@ 2021-06-05  4:34     ` ZheNing Hu
  2021-06-05  4:49       ` Christian Couder
  0 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu @ 2021-06-05  4:34 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Karthik Nayak, Felipe Contreras, Bagas Sanjaya, Jeff King,
	Phillip Wood

Hi, Christian,

Christian Couder <christian.couder@gmail.com> 于2021年6月4日周五 下午8:53写道:
>
> No need to resend as it's a cover letter, but just in case there is
> another round and you copy things from this cover letter:
>

Sorry, what is the bad place in this cover letter I write? This
cover letter is also different from the last time ...

> On Fri, Jun 4, 2021 at 2:12 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > In order to make git cat-file --batch use ref-filter logic, %(raw) atom is
> > adding to ref-filter.
>
> s/adding/added/
>
> > Change from last version:
> >
> >  1. Change --<lang> and --format=%(raw) checkpoint to verify_ref_format(),
> >     which make it more scalable.
>
> s/make/makes/
>
> >  2. Change grab_sub_body_contents() use struct expand_data *data instread of
>
> s/use/to use/
> s/instread/instead/
>
> >     using obj,buf,buf_size to pass object info which can reduce the delivery
> >     of function parameters.

Thanks for these grammatical corrections.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-05  4:34     ` ZheNing Hu
@ 2021-06-05  4:49       ` Christian Couder
  2021-06-05  5:42         ` ZheNing Hu
  0 siblings, 1 reply; 36+ messages in thread
From: Christian Couder @ 2021-06-05  4:49 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Karthik Nayak, Felipe Contreras, Bagas Sanjaya, Jeff King,
	Phillip Wood

On Sat, Jun 5, 2021 at 6:34 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Hi, Christian,
>
> Christian Couder <christian.couder@gmail.com> 于2021年6月4日周五 下午8:53写道:
> >
> > No need to resend as it's a cover letter, but just in case there is
> > another round and you copy things from this cover letter:
> >
>
> Sorry, what is the bad place in this cover letter I write? This
> cover letter is also different from the last time ...

I was talking about the grammatical issues below in the cover letter.
Sometimes people copy things, for example a text explaining what the
patch series is about, from the cover letter of version N to the cover
letter of version N + 1, so I thought that telling you about
grammatical issues in this cover letter could perhaps help you if you
have to write another cover letter for another version of this patch
series.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-05  4:49       ` Christian Couder
@ 2021-06-05  5:42         ` ZheNing Hu
  2021-06-05  6:45           ` Christian Couder
  0 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu @ 2021-06-05  5:42 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Karthik Nayak, Felipe Contreras, Bagas Sanjaya, Jeff King,
	Phillip Wood

Christian Couder <christian.couder@gmail.com> 于2021年6月5日周六 下午12:49写道:
>
> On Sat, Jun 5, 2021 at 6:34 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > Hi, Christian,
> >
> > Christian Couder <christian.couder@gmail.com> 于2021年6月4日周五 下午8:53写道:
> > >
> > > No need to resend as it's a cover letter, but just in case there is
> > > another round and you copy things from this cover letter:
> > >
> >
> > Sorry, what is the bad place in this cover letter I write? This
> > cover letter is also different from the last time ...
>
> I was talking about the grammatical issues below in the cover letter.
> Sometimes people copy things, for example a text explaining what the
> patch series is about, from the cover letter of version N to the cover
> letter of version N + 1, so I thought that telling you about
> grammatical issues in this cover letter could perhaps help you if you
> have to write another cover letter for another version of this patch
> series.

Ok, I get it.

I want to mention another question:
If I have a new patch series about %(rest) is based on the current %(raw)
patch series, should I submit it immediately?

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-05  5:42         ` ZheNing Hu
@ 2021-06-05  6:45           ` Christian Couder
  2021-06-05  8:05             ` ZheNing Hu
  0 siblings, 1 reply; 36+ messages in thread
From: Christian Couder @ 2021-06-05  6:45 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Karthik Nayak, Felipe Contreras, Bagas Sanjaya, Jeff King,
	Phillip Wood

On Sat, Jun 5, 2021 at 7:42 AM ZheNing Hu <adlternative@gmail.com> wrote:

> I want to mention another question:
> If I have a new patch series about %(rest) is based on the current %(raw)
> patch series, should I submit it immediately?

Yeah, I think it's ok to send it as long as you explicitly specify
(using a link for example) the patch series on the mailing list it
depends on.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-05  6:45           ` Christian Couder
@ 2021-06-05  8:05             ` ZheNing Hu
  0 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-05  8:05 UTC (permalink / raw)
  To: Christian Couder
  Cc: ZheNing Hu via GitGitGadget, git, Junio C Hamano, Hariom Verma,
	Karthik Nayak, Felipe Contreras, Bagas Sanjaya, Jeff King,
	Phillip Wood

Christian Couder <christian.couder@gmail.com> 于2021年6月5日周六 下午2:45写道:
>
> On Sat, Jun 5, 2021 at 7:42 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> > I want to mention another question:
> > If I have a new patch series about %(rest) is based on the current %(raw)
> > patch series, should I submit it immediately?
>
> Yeah, I think it's ok to send it as long as you explicitly specify
> (using a link for example) the patch series on the mailing list it
> depends on.

Ok. Because %(raw:textconv) %(raw:filter) dependent on %(raw) and %(rest),
%(raw) seems to have experienced a long cycle. These codes about new atoms
seem to stay in my local git repo for a long time. It may be a good thing to
send them earlier. :)

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01 11:00           ` ZheNing Hu
@ 2021-06-01 13:48             ` Johannes Schindelin
  0 siblings, 0 replies; 36+ messages in thread
From: Johannes Schindelin @ 2021-06-01 13:48 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: Junio C Hamano, ZheNing Hu via GitGitGadget, Git List

[-- Attachment #1: Type: text/plain, Size: 1913 bytes --]

Hi,

On Tue, 1 Jun 2021, ZheNing Hu wrote:

> Junio C Hamano <gitster@pobox.com> 于2021年6月1日周二 下午4:54写道:
>
> > ZheNing Hu <adlternative@gmail.com> writes:
> >
> > > [...] But if I want to base the current topic on
> > > zh/ref-filter-atom-type, GGG will send past patches
> > > (zh/ref-filter-atom-type) repeatedly.
> >
> > I thought GGG lets you say "this is based on that other branch, not on
> > the 'master' branch" to solve that exact issue?
>
> I'm not sure...I will try it after I rebasing this topic to
> zh/ref-filter-atom-type.

Yes, it should be possible to rebase your patch on top of one of the
[a-z][a-z]/* patches Junio publishes at https://github.com/gitster/git
(and which get mirrored automatically to
https://github.com/gitgitgadget/git via a scheduled Azure Pipeline), and
then to change the PR base (simply click the `Edit` button next to the PR
title, as if you wanted to edit the title, and you can also change the
base branch).

> I just remember that it looked like something went wrong with my base on
> your patch last time.[1]
>
> [1] https://lore.kernel.org/git/pull.870.v6.git.1613739235241.gitgitgadget@gmail.com/

https://github.com/gitgitgadget/git/pull/870 seems to be based on
jc/diffcore-rotate all right.

If you are talking about including Junio's patch in v5
(https://lore.kernel.org/git/fb4bfd0f8b162e51e71711fe5503ca684f980d58.1613480198.git.gitgitgadget@gmail.com/#r),
I _think_ that there might have been the simple problem of
jc/diffcore-rotate having been force-pushed just before you sent v5, and
therefore you had a stale branch.

To prevent things like that, it is a good idea to set the upstream of your
local branch accordingly (in this instance, `git branch
--set-upstream-to=gitgitgadget/jc/diffcore-rotate`) and ensure to `git
pull --rebase` before force-pushing and submitting.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01  9:54         ` Junio C Hamano
@ 2021-06-01 11:05           ` ZheNing Hu
  0 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-01 11:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

Junio C Hamano <gitster@pobox.com> 于2021年6月1日周二 下午5:54写道:
>
> > Well, there seems to be no correction here. But is it true that memory
> > like "\0abc" is considered empty?
>
> That sample has 'a' or 'b' or 'c' that are clearly not part of an
> "empty" string and irrelevant.  After all, a string " abc" is not
> treated as empty in the original implementation, either.
>

In other words, we still need to look at each character of strbuf,
instead of stopping at NUL.

> You are treating a block of memory with e.g. " \000 " (SP NUL SP) as
> an "empty line" just like you do for "   " (SP SP SP), but I think we
> should treat it more like " \001 " or " \007 ", i.e. not an empty
> string at all.

OK. I understand it now: " \001 " is It’s like a block of space, but it’s
not truly "empty", "SP NUL SP" is same too, So the complete definition of
"empty" here should be: All characters are SP which do not contain NUL
or other characters.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-06-01  8:54         ` Junio C Hamano
@ 2021-06-01 11:00           ` ZheNing Hu
  2021-06-01 13:48             ` Johannes Schindelin
  0 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu @ 2021-06-01 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> 于2021年6月1日周二 下午4:54写道:

>
> ZheNing Hu <adlternative@gmail.com> writes:
>
> >> Doesn't this conflict with your own zh/ref-filter-atom-type topic?
> >> Shouldn't one build on top of the other?
> >>
> >> Or did we find something fundamentally broken about the other topic
> >> to make us retract it that I do not remember?
> >>
> >> Thanks.
> >
> > I am waiting for zh/ref-filter-atom-type to be merged into master. But it
>
> As you sent this that conflicts with it, clearly you are doing
> something else that conflicts with it _without waiting_ ;-).
>

OK.

> > hasn't happened yet. But if I want to base the current topic on
> > zh/ref-filter-atom-type, GGG will send past patches (zh/ref-filter-atom-type)
> > repeatedly.
>
> I thought GGG lets you say "this is based on that other branch, not
> on the 'master' branch" to solve that exact issue?

I'm not sure...I will try it after I rebasing this topic to
zh/ref-filter-atom-type.
I just remember that it looked like something went wrong with my base on
your patch last time.[1]

[1] https://lore.kernel.org/git/pull.870.v6.git.1613739235241.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-31 15:41     ` Felipe Contreras
@ 2021-06-01 10:37       ` ZheNing Hu
  0 siblings, 0 replies; 36+ messages in thread
From: ZheNing Hu @ 2021-06-01 10:37 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: ZheNing Hu via GitGitGadget, Git List, Junio C Hamano,
	Christian Couder, Hariom Verma, Karthik Nayak, Bagas Sanjaya,
	Jeff King, Phillip Wood

Felipe Contreras <felipe.contreras@gmail.com> 于2021年5月31日周一 下午11:41写道:
>
> ZheNing Hu via GitGitGadget wrote:
> > From: ZheNing Hu <adlternative@gmail.com>
>
> > +static int memcasecmp(const void *vs1, const void *vs2, size_t n)
> > +{
> > +     const char *s1 = (const void *)vs1;
> > +     const char *s2 = (const void *)vs2;
>
> As I explained in another mail, I think this is better:
>
>         const char *s1 = vs1, *s2 = vs2;
>

OK, I understand it now.

> > +     const char *end = s1 + n;
> > +
> > +     for (; s1 < end; s1++, s2++) {
> > +             int diff = tolower(*s1) - tolower(*s2);
> > +             if (diff)
> > +                     return diff;
> > +     }
> > +     return 0;
> > +}
>
> --
> Felipe Contreras

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-31 14:35       ` ZheNing Hu
@ 2021-06-01  9:54         ` Junio C Hamano
  2021-06-01 11:05           ` ZheNing Hu
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-06-01  9:54 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

ZheNing Hu <adlternative@gmail.com> writes:

>>     ... because the host language may not support a NUL in the variables
>>     of its string type.
>>
>
> I agree. But host language not only support NUL but also some Non-ASCII
> character and Non-UTF-8 code:

Yup, rephase with "a NUL" with "arbitrary binary data" and you got
what I meant.  Thanks.

>> > +     const char *s = buf->buf;
>> > +     size_t cur_len = 0;
>> > +
>> > +     while ((cur_len != buf->len) && (isspace(*s) || *s == '\0')) {
>> >               s++;
>> > +             cur_len++;
>>
>> Is NUL treated the same as a whitespace letter for the purpose of
>> determining if a line is empty?  WHY?
>
> Well, there seems to be no correction here. But is it true that memory
> like "\0abc" is considered empty?

That sample has 'a' or 'b' or 'c' that are clearly not part of an
"empty" string and irrelevant.  After all, a string " abc" is not
treated as empty in the original implementation, either.

You are treating a block of memory with e.g. " \000 " (SP NUL SP) as
an "empty line" just like you do for "   " (SP SP SP), but I think we
should treat it more like " \001 " or " \007 ", i.e. not an empty
string at all.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-31 14:40       ` ZheNing Hu
@ 2021-06-01  8:54         ` Junio C Hamano
  2021-06-01 11:00           ` ZheNing Hu
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-06-01  8:54 UTC (permalink / raw)
  To: ZheNing Hu; +Cc: ZheNing Hu via GitGitGadget, Git List, Johannes Schindelin

ZheNing Hu <adlternative@gmail.com> writes:

>> Doesn't this conflict with your own zh/ref-filter-atom-type topic?
>> Shouldn't one build on top of the other?
>>
>> Or did we find something fundamentally broken about the other topic
>> to make us retract it that I do not remember?
>>
>> Thanks.
>
> I am waiting for zh/ref-filter-atom-type to be merged into master. But it

As you sent this that conflicts with it, clearly you are doing
something else that conflicts with it _without waiting_ ;-).

> hasn't happened yet. But if I want to base the current topic on
> zh/ref-filter-atom-type, GGG will send past patches (zh/ref-filter-atom-type)
> repeatedly.

I thought GGG lets you say "this is based on that other branch, not
on the 'master' branch" to solve that exact issue?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-30 13:01   ` [PATCH v2 2/2] " ZheNing Hu via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-05-31  4:10     ` Junio C Hamano
@ 2021-05-31 15:41     ` Felipe Contreras
  2021-06-01 10:37       ` ZheNing Hu
  3 siblings, 1 reply; 36+ messages in thread
From: Felipe Contreras @ 2021-05-31 15:41 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget, git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu, ZheNing Hu

ZheNing Hu via GitGitGadget wrote:
> From: ZheNing Hu <adlternative@gmail.com>

> +static int memcasecmp(const void *vs1, const void *vs2, size_t n)
> +{
> +	const char *s1 = (const void *)vs1;
> +	const char *s2 = (const void *)vs2;

As I explained in another mail, I think this is better:

	const char *s1 = vs1, *s2 = vs2;

> +	const char *end = s1 + n;
> +
> +	for (; s1 < end; s1++, s2++) {
> +		int diff = tolower(*s1) - tolower(*s2);
> +		if (diff)
> +			return diff;
> +	}
> +	return 0;
> +}

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-31  4:04     ` Junio C Hamano
@ 2021-05-31 14:40       ` ZheNing Hu
  2021-06-01  8:54         ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu @ 2021-05-31 14:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

Junio C Hamano <gitster@pobox.com> 于2021年5月31日周一 下午12:04写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -530,6 +545,7 @@ static struct {
> >       { "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
> >       { "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
> >       { "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
> > +     { "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
> >       { "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
> >       { "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
> >       { "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
>
> Doesn't this conflict with your own zh/ref-filter-atom-type topic?
> Shouldn't one build on top of the other?
>
> Or did we find something fundamentally broken about the other topic
> to make us retract it that I do not remember?
>
> Thanks.

I am waiting for zh/ref-filter-atom-type to be merged into master. But it
hasn't happened yet. But if I want to base the current topic on
zh/ref-filter-atom-type, GGG will send past patches (zh/ref-filter-atom-type)
repeatedly. If necessary, I will submit the current branch based on
zh/ref-filter-atom-type.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-31  0:44     ` Junio C Hamano
@ 2021-05-31 14:35       ` ZheNing Hu
  2021-06-01  9:54         ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: ZheNing Hu @ 2021-05-31 14:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Christian Couder,
	Hariom Verma, Karthik Nayak, Felipe Contreras, Bagas Sanjaya,
	Jeff King, Phillip Wood

Junio C Hamano <gitster@pobox.com> 于2021年5月31日周一 上午8:44写道:
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> > `--tcl`, `--perl` because if our binary raw data is passed to a variable
> > in the host language, the host languages may cause escape errors.
>
> "may cause escape errors" just says you are not escaping correctly
> in your code (implying that this patch is not good enough and with
> more effort we should be able to fix it to allow binaries), but the
> problem is the host languages may not support binaries
> (specifically, anything with a NUL in it) at all, which is
> fundamentally unfixable, in which case, rejecting is the only
> sensible choice.
>
>     ... because the host language may not support a NUL in the variables
>     of its string type.
>

I agree. But host language not only support NUL but also some Non-ASCII
character and Non-UTF-8 code:

$ git hash-object a.out -w | xargs git update-ref refs/myblobs/aoutblob
$ git for-each-ref --format="name=%(raw)" refs/myblobs/aoutblob
--python | python2
  File "<stdin>", line 1
SyntaxError: Non-ASCII character '\x8b' in file <stdin> on line 2, but
no encoding declared;
 see http://python.org/dev/peps/pep-0263/ for details

$ git for-each-ref --format="name=%(raw)" refs/myblobs/aoutblob
--python |python3
SyntaxError: Non-UTF-8 code starting with '\x8b' in file <stdin> on
line 2, but no encoding declared;
 see http://python.org/dev/peps/pep-0263/ for details

> > +The raw data in a object is `raw`, For commit and tag objects, `raw` contain
>
> s/contain/contains/, but more importantly, as we are not introducing
> %(header), I do not see why we want to talk about its details.  For
> commits and tags, just like for trees and blobs, 'raw' is the raw
> data in the object, so beyond "The raw data of a object is %(raw)",
> I do not think there is anything to talk about.
>

Ok, I will delete this part.

> > +     const char *s = buf->buf;
> > +     size_t cur_len = 0;
> > +
> > +     while ((cur_len != buf->len) && (isspace(*s) || *s == '\0')) {
> >               s++;
> > +             cur_len++;
>
> Is NUL treated the same as a whitespace letter for the purpose of
> determining if a line is empty?  WHY?
>

Well, there seems to be no correction here. But is it true that memory
like "\0abc" is considered empty?

Thanks.
--
ZheNing Hu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-30 13:01   ` [PATCH v2 2/2] " ZheNing Hu via GitGitGadget
  2021-05-31  0:44     ` Junio C Hamano
  2021-05-31  4:04     ` Junio C Hamano
@ 2021-05-31  4:10     ` Junio C Hamano
  2021-05-31 15:41     ` Felipe Contreras
  3 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2021-05-31  4:10 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -530,6 +545,7 @@ static struct {
>  	{ "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
>  	{ "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
>  	{ "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
> +	{ "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
>  	{ "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
>  	{ "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
>  	{ "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },

Doesn't this conflict with your own zh/ref-filter-atom-type topic?
Shouldn't one build on top of the other?

Or did we find something fundamentally broken about the other topic
to make us retract it that I do not remember?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-30 13:01   ` [PATCH v2 2/2] " ZheNing Hu via GitGitGadget
  2021-05-31  0:44     ` Junio C Hamano
@ 2021-05-31  4:04     ` Junio C Hamano
  2021-05-31 14:40       ` ZheNing Hu
  2021-05-31  4:10     ` Junio C Hamano
  2021-05-31 15:41     ` Felipe Contreras
  3 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-05-31  4:04 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -530,6 +545,7 @@ static struct {
>  	{ "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
>  	{ "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
>  	{ "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
> +	{ "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
>  	{ "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
>  	{ "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
>  	{ "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },

Doesn't this conflict with your own zh/ref-filter-atom-type topic?
Shouldn't one build on top of the other?

Or did we find something fundamentally broken about the other topic
to make us retract it that I do not remember?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-30 13:01   ` [PATCH v2 2/2] " ZheNing Hu via GitGitGadget
@ 2021-05-31  0:44     ` Junio C Hamano
  2021-05-31 14:35       ` ZheNing Hu
  2021-05-31  4:04     ` Junio C Hamano
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2021-05-31  0:44 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
> `--tcl`, `--perl` because if our binary raw data is passed to a variable
> in the host language, the host languages may cause escape errors.

"may cause escape errors" just says you are not escaping correctly
in your code (implying that this patch is not good enough and with
more effort we should be able to fix it to allow binaries), but the
problem is the host languages may not support binaries
(specifically, anything with a NUL in it) at all, which is
fundamentally unfixable, in which case, rejecting is the only
sensible choice.

    ... because the host language may not support a NUL in the variables
    of its string type.

> +The raw data in a object is `raw`, For commit and tag objects, `raw` contain

s/contain/contains/, but more importantly, as we are not introducing
%(header), I do not see why we want to talk about its details.  For
commits and tags, just like for trees and blobs, 'raw' is the raw
data in the object, so beyond "The raw data of a object is %(raw)",
I do not think there is anything to talk about.

> +`header` and `contents` two parts, `header` is structured part of raw data, it
> +composed of "tree XXX", "parent YYY", etc lines in commits , or composed of
> +"object OOO", "type TTT", etc lines in tags; `contents` is unstructured "free
> +text" part of raw object data. For blob and tree objects, their raw data don't
> +have `header` and `contents` parts.


> +	const char *s = buf->buf;
> +	size_t cur_len = 0;
> +
> +	while ((cur_len != buf->len) && (isspace(*s) || *s == '\0')) {
>  		s++;
> +		cur_len++;

Is NUL treated the same as a whitespace letter for the purpose of
determining if a line is empty?  WHY?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom
  2021-05-30 13:01 ` [PATCH v2 " ZheNing Hu via GitGitGadget
@ 2021-05-30 13:01   ` ZheNing Hu via GitGitGadget
  2021-05-31  0:44     ` Junio C Hamano
                       ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2021-05-30 13:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Christian Couder, Hariom Verma, Karthik Nayak,
	Felipe Contreras, Bagas Sanjaya, Jeff King, Phillip Wood,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depands on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, add a new member in `struct atom_value`: `s_size`, which
can record raw object size, it can help us add raw object data to
the buffer or compare two buffers which contain raw object data.

Beyond, `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, `--perl` because if our binary raw data is passed to a variable
in the host language, the host languages may cause escape errors.

Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-for-each-ref.txt |  14 ++
 ref-filter.c                       | 146 ++++++++++++++++-----
 t/t6300-for-each-ref.sh            | 200 +++++++++++++++++++++++++++++
 3 files changed, 330 insertions(+), 30 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 2ae2478de706..42cef4c9617d 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -235,6 +235,20 @@ and `date` to extract the named component.  For email fields (`authoremail`,
 without angle brackets, and `:localpart` to get the part before the `@` symbol
 out of the trimmed email.
 
+The raw data in a object is `raw`, For commit and tag objects, `raw` contain
+`header` and `contents` two parts, `header` is structured part of raw data, it
+composed of "tree XXX", "parent YYY", etc lines in commits , or composed of
+"object OOO", "type TTT", etc lines in tags; `contents` is unstructured "free
+text" part of raw object data. For blob and tree objects, their raw data don't
+have `header` and `contents` parts.
+
+raw:size::
+	The raw data size of the object.
+
+Note that `--format=%(raw)` can not be used with `--python`, `--shell`, `--tcl`,
+`--perl` because if our binary raw data is passed to a variable in the host language,
+the host languages may cause escape errors.
+
 The message in a commit or a tag object is `contents`, from which
 `contents:<part>` can be used to extract various parts out of:
 
diff --git a/ref-filter.c b/ref-filter.c
index f6a7b5290d54..5211fe946b70 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -138,6 +138,9 @@ static struct used_atom {
 			struct process_trailer_options trailer_opts;
 			unsigned int nlines;
 		} contents;
+		struct {
+			enum { RAW_BARE, RAW_LENGTH } option;
+		} raw_data;
 		struct {
 			cmp_status cmp_status;
 			const char *str;
@@ -370,6 +373,18 @@ static int contents_atom_parser(const struct ref_format *format, struct used_ato
 	return 0;
 }
 
+static int raw_atom_parser(const struct ref_format *format, struct used_atom *atom,
+				const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		atom->u.raw_data.option = RAW_BARE;
+	else if (!strcmp(arg, "size"))
+		atom->u.raw_data.option = RAW_LENGTH;
+	else
+		return strbuf_addf_ret(err, -1, _("unrecognized %%(raw) argument: %s"), arg);
+	return 0;
+}
+
 static int oid_atom_parser(const struct ref_format *format, struct used_atom *atom,
 			   const char *arg, struct strbuf *err)
 {
@@ -530,6 +545,7 @@ static struct {
 	{ "body", SOURCE_OBJ, FIELD_STR, body_atom_parser },
 	{ "trailers", SOURCE_OBJ, FIELD_STR, trailers_atom_parser },
 	{ "contents", SOURCE_OBJ, FIELD_STR, contents_atom_parser },
+	{ "raw", SOURCE_OBJ, FIELD_STR, raw_atom_parser },
 	{ "upstream", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	{ "push", SOURCE_NONE, FIELD_STR, remote_ref_atom_parser },
 	{ "symref", SOURCE_NONE, FIELD_STR, refname_atom_parser },
@@ -564,12 +580,15 @@ struct ref_formatting_state {
 
 struct atom_value {
 	const char *s;
+	size_t s_size;
 	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
 		       struct strbuf *err);
 	uintmax_t value; /* used for sorting when not FIELD_STR */
 	struct used_atom *atom;
 };
 
+#define ATOM_VALUE_S_SIZE_INIT (-1)
+
 /*
  * Used to parse format string and sort specifiers
  */
@@ -588,13 +607,6 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 		return strbuf_addf_ret(err, -1, _("malformed field name: %.*s"),
 				       (int)(ep-atom), atom);
 
-	/* Do we have the atom already used elsewhere? */
-	for (i = 0; i < used_atom_cnt; i++) {
-		int len = strlen(used_atom[i].name);
-		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-			return i;
-	}
-
 	/*
 	 * If the atom name has a colon, strip it and everything after
 	 * it off - it specifies the format for this entry, and
@@ -604,6 +616,17 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	arg = memchr(sp, ':', ep - sp);
 	atom_len = (arg ? arg : ep) - sp;
 
+	if (format->quote_style && !strncmp(sp, "raw", 3) && !arg)
+		return strbuf_addf_ret(err, -1, _("--format=%.*s cannot be used with"
+				"--python, --shell, --tcl, --perl"), (int)(ep-atom), atom);
+
+	/* Do we have the atom already used elsewhere? */
+	for (i = 0; i < used_atom_cnt; i++) {
+		int len = strlen(used_atom[i].name);
+		if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
+			return i;
+	}
+
 	/* Is the atom a valid one? */
 	for (i = 0; i < ARRAY_SIZE(valid_atom); i++) {
 		int len = strlen(valid_atom[i].name);
@@ -652,11 +675,14 @@ static int parse_ref_filter_atom(const struct ref_format *format,
 	return at;
 }
 
-static void quote_formatting(struct strbuf *s, const char *str, int quote_style)
+static void quote_formatting(struct strbuf *s, const char *str, size_t len, int quote_style)
 {
 	switch (quote_style) {
 	case QUOTE_NONE:
-		strbuf_addstr(s, str);
+		if (len != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(s, str, len);
+		else
+			strbuf_addstr(s, str);
 		break;
 	case QUOTE_SHELL:
 		sq_quote_buf(s, str);
@@ -683,9 +709,12 @@ static int append_atom(struct atom_value *v, struct ref_formatting_state *state,
 	 * encountered.
 	 */
 	if (!state->stack->prev)
-		quote_formatting(&state->stack->output, v->s, state->quote_style);
+		quote_formatting(&state->stack->output, v->s, v->s_size, state->quote_style);
 	else
-		strbuf_addstr(&state->stack->output, v->s);
+		if (v->s_size != ATOM_VALUE_S_SIZE_INIT)
+			strbuf_add(&state->stack->output, v->s, v->s_size);
+		else
+			strbuf_addstr(&state->stack->output, v->s);
 	return 0;
 }
 
@@ -785,14 +814,16 @@ static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state
 	return 0;
 }
 
-static int is_empty(const char *s)
+static int is_empty(struct strbuf *buf)
 {
-	while (*s != '\0') {
-		if (!isspace(*s))
-			return 0;
+	const char *s = buf->buf;
+	size_t cur_len = 0;
+
+	while ((cur_len != buf->len) && (isspace(*s) || *s == '\0')) {
 		s++;
+		cur_len++;
 	}
-	return 1;
+	return cur_len == buf->len;
 }
 
 static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
@@ -800,6 +831,7 @@ static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 {
 	struct ref_formatting_stack *cur = state->stack;
 	struct if_then_else *if_then_else = NULL;
+	size_t str_len = 0;
 
 	if (cur->at_end == if_then_else_handler)
 		if_then_else = (struct if_then_else *)cur->at_end_data;
@@ -810,18 +842,22 @@ static int then_atom_handler(struct atom_value *atomv, struct ref_formatting_sta
 	if (if_then_else->else_atom_seen)
 		return strbuf_addf_ret(err, -1, _("format: %%(then) atom used after %%(else)"));
 	if_then_else->then_atom_seen = 1;
+	if (if_then_else->str)
+		str_len = strlen(if_then_else->str);
 	/*
 	 * If the 'equals' or 'notequals' attribute is used then
 	 * perform the required comparison. If not, only non-empty
 	 * strings satisfy the 'if' condition.
 	 */
 	if (if_then_else->cmp_status == COMPARE_EQUAL) {
-		if (!strcmp(if_then_else->str, cur->output.buf))
+		if (str_len == cur->output.len &&
+		    !memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
 	} else if (if_then_else->cmp_status == COMPARE_UNEQUAL) {
-		if (strcmp(if_then_else->str, cur->output.buf))
+		if (str_len != cur->output.len ||
+		    memcmp(if_then_else->str, cur->output.buf, cur->output.len))
 			if_then_else->condition_satisfied = 1;
-	} else if (cur->output.len && !is_empty(cur->output.buf))
+	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
 	strbuf_reset(&cur->output);
 	return 0;
@@ -867,7 +903,7 @@ static int end_atom_handler(struct atom_value *atomv, struct ref_formatting_stat
 	 * only on the topmost supporting atom.
 	 */
 	if (!current->prev->prev) {
-		quote_formatting(&s, current->output.buf, state->quote_style);
+		quote_formatting(&s, current->output.buf, current->output.len, state->quote_style);
 		strbuf_swap(&current->output, &s);
 	}
 	strbuf_release(&s);
@@ -1293,7 +1329,7 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 
 /* See grab_values */
 static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
-				   struct object *obj)
+				   unsigned long buf_size, struct object *obj)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
@@ -1309,6 +1345,16 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf,
 		if (deref)
 			name++;
 
+		if (starts_with(name, "raw")) {
+			if (atom->u.raw_data.option == RAW_BARE) {
+				v->s = xmemdupz(buf, buf_size);
+				v->s_size = buf_size;
+			} else if (atom->u.raw_data.option == RAW_LENGTH) {
+				v->s = xstrfmt("%"PRIuMAX, (uintmax_t)buf_size);
+			}
+			continue;
+		}
+
 		if ((obj->type != OBJ_TAG &&
 		     obj->type != OBJ_COMMIT) ||
 		    (strcmp(name, "body") &&
@@ -1378,25 +1424,30 @@ static void fill_missing_values(struct atom_value *val)
  * pointed at by the ref itself; otherwise it is the object the
  * ref (which is a tag) refers to.
  */
-static void grab_values(struct atom_value *val, int deref, struct object *obj, void *buf)
+static void grab_values(struct atom_value *val, int deref, struct object *obj, struct expand_data *data)
 {
+	void *buf = data->content;
+	unsigned long buf_size = data->size;
+
 	switch (obj->type) {
 	case OBJ_TAG:
 		grab_tag_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf, obj);
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		grab_person("tagger", val, deref, buf);
 		break;
 	case OBJ_COMMIT:
 		grab_commit_values(val, deref, obj);
-		grab_sub_body_contents(val, deref, buf, obj);
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		grab_person("author", val, deref, buf);
 		grab_person("committer", val, deref, buf);
 		break;
 	case OBJ_TREE:
 		/* grab_tree_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		break;
 	case OBJ_BLOB:
 		/* grab_blob_values(val, deref, obj, buf, sz); */
+		grab_sub_body_contents(val, deref, buf, buf_size, obj);
 		break;
 	default:
 		die("Eh?  Object of type %d?", obj->type);
@@ -1618,7 +1669,7 @@ static int get_object(struct ref_array_item *ref, int deref, struct object **obj
 			return strbuf_addf_ret(err, -1, _("parse_object_buffer failed on %s for %s"),
 					       oid_to_hex(&oi->oid), ref->refname);
 		}
-		grab_values(ref->value, deref, *obj, oi->content);
+		grab_values(ref->value, deref, *obj, oi);
 	}
 
 	grab_common_values(ref->value, deref, oi);
@@ -1698,7 +1749,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 		int deref = 0;
 		const char *refname;
 		struct branch *branch = NULL;
-
+		v->s_size = ATOM_VALUE_S_SIZE_INIT;
 		v->handler = append_atom;
 		v->atom = atom;
 
@@ -2301,6 +2352,20 @@ static int compare_detached_head(struct ref_array_item *a, struct ref_array_item
 	return 0;
 }
 
+static int memcasecmp(const void *vs1, const void *vs2, size_t n)
+{
+	const char *s1 = (const void *)vs1;
+	const char *s2 = (const void *)vs2;
+	const char *end = s1 + n;
+
+	for (; s1 < end; s1++, s2++) {
+		int diff = tolower(*s1) - tolower(*s2);
+		if (diff)
+			return diff;
+	}
+	return 0;
+}
+
 static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, struct ref_array_item *b)
 {
 	struct atom_value *va, *vb;
@@ -2321,10 +2386,30 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
 	} else if (s->sort_flags & REF_SORTING_VERSION) {
 		cmp = versioncmp(va->s, vb->s);
 	} else if (cmp_type == FIELD_STR) {
-		int (*cmp_fn)(const char *, const char *);
-		cmp_fn = s->sort_flags & REF_SORTING_ICASE
-			? strcasecmp : strcmp;
-		cmp = cmp_fn(va->s, vb->s);
+		if (va->s_size == ATOM_VALUE_S_SIZE_INIT &&
+		    vb->s_size == ATOM_VALUE_S_SIZE_INIT) {
+			int (*cmp_fn)(const char *, const char *);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? strcasecmp : strcmp;
+			cmp = cmp_fn(va->s, vb->s);
+		} else {
+			int (*cmp_fn)(const void *, const void *, size_t);
+			cmp_fn = s->sort_flags & REF_SORTING_ICASE
+				? memcasecmp : memcmp;
+			size_t a_size = va->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(va->s) : va->s_size;
+			size_t b_size = vb->s_size == ATOM_VALUE_S_SIZE_INIT ?
+					strlen(vb->s) : vb->s_size;
+
+			cmp = cmp_fn(va->s, vb->s, b_size > a_size ?
+				     a_size : b_size);
+			if (!cmp) {
+				if (a_size > b_size)
+					cmp = 1;
+				else if (a_size < b_size)
+					cmp = -1;
+			}
+		}
 	} else {
 		if (va->value < vb->value)
 			cmp = -1;
@@ -2424,6 +2509,7 @@ int format_ref_array_item(struct ref_array_item *info,
 	}
 	if (format->need_color_reset_at_eol) {
 		struct atom_value resetv;
+		resetv.s_size = ATOM_VALUE_S_SIZE_INIT;
 		resetv.s = GIT_COLOR_RESET;
 		if (append_atom(&resetv, &state, error_buf)) {
 			pop_stack_element(&state.stack);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9e0214076b4d..46703d4e512f 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -130,6 +130,8 @@ test_atom head parent:short=10 ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head raw "$(git cat-file commit refs/heads/main)
+"
 test_atom head '*objectname' ''
 test_atom head '*objecttype' ''
 test_atom head author 'A U Thor <author@example.com> 1151968724 +0200'
@@ -221,6 +223,15 @@ test_atom tag contents 'Tagging at 1151968727
 '
 test_atom tag HEAD ' '
 
+test_expect_success 'basic atom: refs/tags/testtag *raw' '
+	git cat-file commit refs/tags/testtag^{} >expected &&
+	git for-each-ref --format="%(*raw)" refs/tags/testtag >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'Check invalid atoms names are errors' '
 	test_must_fail git for-each-ref --format="%(INVALID)" refs/heads
 '
@@ -686,6 +697,15 @@ test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
 test_atom refs/tags/signed-empty contents "$sig"
 
+test_expect_success 'basic atom: refs/tags/signed-empty raw' '
+	git cat-file tag refs/tags/signed-empty >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short subject:sanitize 'subject-line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
@@ -695,6 +715,15 @@ test_atom refs/tags/signed-short contents:signature "$sig"
 test_atom refs/tags/signed-short contents "subject line
 $sig"
 
+test_expect_success 'basic atom: refs/tags/signed-short raw' '
+	git cat-file tag refs/tags/signed-short >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-short >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long subject:sanitize 'subject-line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
@@ -708,6 +737,15 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
+test_expect_success 'basic atom: refs/tags/signed-long raw' '
+	git cat-file tag refs/tags/signed-long >expected &&
+	git for-each-ref --format="%(raw)" refs/tags/signed-long >actual &&
+	sanitize_pgp <expected >expected.clean &&
+	sanitize_pgp <actual >actual.clean &&
+	echo "" >>expected.clean &&
+	test_cmp expected.clean actual.clean
+'
+
 test_expect_success 'set up refs pointing to tree and blob' '
 	git update-ref refs/mytrees/first refs/heads/main^{tree} &&
 	git update-ref refs/myblobs/first refs/heads/main:one
@@ -720,6 +758,16 @@ test_atom refs/mytrees/first contents:body ""
 test_atom refs/mytrees/first contents:signature ""
 test_atom refs/mytrees/first contents ""
 
+test_expect_success 'basic atom: refs/mytrees/first raw' '
+	git cat-file tree refs/mytrees/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/mytrees/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_atom refs/myblobs/first subject ""
 test_atom refs/myblobs/first contents:subject ""
 test_atom refs/myblobs/first body ""
@@ -727,6 +775,158 @@ test_atom refs/myblobs/first contents:body ""
 test_atom refs/myblobs/first contents:signature ""
 test_atom refs/myblobs/first contents ""
 
+test_expect_success 'basic atom: refs/myblobs/first raw' '
+	git cat-file blob refs/myblobs/first >expected &&
+	echo "" >>expected &&
+	git for-each-ref --format="%(raw)" refs/myblobs/first >actual &&
+	test_cmp expected actual &&
+	git cat-file -s refs/myblobs/first >expected &&
+	git for-each-ref --format="%(raw:size)" refs/myblobs/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'set up refs pointing to binary blob' '
+	printf "%b" "a\0b\0c" >blob1 &&
+	printf "%b" "a\0c\0b" >blob2 &&
+	printf "%b" "\0a\0b\0c" >blob3 &&
+	printf "%b" "abc" >blob4 &&
+	printf "%b" "\0 \0 \0 " >blob5 &&
+	printf "%b" "\0 \0a\0 " >blob6 &&
+	>blob7 &&
+	git hash-object blob1 -w | xargs git update-ref refs/myblobs/blob1 &&
+	git hash-object blob2 -w | xargs git update-ref refs/myblobs/blob2 &&
+	git hash-object blob3 -w | xargs git update-ref refs/myblobs/blob3 &&
+	git hash-object blob4 -w | xargs git update-ref refs/myblobs/blob4 &&
+	git hash-object blob5 -w | xargs git update-ref refs/myblobs/blob5 &&
+	git hash-object blob6 -w | xargs git update-ref refs/myblobs/blob6 &&
+	git hash-object blob7 -w | xargs git update-ref refs/myblobs/blob7
+'
+
+test_expect_success 'Verify sorts with raw' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob7
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob3
+	refs/mytrees/first
+	refs/myblobs/first
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob4
+	refs/heads/main
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'Verify sorts with raw:size' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob7
+	refs/myblobs/first
+	refs/heads/main
+	refs/myblobs/blob4
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/mytrees/first
+	EOF
+	git for-each-ref --format="%(refname)" --sort=raw:size \
+		refs/heads/main refs/myblobs/ refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'validate raw atom with %(if:equals)' '
+	cat >expected <<-EOF &&
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	not equals
+	refs/myblobs/blob4
+	not equals
+	not equals
+	not equals
+	not equals
+	EOF
+	git for-each-ref --format="%(if:equals=abc)%(raw)%(then)%(refname)%(else)not equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+test_expect_success 'validate raw atom with %(if:notequals)' '
+	cat >expected <<-EOF &&
+	refs/heads/ambiguous
+	refs/heads/main
+	refs/heads/newtag
+	refs/myblobs/blob1
+	refs/myblobs/blob2
+	refs/myblobs/blob3
+	equals
+	refs/myblobs/blob5
+	refs/myblobs/blob6
+	refs/myblobs/blob7
+	refs/myblobs/first
+	EOF
+	git for-each-ref --format="%(if:notequals=abc)%(raw)%(then)%(refname)%(else)equals%(end)" \
+		refs/myblobs/ refs/heads/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'empty raw refs with %(if)' '
+	cat >expected <<-EOF &&
+	refs/myblobs/blob1 not empty
+	refs/myblobs/blob2 not empty
+	refs/myblobs/blob3 not empty
+	refs/myblobs/blob4 not empty
+	refs/myblobs/blob5 empty
+	refs/myblobs/blob6 not empty
+	refs/myblobs/blob7 empty
+	refs/myblobs/first not empty
+	EOF
+	git for-each-ref --format="%(refname) %(if)%(raw)%(then)not empty%(else)empty%(end)" \
+		refs/myblobs/ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success '%(raw) with --python must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --python
+'
+
+test_expect_success '%(raw) with --tcl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --tcl
+'
+
+test_expect_success '%(raw) with --perl must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --perl
+'
+
+test_expect_success '%(raw) with --shell must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --shell
+'
+
+test_expect_success '%(raw) with --shell and --sort=raw must failed' '
+	test_must_fail git for-each-ref --format="%(raw)" --sort=raw --shell
+'
+
+test_expect_success '%(raw:size) with --shell' '
+	git for-each-ref --format="%(raw:size)" | while read line
+	do
+		echo "'\''$line'\''" >>expect
+	done &&
+	git for-each-ref --format="%(raw:size)" --shell >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'for-each-ref --format compare with cat-file --batch' '
+	git rev-parse refs/mytrees/first | git cat-file --batch >expected &&
+	git for-each-ref --format="%(objectname) %(objecttype) %(objectsize)
+%(raw)" refs/mytrees/first >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'set up multiple-sort tags' '
 	for when in 100000 200000
 	do
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-06-05  8:06 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-01 14:37 [PATCH 0/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-01 14:37 ` [PATCH 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-03  2:10   ` Junio C Hamano
2021-06-03  4:52     ` ZheNing Hu
2021-06-01 14:37 ` [PATCH 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-03  2:38   ` Junio C Hamano
2021-06-03  5:36     ` ZheNing Hu
2021-06-03 14:06       ` ZheNing Hu
2021-06-03 21:36         ` Junio C Hamano
2021-06-03 21:35       ` Junio C Hamano
2021-06-04 10:59         ` ZheNing Hu
2021-06-03  5:11 ` [PATCH 0/2] " Bagas Sanjaya
2021-06-03  5:37   ` ZheNing Hu
2021-06-04 12:12 ` [PATCH v2 " ZheNing Hu via GitGitGadget
2021-06-04 12:12   ` [PATCH v2 1/2] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-06-04 12:12   ` [PATCH v2 2/2] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-06-04 13:23     ` Christian Couder
2021-06-04 12:53   ` [PATCH v2 0/2] " Christian Couder
2021-06-05  4:34     ` ZheNing Hu
2021-06-05  4:49       ` Christian Couder
2021-06-05  5:42         ` ZheNing Hu
2021-06-05  6:45           ` Christian Couder
2021-06-05  8:05             ` ZheNing Hu
  -- strict thread matches above, loose matches on Subject: below --
2021-05-27 14:43 [PATCH " ZheNing Hu via GitGitGadget
2021-05-30 13:01 ` [PATCH v2 " ZheNing Hu via GitGitGadget
2021-05-30 13:01   ` [PATCH v2 2/2] " ZheNing Hu via GitGitGadget
2021-05-31  0:44     ` Junio C Hamano
2021-05-31 14:35       ` ZheNing Hu
2021-06-01  9:54         ` Junio C Hamano
2021-06-01 11:05           ` ZheNing Hu
2021-05-31  4:04     ` Junio C Hamano
2021-05-31 14:40       ` ZheNing Hu
2021-06-01  8:54         ` Junio C Hamano
2021-06-01 11:00           ` ZheNing Hu
2021-06-01 13:48             ` Johannes Schindelin
2021-05-31  4:10     ` Junio C Hamano
2021-05-31 15:41     ` Felipe Contreras
2021-06-01 10:37       ` ZheNing Hu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).