All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH v2 0/4] A new library for plumbing output
@ 2010-04-11 23:21 Julian Phillips
  2010-04-11 23:21 ` [RFC/PATCH v2 1/4] output: Add a " Julian Phillips
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-11 23:21 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

Ok, round 2 of an attempt at making a format agnostic structured output library.
The idea being that the command doing the output doesn't have to care what the
actual output format is, it uses one set of output functions and the user gets
to choose their preferred output style.

The current backend/frontend interface probably needs expanding so that a less
noddy XML output can be used, but it's a start.

Rather than building an in-memory structure and then writing it out I've gone
for the approach of writing out immediately.  The thought behind this was that I
didn't really want to force commands like log to have to wait 'til the end to
start outputting information (though I still haven't got around to working on
converting log).

Probably the biggest change from v1 is an expanded aim.  Now the output library
is aimed at controlling _all_ plubming output.  This series includes a patch for
ls-tree that has all it's output going through the library, and a patch for
status that has all the --porcelain output going through the library.

The XML patch still needs a lot of work, as I've been busy playing with the
library API and the NORMAL/ZERO backends ...

Julian Phillips (4):
  output: Add a new library for plumbing output
  ls-tree: complete conversion to using output library
  status: use output library for porcelain output
  output: WIP: Add XML backend

 Documentation/technical/api-output.txt |  116 ++++++++++++++
 Makefile                               |    5 +
 builtin/commit.c                       |   21 +++-
 builtin/ls-tree.c                      |   51 ++++--
 output-json.c                          |  127 +++++++++++++++
 output-normal.c                        |   95 +++++++++++
 output-xml.c                           |   68 ++++++++
 output-zero.c                          |   74 +++++++++
 output.c                               |  270 ++++++++++++++++++++++++++++++++
 output.h                               |   93 +++++++++++
 wt-status.c                            |   88 ++++++++++-
 wt-status.h                            |    3 +-
 12 files changed, 985 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/technical/api-output.txt
 create mode 100644 output-json.c
 create mode 100644 output-normal.c
 create mode 100644 output-xml.c
 create mode 100644 output-zero.c
 create mode 100644 output.c
 create mode 100644 output.h

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC/PATCH v2 1/4] output: Add a new library for plumbing output
  2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
@ 2010-04-11 23:21 ` Julian Phillips
  2010-04-13  9:43   ` Ilari Liusvaara
  2010-04-11 23:21 ` [RFC/PATCH v2 2/4] ls-tree: complete conversion to using output library Julian Phillips
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Julian Phillips @ 2010-04-11 23:21 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

Add a library that allows commands to produce output in any of a range of
formats using a single API.  The idea being that by running all plumbing
command output through this library the output can easily be switched to an
alternative output style (e.g. JSON), while still supporting the
current output formats.

The API includes an OPT_OUTPUT and handle_output_arg so that the
option handling for different commands will be as similar as possible.

Documentation for the API is included in
Documentation/technical/api-output.txt.

At the moment the only new output format is JSON.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 Documentation/technical/api-output.txt |  116 ++++++++++++++
 Makefile                               |    4 +
 output-json.c                          |  127 +++++++++++++++
 output-normal.c                        |   95 +++++++++++
 output-zero.c                          |   74 +++++++++
 output.c                               |  266 ++++++++++++++++++++++++++++++++
 output.h                               |   92 +++++++++++
 7 files changed, 774 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/api-output.txt
 create mode 100644 output-json.c
 create mode 100644 output-normal.c
 create mode 100644 output-zero.c
 create mode 100644 output.c
 create mode 100644 output.h

diff --git a/Documentation/technical/api-output.txt b/Documentation/technical/api-output.txt
new file mode 100644
index 0000000..a811cbe
--- /dev/null
+++ b/Documentation/technical/api-output.txt
@@ -0,0 +1,116 @@
+structured output API
+=====================
+
+The structured output API is provided by output.h and consists of a set of
+functions for outputting data in a structured manner in one of a number of
+formats (referred to as output styles).
+
+The output consists of objects, arrays and the actual values, the term item is
+used where any of these may be used, and container when either an object or
+array may be used.  Objects are unordered collections of named items, and arrays
+are ordered collections of unnamed items.  For simplicity a name is always
+supplied when creating an item - though it may not always be used (e.g. if you
+are adding the item to a list).
+
+To start using structured output `output_start` must be called, as this
+allocates a context to track what is happening and records which output style
+has been selected.  Once all the data has been output, then `output_end` must be
+called.
+
+Values are output by calling the `output_<type>` functions, and grouped by
+starting a container with either `output_start_object` or `output_start_array`.
+A container can be closed by calling `output_end_current` after which items will
+be added at the same level as the container just closed.  It is not necessary to
+close all containers before calling `output_end`, as any open containers will be
+automatically closed.
+
+There are also some functions in the API which exist to support unstructured
+output formats - primarily the existing regular and -z output formats.
+
+Functions
+---------
+
+* Option Parsing
+
+`OPT_OUTPUT`::
+
+	This is a convienience macro that allows easy integration of structured
+	output into a command using the option parser.  The caller only has to
+	supply the short and long option names, and the variable to store the
+	option string in.
+
+`handle_output_arg`::
+
+	Convert a string into a `enum output_style` value.  This allows the
+	calling command to pass the job of parsing the command line option value
+	to the output code.
+
+* Top Level
+
+`ouput_start`::
+
+	This function starts the structured output.  It takes an `enum
+	output_style` argument the defines which backend will be used to do the
+	output, and returns a context pointer that must be passed to all other
+	output functions.
+
+`output_end`::
+
+	This function finishes structured output, including closing any open
+	containers and freeing the context structure allocated by output_start.
+
+* Container Management
+
+`output_start_object`::
+
+	This starts a new object as a member of the curent container, all items
+	will then be added to this object until `object_end_current` or
+	`output_end` is called.
+
+`output_start_array`::
+
+	This starts a new array as a member of the current container, all items
+	will then be added to this array until `object_end_curent` or
+	`output_end` is called.
+
+`output_end_current`::
+
+	This closes the current container.  Any items added after this function
+	is called will be added to the parent of the container just closed.
+
+* Value Output Functions
+
+`output_<type>`::
+
+	Output a value of the given type into the current with the given name
+	(which will be ignored if the continer is an array).  Strings are
+	assumed to be UTF-8, so strings known to be in other encodings will need
+	to be converted.  Quoting of spaces etc is done by the backend code, as
+	the quoting rules are goverened by which style is chosen, so strings
+	should be unqouted.
+
+* Unstructured Output Functions
+
+`output_token`::
+
+	Output the given token.  This token is not subject to any quoting rules
+	but is just displayed as given.
+
+`output_nul`::
+
+	This is basically a dedicated version of `output_token` for outputing a
+	NUL character (which can't be passed through `output_token` since it
+	uses NUL terminatation).
+
+`output_newline`::
+
+	Output an approprite marker for the end of a line.  In the normal output
+	this is a newline character - for the -z output it will be a NUL
+	character.
+
+`output_next_directive`::
+
+	This function allows the modification of printf format directives if
+	supported by the backend (only NORMAL and ZERO currently).  The given
+	string is inserted between the "%" and the exisitng directive
+	characters.
diff --git a/Makefile b/Makefile
index 910f471..dc38730 100644
--- a/Makefile
+++ b/Makefile
@@ -576,6 +576,10 @@ LIB_OBJS += merge-recursive.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += notes.o
 LIB_OBJS += object.o
+LIB_OBJS += output.o
+LIB_OBJS += output-json.o
+LIB_OBJS += output-normal.o
+LIB_OBJS += output-zero.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-refs.o
 LIB_OBJS += pack-revindex.o
diff --git a/output-json.c b/output-json.c
new file mode 100644
index 0000000..386f010
--- /dev/null
+++ b/output-json.c
@@ -0,0 +1,127 @@
+#include "git-compat-util.h"
+#include "output.h"
+
+static void json_print_quoted(FILE *file, const char *unquoted)
+{
+	unsigned char *s = (unsigned char *)unquoted;
+
+	while (*s) {
+		switch (*s) {
+		case '"':
+			fprintf(file, "\\\"");
+			break;
+		case '\\':
+			fprintf(file, "\\\\");
+			break;
+		case '\b':
+			fprintf(file, "\\b");
+			break;
+		case '\f':
+			fprintf(file, "\\f");
+			break;
+		case '\n':
+			fprintf(file, "\\n");
+			break;
+		case '\r':
+			fprintf(file, "\\r");
+			break;
+		case '\t':
+			fprintf(file, "\\t");
+			break;
+		default:
+			/*
+			 * All control characters must be encoded, even if they
+			 * don't have a specific escape character of their own
+			 */
+			if (*s < 0x20)
+				fprintf(file, "\\u%04x", *s);
+			else
+				fprintf(file, "%c", *s);
+			break;
+		}
+		s++;
+	}
+}
+
+static void json_obj_start(FILE *file, const char *name)
+{
+	fprintf(file, "{\n");
+}
+
+static void json_obj_end(FILE *file, const char *name)
+{
+	fprintf(file, "\n}");
+}
+
+static void json_obj_item_start(FILE *file, const char *name, int first)
+{
+	if (!first)
+		fprintf(file, ",\n");
+	fprintf(file, "\"");
+	json_print_quoted(file, name);
+	fprintf(file, "\" : ");
+}
+
+static void json_array_start(FILE *file, const char *name)
+{
+	fprintf(file, "[\n");
+}
+
+static void json_array_end(FILE *file, const char *name)
+{
+	fprintf(file, "\n]");
+}
+
+static void json_array_item_start(FILE *file, const char *name, int first)
+{
+	if (!first)
+		fprintf(file, ",\n");
+}
+
+static void json_bool(FILE *file, int value)
+{
+	if (value)
+		fprintf(file, "true");
+	else
+		fprintf(file, "false");
+}
+
+static void json_str(FILE *file, const char *value)
+{
+	fprintf(file, "\"");
+	json_print_quoted(file, value);
+	fprintf(file, "\"");
+}
+
+static void json_int(FILE *file, int64_t value)
+{
+	fprintf(file, "%lld", value);
+}
+
+static void json_uint(FILE *file, uint64_t value)
+{
+	fprintf(file, "%llu", value);
+}
+
+static void json_double(FILE *file, double value, int precision)
+{
+	fprintf(file, "%.*f", precision, value);
+}
+
+struct output_ops output_json_ops = {
+	json_obj_start,
+	json_obj_end,
+	json_obj_item_start,
+	NULL,
+
+	json_array_start,
+	json_array_end,
+	json_array_item_start,
+	NULL,
+
+	json_bool,
+	json_str,
+	json_int,
+	json_uint,
+	json_double,
+};
diff --git a/output-normal.c b/output-normal.c
new file mode 100644
index 0000000..d4c570a
--- /dev/null
+++ b/output-normal.c
@@ -0,0 +1,95 @@
+#include "git-compat-util.h"
+#include "output.h"
+#include "strbuf.h"
+#include "quote.h"
+
+static const char *next_directive = "";
+static char format_string[100];
+
+static char *normal_quote(const char *s)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t len = quote_c_style(s, &buf, NULL, 0);
+
+	if (len == 0) {
+		strbuf_release(&buf);
+		return xstrdup(s);
+	} else
+		return strbuf_detach(&buf, NULL);
+}
+
+static void normal_bool(FILE *file, int value)
+{
+	if (value)
+		fprintf(file, "true");
+	else
+		fprintf(file, "false");
+}
+
+static void normal_str(FILE *file, const char *value)
+{
+	char *quoted = normal_quote(value);
+	snprintf(format_string, sizeof(format_string), "%%%ss", next_directive);
+	fprintf(file, format_string, quoted);
+	next_directive = "";
+	free(quoted);
+}
+
+static void normal_int(FILE *file, int64_t value)
+{
+	snprintf(format_string, sizeof(format_string), "%%%slld", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void normal_uint(FILE *file, uint64_t value)
+{
+	snprintf(format_string, sizeof(format_string), "%%%sllu", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void normal_double(FILE *file, double value, int precision)
+{
+	snprintf(format_string, sizeof(format_string), "%%%s.*f", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void normal_token(FILE *file, const char *token)
+{
+	fprintf(file, "%s", token);
+}
+
+static void normal_newline(FILE *file)
+{
+	fprintf(file, "\n");
+}
+
+static void normal_next_directive(const char *directive)
+{
+	next_directive = directive;
+}
+
+struct output_ops output_normal_ops = {
+	NULL,
+	NULL,
+	NULL,
+	NULL,
+
+	NULL,
+	NULL,
+	NULL,
+	NULL,
+
+	normal_bool,
+	normal_str,
+	normal_int,
+	normal_uint,
+	normal_double,
+
+	normal_token,
+	NULL,
+	normal_newline,
+	normal_next_directive,
+};
diff --git a/output-zero.c b/output-zero.c
new file mode 100644
index 0000000..a00a3cb
--- /dev/null
+++ b/output-zero.c
@@ -0,0 +1,74 @@
+#include "git-compat-util.h"
+#include "output.h"
+
+static const char *next_directive = "";
+static char format_string[100];
+
+static void zero_bool(FILE *file, int value)
+{
+	if (value)
+		fprintf(file, "true");
+	else
+		fprintf(file, "false");
+}
+
+static void zero_str(FILE *file, const char *value)
+{
+	snprintf(format_string, sizeof(format_string), "%%%ss", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void zero_int(FILE *file, int64_t value)
+{
+	snprintf(format_string, sizeof(format_string), "%%%slld", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void zero_uint(FILE *file, uint64_t value)
+{
+	snprintf(format_string, sizeof(format_string), "%%%sllu", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void zero_double(FILE *file, double value, int precision)
+{
+	snprintf(format_string, sizeof(format_string), "%%%s.*f", next_directive);
+	fprintf(file, format_string, value);
+	next_directive = "";
+}
+
+static void zero_nul(FILE *file)
+{
+	fprintf(file, "%c", 0);
+}
+
+static void zero_next_directive(const char *directive)
+{
+	next_directive = directive;
+}
+
+struct output_ops output_zero_ops = {
+	NULL,
+	NULL,
+	NULL,
+	NULL,
+
+	NULL,
+	NULL,
+	NULL,
+	NULL,
+
+	zero_bool,
+	zero_str,
+	zero_int,
+	zero_uint,
+	zero_double,
+
+	zero_str,
+	zero_nul,
+	zero_nul,
+        zero_next_directive,
+};
diff --git a/output.c b/output.c
new file mode 100644
index 0000000..ac8feb1
--- /dev/null
+++ b/output.c
@@ -0,0 +1,266 @@
+#include "git-compat-util.h"
+#include "output.h"
+#include "strbuf.h"
+
+#define DEFAULT_OUTPUT_FORMAT OUTPUT_JSON
+
+extern struct output_ops output_normal_ops;
+extern struct output_ops output_zero_ops;
+extern struct output_ops output_json_ops;
+
+struct output_ops *output_ops[] = {
+	&output_normal_ops,
+	&output_zero_ops,
+	&output_json_ops,
+};
+
+enum output_style handle_output_arg(char *s)
+{
+	if (!s)
+		return OUTPUT_NORMAL;
+	else if (!strcmp(s, "no"))
+		return OUTPUT_NORMAL;
+	else if (!strcmp(s, "zero"))
+		return OUTPUT_ZERO;
+	else if (!strcmp(s, "json"))
+		return OUTPUT_JSON;
+	else
+		die("Invalid output style '%s'", s);
+}
+
+struct output_context *output_start(enum output_style style)
+{
+	struct output_context *context = xcalloc(1, sizeof(*context));
+
+	context->style = style;
+	context->file = stdout;
+	context->ops = output_ops[style];
+
+	output_start_object(context, "git");
+
+	return context;
+}
+
+void output_end(struct output_context *context)
+{
+	while(context->current)
+		output_end_current(context);
+
+	/*
+	 * OUTPUT_NORMAL and OUTPUT_ZERO are special cases - the output format
+	 * is _already_ defined so we have to stick to the rules, we can't add
+	 * _anything_
+	 */
+	if (context->style > OUTPUT_ZERO)
+		fprintf(context->file, "\n");
+
+	free(context);
+}
+
+
+static void item_start(struct output_context *context, const char *name)
+{
+	if (!context->current)
+		return;
+
+	switch (context->current->type) {
+	case OUTPUT_ITEM_OBJECT:
+		if (context->ops->obj_item_start)
+			context->ops->obj_item_start(context->file, name,
+						     context->current->first);
+		break;
+	case OUTPUT_ITEM_ARRAY:
+		if (context->ops->array_item_start)
+			context->ops->array_item_start(context->file, name,
+						       context->current->first);
+		break;
+	}
+}
+
+static void item_end(struct output_context *context, const char *name)
+{
+	if (!context->current)
+		return;
+
+	switch (context->current->type) {
+	case OUTPUT_ITEM_OBJECT:
+		if (context->ops->obj_item_end)
+			context->ops->obj_item_end(context->file, name,
+						   context->current->first);
+		break;
+	case OUTPUT_ITEM_ARRAY:
+		if (context->ops->array_item_end)
+			context->ops->array_item_end(context->file, name,
+						     context->current->first);
+		break;
+	}
+
+	context->current->first = 0;
+}
+
+
+void output_new_current(struct output_context *context, const char *name,
+			enum output_item_type type)
+{
+	struct output_item *item = xmallocz(sizeof(*item));
+
+	item->name = xstrdup(name);
+	item->type = type;
+	item->first = 1;
+
+	item->prev = context->current;
+	context->current = item;
+}
+
+void output_start_object(struct output_context *context, const char *name)
+{
+	item_start(context, name);
+
+	output_new_current(context, name, OUTPUT_ITEM_OBJECT);
+
+	if (context->ops->obj_start)
+		context->ops->obj_start(context->file, name);
+}
+
+void output_start_array(struct output_context *context, const char *name)
+{
+	item_start(context, name);
+
+	output_new_current(context, name, OUTPUT_ITEM_ARRAY);
+
+	if (context->ops->array_start)
+		context->ops->array_start(context->file, name);
+}
+
+void output_end_current(struct output_context *context)
+{
+	struct output_item *item = context->current;
+
+	switch (item->type) {
+	case OUTPUT_ITEM_OBJECT:
+		if (context->ops->obj_end)
+			context->ops->obj_end(context->file, item->name);
+		break;
+	case OUTPUT_ITEM_ARRAY:
+		if (context->ops->array_end)
+			context->ops->array_end(context->file, item->name);
+		break;
+	}
+
+	item = context->current;
+	context->current = context->current->prev;
+
+	item_end(context, item->name);
+
+	free(item->name);
+	free(item);
+}
+
+
+void output_bool(struct output_context *context, const char *name, int value)
+{
+	item_start(context, name);
+	if (context->ops->bool)
+		context->ops->bool(context->file, value);
+	item_end(context, name);
+}
+
+void output_str(struct output_context *context, const char *name, const char *value)
+{
+	item_start(context, name);
+	if (context->ops->str)
+		context->ops->str(context->file, value);
+	item_end(context, name);
+}
+
+void output_strf(struct output_context *context, const char *name, const char *fmt, ...)
+{
+	static char buffer[100];
+	int len;
+	char *value = buffer;
+	va_list ap;
+
+	va_start(ap, fmt);
+	len = vsnprintf(buffer, sizeof(buffer), fmt, ap);
+	va_end(ap);
+
+	if (len >= sizeof(buffer)) {
+		value = xmalloc(len + 1);
+		va_start(ap, fmt);
+		vsnprintf(value, len + 1, fmt, ap);
+		va_end(ap);
+	}
+
+	output_str(context, name, value);
+
+	if (value != buffer)
+		free(value);
+}
+
+void output_int(struct output_context *context, const char *name, int64_t value)
+{
+	item_start(context, name);
+	if (context->ops->int_)
+		context->ops->int_(context->file, value);
+	item_end(context, name);
+}
+
+void output_uint(struct output_context *context, const char *name, uint64_t value)
+{
+	item_start(context, name);
+	if (context->ops->uint)
+		context->ops->uint(context->file, value);
+	item_end(context, name);
+}
+
+void output_double(struct output_context *context, const char *name, double value,
+		   int precision)
+{
+	item_start(context, name);
+	if (context->ops->double_)
+		context->ops->double_(context->file, value, precision);
+	item_end(context, name);
+}
+
+void output_token(struct output_context *context, const char *fmt, ...)
+{
+	static char buffer[100];
+	int len;
+	char *token = buffer;
+	va_list ap;
+
+	va_start(ap, fmt);
+	len = vsnprintf(buffer, sizeof(buffer), fmt, ap);
+	va_end(ap);
+
+	if (len >= sizeof(buffer)) {
+		token = xmalloc(len + 1);
+		va_start(ap, fmt);
+		vsnprintf(token, len + 1, fmt, ap);
+		va_end(ap);
+	}
+
+	if (context->ops->token)
+		context->ops->token(context->file, token);
+
+	if (token != buffer)
+		free(token);
+}
+
+void output_nul(struct output_context *context)
+{
+	if (context->ops->nul)
+		context->ops->nul(context->file);
+}
+
+void output_newline(struct output_context *context)
+{
+	if (context->ops->newline)
+		context->ops->newline(context->file);
+}
+
+void output_next_directive(struct output_context *context, const char *directive)
+{
+	if (context->ops->directive)
+		context->ops->directive(directive);
+}
diff --git a/output.h b/output.h
new file mode 100644
index 0000000..c1a09d0
--- /dev/null
+++ b/output.h
@@ -0,0 +1,92 @@
+#ifndef OUTPUT_H
+#define OUTPUT_H
+
+enum output_style {
+	OUTPUT_NORMAL,
+	OUTPUT_ZERO,
+	OUTPUT_JSON,
+};
+
+struct output_ops {
+	void (*obj_start)(FILE *file, const char *name);
+	void (*obj_end)(FILE *file, const char *name);
+	void (*obj_item_start)(FILE *file, const char *name, int first);
+	void (*obj_item_end)(FILE *file, const char *name, int first);
+
+	void (*array_start)(FILE *file, const char *name);
+	void (*array_end)(FILE *file, const char *name);
+	void (*array_item_start)(FILE *file, const char *name, int first);
+	void (*array_item_end)(FILE *file, const char *name, int first);
+
+	void (*bool)(FILE *file, int value);
+	void (*str)(FILE *file, const char *value);
+	void (*int_)(FILE *file, int64_t value);
+	void (*uint)(FILE *file, uint64_t value);
+	void (*double_)(FILE *file, double value, int precision);
+
+	/*
+	 * ops defined after this comment are for backwards compatability with
+	 * existing output formats, and shouldn't be provided by new output
+	 * styles
+	 */
+	void (*token)(FILE *file, const char *token);
+	void (*nul)(FILE *file);
+	void (*newline)(FILE *file);
+	void (*directive)(const char *directive);
+};
+
+enum output_item_type {
+	OUTPUT_ITEM_OBJECT,
+	OUTPUT_ITEM_ARRAY,
+};
+
+struct output_item {
+	struct output_item *prev;
+	char *name;
+	enum output_item_type type;
+	int first;
+};
+
+struct output_context {
+	enum output_style style;
+	FILE *file;
+	struct output_ops *ops;
+	struct output_item *current;
+};
+
+extern struct option OUTPUT_OPTION;
+
+#define OPT_OUTPUT(s, l, v) { OPTION_STRING, (s), (l), (v), "style",     \
+			      "Use a structured output style, options: " \
+			      "no, zero, json (Default: zero)",          \
+			      PARSE_OPT_OPTARG, NULL, (intptr_t)"zero" }
+
+enum output_style handle_output_arg(char *s);
+
+struct output_context *output_start(enum output_style style);
+void output_end(struct output_context *context);
+
+void output_start_object(struct output_context *context, const char *name);
+void output_start_array(struct output_context *context, const char *name);
+void output_end_current(struct output_context *context);
+
+void output_bool(struct output_context *context, const char *name, int value);
+void output_str(struct output_context *context, const char *name, const char *value);
+void output_strf(struct output_context *context, const char *name, const char *fmt, ...);
+void output_int(struct output_context *context, const char *name, int64_t value);
+void output_uint(struct output_context *context, const char *name, uint64_t value);
+void output_double(struct output_context *context, const char *name, double value,
+		   int precision);
+
+/*
+ * These functions are used to output the fixed formatting tokens needed to
+ * output exisitng -z formats.  It should only be needed when reproducing
+ * existing output.
+ */
+
+void output_token(struct output_context *context, const char *fmt, ...);
+void output_nul(struct output_context *context);
+void output_newline(struct output_context *context);
+void output_next_directive(struct output_context *context, const char *directive);
+
+#endif /* OUTPUT_H */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC/PATCH v2 2/4] ls-tree: complete conversion to using output library
  2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
  2010-04-11 23:21 ` [RFC/PATCH v2 1/4] output: Add a " Julian Phillips
@ 2010-04-11 23:21 ` Julian Phillips
  2010-04-11 23:21 ` [RFC/PATCH v2 3/4] status: use output library for porcelain output Julian Phillips
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-11 23:21 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

All output from ls-tree now goes through the output library - even the
regular output.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 builtin/ls-tree.c |   51 +++++++++++++++++++++++++++++++++------------------
 1 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index dc86b0d..7e19d19 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -10,8 +10,8 @@
 #include "quote.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "output.h"
 
-static int line_termination = '\n';
 #define LS_RECURSIVE 1
 #define LS_TREE_ONLY 2
 #define LS_SHOW_TREES 4
@@ -22,6 +22,7 @@ static int ls_options;
 static const char **pathspec;
 static int chomp_prefix;
 static const char *ls_tree_prefix;
+static struct output_context *oc;
 
 static const  char * const ls_tree_usage[] = {
 	"git ls-tree [<options>] <tree-ish> [path...]",
@@ -90,27 +91,32 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen,
 	    (baselen < chomp_prefix || memcmp(ls_tree_prefix, base, chomp_prefix)))
 		return 0;
 
+	output_start_object(oc, "entry");
 	if (!(ls_options & LS_NAME_ONLY)) {
+		output_strf(oc, "mode", "%06o", mode);
+		output_token(oc, " ");
+		output_str(oc, "type", type);
+		output_token(oc, " ");
+		output_str(oc, "hash", find_unique_abbrev(sha1, abbrev));
 		if (ls_options & LS_SHOW_SIZE) {
-			char size_text[24];
-			if (!strcmp(type, blob_type)) {
+			if (strcmp(type, blob_type)) {
+				output_token(oc, "       -");
+			} else {
 				unsigned long size;
+				output_token(oc, " ");
+				output_next_directive(oc, "7");
 				if (sha1_object_info(sha1, &size) == OBJ_BAD)
-					strcpy(size_text, "BAD");
+					output_str(oc, "size", "BAD");
 				else
-					snprintf(size_text, sizeof(size_text),
-						 "%lu", size);
-			} else
-				strcpy(size_text, "-");
-			printf("%06o %s %s %7s\t", mode, type,
-			       find_unique_abbrev(sha1, abbrev),
-			       size_text);
-		} else
-			printf("%06o %s %s\t", mode, type,
-			       find_unique_abbrev(sha1, abbrev));
+					output_uint(oc, "size", size);
+			}
+		}
+		output_token(oc, "\t");
 	}
-	write_name_quotedpfx(base + chomp_prefix, baselen - chomp_prefix,
-			  pathname, stdout, line_termination);
+	output_strf(oc, "path", "%.*s%s", baselen - chomp_prefix,
+		    base + chomp_prefix, pathname);
+	output_newline(oc);
+	output_end_current(oc);
 	return retval;
 }
 
@@ -119,6 +125,8 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	unsigned char sha1[20];
 	struct tree *tree;
 	int full_tree = 0;
+	char *structured_output_arg = NULL;
+	enum output_style output_style;
 	const struct option ls_tree_options[] = {
 		OPT_BIT('d', NULL, &ls_options, "only show trees",
 			LS_TREE_ONLY),
@@ -126,8 +134,6 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			LS_RECURSIVE),
 		OPT_BIT('t', NULL, &ls_options, "show trees when recursing",
 			LS_SHOW_TREES),
-		OPT_SET_INT('z', NULL, &line_termination,
-			    "terminate entries with NUL byte", 0),
 		OPT_BIT('l', "long", &ls_options, "include object size",
 			LS_SHOW_SIZE),
 		OPT_BIT(0, "name-only", &ls_options, "list only filenames",
@@ -140,6 +146,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			    "list entire tree; not just current directory "
 			    "(implies --full-name)"),
 		OPT__ABBREV(&abbrev),
+		OPT_OUTPUT('z', "output", &structured_output_arg),
 		OPT_END()
 	};
 
@@ -159,6 +166,8 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	    ((LS_TREE_ONLY|LS_RECURSIVE) & ls_options))
 		ls_options |= LS_SHOW_TREES;
 
+	output_style = handle_output_arg(structured_output_arg);
+
 	if (argc < 1)
 		usage_with_options(ls_tree_usage, ls_tree_options);
 	if (get_sha1(argv[0], sha1))
@@ -168,7 +177,13 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 	tree = parse_tree_indirect(sha1);
 	if (!tree)
 		die("not a tree object");
+
+	oc = output_start(output_style);
+	output_start_array(oc, "entries");
+
 	read_tree_recursive(tree, "", 0, 0, pathspec, show_tree, NULL);
 
+	output_end(oc);
+
 	return 0;
 }
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC/PATCH v2 3/4] status: use output library for porcelain output
  2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
  2010-04-11 23:21 ` [RFC/PATCH v2 1/4] output: Add a " Julian Phillips
  2010-04-11 23:21 ` [RFC/PATCH v2 2/4] ls-tree: complete conversion to using output library Julian Phillips
@ 2010-04-11 23:21 ` Julian Phillips
  2010-04-11 23:21 ` [RFC/PATCH v2 4/4] output: WIP: Add XML backend Julian Phillips
  2010-04-11 23:35 ` [RFC/PATCH v2 0/4] A new library for plumbing output Sverre Rabbelier
  4 siblings, 0 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-11 23:21 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

Use the new output library for output when in porcelain mode.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 builtin/commit.c |   21 +++++++++++-
 wt-status.c      |   88 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 wt-status.h      |    3 +-
 3 files changed, 104 insertions(+), 8 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index c5ab683..4e506b1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -25,6 +25,7 @@
 #include "rerere.h"
 #include "unpack-trees.h"
 #include "quote.h"
+#include "output.h"
 
 static const char * const builtin_commit_usage[] = {
 	"git commit [options] [--] <filepattern>...",
@@ -68,6 +69,8 @@ static int all, edit_flag, also, interactive, only, amend, signoff;
 static int quiet, verbose, no_verify, allow_empty, dry_run, renew_authorship;
 static int no_post_rewrite;
 static char *untracked_files_arg, *force_date;
+static char *structured_output_arg;
+static enum output_style output_style;
 /*
  * The default commit message cleanup mode will remove the lines
  * beginning with # (shell comments) and leading and trailing
@@ -137,6 +140,7 @@ static struct option builtin_commit_options[] = {
 		    "show porcelain output format", STATUS_FORMAT_PORCELAIN),
 	OPT_BOOLEAN('z', "null", &null_termination,
 		    "terminate entries with NUL"),
+	OPT_OUTPUT(0, "output", &structured_output_arg),
 	OPT_BOOLEAN(0, "amend", &amend, "amend previous commit"),
 	OPT_BOOLEAN(0, "no-post-rewrite", &no_post_rewrite, "bypass post-rewrite hook"),
 	{ OPTION_STRING, 'u', "untracked-files", &untracked_files_arg, "mode", "show untracked files, optional modes: all, normal, no. (Default: all)", PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
@@ -420,7 +424,7 @@ static int run_status(FILE *fp, const char *index_file, const char *prefix, int
 		wt_shortstatus_print(s, null_termination);
 		break;
 	case STATUS_FORMAT_PORCELAIN:
-		wt_porcelain_print(s, null_termination);
+		wt_porcelain_print(s, output_style);
 		break;
 	case STATUS_FORMAT_LONG:
 		wt_status_print(s);
@@ -933,6 +937,11 @@ static int parse_and_validate_options(int argc, const char *argv[],
 
 	if (null_termination && status_format == STATUS_FORMAT_LONG)
 		status_format = STATUS_FORMAT_PORCELAIN;
+	output_style = handle_output_arg(structured_output_arg);
+	if (output_style == OUTPUT_NORMAL && null_termination)
+		output_style = OUTPUT_ZERO;
+	if (output_style != OUTPUT_NORMAL && status_format == STATUS_FORMAT_LONG)
+		status_format = STATUS_FORMAT_PORCELAIN;
 	if (status_format != STATUS_FORMAT_LONG)
 		dry_run = 1;
 
@@ -1031,6 +1040,7 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		  "mode",
 		  "show untracked files, optional modes: all, normal, no. (Default: all)",
 		  PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
+		OPT_OUTPUT(0, "output", &structured_output_arg),
 		OPT_END(),
 	};
 
@@ -1045,6 +1055,13 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 			     builtin_status_usage, 0);
 	handle_untracked_files_arg(&s);
 
+	output_style = handle_output_arg(structured_output_arg);
+
+	if (output_style == OUTPUT_NORMAL && null_termination)
+		output_style = OUTPUT_ZERO;
+	if (output_style != OUTPUT_NORMAL && status_format == STATUS_FORMAT_LONG)
+		status_format = STATUS_FORMAT_PORCELAIN;
+
 	if (*argv)
 		s.pathspec = get_pathspec(prefix, argv);
 
@@ -1066,7 +1083,7 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 		wt_shortstatus_print(&s, null_termination);
 		break;
 	case STATUS_FORMAT_PORCELAIN:
-		wt_porcelain_print(&s, null_termination);
+		wt_porcelain_print(&s, output_style);
 		break;
 	case STATUS_FORMAT_LONG:
 		s.verbose = verbose;
diff --git a/wt-status.c b/wt-status.c
index 8ca59a2..2a1e0fe 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -743,10 +743,88 @@ void wt_shortstatus_print(struct wt_status *s, int null_termination)
 	}
 }
 
-void wt_porcelain_print(struct wt_status *s, int null_termination)
+static void wt_porcelain_unmerged(struct string_list_item *it,
+				  struct output_context *oc)
 {
-	s->use_color = 0;
-	s->relative_paths = 0;
-	s->prefix = NULL;
-	wt_shortstatus_print(s, null_termination);
+	struct wt_status_change_data *d = it->util;
+	char *ours = "?", *theirs = "?";
+
+	switch (d->stagemask) {
+	case 1: ours = "D"; theirs = "D"; break; /* both deleted */
+	case 2: ours = "A"; theirs = "U"; break; /* added by us */
+	case 3: ours = "U"; theirs = "D"; break; /* deleted by them */
+	case 4: ours = "U"; theirs = "A"; break; /* added by them */
+	case 5: ours = "D"; theirs = "U"; break; /* deleted by us */
+	case 6: ours = "A"; theirs = "A"; break; /* both added */
+	case 7: ours = "U"; theirs = "U"; break; /* both modified */
+	}
+
+	output_start_object(oc, "entry");
+	output_str(oc, "ours", ours);
+	output_str(oc, "theirs", theirs);
+	output_token(oc, " ");
+	output_str(oc, "name", it->string);
+	output_newline(oc);
+	output_end_current(oc);
+}
+
+
+static void wt_porcelain_status(struct string_list_item *it,
+				struct output_context *oc)
+{
+	struct wt_status_change_data *d = it->util;
+	char index = ' ', worktree = ' ';
+
+	if (d->index_status)
+		index = d->index_status;
+	if (d->worktree_status)
+		worktree = d->worktree_status;
+
+	output_start_object(oc, "entry");
+	output_strf(oc, "index", "%c", index);
+	output_strf(oc, "worktree", "%c", worktree);
+	output_token(oc, " ");
+	if (d->head_path && oc->style == OUTPUT_NORMAL) {
+		output_str(oc, "orig_name", d->head_path);
+		output_token(oc, " -> ");
+	}
+	output_str(oc, "name", it->string);
+	if (d->head_path && oc->style != OUTPUT_NORMAL) {
+		output_nul(oc);
+		output_str(oc, "orig_name", d->head_path);
+	}
+	output_newline(oc);
+	output_end_current(oc);
+}
+
+void wt_porcelain_print(struct wt_status *s, enum output_style style)
+{
+	int i;
+	struct output_context *oc = output_start(style);
+
+	output_start_array(oc, "entries");
+
+	for (i = 0; i < s->change.nr; i++) {
+		struct wt_status_change_data *d;
+		struct string_list_item *it;
+
+		it = &(s->change.items[i]);
+		d = it->util;
+		if (d->stagemask)
+			wt_porcelain_unmerged(it, oc);
+		else
+			wt_porcelain_status(it, oc);
+	}
+
+	for (i = 0; i < s->untracked.nr; i++) {
+		output_start_object(oc, "entry");
+		output_str(oc, "index", "?");
+		output_str(oc, "worktree", "?");
+		output_token(oc, " ");
+		output_str(oc, "name", s->untracked.items[i].string);
+		output_newline(oc);
+		output_end_current(oc);
+	}
+
+	output_end(oc);
 }
diff --git a/wt-status.h b/wt-status.h
index 9120673..4461c64 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -4,6 +4,7 @@
 #include <stdio.h>
 #include "string-list.h"
 #include "color.h"
+#include "output.h"
 
 enum color_wt_status {
 	WT_STATUS_HEADER = 0,
@@ -60,6 +61,6 @@ void wt_status_print(struct wt_status *s);
 void wt_status_collect(struct wt_status *s);
 
 void wt_shortstatus_print(struct wt_status *s, int null_termination);
-void wt_porcelain_print(struct wt_status *s, int null_termination);
+void wt_porcelain_print(struct wt_status *s, enum output_style style);
 
 #endif /* STATUS_H */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC/PATCH v2 4/4] output: WIP: Add XML backend
  2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
                   ` (2 preceding siblings ...)
  2010-04-11 23:21 ` [RFC/PATCH v2 3/4] status: use output library for porcelain output Julian Phillips
@ 2010-04-11 23:21 ` Julian Phillips
  2010-04-11 23:35 ` [RFC/PATCH v2 0/4] A new library for plumbing output Sverre Rabbelier
  4 siblings, 0 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-11 23:21 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

This is the very beginnings of an XML style for the output library.
It still needs a lot of work.  There is no quoting, and the layout
still needs designing.  At this point the main purpose of this code is
to exercise the frontend/backend API in a different way from JSON.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 Makefile     |    1 +
 output-xml.c |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 output.c     |    4 +++
 output.h     |    3 +-
 4 files changed, 75 insertions(+), 1 deletions(-)
 create mode 100644 output-xml.c

diff --git a/Makefile b/Makefile
index dc38730..c84d7de 100644
--- a/Makefile
+++ b/Makefile
@@ -579,6 +579,7 @@ LIB_OBJS += object.o
 LIB_OBJS += output.o
 LIB_OBJS += output-json.o
 LIB_OBJS += output-normal.o
+LIB_OBJS += output-xml.o
 LIB_OBJS += output-zero.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-refs.o
diff --git a/output-xml.c b/output-xml.c
new file mode 100644
index 0000000..9c98ac9
--- /dev/null
+++ b/output-xml.c
@@ -0,0 +1,68 @@
+#include "git-compat-util.h"
+#include "output.h"
+
+static void xml_obj_start(FILE *file, const char *name)
+{
+	fprintf(file, "<object name=\"%s\">\n", name);
+}
+
+static void xml_obj_end(FILE *file, const char *name)
+{
+	fprintf(file, "</object>\n");
+}
+
+static void xml_obj_item_start(FILE *file, const char *name, int first)
+{
+	fprintf(file, "<%s>", name);
+}
+
+static void xml_obj_item_end(FILE *file, const char *name, int first)
+{
+	fprintf(file, "</%s>\n", name);
+}
+
+static void xml_bool(FILE *file, int value)
+{
+	if (value)
+		fprintf(file, "true");
+	else
+		fprintf(file, "false");
+}
+
+static void xml_str(FILE *file, const char *value)
+{
+	fprintf(file, "\"%s\"", value);
+}
+
+static void xml_int(FILE *file, int64_t value)
+{
+	fprintf(file, "%lld", value);
+}
+
+static void xml_uint(FILE *file, uint64_t value)
+{
+	fprintf(file, "%llu", value);
+}
+
+static void xml_double(FILE *file, double value, int precision)
+{
+	fprintf(file, "%.*f", precision, value);
+}
+
+struct output_ops output_xml_ops = {
+	xml_obj_start,
+	xml_obj_end,
+	xml_obj_item_start,
+	xml_obj_item_end,
+
+	NULL,
+	NULL,
+	NULL,
+	NULL,
+
+	xml_bool,
+	xml_str,
+	xml_int,
+	xml_uint,
+	xml_double,
+};
diff --git a/output.c b/output.c
index ac8feb1..3be1560 100644
--- a/output.c
+++ b/output.c
@@ -7,11 +7,13 @@
 extern struct output_ops output_normal_ops;
 extern struct output_ops output_zero_ops;
 extern struct output_ops output_json_ops;
+extern struct output_ops output_xml_ops;
 
 struct output_ops *output_ops[] = {
 	&output_normal_ops,
 	&output_zero_ops,
 	&output_json_ops,
+	&output_xml_ops,
 };
 
 enum output_style handle_output_arg(char *s)
@@ -24,6 +26,8 @@ enum output_style handle_output_arg(char *s)
 		return OUTPUT_ZERO;
 	else if (!strcmp(s, "json"))
 		return OUTPUT_JSON;
+	else if (!strcmp(s, "xml"))
+		return OUTPUT_XML;
 	else
 		die("Invalid output style '%s'", s);
 }
diff --git a/output.h b/output.h
index c1a09d0..cc0b921 100644
--- a/output.h
+++ b/output.h
@@ -5,6 +5,7 @@ enum output_style {
 	OUTPUT_NORMAL,
 	OUTPUT_ZERO,
 	OUTPUT_JSON,
+	OUTPUT_XML,
 };
 
 struct output_ops {
@@ -58,7 +59,7 @@ extern struct option OUTPUT_OPTION;
 
 #define OPT_OUTPUT(s, l, v) { OPTION_STRING, (s), (l), (v), "style",     \
 			      "Use a structured output style, options: " \
-			      "no, zero, json (Default: zero)",          \
+			      "no, zero, json, xml (Default: zero)",     \
 			      PARSE_OPT_OPTARG, NULL, (intptr_t)"zero" }
 
 enum output_style handle_output_arg(char *s);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
                   ` (3 preceding siblings ...)
  2010-04-11 23:21 ` [RFC/PATCH v2 4/4] output: WIP: Add XML backend Julian Phillips
@ 2010-04-11 23:35 ` Sverre Rabbelier
  2010-04-12  0:46   ` Eric Raymond
  2010-04-14 19:10   ` Jakub Narebski
  4 siblings, 2 replies; 28+ messages in thread
From: Sverre Rabbelier @ 2010-04-11 23:35 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git, Jakub Narebsk, Junio C Hamano, Eric Raymond

Heya,

On Mon, Apr 12, 2010 at 01:21, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> Probably the biggest change from v1 is an expanded aim.  Now the output library
> is aimed at controlling _all_ plubming output.  This series includes a patch for
> ls-tree that has all it's output going through the library, and a patch for
> status that has all the --porcelain output going through the library.

I like where this is going, a lot, especially since we don't have to
convert everything in one go, but we can do it as desired, similar to
optparsification. I still think more commands than just these two
should be converted to validate the design though, perhaps something
like 'git blame', or 'git for-each-ref'?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-11 23:35 ` [RFC/PATCH v2 0/4] A new library for plumbing output Sverre Rabbelier
@ 2010-04-12  0:46   ` Eric Raymond
  2010-04-14 19:10   ` Jakub Narebski
  1 sibling, 0 replies; 28+ messages in thread
From: Eric Raymond @ 2010-04-12  0:46 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Julian Phillips, git, Jakub Narebsk, Junio C Hamano

Sverre Rabbelier <srabbelier@gmail.com>:
> On Mon, Apr 12, 2010 at 01:21, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> > Probably the biggest change from v1 is an expanded aim.  Now the
> > output library is aimed at controlling _all_ plubming
> > output.  This series includes a patch for ls-tree that has all
> > it's output going through the library, and a patch for status that
> > has all the --porcelain output going through the library.
> 
> I like where this is going, a lot, especially since we don't have to
> convert everything in one go, but we can do it as desired, similar to
> optparsification.

Speaking as a major customer for the new capabilities it will enable,
I strongly concur.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 1/4] output: Add a new library for plumbing output
  2010-04-11 23:21 ` [RFC/PATCH v2 1/4] output: Add a " Julian Phillips
@ 2010-04-13  9:43   ` Ilari Liusvaara
  2010-04-13 11:46     ` Julian Phillips
  0 siblings, 1 reply; 28+ messages in thread
From: Ilari Liusvaara @ 2010-04-13  9:43 UTC (permalink / raw)
  To: Julian Phillips
  Cc: git, Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

On Mon, Apr 12, 2010 at 12:21:14AM +0100, Julian Phillips wrote:

I'm writing S-Expression output backend as experiment (not yet even sendable
as WIP) and hit an issue in general framework...

Also, some comments on documentation...

> +The output consists of objects, arrays and the actual values, the term item is
> +used where any of these may be used, and container when either an object or
> +array may be used.  Objects are unordered collections of named items, and arrays
> +are ordered collections of unnamed items.  For simplicity a name is always
> +supplied when creating an item - though it may not always be used (e.g. if you
> +are adding the item to a list).

List? Above says types are 'object', 'array' and 'value'. Then it defines
terms 'item' and 'container'. But what is 'list'?

> +* Unstructured Output Functions

Maybe add extra note about these. When one sees output_token used in code
outputting stuff, one can get puzzled until one realizes that token output
is ignored for non normal/zero outputs.

> diff --git a/output.c b/output.c
> new file mode 100644
> index 0000000..ac8feb1
> --- /dev/null
> +++ b/output.c

> +void output_end(struct output_context *context)
> +{
> +	while(context->current)
> +		output_end_current(context);
> +
> +	/*
> +	 * OUTPUT_NORMAL and OUTPUT_ZERO are special cases - the output format
> +	 * is _already_ defined so we have to stick to the rules, we can't add
> +	 * _anything_
> +	 */
> +	if (context->style > OUTPUT_ZERO)
> +		fprintf(context->file, "\n");

This is AFAIK really inapporiate for canonical S-Expression output. Point of
canonical S-Expressions is to have only one way to serialize given tree (bit
for bit identicality) and linefeeds are not allowed except as serialization
of linefeed in string.

Perhaps one could add method/flag to output backend to tell wheither to
print trailing linefeed?

> +
> +	free(context);
> +}

-Ilari

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 1/4] output: Add a new library for plumbing output
  2010-04-13  9:43   ` Ilari Liusvaara
@ 2010-04-13 11:46     ` Julian Phillips
  0 siblings, 0 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-13 11:46 UTC (permalink / raw)
  To: Ilari Liusvaara
  Cc: git, Jakub Narebsk, Junio C Hamano, Eric Raymond, Sverre Rabbelier

On Tue, 13 Apr 2010 12:43:51 +0300, Ilari Liusvaara
<ilari.liusvaara@elisanet.fi> wrote:
> On Mon, Apr 12, 2010 at 12:21:14AM +0100, Julian Phillips wrote:
> 
> I'm writing S-Expression output backend as experiment (not yet even
> sendable
> as WIP) and hit an issue in general framework...
> 
> Also, some comments on documentation...
> 
>> +The output consists of objects, arrays and the actual values, the term
>> item is
>> +used where any of these may be used, and container when either an
>> object or
>> +array may be used.  Objects are unordered collections of named items,
>> and arrays
>> +are ordered collections of unnamed items.  For simplicity a name is
>> always
>> +supplied when creating an item - though it may not always be used
(e.g.
>> if you
>> +are adding the item to a list).
> 
> List? Above says types are 'object', 'array' and 'value'. Then it
defines
> terms 'item' and 'container'. But what is 'list'?

typo - should be array.

>> +* Unstructured Output Functions
> 
> Maybe add extra note about these. When one sees output_token used in
code
> outputting stuff, one can get puzzled until one realizes that token
output
> is ignored for non normal/zero outputs.

Yes.  I intend to revist the header and documentation, as they were mostly
done before the normal output was added.

>> diff --git a/output.c b/output.c
>> new file mode 100644
>> index 0000000..ac8feb1
>> --- /dev/null
>> +++ b/output.c
> 
>> +void output_end(struct output_context *context)
>> +{
>> +	while(context->current)
>> +		output_end_current(context);
>> +
>> +	/*
>> +	 * OUTPUT_NORMAL and OUTPUT_ZERO are special cases - the output
format
>> +	 * is _already_ defined so we have to stick to the rules, we can't
add
>> +	 * _anything_
>> +	 */
>> +	if (context->style > OUTPUT_ZERO)
>> +		fprintf(context->file, "\n");
> 
> This is AFAIK really inapporiate for canonical S-Expression output.
Point
> of
> canonical S-Expressions is to have only one way to serialize given tree
> (bit
> for bit identicality) and linefeeds are not allowed except as
serialization
> of linefeed in string.
> 
> Perhaps one could add method/flag to output backend to tell wheither to
> print trailing linefeed?

I think that it probably makes sense to have explicit calls into the
backend for start and end rather than assuming that wrapping everything in
an object is appropriate, an XML backend for example could then use those
callbacks to do XML headers and the outmost element tags etc.

-- 
Julian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-11 23:35 ` [RFC/PATCH v2 0/4] A new library for plumbing output Sverre Rabbelier
  2010-04-12  0:46   ` Eric Raymond
@ 2010-04-14 19:10   ` Jakub Narebski
  2010-04-14 19:13     ` Sverre Rabbelier
                       ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Jakub Narebski @ 2010-04-14 19:10 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Julian Phillips, git, Junio C Hamano, Eric Raymond

On Mon, 12 April 2010, Sverre Rabbelier wrote:
> 
> On Mon, Apr 12, 2010 at 01:21, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> > Probably the biggest change from v1 is an expanded aim.  Now the output library
> > is aimed at controlling _all_ plubming output.  This series includes a patch for
> > ls-tree that has all it's output going through the library, and a patch for
> > status that has all the --porcelain output going through the library.
> 
> I like where this is going, a lot, especially since we don't have to
> convert everything in one go, but we can do it as desired, similar to
> optparsification. I still think more commands than just these two
> should be converted to validate the design though, perhaps something
> like 'git blame', or 'git for-each-ref'?

I don't think it is needed for either command.

'git blame' has --porcelain and --incremental output, which is line-based
and pretty much self-describing (with "header-name value" syntax for most
of it), and well documented.  JSON output would only add unnecessary
chatter and different quoting rules.

'git for-each-ref' has both --format=<format> to allow to get data what
one needs, and in the format one wants (with e.g. %00 to reresent NUL),
and [--shell|--perl|--python|--tcl] for placeholders in <format> to be
quoted as string literals suitable for specified host language.  Although
I am not sure if this option, meant to produce scriptlets, is used that
much/ note that there is not support for --json quoting, nor --xml 
escaping.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 19:10   ` Jakub Narebski
@ 2010-04-14 19:13     ` Sverre Rabbelier
  2010-04-14 21:42       ` Jakub Narebski
  2010-04-14 19:32     ` Junio C Hamano
  2010-04-14 20:57     ` [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
  2 siblings, 1 reply; 28+ messages in thread
From: Sverre Rabbelier @ 2010-04-14 19:13 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Julian Phillips, git, Junio C Hamano, Eric Raymond

Heya,

On Wed, Apr 14, 2010 at 21:10, Jakub Narebski <jnareb@gmail.com> wrote:
> I don't think it is needed for either command.

Those were just the first two plumbing commands that came to mind that
I use myself, do you have any suggestions for others that would be
more appropriate?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 19:10   ` Jakub Narebski
  2010-04-14 19:13     ` Sverre Rabbelier
@ 2010-04-14 19:32     ` Junio C Hamano
  2010-04-14 20:12       ` Jakub Narebski
  2010-04-14 20:57     ` [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
  2 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2010-04-14 19:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Sverre Rabbelier, Julian Phillips, git, Eric Raymond

Jakub Narebski <jnareb@gmail.com> writes:

> I don't think it is needed for either command.
>
> 'git blame' has --porcelain and --incremental output, which is line-based
> and pretty much self-describing (with "header-name value" syntax for most
> of it), and well documented.  JSON output would only add unnecessary
> chatter and different quoting rules.

Wouldn't the exact same argument apply equally well to the output format
of "status --porcelain", by the way?  It is line-based and pretty much
self-describing (once you know the mnemonic but you can make an educated
guess from previous SCM experience).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 19:32     ` Junio C Hamano
@ 2010-04-14 20:12       ` Jakub Narebski
  2010-04-14 20:38         ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Narebski @ 2010-04-14 20:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sverre Rabbelier, Julian Phillips, git, Eric Raymond

Dnia środa 14. kwietnia 2010 21:32, Junio C Hamano napisał:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > I don't think it is needed for either command.
> >
> > 'git blame' has --porcelain and --incremental output, which is line-based
> > and pretty much self-describing (with "header-name value" syntax for most
> > of it), and well documented.  JSON output would only add unnecessary
> > chatter and different quoting rules.
> 
> Wouldn't the exact same argument apply equally well to the output format
> of "status --porcelain", by the way?  It is line-based and pretty much
> self-describing (once you know the mnemonic but you can make an educated
> guess from previous SCM experience).

No, current "git status --porcelain" output is record-based (tabular);
the meaning is not described by header but depends on field in record,
i.e. position in line.

Self describing output of "git status --porcelain" would be

  filename <maybe-quoted filename>
  renamed-from <maybe-quoted filename>
  similarity 95%
  worktree ...
  index ...

or something like that...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 20:12       ` Jakub Narebski
@ 2010-04-14 20:38         ` Junio C Hamano
  2010-04-14 21:29           ` Jakub Narebski
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2010-04-14 20:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Sverre Rabbelier, Julian Phillips, git, Eric Raymond

Jakub Narebski <jnareb@gmail.com> writes:

>> Wouldn't the exact same argument apply equally well to the output format
>> of "status --porcelain", by the way?  It is line-based and pretty much
>> self-describing (once you know the mnemonic but you can make an educated
>> guess from previous SCM experience).
>
> No, current "git status --porcelain" output is record-based (tabular);
> the meaning is not described by header but depends on field in record,
> i.e. position in line.

Now, what's wrong about that?  For that matter, would you say "diff --raw"
output should be JSON/XMLified because it is columnar?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 19:10   ` Jakub Narebski
  2010-04-14 19:13     ` Sverre Rabbelier
  2010-04-14 19:32     ` Junio C Hamano
@ 2010-04-14 20:57     ` Julian Phillips
  2010-04-14 21:16       ` Jakub Narebski
  2010-04-15  7:15       ` Jeff King
  2 siblings, 2 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-14 20:57 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Sverre Rabbelier, git, Junio C Hamano, Eric Raymond

On Wed, 14 Apr 2010 21:10:35 +0200, Jakub Narebski <jnareb@gmail.com>
wrote:
> On Mon, 12 April 2010, Sverre Rabbelier wrote:
>> 
>> On Mon, Apr 12, 2010 at 01:21, Julian Phillips
>> <julian@quantumfyre.co.uk> wrote:
>> > Probably the biggest change from v1 is an expanded aim.  Now the
>> > output library
>> > is aimed at controlling _all_ plubming output.  This series includes
a
>> > patch for
>> > ls-tree that has all it's output going through the library, and a
>> > patch for
>> > status that has all the --porcelain output going through the library.
>> 
>> I like where this is going, a lot, especially since we don't have to
>> convert everything in one go, but we can do it as desired, similar to
>> optparsification. I still think more commands than just these two
>> should be converted to validate the design though, perhaps something
>> like 'git blame', or 'git for-each-ref'?
> 
> I don't think it is needed for either command.

I think that the ability to say that all plumbing output is available in a
variety of standard outputs is potentially useful.  In particular the
ability to be able to parse the output of all plumbing commands directly
into whatever native language the high-level tool is in using an already
existing standard parser makes life easier for those writing the tool.

> 'git blame' has --porcelain and --incremental output, which is
line-based
> and pretty much self-describing (with "header-name value" syntax for
most
> of it), and well documented.  JSON output would only add unnecessary
> chatter and different quoting rules.

That depends really.  If you are writing something to parse the output,
and you already have a JSON parser available then it's the current output
that has different quoting rules. ;)

Anyway, I have already converted blame to use the library for both
--porcelain and --incremental output, so it'll be in the next version of
the patch series.  So you can try before you buy ...

> 'git for-each-ref' has both --format=<format> to allow to get data what
> one needs, and in the format one wants (with e.g. %00 to reresent NUL),
> and [--shell|--perl|--python|--tcl] for placeholders in <format> to be
> quoted as string literals suitable for specified host language. 
Although
> I am not sure if this option, meant to produce scriptlets, is used that
> much/ note that there is not support for --json quoting, nor --xml 
> escaping.

-- 
Julian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 20:57     ` [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
@ 2010-04-14 21:16       ` Jakub Narebski
  2010-04-14 21:28         ` Julian Phillips
  2010-04-15  7:15       ` Jeff King
  1 sibling, 1 reply; 28+ messages in thread
From: Jakub Narebski @ 2010-04-14 21:16 UTC (permalink / raw)
  To: Julian Phillips; +Cc: Sverre Rabbelier, git, Junio C Hamano, Eric Raymond

On Wed,  14 Apr 2010, Julian Phillips wrote:
> On Wed, 14 Apr 2010 21:10:35 +0200, Jakub Narebski <jnareb@gmail.com>
> wrote:

> > 'git blame' has --porcelain and --incremental output, which is line-based
> > and pretty much self-describing (with "header-name value" syntax for most
> > of it), and well documented.  JSON output would only add unnecessary
> > chatter and different quoting rules.
> 
> That depends really.  If you are writing something to parse the output,
> and you already have a JSON parser available then it's the current output
> that has different quoting rules. ;)

True.

> 
> Anyway, I have already converted blame to use the library for both
> --porcelain and --incremental output, so it'll be in the next version of
> the patch series.  So you can try before you buy ...

Nice.

How did you managed to work with a bit non-standard rules of --porcelain
format, namely maybe-quoting of filenames, and that not all lines conform
to "<header> SP <value> LF" syntax: group definition begins with SHA-1,
and contents is indented with TAB?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 21:16       ` Jakub Narebski
@ 2010-04-14 21:28         ` Julian Phillips
  0 siblings, 0 replies; 28+ messages in thread
From: Julian Phillips @ 2010-04-14 21:28 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Sverre Rabbelier, git, Junio C Hamano, Eric Raymond

On Wed, 14 Apr 2010 23:16:06 +0200, Jakub Narebski <jnareb@gmail.com>
wrote:
> On Wed,  14 Apr 2010, Julian Phillips wrote:
>> On Wed, 14 Apr 2010 21:10:35 +0200, Jakub Narebski <jnareb@gmail.com>
>> wrote:
>> Anyway, I have already converted blame to use the library for both
>> --porcelain and --incremental output, so it'll be in the next version
of
>> the patch series.  So you can try before you buy ...
> 
> Nice.
> 
> How did you managed to work with a bit non-standard rules of --porcelain
> format, namely maybe-quoting of filenames, and that not all lines
conform
> to "<header> SP <value> LF" syntax: group definition begins with SHA-1,
> and contents is indented with TAB?

I had to extend the output API for quoting rules.  There are now
qstr/qstrf output functions that call different backend functions.  The
normal output then applies git quoting rules to the q versions only,
whereas the JSON output treats them both the same.

For the rest of it, only the actual data is going through the output_str
etc functions the rest is using output_token that is ignored by the JSON
backend.  I actually started converting blame twice - the first time I
added an API function that allowed telling the normal output how to format
values, but I decided that I wanted to apply grouping to the structured
output, so that e.g. the author and committer information were two objects
with name, mail, date and tz members so I never finished the first attempt.

-- 
Julian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 20:38         ` Junio C Hamano
@ 2010-04-14 21:29           ` Jakub Narebski
  2010-04-14 21:34             ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Narebski @ 2010-04-14 21:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sverre Rabbelier, Julian Phillips, git, Eric Raymond

On Wed, 14 April 2010, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>>> Wouldn't the exact same argument apply equally well to the output format
>>> of "status --porcelain", by the way?  It is line-based and pretty much
>>> self-describing (once you know the mnemonic but you can make an educated
>>> guess from previous SCM experience).
>>
>> No, current "git status --porcelain" output is record-based (tabular);
>> the meaning is not described by header but depends on field in record,
>> i.e. position in line.
> 
> Now, what's wrong about that?

Well, this whole idea started with the fact, that "git status --short"
was hard (or impossible) to parse unambigously by scripts[1], and even
"git status --porcelain -z"[2] is not that easy to parse[3].

With JSON output format one can use existing JSON parsers, which are
available in any language.

[1] And it was woefully underdocumented
[2] I wonder why git-config and git-grep have '--null' as long version
    of '-z' option... and only those.
[3] I rather liked the idea of -Z output format, the form that uses
    NUL as field separator for each field (and not only filenames),
    and NUL NUL as record terminator; it makes parsing much easier
    because you don't need to take a look at other field to know
    where the record ends.

> For that matter, would you say "diff --raw" output should be
> JSON/XMLified because it is columnar? 

It would be nice to have raw diff format JSONified, or have --porcelain
(like "git blame --porcelain" output format) version of it.  Especially
for "diff -c --raw" i.e. raw output format for merges, which lacks some
information, namely filename and similarity score for n-th pre-image,
if rename or copy was detected.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 21:29           ` Jakub Narebski
@ 2010-04-14 21:34             ` Junio C Hamano
  2010-04-15  6:57               ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2010-04-14 21:34 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Junio C Hamano, Sverre Rabbelier, Julian Phillips, git, Eric Raymond

Jakub Narebski <jnareb@gmail.com> writes:

> Well, this whole idea started with the fact, that "git status --short"
> was hard (or impossible) to parse unambigously by scripts[1], and even
> "git status --porcelain -z"[2] is not that easy to parse[3].

And you apparently seem to agree with that claim, but I don't.  I think
Jeff (who did the --porcelain stuff; by the way, why did we lose him from
Cc list?) has already said that he is open to an update.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 19:13     ` Sverre Rabbelier
@ 2010-04-14 21:42       ` Jakub Narebski
  0 siblings, 0 replies; 28+ messages in thread
From: Jakub Narebski @ 2010-04-14 21:42 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Julian Phillips, git, Junio C Hamano, Eric Raymond

Dnia środa 14. kwietnia 2010 21:13, Sverre Rabbelier napisał:
> Heya,
> 
> On Wed, Apr 14, 2010 at 21:10, Jakub Narebski <jnareb@gmail.com> wrote:
> > I don't think it is needed for either command.
> 
> Those were just the first two plumbing commands that came to mind that
> I use myself, do you have any suggestions for others that would be
> more appropriate?

"git config [<type>] --get <name>" and friends... especially that even
in the case of '{ name: "<name>"; value: <value> }' (and without e.g.
"type: <type>" field) we can have <value> of correct JSON type.

"git ls-files" (especially when mixing information about files in
working area and those in index), "git diff-tree" (especially for merges,
as ordinary columnar output do not include all possible information,
like pre-image filename in case of renames), perhaps "git branch" so
people stop trying to parse it in scripts, perhaps "git describe"
(you need "git describe --long" to unambiguously parse its output),
perhaps "git remote show" (I am not sure about this case), "git show-ref"
if "git for-each-ref" didn't exists...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 21:34             ` Junio C Hamano
@ 2010-04-15  6:57               ` Jeff King
  2010-04-15  9:07                 ` Jakub Narebski
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2010-04-15  6:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jakub Narebski, Sverre Rabbelier, Julian Phillips, git, Eric Raymond

On Wed, Apr 14, 2010 at 02:34:01PM -0700, Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > Well, this whole idea started with the fact, that "git status --short"
> > was hard (or impossible) to parse unambigously by scripts[1], and even
> > "git status --porcelain -z"[2] is not that easy to parse[3].
> 
> And you apparently seem to agree with that claim, but I don't.  I think
> Jeff (who did the --porcelain stuff; by the way, why did we lose him from
> Cc list?) has already said that he is open to an update.

I haven't seen any evidence that status --porcelain (or its -z form) is
impossible to parse unambiguously. I don't even think it's that hard,
but it certainly could be easier. But more importantly, from looking at
the output it's not necessarily _obvious_ how to parse it correctly
(e.g., whitespace as value and as field separator, syntax of "-z"
depends on semantics of field contents).

The approach I proposed was to leave it be and document it a bit better.
Adding some format that is close but subtly different is just going to
lead to more confusion.

But since Julian was willing to do the JSON work, I think that is a much
nicer approach. It's not subtly different; it's very different and way
easier to read and parse. And I'm really happy with the way he has
structured the code to handle multiple output formats. It keeps the code
much cleaner, and it should silence any "but YAML is better than JSON is
better than XML" debates.

Even with Julian's patches, we should still better document the regular
and "-z" forms. Eric promised to send some patches this week; I'm hoping
he is still interested in doing so after seeing a better solution arise.
:)

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-14 20:57     ` [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
  2010-04-14 21:16       ` Jakub Narebski
@ 2010-04-15  7:15       ` Jeff King
  1 sibling, 0 replies; 28+ messages in thread
From: Jeff King @ 2010-04-15  7:15 UTC (permalink / raw)
  To: Julian Phillips
  Cc: Jakub Narebski, Sverre Rabbelier, git, Junio C Hamano, Eric Raymond

On Wed, Apr 14, 2010 at 09:57:27PM +0100, Julian Phillips wrote:

> > 'git blame' has --porcelain and --incremental output, which is
> [...]
> > JSON output would only add unnecessary chatter and different quoting
> > rules.
> 
> That depends really.  If you are writing something to parse the output,
> and you already have a JSON parser available then it's the current output
> that has different quoting rules. ;)

Every once in a while, I have some crazy idea for a short script that is
built around blame output (e.g., counting contributors by line count).
Something that I might do in a little one-off perl script. And my
experience has been that 90% of the script ends up parsing and managing
commit blocks, and not the computation of interest.

Not that it's a lot of lines, mind you, but having to write 20 lines of
parser to do a perl one-liner on the result is annoying. I would be very
happy to have some 1 or 2 line solution where one of the lines is "use
JSON;".

> Anyway, I have already converted blame to use the library for both
> --porcelain and --incremental output, so it'll be in the next version of
> the patch series.  So you can try before you buy ...

I'll be curious to see it. I hope you will (at least optionally) wrap
the _whole_ output and not just the commit blocks. It would be nice to
just suck it in all at once and walk the data structure. But it may be
tricky because the output suppresses the commit description for commits
that have already been output. You would probably want a list of lines
and a map of commits, like:

  {
    "lines": [
      { sha1 and line info }
      { sha1 and line info }
      ...
    ],
    "commits": {
      "$sha1": { commit info },
      ...
  }

which is close to what I would parse to in a script, except I would
actually drop the "commits" map and point directly to the commit info
from each line.

Is there a way in JSON to refer to the contents of a previous item
without just outputting the same data again? I assume not, and even if
there is, other output formats like XML wouldn't handle it.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-15  6:57               ` Jeff King
@ 2010-04-15  9:07                 ` Jakub Narebski
  2010-04-17  9:53                   ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Narebski @ 2010-04-15  9:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Sverre Rabbelier, Julian Phillips, git, Eric Raymond

On Thu, 15 April 2010, Jeff King wrote:
> On Wed, Apr 14, 2010 at 02:34:01PM -0700, Junio C Hamano wrote:
> > Jakub Narebski <jnareb@gmail.com> writes:
> > 
> > > Well, this whole idea started with the fact, that "git status --short"
> > > was hard (or impossible) to parse unambigously by scripts[1], and even
> > > "git status --porcelain -z"[2] is not that easy to parse[3].
> > 
> > And you apparently seem to agree with that claim, but I don't.  I think
> > Jeff (who did the --porcelain stuff; by the way, why did we lose him from
> > Cc list?) has already said that he is open to an update.
> 
> I haven't seen any evidence that status --porcelain (or its -z form) is
> impossible to parse unambiguously. I don't even think it's that hard,
> but it certainly could be easier. But more importantly, from looking at
> the output it's not necessarily _obvious_ how to parse it correctly
> (e.g., whitespace as value and as field separator, syntax of "-z"
> depends on semantics of field contents).

Well, IMVHO output of "git status --short" / "git status --porcelain"
(without '-z') is very hard to parse.  Even assuming that in the case
of ambiguity filenames are quoted (which also means that in the case of
ambiguity whether they are quoted they must be quoted), the fact that
separator between source and destination filename in the case of rename
detection is " -> " (if I understand it correctly), and neither of ' '
(SPC), '-' nor '>' is replaced by escape sequence means that one needs
to detect where quoted filename begins and where ends.  This means
either parsing character by character, taking into account quoting and
escaping (e.g. '\\', '\"' etc.), or using 'balanced quote' regexp like
the one from Text::Balanced, e.g.:  (?:\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\")

What was the reason behind choosing " -> " as separator between pair[1]
of filenames in rename, instead of using default "git diff --stat" format
i.e. 'arch/{i386 => x86}/Makefile' for "git status --short" which is
meant for end user, and for "git status --porcelain" the same format 
that raw diff format, i.e. with TAB as separator between filenames,
and filename quited if it contains TAB (then TAB is relaced by '\t',
and does not appear in filename, therefore you can split on TAB)?

IMVHO "git status --porcelain -z" format is not easy to parse either.
(The same can be said for "git diff --raw -z" output format.)  You
can't just split on record separator; you have to take into account
status to check if there are two filenames or one.

[1] A question: we have working area version, index version, and HEAD
    version of file.  Isn't it possible for *each* of them to have 
    different filename?  What about the case of rename/rename merge
    conflict?
> 
> The approach I proposed was to leave it be and document it a bit better.
> Adding some format that is close but subtly different is just going to
> lead to more confusion.

Well, the proposed '-Z' output format, in the OFS="\0", ORS="\0\0"
variant, would be very easy to parse.  If I understand it correctly
it is also one of available format in outputification^W in this series.

> 
> But since Julian was willing to do the JSON work, I think that is a much
> nicer approach. It's not subtly different; it's very different and way
> easier to read and parse. And I'm really happy with the way he has
> structured the code to handle multiple output formats. It keeps the code
> much cleaner, and it should silence any "but YAML is better than JSON is
> better than XML" debates.

I really like this outputification ;-) too.

Although if possible I'd like to have it wrapped in utility macros,
like parseopt, so one does not need to write output_str / output_int
etc.... but currently it is very, very vague sketch of an idea, rather
than realized concept.

> 
> Even with Julian's patches, we should still better document the regular
> and "-z" forms. Eric promised to send some patches this week; I'm hoping
> he is still interested in doing so after seeing a better solution arise.
> :)

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-15  9:07                 ` Jakub Narebski
@ 2010-04-17  9:53                   ` Jeff King
  2010-04-17 13:02                     ` Jakub Narebski
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2010-04-17  9:53 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Junio C Hamano, Sverre Rabbelier, Julian Phillips, git, Eric Raymond

On Thu, Apr 15, 2010 at 11:07:32AM +0200, Jakub Narebski wrote:

> Well, IMVHO output of "git status --short" / "git status --porcelain"
> (without '-z') is very hard to parse.  Even assuming that in the case
> of ambiguity filenames are quoted (which also means that in the case of
> ambiguity whether they are quoted they must be quoted), the fact that

For the record, they are properly quoted in the non-z form.

> [some reasons it's hard to parse]

Yeah, I don't disagree with your reasons (which are largely the same as
Eric's). I just don't think it's "oh no this is useless and we have to
start again" hard.

> What was the reason behind choosing " -> " as separator between pair[1]
> of filenames in rename, instead of using default "git diff --stat" format
> i.e. 'arch/{i386 => x86}/Makefile' for "git status --short" which is
> meant for end user, and for "git status --porcelain" the same format 
> that raw diff format, i.e. with TAB as separator between filenames,
> and filename quited if it contains TAB (then TAB is relaced by '\t',
> and does not appear in filename, therefore you can split on TAB)?

I don't know Junio's reason for using " -> " in --short; probably
because it was the format used in non-short status. For --porcelain, it
was simply because I used exactly --short. I assumed that --short was
suitable for parsing (which it _is_, it just has some rough edges), and
wanted to provide an option right away that would keep the output
stable, so we didn't run into the usual problem of people wanting to
enhance the human-readable interface, but being blocked by script
compatibility.

> IMVHO "git status --porcelain -z" format is not easy to parse either.
> (The same can be said for "git diff --raw -z" output format.)  You
> can't just split on record separator; you have to take into account
> status to check if there are two filenames or one.

Yep, I agree. I think the JSON approach is the best solution, as it is
separating syntax from semantics.

> [1] A question: we have working area version, index version, and HEAD
>     version of file.  Isn't it possible for *each* of them to have 
>     different filename?  What about the case of rename/rename merge
>     conflict?

Good question. The answer is no, the three different versions can't have
three filenames on the same line, because we don't do rename detection
between the working tree and the index. Which makes sense. Consider
something like this:

  mkdir repo && cd repo && git init
  echo content >one
  git add one && git commit -m one
  mv one two && git add -A
  mv two three
  git status

We will see the movement of "one -> two" between the index and HEAD. In
theory we could see the movement of "three -> two" between the index and
working tree. But "three" isn't tracked, so instead we see "two" deleted
and "three" untracked. We can mark "three" with intent-to-add to note
that we are interested in it, but then it is not a new file any more
(since it has an index entry), and is therefore not eligible for rename
detection.

As for a rename/rename conflict, it gets represented in the index as
both deleting the source and then each side adding its new version with
a conflict. So:

  mkdir repo && cd repo && git init
  echo content >one
  git add one && git commit -m base
  git mv one two && git commit -m two
  git checkout -b other HEAD^
  git mv one three && git commit -m three
  git merge master
  git status

generates:

  # On branch other
  # Unmerged paths:
  #       both deleted:       one
  #       added by us:        three
  #       added by them:      two

and an equivalent short-status form.

> Although if possible I'd like to have it wrapped in utility macros,
> like parseopt, so one does not need to write output_str / output_int
> etc.... but currently it is very, very vague sketch of an idea, rather
> than realized concept.

I'm not sure I understand what utility macros you would want.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-17  9:53                   ` Jeff King
@ 2010-04-17 13:02                     ` Jakub Narebski
  2010-04-17 14:00                       ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Narebski @ 2010-04-17 13:02 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Sverre Rabbelier, Julian Phillips, git, Eric Raymond

On Sat, 17 Apr 2010, Jeff King wrote:
> On Thu, Apr 15, 2010 at 11:07:32AM +0200, Jakub Narebski wrote:

> > [1] A question: we have working area version, index version, and HEAD
> >     version of file.  Isn't it possible for *each* of them to have 
> >     different filename?  What about the case of rename/rename merge
> >     conflict?

[cut]

Thanks for detailed explanation.

> > Although if possible I'd like to have it wrapped in utility macros,
> > like parseopt, so one does not need to write output_str / output_int
> > etc.... but currently it is very, very vague sketch of an idea, rather
> > than realized concept.
> 
> I'm not sure I understand what utility macros you would want.

Something like that (please remember that it is still in vague beginnings
of an idea stage:

  OUT_OBJECT(
     OUT_FIELD("mode",   OUT_MODE, tree.mode), SP,
     OUT_FIELD("type",   "%s", tree.object.type), SP,
     OUT_FIELD("object", OUT_SHA1, tree.object.sha1), TAB,
     OUT_FIELD("file", OUT_FILE(sep), tree.filename), 
     sep
  );

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output
  2010-04-17 13:02                     ` Jakub Narebski
@ 2010-04-17 14:00                       ` Jeff King
  2010-04-18 21:46                         ` [RFC/PATCH v2 0/4] A new library for plumbing output (inc. current status) Julian Phillips
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2010-04-17 14:00 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Junio C Hamano, Sverre Rabbelier, Julian Phillips, git, Eric Raymond

On Sat, Apr 17, 2010 at 03:02:39PM +0200, Jakub Narebski wrote:

> Something like that (please remember that it is still in vague beginnings
> of an idea stage:
> 
>   OUT_OBJECT(
>      OUT_FIELD("mode",   OUT_MODE, tree.mode), SP,
>      OUT_FIELD("type",   "%s", tree.object.type), SP,
>      OUT_FIELD("object", OUT_SHA1, tree.object.sha1), TAB,
>      OUT_FIELD("file", OUT_FILE(sep), tree.filename), 
>      sep
>   );

Doing that would require variadic macros, which are a C99-ism. So you
would have to do:

  OUT_OBJECT_START();
    OUT_FIELD("mode", OUT_MODE, tree.mode); OUT_SP;
    ...
  OUT_OBJECT_END();

which is not all that different from what Julian has now. I do think
some type-specific conversions might be handy. They don't even need to
be macros. E.g.,:

  void output_mode(struct output_context *oc, int mode)
  {
    output_strf(oc, "mode", "%06o", mode);
  }

OTOH, looking over Julian's last patch series, there really aren't that
many that would be generally applicable, and as you can see they only
save a few characters, not even a line. A few bigger objects could be
factored out, but he has already done that (e.g., see
wt_porcelain_unmerged in his v2 3/4).

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output (inc. current status)
  2010-04-17 14:00                       ` Jeff King
@ 2010-04-18 21:46                         ` Julian Phillips
  2010-04-19 19:40                           ` Jeff King
  0 siblings, 1 reply; 28+ messages in thread
From: Julian Phillips @ 2010-04-18 21:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Jakub Narebski, Junio C Hamano, Sverre Rabbelier, git, Eric Raymond

On Sat, 17 Apr 2010 10:00:53 -0400, Jeff King <peff@peff.net> wrote:
> On Sat, Apr 17, 2010 at 03:02:39PM +0200, Jakub Narebski wrote:
> 
>> Something like that (please remember that it is still in vague
beginnings
>> of an idea stage:
>> 
>>   OUT_OBJECT(
>>      OUT_FIELD("mode",   OUT_MODE, tree.mode), SP,
>>      OUT_FIELD("type",   "%s", tree.object.type), SP,
>>      OUT_FIELD("object", OUT_SHA1, tree.object.sha1), TAB,
>>      OUT_FIELD("file", OUT_FILE(sep), tree.filename), 
>>      sep
>>   );
> 
> Doing that would require variadic macros, which are a C99-ism. So you
> would have to do:
> 
>   OUT_OBJECT_START();
>     OUT_FIELD("mode", OUT_MODE, tree.mode); OUT_SP;
>     ...
>   OUT_OBJECT_END();

Also, backends such as JSON want to know which things are strings, and
which are numbers - as they print differently.  An XML backend may want to
distinguish even more (though I guess that depends on the design).

> which is not all that different from what Julian has now. I do think
> some type-specific conversions might be handy. They don't even need to
> be macros. E.g.,:
> 
>   void output_mode(struct output_context *oc, int mode)
>   {
>     output_strf(oc, "mode", "%06o", mode);
>   }
> 
> OTOH, looking over Julian's last patch series, there really aren't that
> many that would be generally applicable, and as you can see they only
> save a few characters, not even a line. A few bigger objects could be
> factored out, but he has already done that (e.g., see
> wt_porcelain_unmerged in his v2 3/4).

It might help standardise the output between commands if there were helper
functions for some of the larger structures - e.g. commits.  Though I don't
think that those functions would be able to do legacy output, due to the
current lack of cross-command output compatibility.  I'm starting to see
this with blame and diff-tree (and family), where they both want to output
information about commits.

I think that maybe I need to design and document the output structure for
common concepts - so that it would be possible to pass the output from any
command to a common parser, with matching utility functions in the code. 
Though, I'm not sure if there actually are any common concepts that need
outputting apart from commits.

Current Status
--------------

I had been planning to post an updated series this weekend, but I'm too
tired to attempt tidying things up for posting at the moment ... If you
want to see the current state then my current mess is available at
http://git.q42.co.uk/w/output.git.

A quick summary of main changes since v2:
  - backends are now in a subdirectory
  - blame, diff-tree, have --ouptut=... support for plumbing output
  - log has some support for --ouput=...
  - output library has extended API, including quoted strings and
is_structured_output function
  - backend API includes explicit functions for top-level items

-- 
Julian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC/PATCH v2 0/4] A new library for plumbing output (inc. current status)
  2010-04-18 21:46                         ` [RFC/PATCH v2 0/4] A new library for plumbing output (inc. current status) Julian Phillips
@ 2010-04-19 19:40                           ` Jeff King
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2010-04-19 19:40 UTC (permalink / raw)
  To: Julian Phillips
  Cc: Jakub Narebski, Junio C Hamano, Sverre Rabbelier, git, Eric Raymond

On Sun, Apr 18, 2010 at 10:46:18PM +0100, Julian Phillips wrote:

> It might help standardise the output between commands if there were helper
> functions for some of the larger structures - e.g. commits.  Though I don't
> think that those functions would be able to do legacy output, due to the
> current lack of cross-command output compatibility.  I'm starting to see
> this with blame and diff-tree (and family), where they both want to output
> information about commits.

Yeah, that was what I saw on looking at the code. And we have to support
those old formats, obviously. For the most part, I found the level of
verbosity in the patches you posted (and I just peeked at your repo) to
be fine. Sure, it's more lines, but they're IMHO very easy to read.

If we have to tradeoff between either duplicating output entirely (for
both the output form and traditional form) or having a more flexible but
slightly more verbose output library, I think I would rather go with the
latter. It will be more maintainable in the long run.

-Peff

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2010-04-19 19:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-11 23:21 [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 1/4] output: Add a " Julian Phillips
2010-04-13  9:43   ` Ilari Liusvaara
2010-04-13 11:46     ` Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 2/4] ls-tree: complete conversion to using output library Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 3/4] status: use output library for porcelain output Julian Phillips
2010-04-11 23:21 ` [RFC/PATCH v2 4/4] output: WIP: Add XML backend Julian Phillips
2010-04-11 23:35 ` [RFC/PATCH v2 0/4] A new library for plumbing output Sverre Rabbelier
2010-04-12  0:46   ` Eric Raymond
2010-04-14 19:10   ` Jakub Narebski
2010-04-14 19:13     ` Sverre Rabbelier
2010-04-14 21:42       ` Jakub Narebski
2010-04-14 19:32     ` Junio C Hamano
2010-04-14 20:12       ` Jakub Narebski
2010-04-14 20:38         ` Junio C Hamano
2010-04-14 21:29           ` Jakub Narebski
2010-04-14 21:34             ` Junio C Hamano
2010-04-15  6:57               ` Jeff King
2010-04-15  9:07                 ` Jakub Narebski
2010-04-17  9:53                   ` Jeff King
2010-04-17 13:02                     ` Jakub Narebski
2010-04-17 14:00                       ` Jeff King
2010-04-18 21:46                         ` [RFC/PATCH v2 0/4] A new library for plumbing output (inc. current status) Julian Phillips
2010-04-19 19:40                           ` Jeff King
2010-04-14 20:57     ` [RFC/PATCH v2 0/4] A new library for plumbing output Julian Phillips
2010-04-14 21:16       ` Jakub Narebski
2010-04-14 21:28         ` Julian Phillips
2010-04-15  7:15       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.