All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/15] store length of commit->buffer
@ 2014-06-09 18:02 Jeff King
  2014-06-09 18:09 ` [PATCH 01/15] alloc: include any-object allocations in alloc_report Jeff King
                   ` (15 more replies)
  0 siblings, 16 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Here's my series to drop "buffer" from "struct commit" in favor of a
slab, and then add in a length field. It's a lot of commits, but I tried
to break it down into readable chunks.

  [01/15]: alloc: include any-object allocations in alloc_report
  [02/15]: commit: push commit_index update into alloc_commit_node
  [03/15]: do not create "struct commit" with xcalloc
  [04/15]: logmsg_reencode: return const buffer
  [05/15]: sequencer: use logmsg_reencode in get_message
  [06/15]: provide a helper to free commit buffer
  [07/15]: provide a helper to set the commit buffer
  [08/15]: provide helpers to access the commit buffer
  [09/15]: use get_cached_commit_buffer where appropriate
  [10/15]: use get_commit_buffer to avoid duplicate code
  [11/15]: convert logmsg_reencode to get_commit_buffer
  [12/15]: use get_commit_buffer everywhere
  [13/15]: commit-slab: provide a static initializer
  [14/15]: commit: convert commit->buffer to a slab
  [15/15]: commit: record buffer length in cache

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/15] alloc: include any-object allocations in alloc_report
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
@ 2014-06-09 18:09 ` Jeff King
  2014-06-09 18:09 ` [PATCH 02/15] commit: push commit_index update into alloc_commit_node Jeff King
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:09 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

When 2c1cbec (Use proper object allocators for unknown
object nodes too, 2007-04-16), added a special "any_object"
allocator, it never taught alloc_report to report on it. To
do so we need to add an extra type argument to the REPORT
macro, as that commit did for DEFINE_ALLOCATOR.

Signed-off-by: Jeff King <peff@peff.net>
---
My ulterior motive is that I'm going to add more changes that would
disrupt this.  I'd also be happy to just drop alloc_report, as nobody
calls it (it was purely for debugging, and I suspect hasn't been used
since the initial work in 2007).

 alloc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/alloc.c b/alloc.c
index f3ee745..38ff7e7 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,13 +57,14 @@ static void report(const char *name, unsigned int count, size_t size)
 			name, count, (uintmax_t) size);
 }
 
-#define REPORT(name)	\
-    report(#name, name##_allocs, name##_allocs * sizeof(struct name) >> 10)
+#define REPORT(name, type)	\
+    report(#name, name##_allocs, name##_allocs * sizeof(type) >> 10)
 
 void alloc_report(void)
 {
-	REPORT(blob);
-	REPORT(tree);
-	REPORT(commit);
-	REPORT(tag);
+	REPORT(blob, struct blob);
+	REPORT(tree, struct tree);
+	REPORT(commit, struct commit);
+	REPORT(tag, struct tag);
+	REPORT(object, union any_object);
 }
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/15] commit: push commit_index update into alloc_commit_node
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
  2014-06-09 18:09 ` [PATCH 01/15] alloc: include any-object allocations in alloc_report Jeff King
@ 2014-06-09 18:09 ` Jeff King
  2014-06-10 21:31   ` Junio C Hamano
  2014-06-09 18:10 ` [PATCH 03/15] do not create "struct commit" with xcalloc Jeff King
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:09 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Whenever we create a commit object via lookup_commit, we
give it a unique index to be used with the commit-slab API.
The theory is that any "struct commit" we create would
follow this code path, so any such struct would get an
index. However, callers could use alloc_commit_node()
directly (and get multiple commits with index 0).

Let's push the indexing into alloc_commit_node so that it's
hard for callers to get it wrong.

Signed-off-by: Jeff King <peff@peff.net>
---
This will make the alloc_report output a little uglier (it will say
raw_commit), but I don't think anyone cares. And I wanted to make sure
there wasn't an easy way to accidentally call the wrong alloc function
that doesn't handle the index.

 alloc.c  | 12 ++++++++++--
 commit.c |  2 --
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 38ff7e7..eb22a45 100644
--- a/alloc.c
+++ b/alloc.c
@@ -47,10 +47,18 @@ union any_object {
 
 DEFINE_ALLOCATOR(blob, struct blob)
 DEFINE_ALLOCATOR(tree, struct tree)
-DEFINE_ALLOCATOR(commit, struct commit)
+DEFINE_ALLOCATOR(raw_commit, struct commit)
 DEFINE_ALLOCATOR(tag, struct tag)
 DEFINE_ALLOCATOR(object, union any_object)
 
+void *alloc_commit_node(void)
+{
+	static int commit_count;
+	struct commit *c = alloc_raw_commit_node();
+	c->index = commit_count++;
+	return c;
+}
+
 static void report(const char *name, unsigned int count, size_t size)
 {
 	fprintf(stderr, "%10s: %8u (%"PRIuMAX" kB)\n",
@@ -64,7 +72,7 @@ void alloc_report(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
-	REPORT(commit, struct commit);
+	REPORT(raw_commit, struct commit);
 	REPORT(tag, struct tag);
 	REPORT(object, union any_object);
 }
diff --git a/commit.c b/commit.c
index f479331..21957ee 100644
--- a/commit.c
+++ b/commit.c
@@ -17,7 +17,6 @@ static struct commit_extra_header *read_commit_extra_header_lines(const char *bu
 int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
-static int commit_count;
 
 static struct commit *check_commit(struct object *obj,
 				   const unsigned char *sha1,
@@ -64,7 +63,6 @@ struct commit *lookup_commit(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj) {
 		struct commit *c = alloc_commit_node();
-		c->index = commit_count++;
 		return create_object(sha1, OBJ_COMMIT, c);
 	}
 	if (!obj->type)
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/15] do not create "struct commit" with xcalloc
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
  2014-06-09 18:09 ` [PATCH 01/15] alloc: include any-object allocations in alloc_report Jeff King
  2014-06-09 18:09 ` [PATCH 02/15] commit: push commit_index update into alloc_commit_node Jeff King
@ 2014-06-09 18:10 ` Jeff King
  2014-06-10 21:35   ` Junio C Hamano
  2014-06-09 18:10 ` [PATCH 04/15] logmsg_reencode: return const buffer Jeff King
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

In both blame and merge-recursive, we sometimes create a
"fake" commit struct for convenience (e.g., to represent the
HEAD state as if we would commit it). By allocating
ourselves rather than using alloc_commit_node, we do not
properly set the "index" field of the commit. This can
produce subtle bugs if we then use commit-slab on the
resulting commit, as we will share the "0" index with
another commit.

We can fix this by using alloc_commit_node() to allocate.
Note that we cannot free the result, as it is part of our
commit allocator. However, both cases were already leaking
the allocated commit anyway, so there's nothing to fix up.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c   | 2 +-
 merge-recursive.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index a52a279..d6056a5 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2286,7 +2286,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 	struct strbuf msg = STRBUF_INIT;
 
 	time(&now);
-	commit = xcalloc(1, sizeof(*commit));
+	commit = alloc_commit_node();
 	commit->object.parsed = 1;
 	commit->date = now;
 	commit->object.type = OBJ_COMMIT;
diff --git a/merge-recursive.c b/merge-recursive.c
index cab16fa..2b37d42 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -40,7 +40,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
 {
-	struct commit *commit = xcalloc(1, sizeof(struct commit));
+	struct commit *commit = alloc_commit_node();
 	struct merge_remote_desc *desc = xmalloc(sizeof(*desc));
 
 	desc->name = comment;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/15] logmsg_reencode: return const buffer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (2 preceding siblings ...)
  2014-06-09 18:10 ` [PATCH 03/15] do not create "struct commit" with xcalloc Jeff King
@ 2014-06-09 18:10 ` Jeff King
  2014-06-10  1:36   ` Eric Sunshine
  2014-06-09 18:10 ` [PATCH 05/15] sequencer: use logmsg_reencode in get_message Jeff King
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const.  We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c |  2 +-
 builtin/reset.c |  2 +-
 commit.h        |  8 ++++----
 pretty.c        | 14 +++++++-------
 revision.c      | 12 +++++++++---
 5 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index d6056a5..6ce3c6d 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1655,7 +1655,7 @@ static void get_commit_info(struct commit *commit,
 {
 	int len;
 	const char *subject, *encoding;
-	char *message;
+	const char *message;
 
 	commit_info_init(ret);
 
diff --git a/builtin/reset.c b/builtin/reset.c
index f368266..7ebee07 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -93,7 +93,7 @@ static int reset_index(const unsigned char *sha1, int reset_type, int quiet)
 static void print_new_head_line(struct commit *commit)
 {
 	const char *hex, *body;
-	char *msg;
+	const char *msg;
 
 	hex = find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV);
 	printf(_("HEAD is now at %s"), hex);
diff --git a/commit.h b/commit.h
index a9f177b..de57df9 100644
--- a/commit.h
+++ b/commit.h
@@ -115,10 +115,10 @@ struct userformat_want {
 
 extern int has_non_ascii(const char *text);
 struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
-extern char *logmsg_reencode(const struct commit *commit,
-			     char **commit_encoding,
-			     const char *output_encoding);
-extern void logmsg_free(char *msg, const struct commit *commit);
+extern const char *logmsg_reencode(const struct commit *commit,
+				   char **commit_encoding,
+				   const char *output_encoding);
+extern void logmsg_free(const char *msg, const struct commit *commit);
 extern void get_commit_format(const char *arg, struct rev_info *);
 extern const char *format_subject(struct strbuf *sb, const char *msg,
 				  const char *line_separator);
diff --git a/pretty.c b/pretty.c
index e1e2cad..d152de2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -606,9 +606,9 @@ static char *replace_encoding_header(char *buf, const char *encoding)
 	return strbuf_detach(&tmp, NULL);
 }
 
-char *logmsg_reencode(const struct commit *commit,
-		      char **commit_encoding,
-		      const char *output_encoding)
+const char *logmsg_reencode(const struct commit *commit,
+			    char **commit_encoding,
+			    const char *output_encoding)
 {
 	static const char *utf8 = "UTF-8";
 	const char *use_encoding;
@@ -687,10 +687,10 @@ char *logmsg_reencode(const struct commit *commit,
 	return out ? out : msg;
 }
 
-void logmsg_free(char *msg, const struct commit *commit)
+void logmsg_free(const char *msg, const struct commit *commit)
 {
 	if (msg != commit->buffer)
-		free(msg);
+		free((void *)msg);
 }
 
 static int mailmap_name(const char **email, size_t *email_len,
@@ -796,7 +796,7 @@ struct format_commit_context {
 	struct signature_check signature_check;
 	enum flush_type flush_type;
 	enum trunc_type truncate;
-	char *message;
+	const char *message;
 	char *commit_encoding;
 	size_t width, indent1, indent2;
 	int auto_color;
@@ -1700,7 +1700,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	unsigned long beginning_of_body;
 	int indent = 4;
 	const char *msg;
-	char *reencoded;
+	const char *reencoded;
 	const char *encoding;
 	int need_8bit_cte = pp->need_8bit_cte;
 
diff --git a/revision.c b/revision.c
index 71e2337..47e5b7a 100644
--- a/revision.c
+++ b/revision.c
@@ -2788,7 +2788,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
 	const char *encoding;
-	char *message;
+	const char *message;
 	struct strbuf buf = STRBUF_INIT;
 
 	if (!opt->grep_filter.pattern_list && !opt->grep_filter.header_list)
@@ -2830,12 +2830,18 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		format_display_notes(commit->object.sha1, &buf, encoding, 1);
 	}
 
-	/* Find either in the original commit message, or in the temporary */
+	/* Find either in the original commit message, or in the temporary.
+	 * Note that we cast away the constness of "message" here. It is
+	 * const because it may come from the cached commit buffer. That's OK,
+	 * because we know that it is modifiable heap memory, and that while
+	 * grep_buffer may modify it for speed, it will restore any
+	 * changes before returning.
+	 */
 	if (buf.len)
 		retval = grep_buffer(&opt->grep_filter, buf.buf, buf.len);
 	else
 		retval = grep_buffer(&opt->grep_filter,
-				     message, strlen(message));
+				     (char *)message, strlen(message));
 	strbuf_release(&buf);
 	logmsg_free(message, commit);
 	return retval;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/15] sequencer: use logmsg_reencode in get_message
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (3 preceding siblings ...)
  2014-06-09 18:10 ` [PATCH 04/15] logmsg_reencode: return const buffer Jeff King
@ 2014-06-09 18:10 ` Jeff King
  2014-06-10 21:40   ` Junio C Hamano
  2014-06-09 18:11 ` [PATCH 06/15] provide a helper to free commit buffer Jeff King
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

This simplifies the code, as logmsg_reencode handles the
reencoding for us in a single call. It also means we learn
logmsg_reencode's trick of pulling the buffer from disk when
commit->buffer is NULL (we currently just silently return!).
It is doubtful this matters in practice, though, as
sequencer operations would not generally turn off
save_commit_buffer.

Note that we may be fixing a bug here. The existing code
does:

  if (same_encoding(to, from))
	  reencode_string(buf, to, from);

That probably should have been "!same_encoding".

Signed-off-by: Jeff King <peff@peff.net>
---
I didn't actually test for the bug, so it's possible that I'm missing
something clever...

 sequencer.c | 45 +++++----------------------------------------
 1 file changed, 5 insertions(+), 40 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 0a80c58..3fcab4d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -116,39 +116,23 @@ static const char *action_name(const struct replay_opts *opts)
 	return opts->action == REPLAY_REVERT ? "revert" : "cherry-pick";
 }
 
-static char *get_encoding(const char *message);
-
 struct commit_message {
 	char *parent_label;
 	const char *label;
 	const char *subject;
-	char *reencoded_message;
 	const char *message;
 };
 
 static int get_message(struct commit *commit, struct commit_message *out)
 {
-	const char *encoding;
 	const char *abbrev, *subject;
 	int abbrev_len, subject_len;
 	char *q;
 
-	if (!commit->buffer)
-		return -1;
-	encoding = get_encoding(commit->buffer);
-	if (!encoding)
-		encoding = "UTF-8";
 	if (!git_commit_encoding)
 		git_commit_encoding = "UTF-8";
 
-	out->reencoded_message = NULL;
-	out->message = commit->buffer;
-	if (same_encoding(encoding, git_commit_encoding))
-		out->reencoded_message = reencode_string(commit->buffer,
-					git_commit_encoding, encoding);
-	if (out->reencoded_message)
-		out->message = out->reencoded_message;
-
+	out->message = logmsg_reencode(commit, NULL, git_commit_encoding);
 	abbrev = find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV);
 	abbrev_len = strlen(abbrev);
 
@@ -167,29 +151,10 @@ static int get_message(struct commit *commit, struct commit_message *out)
 	return 0;
 }
 
-static void free_message(struct commit_message *msg)
+static void free_message(struct commit *commit, struct commit_message *msg)
 {
 	free(msg->parent_label);
-	free(msg->reencoded_message);
-}
-
-static char *get_encoding(const char *message)
-{
-	const char *p = message, *eol;
-
-	while (*p && *p != '\n') {
-		for (eol = p + 1; *eol && *eol != '\n'; eol++)
-			; /* do nothing */
-		if (starts_with(p, "encoding ")) {
-			char *result = xmalloc(eol - 8 - p);
-			strlcpy(result, p + 9, eol - 8 - p);
-			return result;
-		}
-		p = eol;
-		if (*p == '\n')
-			p++;
-	}
-	return NULL;
+	logmsg_free(msg->message, commit);
 }
 
 static void write_cherry_pick_head(struct commit *commit, const char *pseudoref)
@@ -489,7 +454,7 @@ static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
 	unsigned char head[20];
 	struct commit *base, *next, *parent;
 	const char *base_label, *next_label;
-	struct commit_message msg = { NULL, NULL, NULL, NULL, NULL };
+	struct commit_message msg = { NULL, NULL, NULL, NULL };
 	char *defmsg = NULL;
 	struct strbuf msgbuf = STRBUF_INIT;
 	int res, unborn = 0, allow;
@@ -654,7 +619,7 @@ static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
 		res = run_git_commit(defmsg, opts, allow);
 
 leave:
-	free_message(&msg);
+	free_message(commit, &msg);
 	free(defmsg);
 
 	return res;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/15] provide a helper to free commit buffer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (4 preceding siblings ...)
  2014-06-09 18:10 ` [PATCH 05/15] sequencer: use logmsg_reencode in get_message Jeff King
@ 2014-06-09 18:11 ` Jeff King
  2014-06-09 18:11 ` [PATCH 07/15] provide a helper to set the " Jeff King
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for
the weird case in fsck which passes the buffer back to be
freed.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c       |  3 +--
 builtin/index-pack.c |  2 +-
 builtin/log.c        |  6 ++----
 builtin/rev-list.c   |  3 +--
 commit.c             | 13 +++++++++++++
 commit.h             | 11 +++++++++++
 6 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index fc150c8..8aadca1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -310,8 +310,7 @@ static int fsck_obj(struct object *obj)
 	if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 
-		free(commit->buffer);
-		commit->buffer = NULL;
+		free_commit_buffer(commit);
 
 		if (!commit->parents && show_root)
 			printf("root %s\n", sha1_to_hex(commit->object.sha1));
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 18f57de..995df39 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -786,7 +786,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			}
 			if (obj->type == OBJ_COMMIT) {
 				struct commit *commit = (struct commit *) obj;
-				commit->buffer = NULL;
+				detach_commit_buffer(commit);
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
diff --git a/builtin/log.c b/builtin/log.c
index 39e8836..226f8f2 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -349,8 +349,7 @@ static int cmd_log_walk(struct rev_info *rev)
 			rev->max_count++;
 		if (!rev->reflog_info) {
 			/* we allow cycles in reflog ancestry */
-			free(commit->buffer);
-			commit->buffer = NULL;
+			free_commit_buffer(commit);
 		}
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
@@ -1508,8 +1507,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		    reopen_stdout(rev.numbered_files ? NULL : commit, NULL, &rev, quiet))
 			die(_("Failed to create output files"));
 		shown = log_tree_commit(&rev, commit);
-		free(commit->buffer);
-		commit->buffer = NULL;
+		free_commit_buffer(commit);
 
 		/* We put one extra blank line between formatted
 		 * patches and this flag is used by log-tree code
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f92905..e012ebe 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -173,8 +173,7 @@ static void finish_commit(struct commit *commit, void *data)
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
 	}
-	free(commit->buffer);
-	commit->buffer = NULL;
+	free_commit_buffer(commit);
 }
 
 static void finish_object(struct object *obj,
diff --git a/commit.c b/commit.c
index 21957ee..d7b6836 100644
--- a/commit.c
+++ b/commit.c
@@ -245,6 +245,19 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+void free_commit_buffer(struct commit *commit)
+{
+	free(commit->buffer);
+	commit->buffer = NULL;
+}
+
+const void *detach_commit_buffer(struct commit *commit)
+{
+	void *ret = commit->buffer;
+	commit->buffer = NULL;
+	return ret;
+}
+
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size)
 {
 	const char *tail = buffer;
diff --git a/commit.h b/commit.h
index de57df9..daccf46 100644
--- a/commit.h
+++ b/commit.h
@@ -51,6 +51,17 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 int parse_commit(struct commit *item);
 void parse_commit_or_die(struct commit *item);
 
+/*
+ * Free any cached object buffer associated with the commit.
+ */
+void free_commit_buffer(struct commit *);
+
+/*
+ * Disassociate any cached object buffer from the commit, but do not free it.
+ * The buffer (or NULL, if none) is returned.
+ */
+const void *detach_commit_buffer(struct commit *);
+
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
 
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/15] provide a helper to set the commit buffer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (5 preceding siblings ...)
  2014-06-09 18:11 ` [PATCH 06/15] provide a helper to free commit buffer Jeff King
@ 2014-06-09 18:11 ` Jeff King
  2014-06-09 18:11 ` [PATCH 08/15] provide helpers to access " Jeff King
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Right now this is just a one-liner, but abstracting it will
make it easier to change later.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c | 2 +-
 commit.c        | 7 ++++++-
 commit.h        | 6 ++++++
 object.c        | 2 +-
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 6ce3c6d..0af3a18 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2313,7 +2313,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		    ident, ident, path,
 		    (!contents_from ? path :
 		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
-	commit->buffer = strbuf_detach(&msg, NULL);
+	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
 
 	if (!contents_from || strcmp("-", contents_from)) {
 		struct stat st;
diff --git a/commit.c b/commit.c
index d7b6836..5cc52e0 100644
--- a/commit.c
+++ b/commit.c
@@ -245,6 +245,11 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+void set_commit_buffer(struct commit *commit, void *buffer)
+{
+	commit->buffer = buffer;
+}
+
 void free_commit_buffer(struct commit *commit)
 {
 	free(commit->buffer);
@@ -335,7 +340,7 @@ int parse_commit(struct commit *item)
 	}
 	ret = parse_commit_buffer(item, buffer, size);
 	if (save_commit_buffer && !ret) {
-		item->buffer = buffer;
+		set_commit_buffer(item, buffer);
 		return 0;
 	}
 	free(buffer);
diff --git a/commit.h b/commit.h
index daccf46..5cc0bf3 100644
--- a/commit.h
+++ b/commit.h
@@ -52,6 +52,12 @@ int parse_commit(struct commit *item);
 void parse_commit_or_die(struct commit *item);
 
 /*
+ * Associate an object buffer with the commit. The ownership of the
+ * memory is handed over to the commit, and must be free()-able.
+ */
+void set_commit_buffer(struct commit *, void *buffer);
+
+/*
  * Free any cached object buffer associated with the commit.
  */
 void free_commit_buffer(struct commit *);
diff --git a/object.c b/object.c
index 57a0890..44ca657 100644
--- a/object.c
+++ b/object.c
@@ -198,7 +198,7 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 			if (parse_commit_buffer(commit, buffer, size))
 				return NULL;
 			if (!commit->buffer) {
-				commit->buffer = buffer;
+				set_commit_buffer(commit, buffer);
 				*eaten_p = 1;
 			}
 			obj = &commit->object;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/15] provide helpers to access the commit buffer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (6 preceding siblings ...)
  2014-06-09 18:11 ` [PATCH 07/15] provide a helper to set the " Jeff King
@ 2014-06-09 18:11 ` Jeff King
  2014-06-10  8:00   ` Eric Sunshine
  2014-06-09 18:12 ` [PATCH 09/15] use get_cached_commit_buffer where appropriate Jeff King
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Many sites look at commit->buffer to get more detailed
information than what is in the parsed commit struct.
However, we sometimes drop commit->buffer to save memory,
in which case the caller would need to read the object
afresh. Some callers do this (leading to duplicated code),
and others do not (which opens the possibility of a segfault
if somebody else frees the buffer).

Let's provide a pair of helpers, "get" and "unuse", that let
callers easily get the buffer. They will use the cached
buffer when possible, and otherwise load from disk using
read_sha1_file.

Note that we also need to add a "get_cached" variant which
returns NULL when we do not have a cached buffer. At first
glance this seems to defeat the purpose of "get", which is
to always provide a return value. However, some log code
paths actually use the NULL-ness of commit->buffer as a
boolean flag to decide whether to try to printing the
commit. At least for now, we want to continue supporting
that use.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 28 ++++++++++++++++++++++++++++
 commit.h | 21 +++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/commit.c b/commit.c
index 5cc52e0..b9ecae0 100644
--- a/commit.c
+++ b/commit.c
@@ -250,6 +250,34 @@ void set_commit_buffer(struct commit *commit, void *buffer)
 	commit->buffer = buffer;
 }
 
+const void *get_cached_commit_buffer(const struct commit *commit)
+{
+	return commit->buffer;
+}
+
+const void *get_commit_buffer(const struct commit *commit)
+{
+	const void *ret = get_cached_commit_buffer(commit);
+	if (!ret) {
+		enum object_type type;
+		unsigned long size;
+		ret = read_sha1_file(commit->object.sha1, &type, &size);
+		if (!ret)
+			die("cannot read commit object %s",
+			    sha1_to_hex(commit->object.sha1));
+		if (type != OBJ_COMMIT)
+			die("expected commit for %s, got %s",
+			    sha1_to_hex(commit->object.sha1), typename(type));
+	}
+	return ret;
+}
+
+void unuse_commit_buffer(const struct commit *commit, const void *buffer)
+{
+	if (commit->buffer != buffer)
+		free((void *)buffer);
+}
+
 void free_commit_buffer(struct commit *commit)
 {
 	free(commit->buffer);
diff --git a/commit.h b/commit.h
index 5cc0bf3..67caf92 100644
--- a/commit.h
+++ b/commit.h
@@ -58,6 +58,27 @@ void parse_commit_or_die(struct commit *item);
 void set_commit_buffer(struct commit *, void *buffer);
 
 /*
+ * Get any cached object buffer associated with the commit. Returns NULL
+ * if none. The resulting memory should not be freed.
+ */
+const void *get_cached_commit_buffer(const struct commit *);
+
+/*
+ * Get the commit's object contents, either from cache or by reading the object
+ * from disk. The resulting memory should not be modified, and must be given
+ * to unuse_commit_buffer when the caller is done.
+ */
+const void *get_commit_buffer(const struct commit *);
+
+/*
+ * Tell the commit subsytem that we are done with a particular commit buffer.
+ * The commit and buffer should be the input and return value, respectively,
+ * from an earlier call to get_commit_buffer.  The buffer may or may not be
+ * freed by this call; callers should not access the memory afterwards.
+ */
+void unuse_commit_buffer(const struct commit *, const void *buffer);
+
+/*
  * Free any cached object buffer associated with the commit.
  */
 void free_commit_buffer(struct commit *);
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/15] use get_cached_commit_buffer where appropriate
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (7 preceding siblings ...)
  2014-06-09 18:11 ` [PATCH 08/15] provide helpers to access " Jeff King
@ 2014-06-09 18:12 ` Jeff King
  2014-06-09 18:12 ` [PATCH 10/15] use get_commit_buffer to avoid duplicate code Jeff King
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Some call sites check commit->buffer to see whether we have
a cached buffer, and if so, do some work with it. In the
long run we may want to switch these code paths to make
their decision on a different boolean flag (because checking
the cache may get a little more expensive in the future).
But for now, we can easily support them by converting the
calls to use get_cached_commit_buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 2 +-
 log-tree.c         | 2 +-
 object.c           | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index e012ebe..3fcbf21 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -106,7 +106,7 @@ static void show_commit(struct commit *commit, void *data)
 	else
 		putchar('\n');
 
-	if (revs->verbose_header && commit->buffer) {
+	if (revs->verbose_header && get_cached_commit_buffer(commit)) {
 		struct strbuf buf = STRBUF_INIT;
 		struct pretty_print_context ctx = {0};
 		ctx.abbrev = revs->abbrev;
diff --git a/log-tree.c b/log-tree.c
index cf2f86c..e9ef8ab 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -588,7 +588,7 @@ void show_log(struct rev_info *opt)
 		show_mergetag(opt, commit);
 	}
 
-	if (!commit->buffer)
+	if (!get_cached_commit_buffer(commit))
 		return;
 
 	if (opt->show_notes) {
diff --git a/object.c b/object.c
index 44ca657..67b6e35 100644
--- a/object.c
+++ b/object.c
@@ -197,7 +197,7 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 		if (commit) {
 			if (parse_commit_buffer(commit, buffer, size))
 				return NULL;
-			if (!commit->buffer) {
+			if (!get_cached_commit_buffer(commit)) {
 				set_commit_buffer(commit, buffer);
 				*eaten_p = 1;
 			}
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/15] use get_commit_buffer to avoid duplicate code
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (8 preceding siblings ...)
  2014-06-09 18:12 ` [PATCH 09/15] use get_cached_commit_buffer where appropriate Jeff King
@ 2014-06-09 18:12 ` Jeff King
  2014-06-10  8:01   ` Eric Sunshine
  2014-06-09 18:13 ` [PATCH 11/15] convert logmsg_reencode to get_commit_buffer Jeff King
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

For both of these sites, we already do the "fallback to
read_sha1_file" trick. But we can shorten the code by just
using get_commit_buffer.

Note that the error cases are slightly different when
read_sha1_file fails. get_commit_buffer will die() if the
object cannot be loaded, or is a non-commit.

For get_sha1_oneline, this will almost certainly never
happen, as we will have just called parse_object (and if it
does, it's probably worth complaining about).

For record_author_date, the new behavior is probably better;
we notify the user of the error instead of silently ignoring
it. And because it's used only for sorting by author-date,
somebody examining a corrupt can fallback to the regular
traversal order.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c    | 16 +++-------------
 sha1_name.c | 18 ++++--------------
 2 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/commit.c b/commit.c
index b9ecae0..d00b818 100644
--- a/commit.c
+++ b/commit.c
@@ -583,22 +583,12 @@ static void record_author_date(struct author_date_slab *author_date,
 			       struct commit *commit)
 {
 	const char *buf, *line_end, *ident_line;
-	char *buffer = NULL;
+	const char *buffer = get_commit_buffer(commit);
 	struct ident_split ident;
 	char *date_end;
 	unsigned long date;
 
-	if (!commit->buffer) {
-		unsigned long size;
-		enum object_type type;
-		buffer = read_sha1_file(commit->object.sha1, &type, &size);
-		if (!buffer)
-			return;
-	}
-
-	for (buf = commit->buffer ? commit->buffer : buffer;
-	     buf;
-	     buf = line_end + 1) {
+	for (buf = buffer; buf; buf = line_end + 1) {
 		line_end = strchrnul(buf, '\n');
 		ident_line = skip_prefix(buf, "author ");
 		if (!ident_line) {
@@ -619,7 +609,7 @@ static void record_author_date(struct author_date_slab *author_date,
 	*(author_date_slab_at(author_date, commit)) = date;
 
 fail_exit:
-	free(buffer);
+	unuse_commit_buffer(commit, buffer);
 }
 
 static int compare_commits_by_author_date(const void *a_, const void *b_,
diff --git a/sha1_name.c b/sha1_name.c
index 2b6322f..0a65d23 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -862,27 +862,17 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 		commit_list_insert(l->item, &backup);
 	}
 	while (list) {
-		char *p, *to_free = NULL;
+		const char *p, *buf;
 		struct commit *commit;
-		enum object_type type;
-		unsigned long size;
 		int matches;
 
 		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
 		if (!parse_object(commit->object.sha1))
 			continue;
-		if (commit->buffer)
-			p = commit->buffer;
-		else {
-			p = read_sha1_file(commit->object.sha1, &type, &size);
-			if (!p)
-				continue;
-			to_free = p;
-		}
-
-		p = strstr(p, "\n\n");
+		buf = get_commit_buffer(commit);
+		p = strstr(buf, "\n\n");
 		matches = p && !regexec(&regex, p + 2, 0, NULL, 0);
-		free(to_free);
+		unuse_commit_buffer(commit, buf);
 
 		if (matches) {
 			hashcpy(sha1, commit->object.sha1);
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/15] convert logmsg_reencode to get_commit_buffer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (9 preceding siblings ...)
  2014-06-09 18:12 ` [PATCH 10/15] use get_commit_buffer to avoid duplicate code Jeff King
@ 2014-06-09 18:13 ` Jeff King
  2014-06-09 18:13 ` [PATCH 12/15] use get_commit_buffer everywhere Jeff King
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:13 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.

We return either:

  1. The original commit->buffer

  2. A newly allocated buffer from read_sha1_file

  3. A reencoded buffer (based on either 1 or 2 above).

while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), a behavior which is unlikely to change, and because
this makes the interface much simpler, it's a reasonable
bending of the rules.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c |  4 ++--
 builtin/reset.c |  2 +-
 commit.h        |  1 -
 pretty.c        | 40 +++++++++++-----------------------------
 revision.c      |  2 +-
 sequencer.c     |  2 +-
 6 files changed, 16 insertions(+), 35 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 0af3a18..cde19eb 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1666,7 +1666,7 @@ static void get_commit_info(struct commit *commit,
 		    &ret->author_time, &ret->author_tz);
 
 	if (!detailed) {
-		logmsg_free(message, commit);
+		unuse_commit_buffer(commit, message);
 		return;
 	}
 
@@ -1680,7 +1680,7 @@ static void get_commit_info(struct commit *commit,
 	else
 		strbuf_addf(&ret->summary, "(%s)", sha1_to_hex(commit->object.sha1));
 
-	logmsg_free(message, commit);
+	unuse_commit_buffer(commit, message);
 }
 
 /*
diff --git a/builtin/reset.c b/builtin/reset.c
index 7ebee07..850d532 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -109,7 +109,7 @@ static void print_new_head_line(struct commit *commit)
 	}
 	else
 		printf("\n");
-	logmsg_free(msg, commit);
+	unuse_commit_buffer(commit, msg);
 }
 
 static void update_index_from_diff(struct diff_queue_struct *q,
diff --git a/commit.h b/commit.h
index 67caf92..27e4fd7 100644
--- a/commit.h
+++ b/commit.h
@@ -156,7 +156,6 @@ struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
 extern const char *logmsg_reencode(const struct commit *commit,
 				   char **commit_encoding,
 				   const char *output_encoding);
-extern void logmsg_free(const char *msg, const struct commit *commit);
 extern void get_commit_format(const char *arg, struct rev_info *);
 extern const char *format_subject(struct strbuf *sb, const char *msg,
 				  const char *line_separator);
diff --git a/pretty.c b/pretty.c
index d152de2..8fd60cd 100644
--- a/pretty.c
+++ b/pretty.c
@@ -613,22 +613,9 @@ const char *logmsg_reencode(const struct commit *commit,
 	static const char *utf8 = "UTF-8";
 	const char *use_encoding;
 	char *encoding;
-	char *msg = commit->buffer;
+	const char *msg = get_commit_buffer(commit);
 	char *out;
 
-	if (!msg) {
-		enum object_type type;
-		unsigned long size;
-
-		msg = read_sha1_file(commit->object.sha1, &type, &size);
-		if (!msg)
-			die("Cannot read commit object %s",
-			    sha1_to_hex(commit->object.sha1));
-		if (type != OBJ_COMMIT)
-			die("Expected commit for '%s', got %s",
-			    sha1_to_hex(commit->object.sha1), typename(type));
-	}
-
 	if (!output_encoding || !*output_encoding) {
 		if (commit_encoding)
 			*commit_encoding =
@@ -652,12 +639,13 @@ const char *logmsg_reencode(const struct commit *commit,
 		 * Otherwise, we still want to munge the encoding header in the
 		 * result, which will be done by modifying the buffer. If we
 		 * are using a fresh copy, we can reuse it. But if we are using
-		 * the cached copy from commit->buffer, we need to duplicate it
-		 * to avoid munging commit->buffer.
+		 * the cached copy from get_commit_buffer, we need to duplicate it
+		 * to avoid munging the cached copy.
 		 */
-		out = msg;
-		if (out == commit->buffer)
-			out = xstrdup(out);
+		if (msg == get_cached_commit_buffer(commit))
+			out = xstrdup(msg);
+		else
+			out = (char *)msg;
 	}
 	else {
 		/*
@@ -667,8 +655,8 @@ const char *logmsg_reencode(const struct commit *commit,
 		 * copy, we can free it.
 		 */
 		out = reencode_string(msg, output_encoding, use_encoding);
-		if (out && msg != commit->buffer)
-			free(msg);
+		if (out)
+			unuse_commit_buffer(commit, msg);
 	}
 
 	/*
@@ -687,12 +675,6 @@ const char *logmsg_reencode(const struct commit *commit,
 	return out ? out : msg;
 }
 
-void logmsg_free(const char *msg, const struct commit *commit)
-{
-	if (msg != commit->buffer)
-		free((void *)msg);
-}
-
 static int mailmap_name(const char **email, size_t *email_len,
 			const char **name, size_t *name_len)
 {
@@ -1531,7 +1513,7 @@ void format_commit_message(const struct commit *commit,
 	}
 
 	free(context.commit_encoding);
-	logmsg_free(context.message, commit);
+	unuse_commit_buffer(commit, context.message);
 	free(context.signature_check.gpg_output);
 	free(context.signature_check.signer);
 }
@@ -1767,7 +1749,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	if (pp->fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
 
-	logmsg_free(reencoded, commit);
+	unuse_commit_buffer(commit, reencoded);
 }
 
 void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
diff --git a/revision.c b/revision.c
index 47e5b7a..ad5fdf4 100644
--- a/revision.c
+++ b/revision.c
@@ -2843,7 +2843,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		retval = grep_buffer(&opt->grep_filter,
 				     (char *)message, strlen(message));
 	strbuf_release(&buf);
-	logmsg_free(message, commit);
+	unuse_commit_buffer(commit, message);
 	return retval;
 }
 
diff --git a/sequencer.c b/sequencer.c
index 3fcab4d..4632f7d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -154,7 +154,7 @@ static int get_message(struct commit *commit, struct commit_message *out)
 static void free_message(struct commit *commit, struct commit_message *msg)
 {
 	free(msg->parent_label);
-	logmsg_free(msg->message, commit);
+	unuse_commit_buffer(commit, msg->message);
 }
 
 static void write_cherry_pick_head(struct commit *commit, const char *pseudoref)
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/15] use get_commit_buffer everywhere
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (10 preceding siblings ...)
  2014-06-09 18:13 ` [PATCH 11/15] convert logmsg_reencode to get_commit_buffer Jeff King
@ 2014-06-09 18:13 ` Jeff King
  2014-06-09 22:40   ` Junio C Hamano
  2014-06-09 18:15 ` [PATCH 13/15] commit-slab: provide a static initializer Jeff King
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:13 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.

And as a side effect, we abstract away the final bare uses
of commit->buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c   |  5 ++++-
 builtin/fmt-merge-msg.c |  5 ++++-
 builtin/log.c           |  7 +++++--
 fsck.c                  | 13 +++++++++++--
 merge-recursive.c       |  4 +++-
 notes-merge.c           |  4 +++-
 sequencer.c             |  4 +++-
 7 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index b8d8a3a..7ee5e08 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -279,6 +279,7 @@ static const char *find_encoding(const char *begin, const char *end)
 static void handle_commit(struct commit *commit, struct rev_info *rev)
 {
 	int saved_output_format = rev->diffopt.output_format;
+	const char *commit_buffer;
 	const char *author, *author_end, *committer, *committer_end;
 	const char *encoding, *message;
 	char *reencoded = NULL;
@@ -288,7 +289,8 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 	rev->diffopt.output_format = DIFF_FORMAT_CALLBACK;
 
 	parse_commit_or_die(commit);
-	author = strstr(commit->buffer, "\nauthor ");
+	commit_buffer = get_commit_buffer(commit);
+	author = strstr(commit_buffer, "\nauthor ");
 	if (!author)
 		die ("Could not find author in commit %s",
 		     sha1_to_hex(commit->object.sha1));
@@ -335,6 +337,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 			  ? strlen(message) : 0),
 	       reencoded ? reencoded : message ? message : "");
 	free(reencoded);
+	unuse_commit_buffer(commit, commit_buffer);
 
 	for (i = 0, p = commit->parents; p; p = p->next) {
 		int mark = get_object_mark(&p->item->object);
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 3906eda..01f6d59 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -230,12 +230,14 @@ static void add_branch_desc(struct strbuf *out, const char *name)
 static void record_person(int which, struct string_list *people,
 			  struct commit *commit)
 {
+	const char *buffer;
 	char *name_buf, *name, *name_end;
 	struct string_list_item *elem;
 	const char *field;
 
 	field = (which == 'a') ? "\nauthor " : "\ncommitter ";
-	name = strstr(commit->buffer, field);
+	buffer = get_commit_buffer(commit);
+	name = strstr(buffer, field);
 	if (!name)
 		return;
 	name += strlen(field);
@@ -247,6 +249,7 @@ static void record_person(int which, struct string_list *people,
 	if (name_end < name)
 		return;
 	name_buf = xmemdupz(name, name_end - name + 1);
+	unuse_commit_buffer(commit, buffer);
 
 	elem = string_list_lookup(people, name_buf);
 	if (!elem) {
diff --git a/builtin/log.c b/builtin/log.c
index 226f8f2..2c74260 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -918,9 +918,12 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	log_write_email_headers(rev, head, &pp.subject, &pp.after_subject,
 				&need_8bit_cte);
 
-	for (i = 0; !need_8bit_cte && i < nr; i++)
-		if (has_non_ascii(list[i]->buffer))
+	for (i = 0; !need_8bit_cte && i < nr; i++) {
+		const char *buf = get_commit_buffer(list[i]);
+		if (has_non_ascii(buf))
 			need_8bit_cte = 1;
+		unuse_commit_buffer(list[i], buf);
+	}
 
 	if (!branch_name)
 		branch_name = find_branch_name(rev);
diff --git a/fsck.c b/fsck.c
index abed62b..8223780 100644
--- a/fsck.c
+++ b/fsck.c
@@ -276,9 +276,10 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit_buffer(struct commit *commit, const char *buffer,
+			      fsck_error error_func)
 {
-	const char *buffer = commit->buffer, *tmp;
+	const char *tmp;
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
 	int parents = 0;
@@ -336,6 +337,14 @@ static int fsck_commit(struct commit *commit, fsck_error error_func)
 	return 0;
 }
 
+static int fsck_commit(struct commit *commit, fsck_error error_func)
+{
+	const char *buffer = get_commit_buffer(commit);
+	int ret = fsck_commit_buffer(commit, buffer, error_func);
+	unuse_commit_buffer(commit, buffer);
+	return ret;
+}
+
 static int fsck_tag(struct tag *tag, fsck_error error_func)
 {
 	struct object *tagged = tag->tagged;
diff --git a/merge-recursive.c b/merge-recursive.c
index 2b37d42..0dd6039 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -190,9 +190,11 @@ static void output_commit_title(struct merge_options *o, struct commit *commit)
 			printf(_("(bad commit)\n"));
 		else {
 			const char *title;
-			int len = find_commit_subject(commit->buffer, &title);
+			const char *msg = get_commit_buffer(commit);
+			int len = find_commit_subject(msg, &title);
 			if (len)
 				printf("%.*s\n", len, title);
+			unuse_commit_buffer(commit, msg);
 		}
 	}
 }
diff --git a/notes-merge.c b/notes-merge.c
index 94a1a8a..7885ab2 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -671,7 +671,8 @@ int notes_merge_commit(struct notes_merge_options *o,
 	DIR *dir;
 	struct dirent *e;
 	struct strbuf path = STRBUF_INIT;
-	char *msg = strstr(partial_commit->buffer, "\n\n");
+	const char *buffer = get_commit_buffer(partial_commit);
+	const char *msg = strstr(buffer, "\n\n");
 	struct strbuf sb_msg = STRBUF_INIT;
 	int baselen;
 
@@ -720,6 +721,7 @@ int notes_merge_commit(struct notes_merge_options *o,
 	}
 
 	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
+	unuse_commit_buffer(partial_commit, buffer);
 	create_notes_commit(partial_tree, partial_commit->parents, &sb_msg,
 			    result_sha1);
 	if (o->verbosity >= 4)
diff --git a/sequencer.c b/sequencer.c
index 4632f7d..5609ab3 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -666,10 +666,12 @@ static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
 	int subject_len;
 
 	for (cur = todo_list; cur; cur = cur->next) {
+		const char *commit_buffer = get_commit_buffer(cur->item);
 		sha1_abbrev = find_unique_abbrev(cur->item->object.sha1, DEFAULT_ABBREV);
-		subject_len = find_commit_subject(cur->item->buffer, &subject);
+		subject_len = find_commit_subject(commit_buffer, &subject);
 		strbuf_addf(buf, "%s %s %.*s\n", action_str, sha1_abbrev,
 			subject_len, subject);
+		unuse_commit_buffer(cur->item, commit_buffer);
 	}
 	return 0;
 }
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/15] commit-slab: provide a static initializer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (11 preceding siblings ...)
  2014-06-09 18:13 ` [PATCH 12/15] use get_commit_buffer everywhere Jeff King
@ 2014-06-09 18:15 ` Jeff King
  2014-06-09 18:15 ` [PATCH 14/15] commit: convert commit->buffer to a slab Jeff King
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Callers currently must use init_foo_slab() at runtime before
accessing a slab. For global slabs, it's much nicer if we
can initialize them in BSS, so that each user does not have
to add code to check-and-initialize.

Signed-off-by: Jeff King <peff@peff.net>
---
The calling convention is kind of weird. It goes:

  struct foo_slab foo = COMMIT_SLAB_INIT(1, foo);

We need to know the size of the slab's element-type in the initializer
(and we grab it from sizeof(**foo.slab).  Another option would be:

  struct foo_slab foo = COMMIT_SLAB_INIT(1, void *);

which is simpler, but requires the user to repeat the type of the slab
(and if they get it wrong, bad things happen).

Yet another option would be to simply let a zero-initialized slab be
acceptable, and have slab_at check whether "stride" is initialized (and
if not, init to 1).

 commit-slab.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/commit-slab.h b/commit-slab.h
index cc114b5..375c9c7 100644
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -117,4 +117,16 @@ static int stat_ ##slabname## realloc
  * catch because GCC silently parses it by default.
  */
 
+/*
+ * Statically initialize a commit slab named "var". Note that this
+ * evaluates "stride" multiple times! Example:
+ *
+ *   struct indegree indegrees = COMMIT_SLAB_INIT(1, indegrees);
+ *
+ */
+#define COMMIT_SLAB_INIT(stride, var) { \
+	COMMIT_SLAB_SIZE / sizeof(**((var).slab)) / (stride), \
+	(stride), 0, NULL \
+}
+
 #endif /* COMMIT_SLAB_H */
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/15] commit: convert commit->buffer to a slab
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (12 preceding siblings ...)
  2014-06-09 18:15 ` [PATCH 13/15] commit-slab: provide a static initializer Jeff King
@ 2014-06-09 18:15 ` Jeff King
  2014-06-09 18:15 ` [PATCH 15/15] commit: record buffer length in cache Jeff King
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
  15 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

This will make it easier to manage the buffer cache
independently of the "struct commit" objects. It also
shrinks "struct commit" by one pointer, which may be
helpful.

Unfortunately it does not reduce the max memory size of
something like "rev-list", because rev-list uses
get_cached_commit_buffer() to decide not to show each
commit's output (and due to the design of slab_at, accessing
the slab requires us to extend it, allocating exactly the
same number of buffer pointers we dropped from the commit
structs).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 20 +++++++++++++-------
 commit.h |  1 -
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/commit.c b/commit.c
index d00b818..1b344bd 100644
--- a/commit.c
+++ b/commit.c
@@ -245,14 +245,17 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+define_commit_slab(buffer_slab, void *);
+static struct buffer_slab buffer_slab = COMMIT_SLAB_INIT(1, buffer_slab);
+
 void set_commit_buffer(struct commit *commit, void *buffer)
 {
-	commit->buffer = buffer;
+	*buffer_slab_at(&buffer_slab, commit) = buffer;
 }
 
 const void *get_cached_commit_buffer(const struct commit *commit)
 {
-	return commit->buffer;
+	return *buffer_slab_at(&buffer_slab, commit);
 }
 
 const void *get_commit_buffer(const struct commit *commit)
@@ -274,20 +277,23 @@ const void *get_commit_buffer(const struct commit *commit)
 
 void unuse_commit_buffer(const struct commit *commit, const void *buffer)
 {
-	if (commit->buffer != buffer)
+	void *cached = *buffer_slab_at(&buffer_slab, commit);
+	if (cached != buffer)
 		free((void *)buffer);
 }
 
 void free_commit_buffer(struct commit *commit)
 {
-	free(commit->buffer);
-	commit->buffer = NULL;
+	void **b = buffer_slab_at(&buffer_slab, commit);
+	free(*b);
+	*b = NULL;
 }
 
 const void *detach_commit_buffer(struct commit *commit)
 {
-	void *ret = commit->buffer;
-	commit->buffer = NULL;
+	void **b = buffer_slab_at(&buffer_slab, commit);
+	void *ret = *b;
+	*b = NULL;
 	return ret;
 }
 
diff --git a/commit.h b/commit.h
index 27e4fd7..02f894b 100644
--- a/commit.h
+++ b/commit.h
@@ -20,7 +20,6 @@ struct commit {
 	unsigned long date;
 	struct commit_list *parents;
 	struct tree *tree;
-	char *buffer;
 };
 
 extern int save_commit_buffer;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/15] commit: record buffer length in cache
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (13 preceding siblings ...)
  2014-06-09 18:15 ` [PATCH 14/15] commit: convert commit->buffer to a slab Jeff King
@ 2014-06-09 18:15 ` Jeff King
  2014-06-10  5:12   ` Christian Couder
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
  15 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-09 18:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c         |  2 +-
 builtin/fast-export.c   |  2 +-
 builtin/fmt-merge-msg.c |  2 +-
 builtin/index-pack.c    |  2 +-
 builtin/log.c           |  2 +-
 builtin/rev-list.c      |  2 +-
 commit.c                | 54 ++++++++++++++++++++++++++++++++-----------------
 commit.h                |  8 ++++----
 fsck.c                  |  2 +-
 log-tree.c              |  2 +-
 merge-recursive.c       |  2 +-
 notes-merge.c           |  2 +-
 object.c                |  4 ++--
 pretty.c                |  4 ++--
 sequencer.c             |  2 +-
 sha1_name.c             |  2 +-
 16 files changed, 56 insertions(+), 38 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index cde19eb..58a2c54 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2313,7 +2313,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		    ident, ident, path,
 		    (!contents_from ? path :
 		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
-	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
+	set_commit_buffer(commit, msg.buf, msg.len);
 
 	if (!contents_from || strcmp("-", contents_from)) {
 		struct stat st;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 7ee5e08..05d161f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -289,7 +289,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 	rev->diffopt.output_format = DIFF_FORMAT_CALLBACK;
 
 	parse_commit_or_die(commit);
-	commit_buffer = get_commit_buffer(commit);
+	commit_buffer = get_commit_buffer(commit, NULL);
 	author = strstr(commit_buffer, "\nauthor ");
 	if (!author)
 		die ("Could not find author in commit %s",
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 01f6d59..ef8b254 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -236,7 +236,7 @@ static void record_person(int which, struct string_list *people,
 	const char *field;
 
 	field = (which == 'a') ? "\nauthor " : "\ncommitter ";
-	buffer = get_commit_buffer(commit);
+	buffer = get_commit_buffer(commit, NULL);
 	name = strstr(buffer, field);
 	if (!name)
 		return;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 995df39..25e3e9b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -786,7 +786,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			}
 			if (obj->type == OBJ_COMMIT) {
 				struct commit *commit = (struct commit *) obj;
-				detach_commit_buffer(commit);
+				detach_commit_buffer(commit, NULL);
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
diff --git a/builtin/log.c b/builtin/log.c
index 2c74260..c599eac 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -919,7 +919,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 				&need_8bit_cte);
 
 	for (i = 0; !need_8bit_cte && i < nr; i++) {
-		const char *buf = get_commit_buffer(list[i]);
+		const char *buf = get_commit_buffer(list[i], NULL);
 		if (has_non_ascii(buf))
 			need_8bit_cte = 1;
 		unuse_commit_buffer(list[i], buf);
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3fcbf21..ff84a82 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -106,7 +106,7 @@ static void show_commit(struct commit *commit, void *data)
 	else
 		putchar('\n');
 
-	if (revs->verbose_header && get_cached_commit_buffer(commit)) {
+	if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) {
 		struct strbuf buf = STRBUF_INIT;
 		struct pretty_print_context ctx = {0};
 		ctx.abbrev = revs->abbrev;
diff --git a/commit.c b/commit.c
index 1b344bd..b833b9f 100644
--- a/commit.c
+++ b/commit.c
@@ -245,22 +245,31 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
-define_commit_slab(buffer_slab, void *);
+struct commit_buffer {
+	void *buffer;
+	unsigned long size;
+};
+define_commit_slab(buffer_slab, struct commit_buffer);
 static struct buffer_slab buffer_slab = COMMIT_SLAB_INIT(1, buffer_slab);
 
-void set_commit_buffer(struct commit *commit, void *buffer)
+void set_commit_buffer(struct commit *commit, void *buffer, unsigned long size)
 {
-	*buffer_slab_at(&buffer_slab, commit) = buffer;
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	v->buffer = buffer;
+	v->size = size;
 }
 
-const void *get_cached_commit_buffer(const struct commit *commit)
+const void *get_cached_commit_buffer(const struct commit *commit, unsigned long *sizep)
 {
-	return *buffer_slab_at(&buffer_slab, commit);
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	if (sizep)
+		*sizep = v->size;
+	return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit)
+const void *get_commit_buffer(const struct commit *commit, unsigned long *sizep)
 {
-	const void *ret = get_cached_commit_buffer(commit);
+	const void *ret = get_cached_commit_buffer(commit, sizep);
 	if (!ret) {
 		enum object_type type;
 		unsigned long size;
@@ -271,29 +280,38 @@ const void *get_commit_buffer(const struct commit *commit)
 		if (type != OBJ_COMMIT)
 			die("expected commit for %s, got %s",
 			    sha1_to_hex(commit->object.sha1), typename(type));
+		if (sizep)
+			*sizep = size;
 	}
 	return ret;
 }
 
 void unuse_commit_buffer(const struct commit *commit, const void *buffer)
 {
-	void *cached = *buffer_slab_at(&buffer_slab, commit);
-	if (cached != buffer)
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	if (v->buffer != buffer)
 		free((void *)buffer);
 }
 
 void free_commit_buffer(struct commit *commit)
 {
-	void **b = buffer_slab_at(&buffer_slab, commit);
-	free(*b);
-	*b = NULL;
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	free(v->buffer);
+	v->buffer = NULL;
+	v->size = 0;
 }
 
-const void *detach_commit_buffer(struct commit *commit)
+const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 {
-	void **b = buffer_slab_at(&buffer_slab, commit);
-	void *ret = *b;
-	*b = NULL;
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	void *ret;
+
+	ret = v->buffer;
+	if (sizep)
+		*sizep = v->size;
+
+	v->buffer = NULL;
+	v->size = 0;
 	return ret;
 }
 
@@ -374,7 +392,7 @@ int parse_commit(struct commit *item)
 	}
 	ret = parse_commit_buffer(item, buffer, size);
 	if (save_commit_buffer && !ret) {
-		set_commit_buffer(item, buffer);
+		set_commit_buffer(item, buffer, size);
 		return 0;
 	}
 	free(buffer);
@@ -589,7 +607,7 @@ static void record_author_date(struct author_date_slab *author_date,
 			       struct commit *commit)
 {
 	const char *buf, *line_end, *ident_line;
-	const char *buffer = get_commit_buffer(commit);
+	const char *buffer = get_commit_buffer(commit, NULL);
 	struct ident_split ident;
 	char *date_end;
 	unsigned long date;
diff --git a/commit.h b/commit.h
index 02f894b..9ac7954 100644
--- a/commit.h
+++ b/commit.h
@@ -54,20 +54,20 @@ void parse_commit_or_die(struct commit *item);
  * Associate an object buffer with the commit. The ownership of the
  * memory is handed over to the commit, and must be free()-able.
  */
-void set_commit_buffer(struct commit *, void *buffer);
+void set_commit_buffer(struct commit *, void *buffer, unsigned long size);
 
 /*
  * Get any cached object buffer associated with the commit. Returns NULL
  * if none. The resulting memory should not be freed.
  */
-const void *get_cached_commit_buffer(const struct commit *);
+const void *get_cached_commit_buffer(const struct commit *, unsigned long *size);
 
 /*
  * Get the commit's object contents, either from cache or by reading the object
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *);
+const void *get_commit_buffer(const struct commit *, unsigned long *size);
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
@@ -86,7 +86,7 @@ void free_commit_buffer(struct commit *);
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
  */
-const void *detach_commit_buffer(struct commit *);
+const void *detach_commit_buffer(struct commit *, unsigned long *sizep);
 
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
diff --git a/fsck.c b/fsck.c
index 8223780..a7233c8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -339,7 +339,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 
 static int fsck_commit(struct commit *commit, fsck_error error_func)
 {
-	const char *buffer = get_commit_buffer(commit);
+	const char *buffer = get_commit_buffer(commit, NULL);
 	int ret = fsck_commit_buffer(commit, buffer, error_func);
 	unuse_commit_buffer(commit, buffer);
 	return ret;
diff --git a/log-tree.c b/log-tree.c
index e9ef8ab..4447021 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -588,7 +588,7 @@ void show_log(struct rev_info *opt)
 		show_mergetag(opt, commit);
 	}
 
-	if (!get_cached_commit_buffer(commit))
+	if (!get_cached_commit_buffer(commit, NULL))
 		return;
 
 	if (opt->show_notes) {
diff --git a/merge-recursive.c b/merge-recursive.c
index 0dd6039..d38a3b2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -190,7 +190,7 @@ static void output_commit_title(struct merge_options *o, struct commit *commit)
 			printf(_("(bad commit)\n"));
 		else {
 			const char *title;
-			const char *msg = get_commit_buffer(commit);
+			const char *msg = get_commit_buffer(commit, NULL);
 			int len = find_commit_subject(msg, &title);
 			if (len)
 				printf("%.*s\n", len, title);
diff --git a/notes-merge.c b/notes-merge.c
index 7885ab2..26360e2 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -671,7 +671,7 @@ int notes_merge_commit(struct notes_merge_options *o,
 	DIR *dir;
 	struct dirent *e;
 	struct strbuf path = STRBUF_INIT;
-	const char *buffer = get_commit_buffer(partial_commit);
+	const char *buffer = get_commit_buffer(partial_commit, NULL);
 	const char *msg = strstr(buffer, "\n\n");
 	struct strbuf sb_msg = STRBUF_INIT;
 	int baselen;
diff --git a/object.c b/object.c
index 67b6e35..9c31e9a 100644
--- a/object.c
+++ b/object.c
@@ -197,8 +197,8 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 		if (commit) {
 			if (parse_commit_buffer(commit, buffer, size))
 				return NULL;
-			if (!get_cached_commit_buffer(commit)) {
-				set_commit_buffer(commit, buffer);
+			if (!get_cached_commit_buffer(commit, NULL)) {
+				set_commit_buffer(commit, buffer, size);
 				*eaten_p = 1;
 			}
 			obj = &commit->object;
diff --git a/pretty.c b/pretty.c
index 8fd60cd..280d7d0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -613,7 +613,7 @@ const char *logmsg_reencode(const struct commit *commit,
 	static const char *utf8 = "UTF-8";
 	const char *use_encoding;
 	char *encoding;
-	const char *msg = get_commit_buffer(commit);
+	const char *msg = get_commit_buffer(commit, NULL);
 	char *out;
 
 	if (!output_encoding || !*output_encoding) {
@@ -642,7 +642,7 @@ const char *logmsg_reencode(const struct commit *commit,
 		 * the cached copy from get_commit_buffer, we need to duplicate it
 		 * to avoid munging the cached copy.
 		 */
-		if (msg == get_cached_commit_buffer(commit))
+		if (msg == get_cached_commit_buffer(commit, NULL))
 			out = xstrdup(msg);
 		else
 			out = (char *)msg;
diff --git a/sequencer.c b/sequencer.c
index 5609ab3..d83f810 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -666,7 +666,7 @@ static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
 	int subject_len;
 
 	for (cur = todo_list; cur; cur = cur->next) {
-		const char *commit_buffer = get_commit_buffer(cur->item);
+		const char *commit_buffer = get_commit_buffer(cur->item, NULL);
 		sha1_abbrev = find_unique_abbrev(cur->item->object.sha1, DEFAULT_ABBREV);
 		subject_len = find_commit_subject(commit_buffer, &subject);
 		strbuf_addf(buf, "%s %s %.*s\n", action_str, sha1_abbrev,
diff --git a/sha1_name.c b/sha1_name.c
index 0a65d23..c2c938c 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -869,7 +869,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
 		if (!parse_object(commit->object.sha1))
 			continue;
-		buf = get_commit_buffer(commit);
+		buf = get_commit_buffer(commit, NULL);
 		p = strstr(buf, "\n\n");
 		matches = p && !regexec(&regex, p + 2, 0, NULL, 0);
 		unuse_commit_buffer(commit, buf);
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/15] use get_commit_buffer everywhere
  2014-06-09 18:13 ` [PATCH 12/15] use get_commit_buffer everywhere Jeff King
@ 2014-06-09 22:40   ` Junio C Hamano
  2014-06-10  0:02     ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2014-06-09 22:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> diff --git a/notes-merge.c b/notes-merge.c
> index 94a1a8a..7885ab2 100644
> --- a/notes-merge.c
> +++ b/notes-merge.c
> @@ -671,7 +671,8 @@ int notes_merge_commit(struct notes_merge_options *o,
>  	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf path = STRBUF_INIT;
> -	char *msg = strstr(partial_commit->buffer, "\n\n");
> +	const char *buffer = get_commit_buffer(partial_commit);
> +	const char *msg = strstr(buffer, "\n\n");

This tightening causes...

>  	struct strbuf sb_msg = STRBUF_INIT;
>  	int baselen;
>  
> @@ -720,6 +721,7 @@ int notes_merge_commit(struct notes_merge_options *o,
>  	}
>  
>  	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);

...a new error here:

notes-merge.c:723:2: error: passing argument 2 of 'strbuf_attach'
discards 'const' qualifier from pointer target type [-Werror]
strbuf.h:19:13: note: expected 'void *' but argument is of type
'const char *'

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/15] use get_commit_buffer everywhere
  2014-06-09 22:40   ` Junio C Hamano
@ 2014-06-10  0:02     ` Jeff King
  2014-06-10  0:22       ` Jeff King
  2014-06-10 16:06       ` Junio C Hamano
  0 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10  0:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Mon, Jun 09, 2014 at 03:40:57PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > diff --git a/notes-merge.c b/notes-merge.c
> > index 94a1a8a..7885ab2 100644
> > --- a/notes-merge.c
> > +++ b/notes-merge.c
> > @@ -671,7 +671,8 @@ int notes_merge_commit(struct notes_merge_options *o,
> >  	DIR *dir;
> >  	struct dirent *e;
> >  	struct strbuf path = STRBUF_INIT;
> > -	char *msg = strstr(partial_commit->buffer, "\n\n");
> > +	const char *buffer = get_commit_buffer(partial_commit);
> > +	const char *msg = strstr(buffer, "\n\n");
> 
> This tightening causes...
> 
> >  	struct strbuf sb_msg = STRBUF_INIT;
> >  	int baselen;
> >  
> > @@ -720,6 +721,7 @@ int notes_merge_commit(struct notes_merge_options *o,
> >  	}
> >  
> >  	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
> 
> ...a new error here:
> 
> notes-merge.c:723:2: error: passing argument 2 of 'strbuf_attach'
> discards 'const' qualifier from pointer target type [-Werror]
> strbuf.h:19:13: note: expected 'void *' but argument is of type
> 'const char *'

That's weird. I compile with -Wall -Werror, and my gcc doesn't complain.
Hmph.

I agree it's not right, though. I think the original is questionable,
too. It takes a pointer into the middle of partial_commit->buffer and
attaches it to a strbuf. That's wrong because:

  1. It's pointing into the middle of an allocated buffer, not the
     beginning.

  2. We do not own partial_commit->buffer in the first place.

So any call to strbuf_detach on the result would be disastrous. The
compiler doesn't notice because of the const leak in strstr, and it
doesn't cause a bug in practice because the only use of the strbuf is to
pass it as a const to create_notes_commit.

I feel like the most elegant solution is for create_notes_commit to take
a buf/len pair rather than a strbuf, but it unfortunately is just
feeding that to commit_tree. Adjusting that code path would affect quite
a few other spots.

The other obvious option is actually populating the strbuf, but it feels
ugly to have to make a copy just to satisfy the function interface.

Maybe a cast and a warning comment are the least evil thing, as below? I
dunno, it feels pretty wrong.

diff --git a/notes-merge.c b/notes-merge.c
index 94a1a8a..1f3b309 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -671,7 +671,7 @@ int notes_merge_commit(struct notes_merge_options *o,
 	DIR *dir;
 	struct dirent *e;
 	struct strbuf path = STRBUF_INIT;
-	char *msg = strstr(partial_commit->buffer, "\n\n");
+	const char *msg = strstr(partial_commit->buffer, "\n\n");
 	struct strbuf sb_msg = STRBUF_INIT;
 	int baselen;
 
@@ -719,7 +719,15 @@ int notes_merge_commit(struct notes_merge_options *o,
 		strbuf_setlen(&path, baselen);
 	}
 
-	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
+	/*
+	 * This is a bit tricky. We should not be attaching msg, which
+	 * is not owned by us and is not even the start of a heap buffer, to a
+	 * strbuf. But the create_notes_commit interface really wants
+	 * a strbuf, even though it will only ever use it as a buf/len pair and
+	 * never modify it. So this is tentatively safe as long as nobody ever
+	 * modifies, detaches, or releases the strbuf.
+	 */
+	strbuf_attach(&sb_msg, (char *)msg, strlen(msg), strlen(msg) + 1);
 	create_notes_commit(partial_tree, partial_commit->parents, &sb_msg,
 			    result_sha1);
 	if (o->verbosity >= 4)

I'm still confused and disturbed that my gcc is not noticing this
obvious const violation. Hmm, shutting off ccache seems to make it work.
Doubly disturbing.

-Peff

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/15] use get_commit_buffer everywhere
  2014-06-10  0:02     ` Jeff King
@ 2014-06-10  0:22       ` Jeff King
  2014-06-10 16:06       ` Junio C Hamano
  1 sibling, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Mon, Jun 09, 2014 at 08:02:24PM -0400, Jeff King wrote:

> I'm still confused and disturbed that my gcc is not noticing this
> obvious const violation. Hmm, shutting off ccache seems to make it work.
> Doubly disturbing.

Ah, mystery solved. It's a gcc bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60014

I get:

  $ gcc -c -Wall -Werror -DSHA1_HEADER='"block-sha1/sha1.h"' notes-merge.c
  notes-merge.c: In function ‘notes_merge_commit’:
  notes-merge.c:723:2: error: passing argument 2 of ‘strbuf_attach’
    discards ‘const’ qualifier from pointer target type [-Werror]
  ...etc...

  $ gcc -E -Wall -Werror -DSHA1_HEADER='"block-sha1/sha1.h"' notes-merge.c >foo.c
  $ gcc -c -Wall -Werror -DSHA1_HEADER='"block-sha1/sha1.h"' foo.c
  [no warnings from either]

ccache uses the latter technique.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 04/15] logmsg_reencode: return const buffer
  2014-06-09 18:10 ` [PATCH 04/15] logmsg_reencode: return const buffer Jeff King
@ 2014-06-10  1:36   ` Eric Sunshine
  0 siblings, 0 replies; 60+ messages in thread
From: Eric Sunshine @ 2014-06-10  1:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano, Christian Couder, Jakub Narebski

On Mon, Jun 9, 2014 at 2:10 PM, Jeff King <peff@peff.net> wrote:
> The return value from logmsg_reencode may be either a newly
> allocated buffer or a pointer to the existing commit->buffer.
> We would not want the caller to accidentally free() or
> modify the latter, so let's mark it as const.  We can cast
> away the constness in logmsg_free, but only once we have
> determined that it is a free-able buffer.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> index 71e2337..47e5b7a 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2788,7 +2788,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>         int retval;
>         const char *encoding;
> -       char *message;
> +       const char *message;
>         struct strbuf buf = STRBUF_INIT;
>
>         if (!opt->grep_filter.pattern_list && !opt->grep_filter.header_list)
> @@ -2830,12 +2830,18 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>                 format_display_notes(commit->object.sha1, &buf, encoding, 1);
>         }
>
> -       /* Find either in the original commit message, or in the temporary */
> +       /* Find either in the original commit message, or in the temporary.

Style:

    /*
     * Find either...
     */

> +        * Note that we cast away the constness of "message" here. It is
> +        * const because it may come from the cached commit buffer. That's OK,
> +        * because we know that it is modifiable heap memory, and that while
> +        * grep_buffer may modify it for speed, it will restore any
> +        * changes before returning.
> +        */
>         if (buf.len)
>                 retval = grep_buffer(&opt->grep_filter, buf.buf, buf.len);
>         else
>                 retval = grep_buffer(&opt->grep_filter,
> -                                    message, strlen(message));
> +                                    (char *)message, strlen(message));
>         strbuf_release(&buf);
>         logmsg_free(message, commit);
>         return retval;
> --
> 2.0.0.729.g520999f
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/15] commit: record buffer length in cache
  2014-06-09 18:15 ` [PATCH 15/15] commit: record buffer length in cache Jeff King
@ 2014-06-10  5:12   ` Christian Couder
  2014-06-10  5:27     ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Christian Couder @ 2014-06-10  5:12 UTC (permalink / raw)
  To: peff; +Cc: git, gitster, jnareb, sunshine

From: Jeff King <peff@peff.net>
>
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2313,7 +2313,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
>  		    ident, ident, path,
>  		    (!contents_from ? path :
>  		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
> -	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
> +	set_commit_buffer(commit, msg.buf, msg.len);

I find the above strange. I would have done something like:

-	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
+	size_t size;
+	char *buf = strbuf_detach(&msg, &size);
+	set_commit_buffer(commit, buf, size);

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/15] commit: record buffer length in cache
  2014-06-10  5:12   ` Christian Couder
@ 2014-06-10  5:27     ` Jeff King
  2014-06-10 20:33       ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-10  5:27 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, gitster, jnareb, sunshine

On Tue, Jun 10, 2014 at 07:12:37AM +0200, Christian Couder wrote:

> From: Jeff King <peff@peff.net>
> >
> > --- a/builtin/blame.c
> > +++ b/builtin/blame.c
> > @@ -2313,7 +2313,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
> >  		    ident, ident, path,
> >  		    (!contents_from ? path :
> >  		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
> > -	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
> > +	set_commit_buffer(commit, msg.buf, msg.len);
> 
> I find the above strange. I would have done something like:
> 
> -	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
> +	size_t size;
> +	char *buf = strbuf_detach(&msg, &size);
> +	set_commit_buffer(commit, buf, size);

It is a little strange. You can't do:

  set_commit_buffer(commit, strbuf_detach(&msg, NULL), msg.len);

because the second argument resets msg.len as a side effect. Doing it
your way is longer, but perhaps is a bit more canonical. In general, one
should call strbuf_detach to ensure that the buffer is allocated (and
does not point to slopbuf). That's guaranteed here, because we just put
contents into the buffer, but it's probably more hygienic to use the
more verbose form.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/15] provide helpers to access the commit buffer
  2014-06-09 18:11 ` [PATCH 08/15] provide helpers to access " Jeff King
@ 2014-06-10  8:00   ` Eric Sunshine
  0 siblings, 0 replies; 60+ messages in thread
From: Eric Sunshine @ 2014-06-10  8:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Christian Couder, Jakub Narebski

On Monday, June 9, 2014, Jeff King <peff@peff.net> wrote:
> Many sites look at commit->buffer to get more detailed
> information than what is in the parsed commit struct.
> However, we sometimes drop commit->buffer to save memory,
> in which case the caller would need to read the object
> afresh. Some callers do this (leading to duplicated code),
> and others do not (which opens the possibility of a segfault
> if somebody else frees the buffer).
>
> Let's provide a pair of helpers, "get" and "unuse", that let
> callers easily get the buffer. They will use the cached
> buffer when possible, and otherwise load from disk using
> read_sha1_file.
>
> Note that we also need to add a "get_cached" variant which
> returns NULL when we do not have a cached buffer. At first
> glance this seems to defeat the purpose of "get", which is
> to always provide a return value. However, some log code
> paths actually use the NULL-ness of commit->buffer as a
> boolean flag to decide whether to try to printing the

s/printing/print/ ... or some other variation of the phrase.

> commit. At least for now, we want to continue supporting
> that use.
>
> Signed-off-by: Jeff King <peff@peff.net>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/15] use get_commit_buffer to avoid duplicate code
  2014-06-09 18:12 ` [PATCH 10/15] use get_commit_buffer to avoid duplicate code Jeff King
@ 2014-06-10  8:01   ` Eric Sunshine
  2014-06-10 20:34     ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Eric Sunshine @ 2014-06-10  8:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Christian Couder, Jakub Narebski

On Monday, June 9, 2014, Jeff King <peff@peff.net> wrote:
> For both of these sites, we already do the "fallback to
> read_sha1_file" trick. But we can shorten the code by just
> using get_commit_buffer.
>
> Note that the error cases are slightly different when
> read_sha1_file fails. get_commit_buffer will die() if the
> object cannot be loaded, or is a non-commit.
>
> For get_sha1_oneline, this will almost certainly never
> happen, as we will have just called parse_object (and if it
> does, it's probably worth complaining about).
>
> For record_author_date, the new behavior is probably better;
> we notify the user of the error instead of silently ignoring
> it. And because it's used only for sorting by author-date,
> somebody examining a corrupt can fallback to the regular

Missing word here? "examining a corrupt [...] can"

> traversal order.
>
> Signed-off-by: Jeff King <peff@peff.net>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/15] use get_commit_buffer everywhere
  2014-06-10  0:02     ` Jeff King
  2014-06-10  0:22       ` Jeff King
@ 2014-06-10 16:06       ` Junio C Hamano
  2014-06-10 21:27         ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2014-06-10 16:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> I agree it's not right, though. I think the original is questionable,
> too. It takes a pointer into the middle of partial_commit->buffer and
> attaches it to a strbuf. That's wrong because:
>
>   1. It's pointing into the middle of an allocated buffer, not the
>      beginning.
>
>   2. We do not own partial_commit->buffer in the first place.
>
> So any call to strbuf_detach on the result would be disastrous.

You are right.  Where did this original crap come from X-<...

> ... and it
> doesn't cause a bug in practice because the only use of the strbuf is to
> pass it as a const to create_notes_commit.
>
> I feel like the most elegant solution is for create_notes_commit to take
> a buf/len pair rather than a strbuf, but it unfortunately is just
> feeding that to commit_tree. Adjusting that code path would affect quite
> a few other spots.
>
> The other obvious option is actually populating the strbuf, but it feels
> ugly to have to make a copy just to satisfy the function interface.
>
> Maybe a cast and a warning comment are the least evil thing, as below? I
> dunno, it feels pretty wrong.

Yeah, it does feel wrong wrong wrong.  Perhaps this big comment
would serve as a marker for a low-hanging fruit for somebody else to
fix it, e.g. by using strbuf-add to make a copy, which would be the
easiest and safest workaround?

> diff --git a/notes-merge.c b/notes-merge.c
> index 94a1a8a..1f3b309 100644
> --- a/notes-merge.c
> +++ b/notes-merge.c
> @@ -671,7 +671,7 @@ int notes_merge_commit(struct notes_merge_options *o,
>  	DIR *dir;
>  	struct dirent *e;
>  	struct strbuf path = STRBUF_INIT;
> -	char *msg = strstr(partial_commit->buffer, "\n\n");
> +	const char *msg = strstr(partial_commit->buffer, "\n\n");
>  	struct strbuf sb_msg = STRBUF_INIT;
>  	int baselen;
>  
> @@ -719,7 +719,15 @@ int notes_merge_commit(struct notes_merge_options *o,
>  		strbuf_setlen(&path, baselen);
>  	}
>  
> -	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
> +	/*
> +	 * This is a bit tricky. We should not be attaching msg, which
> +	 * is not owned by us and is not even the start of a heap buffer, to a
> +	 * strbuf. But the create_notes_commit interface really wants
> +	 * a strbuf, even though it will only ever use it as a buf/len pair and
> +	 * never modify it. So this is tentatively safe as long as nobody ever
> +	 * modifies, detaches, or releases the strbuf.
> +	 */
> +	strbuf_attach(&sb_msg, (char *)msg, strlen(msg), strlen(msg) + 1);
>  	create_notes_commit(partial_tree, partial_commit->parents, &sb_msg,
>  			    result_sha1);
>  	if (o->verbosity >= 4)
>
> I'm still confused and disturbed that my gcc is not noticing this
> obvious const violation. Hmm, shutting off ccache seems to make it work.
> Doubly disturbing.
>
> -Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/15] commit: record buffer length in cache
  2014-06-10  5:27     ` Jeff King
@ 2014-06-10 20:33       ` Jeff King
  2014-06-10 21:30         ` Christian Couder
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-10 20:33 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, gitster, jnareb, sunshine

On Tue, Jun 10, 2014 at 01:27:13AM -0400, Jeff King wrote:

> > I find the above strange. I would have done something like:
> > 
> > -	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
> > +	size_t size;
> > +	char *buf = strbuf_detach(&msg, &size);
> > +	set_commit_buffer(commit, buf, size);
> 
> It is a little strange. You can't do:
> 
>   set_commit_buffer(commit, strbuf_detach(&msg, NULL), msg.len);
> 
> because the second argument resets msg.len as a side effect. Doing it
> your way is longer, but perhaps is a bit more canonical. In general, one
> should call strbuf_detach to ensure that the buffer is allocated (and
> does not point to slopbuf). That's guaranteed here, because we just put
> contents into the buffer, but it's probably more hygienic to use the
> more verbose form.

I was trying to avoid cluttering up the function with the extra variable
definitions. I ended up with the change below, which I think is clear,
correct, and pushes the complexity outside of the function.

diff --git a/builtin/blame.c b/builtin/blame.c
index cde19eb..53f43ab 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2266,6 +2266,18 @@ static void append_merge_parents(struct commit_list **tail)
 }
 
 /*
+ * This isn't as simple as passing sb->buf and sb->len, because we
+ * want to transfer ownership of the buffer to the commit (so we
+ * must use detach).
+ */
+static void set_commit_buffer_from_strbuf(struct commit *c, struct strbuf *sb)
+{
+	size_t len;
+	void *buf = strbuf_detach(sb, &len);
+	set_commit_buffer(c, buf, len);
+}
+
+/*
  * Prepare a dummy commit that represents the work tree (or staged) item.
  * Note that annotating work tree item never works in the reverse.
  */
@@ -2313,7 +2325,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		    ident, ident, path,
 		    (!contents_from ? path :
 		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
-	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
+	set_commit_buffer_from_strbuf(commit, &msg);
 
 	if (!contents_from || strcmp("-", contents_from)) {
 		struct stat st;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/15] use get_commit_buffer to avoid duplicate code
  2014-06-10  8:01   ` Eric Sunshine
@ 2014-06-10 20:34     ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 20:34 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, Junio C Hamano, Christian Couder, Jakub Narebski

On Tue, Jun 10, 2014 at 04:01:29AM -0400, Eric Sunshine wrote:

> > For record_author_date, the new behavior is probably better;
> > we notify the user of the error instead of silently ignoring
> > it. And because it's used only for sorting by author-date,
> > somebody examining a corrupt can fallback to the regular
> 
> Missing word here? "examining a corrupt [...] can"

It was "repo". Thanks, fixed this and your other comments for my
re-roll (I'm still looking into the best solution for the strbuf_attach
issue Junio mentioned).

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 12/15] use get_commit_buffer everywhere
  2014-06-10 16:06       ` Junio C Hamano
@ 2014-06-10 21:27         ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Tue, Jun 10, 2014 at 09:06:35AM -0700, Junio C Hamano wrote:

> > So any call to strbuf_detach on the result would be disastrous.
> 
> You are right.  Where did this original crap come from X-<...

I do not know if that face means you actually looked at the history, but
in case you did not...

It was added by Duy's 13f8b72 (Convert commit_tree() to take strbuf as
message, 2011-12-15). However that was v2 of his patch. If you read the
original thread, you can see that v1 passed a separate pointer/length
pair, and was only changed after a reviewer-who-shall-not-be-named asked
him to change it. ;)

Of course there were many people participating in the review, and none
of us noticed it. I think it is simply a subtle bug.

> > I feel like the most elegant solution is for create_notes_commit to take
> > a buf/len pair rather than a strbuf, but it unfortunately is just
> > feeding that to commit_tree. Adjusting that code path would affect quite
> > a few other spots.
> >
> > The other obvious option is actually populating the strbuf, but it feels
> > ugly to have to make a copy just to satisfy the function interface.
> >
> > Maybe a cast and a warning comment are the least evil thing, as below? I
> > dunno, it feels pretty wrong.
> 
> Yeah, it does feel wrong wrong wrong.  Perhaps this big comment
> would serve as a marker for a low-hanging fruit for somebody else to
> fix it, e.g. by using strbuf-add to make a copy, which would be the
> easiest and safest workaround?

I really think commit_tree is the culprit here. It doesn't actually want
a strbuf at all, but takes one to make passing the pointer/len pair
simpler. Fixing it turned out to be not _too_ disruptive, and it showed
that there is another dubious use of strbuf_attach from a different
caller.

I'll post my re-rolled series with those fixes in a moment.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/15] commit: record buffer length in cache
  2014-06-10 20:33       ` Jeff King
@ 2014-06-10 21:30         ` Christian Couder
  0 siblings, 0 replies; 60+ messages in thread
From: Christian Couder @ 2014-06-10 21:30 UTC (permalink / raw)
  To: peff; +Cc: git, gitster, jnareb, sunshine

From: Jeff King <peff@peff.net>
Subject: Re: [PATCH 15/15] commit: record buffer length in cache
Date: Tue, 10 Jun 2014 16:33:49 -0400

> On Tue, Jun 10, 2014 at 01:27:13AM -0400, Jeff King wrote:
> 
>> > I find the above strange. I would have done something like:
>> > 
>> > -	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
>> > +	size_t size;
>> > +	char *buf = strbuf_detach(&msg, &size);
>> > +	set_commit_buffer(commit, buf, size);
>> 
>> It is a little strange. You can't do:
>> 
>>   set_commit_buffer(commit, strbuf_detach(&msg, NULL), msg.len);
>> 
>> because the second argument resets msg.len as a side effect. Doing it
>> your way is longer, but perhaps is a bit more canonical. In general, one
>> should call strbuf_detach to ensure that the buffer is allocated (and
>> does not point to slopbuf). That's guaranteed here, because we just put
>> contents into the buffer, but it's probably more hygienic to use the
>> more verbose form.
> 
> I was trying to avoid cluttering up the function with the extra variable
> definitions. I ended up with the change below, which I think is clear,
> correct, and pushes the complexity outside of the function.

Yeah, great!
 
> diff --git a/builtin/blame.c b/builtin/blame.c
> index cde19eb..53f43ab 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2266,6 +2266,18 @@ static void append_merge_parents(struct commit_list **tail)
>  }
>  
>  /*
> + * This isn't as simple as passing sb->buf and sb->len, because we
> + * want to transfer ownership of the buffer to the commit (so we
> + * must use detach).
> + */
> +static void set_commit_buffer_from_strbuf(struct commit *c, struct strbuf *sb)
> +{
> +	size_t len;
> +	void *buf = strbuf_detach(sb, &len);
> +	set_commit_buffer(c, buf, len);
> +}
> +
> +/*
>   * Prepare a dummy commit that represents the work tree (or staged) item.
>   * Note that annotating work tree item never works in the reverse.
>   */
> @@ -2313,7 +2325,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
>  		    ident, ident, path,
>  		    (!contents_from ? path :
>  		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
> -	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
> +	set_commit_buffer_from_strbuf(commit, &msg);
>  
>  	if (!contents_from || strcmp("-", contents_from)) {
>  		struct stat st;

Thanks,
Christian. 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/15] commit: push commit_index update into alloc_commit_node
  2014-06-09 18:09 ` [PATCH 02/15] commit: push commit_index update into alloc_commit_node Jeff King
@ 2014-06-10 21:31   ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2014-06-10 21:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> This will make the alloc_report output a little uglier (it will say
> raw_commit), but I don't think anyone cares. And I wanted to make sure
> there wasn't an easy way to accidentally call the wrong alloc function
> that doesn't handle the index.

Thanks; I like this change.

>
>  alloc.c  | 12 ++++++++++--
>  commit.c |  2 --
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/alloc.c b/alloc.c
> index 38ff7e7..eb22a45 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -47,10 +47,18 @@ union any_object {
>  
>  DEFINE_ALLOCATOR(blob, struct blob)
>  DEFINE_ALLOCATOR(tree, struct tree)
> -DEFINE_ALLOCATOR(commit, struct commit)
> +DEFINE_ALLOCATOR(raw_commit, struct commit)
>  DEFINE_ALLOCATOR(tag, struct tag)
>  DEFINE_ALLOCATOR(object, union any_object)
>  
> +void *alloc_commit_node(void)
> +{
> +	static int commit_count;
> +	struct commit *c = alloc_raw_commit_node();
> +	c->index = commit_count++;
> +	return c;
> +}
> +
>  static void report(const char *name, unsigned int count, size_t size)
>  {
>  	fprintf(stderr, "%10s: %8u (%"PRIuMAX" kB)\n",
> @@ -64,7 +72,7 @@ void alloc_report(void)
>  {
>  	REPORT(blob, struct blob);
>  	REPORT(tree, struct tree);
> -	REPORT(commit, struct commit);
> +	REPORT(raw_commit, struct commit);
>  	REPORT(tag, struct tag);
>  	REPORT(object, union any_object);
>  }
> diff --git a/commit.c b/commit.c
> index f479331..21957ee 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -17,7 +17,6 @@ static struct commit_extra_header *read_commit_extra_header_lines(const char *bu
>  int save_commit_buffer = 1;
>  
>  const char *commit_type = "commit";
> -static int commit_count;
>  
>  static struct commit *check_commit(struct object *obj,
>  				   const unsigned char *sha1,
> @@ -64,7 +63,6 @@ struct commit *lookup_commit(const unsigned char *sha1)
>  	struct object *obj = lookup_object(sha1);
>  	if (!obj) {
>  		struct commit *c = alloc_commit_node();
> -		c->index = commit_count++;
>  		return create_object(sha1, OBJ_COMMIT, c);
>  	}
>  	if (!obj->type)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2 0/17] store length of commit->buffer
  2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
                   ` (14 preceding siblings ...)
  2014-06-09 18:15 ` [PATCH 15/15] commit: record buffer length in cache Jeff King
@ 2014-06-10 21:35 ` Jeff King
  2014-06-10 21:36   ` [PATCH 01/17] commit_tree: take a pointer/len pair rather than a const strbuf Jeff King
                     ` (18 more replies)
  15 siblings, 19 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Here's a re-roll of the commit-slab series. It fixes the issues pointed
out by Eric and Christian (thanks, both).

It adds two new patches at the beginning to clean up the dangerous
strbufs that we discussed elsewhere. And as a result, silences the
warning from the old 12/15. I even compiled with a working copy of gcc
to confirm. :)

  [01/17]: commit_tree: take a pointer/len pair rather than a const strbuf
  [02/17]: replace dangerous uses of strbuf_attach
  [03/17]: alloc: include any-object allocations in alloc_report
  [04/17]: commit: push commit_index update into alloc_commit_node
  [05/17]: do not create "struct commit" with xcalloc
  [06/17]: logmsg_reencode: return const buffer
  [07/17]: sequencer: use logmsg_reencode in get_message
  [08/17]: provide a helper to free commit buffer
  [09/17]: provide a helper to set the commit buffer
  [10/17]: provide helpers to access the commit buffer
  [11/17]: use get_cached_commit_buffer where appropriate
  [12/17]: use get_commit_buffer to avoid duplicate code
  [13/17]: convert logmsg_reencode to get_commit_buffer
  [14/17]: use get_commit_buffer everywhere
  [15/17]: commit-slab: provide a static initializer
  [16/17]: commit: convert commit->buffer to a slab
  [17/17]: commit: record buffer length in cache

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/15] do not create "struct commit" with xcalloc
  2014-06-09 18:10 ` [PATCH 03/15] do not create "struct commit" with xcalloc Jeff King
@ 2014-06-10 21:35   ` Junio C Hamano
  2014-06-10 21:49     ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2014-06-10 21:35 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> In both blame and merge-recursive, we sometimes create a
> "fake" commit struct for convenience (e.g., to represent the
> HEAD state as if we would commit it). By allocating
> ourselves rather than using alloc_commit_node, we do not
> properly set the "index" field of the commit. This can
> produce subtle bugs if we then use commit-slab on the
> resulting commit, as we will share the "0" index with
> another commit.

The change I see here is all good, but I wonder if it is too late to
start the real index from 1 and catch anything that has 0 index with
a BUG("do not hand-allocate a commit").

> We can fix this by using alloc_commit_node() to allocate.
> Note that we cannot free the result, as it is part of our
> commit allocator. However, both cases were already leaking
> the allocated commit anyway, so there's nothing to fix up.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/blame.c   | 2 +-
>  merge-recursive.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index a52a279..d6056a5 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2286,7 +2286,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
>  	struct strbuf msg = STRBUF_INIT;
>  
>  	time(&now);
> -	commit = xcalloc(1, sizeof(*commit));
> +	commit = alloc_commit_node();
>  	commit->object.parsed = 1;
>  	commit->date = now;
>  	commit->object.type = OBJ_COMMIT;
> diff --git a/merge-recursive.c b/merge-recursive.c
> index cab16fa..2b37d42 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -40,7 +40,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
>  
>  static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
>  {
> -	struct commit *commit = xcalloc(1, sizeof(struct commit));
> +	struct commit *commit = alloc_commit_node();
>  	struct merge_remote_desc *desc = xmalloc(sizeof(*desc));
>  
>  	desc->name = comment;

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/17] commit_tree: take a pointer/len pair rather than a const strbuf
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
@ 2014-06-10 21:36   ` Jeff King
  2014-06-10 21:38   ` [PATCH 02/17] replace dangerous uses of strbuf_attach Jeff King
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:36 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

While strbufs are pretty common throughout our code, it is
more flexible for functions to take a pointer/len pair than
a strbuf. It's easy to turn a strbuf into such a pair (by
dereferencing its members), but less easy to go the other
way (you can strbuf_attach, but that has implications about
memory ownership).

This patch teaches commit_tree (and its associated callers
and sub-functions) to take such a pair for the commit
message rather than a strbuf.  This makes passing the buffer
around slightly more verbose, but means we can get rid of
some dangerous strbuf_attach calls in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
---
Sadly, the diffstat is not +x/-x. Rewrapping the longer lines
gave us a few extra lines.

 builtin/commit-tree.c |  4 ++--
 builtin/commit.c      |  4 ++--
 builtin/merge.c       |  8 ++++----
 commit.c              | 12 +++++++-----
 commit.h              |  6 ++++--
 notes-cache.c         |  2 +-
 notes-merge.c         |  6 ++++--
 notes-utils.c         |  7 ++++---
 notes-utils.h         |  2 +-
 9 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index 987a4c3..8a66c74 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -123,8 +123,8 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
 			die_errno("git commit-tree: failed to read");
 	}
 
-	if (commit_tree(&buffer, tree_sha1, parents, commit_sha1,
-			NULL, sign_commit)) {
+	if (commit_tree(buffer.buf, buffer.len, tree_sha1, parents,
+			commit_sha1, NULL, sign_commit)) {
 		strbuf_release(&buffer);
 		return 1;
 	}
diff --git a/builtin/commit.c b/builtin/commit.c
index 99c2044..ef300f1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1731,8 +1731,8 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		append_merge_tag_headers(parents, &tail);
 	}
 
-	if (commit_tree_extended(&sb, active_cache_tree->sha1, parents, sha1,
-				 author_ident.buf, sign_commit, extra)) {
+	if (commit_tree_extended(sb.buf, sb.len, active_cache_tree->sha1,
+			 parents, sha1, author_ident.buf, sign_commit, extra)) {
 		rollback_index_files();
 		die(_("failed to write commit object"));
 	}
diff --git a/builtin/merge.c b/builtin/merge.c
index 428ca24..b49c310 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -852,8 +852,8 @@ static int merge_trivial(struct commit *head, struct commit_list *remoteheads)
 	parent->next->item = remoteheads->item;
 	parent->next->next = NULL;
 	prepare_to_commit(remoteheads);
-	if (commit_tree(&merge_msg, result_tree, parent, result_commit, NULL,
-			sign_commit))
+	if (commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent,
+			result_commit, NULL, sign_commit))
 		die(_("failed to write commit object"));
 	finish(head, remoteheads, result_commit, "In-index merge");
 	drop_save();
@@ -877,8 +877,8 @@ static int finish_automerge(struct commit *head,
 		commit_list_insert(head, &parents);
 	strbuf_addch(&merge_msg, '\n');
 	prepare_to_commit(remoteheads);
-	if (commit_tree(&merge_msg, result_tree, parents, result_commit,
-			NULL, sign_commit))
+	if (commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents,
+			result_commit, NULL, sign_commit))
 		die(_("failed to write commit object"));
 	strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy);
 	finish(head, remoteheads, result_commit, buf.buf);
diff --git a/commit.c b/commit.c
index f479331..bd3d5af 100644
--- a/commit.c
+++ b/commit.c
@@ -1344,7 +1344,8 @@ void free_commit_extra_headers(struct commit_extra_header *extra)
 	}
 }
 
-int commit_tree(const struct strbuf *msg, const unsigned char *tree,
+int commit_tree(const char *msg, size_t msg_len,
+		const unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author, const char *sign_commit)
 {
@@ -1352,7 +1353,7 @@ int commit_tree(const struct strbuf *msg, const unsigned char *tree,
 	int result;
 
 	append_merge_tag_headers(parents, &tail);
-	result = commit_tree_extended(msg, tree, parents, ret,
+	result = commit_tree_extended(msg, msg_len, tree, parents, ret,
 				      author, sign_commit, extra);
 	free_commit_extra_headers(extra);
 	return result;
@@ -1473,7 +1474,8 @@ static const char commit_utf8_warn[] =
 "You may want to amend it after fixing the message, or set the config\n"
 "variable i18n.commitencoding to the encoding your project uses.\n";
 
-int commit_tree_extended(const struct strbuf *msg, const unsigned char *tree,
+int commit_tree_extended(const char *msg, size_t msg_len,
+			 const unsigned char *tree,
 			 struct commit_list *parents, unsigned char *ret,
 			 const char *author, const char *sign_commit,
 			 struct commit_extra_header *extra)
@@ -1484,7 +1486,7 @@ int commit_tree_extended(const struct strbuf *msg, const unsigned char *tree,
 
 	assert_sha1_type(tree, OBJ_TREE);
 
-	if (memchr(msg->buf, '\0', msg->len))
+	if (memchr(msg, '\0', msg_len))
 		return error("a NUL byte in commit log message not allowed.");
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
@@ -1523,7 +1525,7 @@ int commit_tree_extended(const struct strbuf *msg, const unsigned char *tree,
 	strbuf_addch(&buffer, '\n');
 
 	/* And add the comment */
-	strbuf_addbuf(&buffer, msg);
+	strbuf_add(&buffer, msg, msg_len);
 
 	/* And check the encoding */
 	if (encoding_is_utf8 && !verify_utf8(&buffer))
diff --git a/commit.h b/commit.h
index a9f177b..1cb55ae 100644
--- a/commit.h
+++ b/commit.h
@@ -261,11 +261,13 @@ struct commit_extra_header {
 extern void append_merge_tag_headers(struct commit_list *parents,
 				     struct commit_extra_header ***tail);
 
-extern int commit_tree(const struct strbuf *msg, const unsigned char *tree,
+extern int commit_tree(const char *msg, size_t msg_len,
+		       const unsigned char *tree,
 		       struct commit_list *parents, unsigned char *ret,
 		       const char *author, const char *sign_commit);
 
-extern int commit_tree_extended(const struct strbuf *msg, const unsigned char *tree,
+extern int commit_tree_extended(const char *msg, size_t msg_len,
+				const unsigned char *tree,
 				struct commit_list *parents, unsigned char *ret,
 				const char *author, const char *sign_commit,
 				struct commit_extra_header *);
diff --git a/notes-cache.c b/notes-cache.c
index 97dfd63..4ad0799 100644
--- a/notes-cache.c
+++ b/notes-cache.c
@@ -59,7 +59,7 @@ int notes_cache_write(struct notes_cache *c)
 		return -1;
 	strbuf_attach(&msg, c->validity,
 		      strlen(c->validity), strlen(c->validity) + 1);
-	if (commit_tree(&msg, tree_sha1, NULL, commit_sha1, NULL, NULL) < 0)
+	if (commit_tree(msg.buf, msg.len, tree_sha1, NULL, commit_sha1, NULL, NULL) < 0)
 		return -1;
 	if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL,
 		       0, UPDATE_REFS_QUIET_ON_ERR) < 0)
diff --git a/notes-merge.c b/notes-merge.c
index 94a1a8a..9d94210 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -644,7 +644,8 @@ int notes_merge(struct notes_merge_options *o,
 		struct commit_list *parents = NULL;
 		commit_list_insert(remote, &parents); /* LIFO order */
 		commit_list_insert(local, &parents);
-		create_notes_commit(local_tree, parents, &o->commit_msg,
+		create_notes_commit(local_tree, parents,
+				    o->commit_msg.buf, o->commit_msg.len,
 				    result_sha1);
 	}
 
@@ -720,7 +721,8 @@ int notes_merge_commit(struct notes_merge_options *o,
 	}
 
 	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
-	create_notes_commit(partial_tree, partial_commit->parents, &sb_msg,
+	create_notes_commit(partial_tree, partial_commit->parents,
+			    sb_msg.buf, sb_msg.len,
 			    result_sha1);
 	if (o->verbosity >= 4)
 		printf("Finalized notes merge commit: %s\n",
diff --git a/notes-utils.c b/notes-utils.c
index a0b1d7b..b64dc1b 100644
--- a/notes-utils.c
+++ b/notes-utils.c
@@ -4,7 +4,8 @@
 #include "notes-utils.h"
 
 void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
-			 const struct strbuf *msg, unsigned char *result_sha1)
+			 const char *msg, size_t msg_len,
+			 unsigned char *result_sha1)
 {
 	unsigned char tree_sha1[20];
 
@@ -25,7 +26,7 @@ void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
 		/* else: t->ref points to nothing, assume root/orphan commit */
 	}
 
-	if (commit_tree(msg, tree_sha1, parents, result_sha1, NULL, NULL))
+	if (commit_tree(msg, msg_len, tree_sha1, parents, result_sha1, NULL, NULL))
 		die("Failed to commit notes tree to database");
 }
 
@@ -46,7 +47,7 @@ void commit_notes(struct notes_tree *t, const char *msg)
 	if (buf.buf[buf.len - 1] != '\n')
 		strbuf_addch(&buf, '\n'); /* Make sure msg ends with newline */
 
-	create_notes_commit(t, NULL, &buf, commit_sha1);
+	create_notes_commit(t, NULL, buf.buf, buf.len, commit_sha1);
 	strbuf_insert(&buf, 0, "notes: ", 7); /* commit message starts at index 7 */
 	update_ref(buf.buf, t->ref, commit_sha1, NULL, 0,
 		   UPDATE_REFS_DIE_ON_ERR);
diff --git a/notes-utils.h b/notes-utils.h
index 564e30c..890ddb3 100644
--- a/notes-utils.h
+++ b/notes-utils.h
@@ -15,7 +15,7 @@
  * The resulting commit SHA1 is stored in result_sha1.
  */
 void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
-			 const struct strbuf *msg, unsigned char *result_sha1);
+			 const char *msg, size_t msg_len, unsigned char *result_sha1);
 
 void commit_notes(struct notes_tree *t, const char *msg);
 
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/17] replace dangerous uses of strbuf_attach
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
  2014-06-10 21:36   ` [PATCH 01/17] commit_tree: take a pointer/len pair rather than a const strbuf Jeff King
@ 2014-06-10 21:38   ` Jeff King
  2014-06-10 21:38   ` [PATCH 03/17] alloc: include any-object allocations in alloc_report Jeff King
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

It is not a good idea to strbuf_attach an arbitrary pointer
just because a function you are calling wants a strbuf.
Attaching implies a transfer of memory ownership; if anyone
were to modify or release the resulting strbuf, we would
free() the pointer, leading to possible problems:

  1. Other users of the original pointer might access freed
     memory.

  2. The pointer might not be the start of a malloc'd
     area, so calling free() on it in the first place would
     be wrong.

In the two cases modified here, we are fortunate that nobody
touches the strbuf once it is attached, but it is an
accident waiting to happen.  Since the previous commit,
commit_tree and friends take a pointer/buf pair, so we can
just do away with the strbufs entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
I also manually checked the other dozen or so calls to strbuf_attach,
and they all look good.

 notes-cache.c | 6 ++----
 notes-merge.c | 5 +----
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/notes-cache.c b/notes-cache.c
index 4ad0799..c4e9bb7 100644
--- a/notes-cache.c
+++ b/notes-cache.c
@@ -48,7 +48,6 @@ int notes_cache_write(struct notes_cache *c)
 {
 	unsigned char tree_sha1[20];
 	unsigned char commit_sha1[20];
-	struct strbuf msg = STRBUF_INIT;
 
 	if (!c || !c->tree.initialized || !c->tree.ref || !*c->tree.ref)
 		return -1;
@@ -57,9 +56,8 @@ int notes_cache_write(struct notes_cache *c)
 
 	if (write_notes_tree(&c->tree, tree_sha1))
 		return -1;
-	strbuf_attach(&msg, c->validity,
-		      strlen(c->validity), strlen(c->validity) + 1);
-	if (commit_tree(msg.buf, msg.len, tree_sha1, NULL, commit_sha1, NULL, NULL) < 0)
+	if (commit_tree(c->validity, strlen(c->validity), tree_sha1, NULL,
+			commit_sha1, NULL, NULL) < 0)
 		return -1;
 	if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL,
 		       0, UPDATE_REFS_QUIET_ON_ERR) < 0)
diff --git a/notes-merge.c b/notes-merge.c
index 9d94210..697cec3 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -673,7 +673,6 @@ int notes_merge_commit(struct notes_merge_options *o,
 	struct dirent *e;
 	struct strbuf path = STRBUF_INIT;
 	char *msg = strstr(partial_commit->buffer, "\n\n");
-	struct strbuf sb_msg = STRBUF_INIT;
 	int baselen;
 
 	strbuf_addstr(&path, git_path(NOTES_MERGE_WORKTREE));
@@ -720,10 +719,8 @@ int notes_merge_commit(struct notes_merge_options *o,
 		strbuf_setlen(&path, baselen);
 	}
 
-	strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
 	create_notes_commit(partial_tree, partial_commit->parents,
-			    sb_msg.buf, sb_msg.len,
-			    result_sha1);
+			    msg, strlen(msg), result_sha1);
 	if (o->verbosity >= 4)
 		printf("Finalized notes merge commit: %s\n",
 			sha1_to_hex(result_sha1));
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/17] alloc: include any-object allocations in alloc_report
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
  2014-06-10 21:36   ` [PATCH 01/17] commit_tree: take a pointer/len pair rather than a const strbuf Jeff King
  2014-06-10 21:38   ` [PATCH 02/17] replace dangerous uses of strbuf_attach Jeff King
@ 2014-06-10 21:38   ` Jeff King
  2014-06-10 21:39   ` [PATCH 04/17] commit: push commit_index update into alloc_commit_node Jeff King
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

When 2c1cbec (Use proper object allocators for unknown
object nodes too, 2007-04-16), added a special "any_object"
allocator, it never taught alloc_report to report on it. To
do so we need to add an extra type argument to the REPORT
macro, as that commit did for DEFINE_ALLOCATOR.

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/alloc.c b/alloc.c
index f3ee745..38ff7e7 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,13 +57,14 @@ static void report(const char *name, unsigned int count, size_t size)
 			name, count, (uintmax_t) size);
 }
 
-#define REPORT(name)	\
-    report(#name, name##_allocs, name##_allocs * sizeof(struct name) >> 10)
+#define REPORT(name, type)	\
+    report(#name, name##_allocs, name##_allocs * sizeof(type) >> 10)
 
 void alloc_report(void)
 {
-	REPORT(blob);
-	REPORT(tree);
-	REPORT(commit);
-	REPORT(tag);
+	REPORT(blob, struct blob);
+	REPORT(tree, struct tree);
+	REPORT(commit, struct commit);
+	REPORT(tag, struct tag);
+	REPORT(object, union any_object);
 }
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/17] commit: push commit_index update into alloc_commit_node
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (2 preceding siblings ...)
  2014-06-10 21:38   ` [PATCH 03/17] alloc: include any-object allocations in alloc_report Jeff King
@ 2014-06-10 21:39   ` Jeff King
  2014-06-10 21:39   ` [PATCH 05/17] do not create "struct commit" with xcalloc Jeff King
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Whenever we create a commit object via lookup_commit, we
give it a unique index to be used with the commit-slab API.
The theory is that any "struct commit" we create would
follow this code path, so any such struct would get an
index. However, callers could use alloc_commit_node()
directly (and get multiple commits with index 0).

Let's push the indexing into alloc_commit_node so that it's
hard for callers to get it wrong.

Signed-off-by: Jeff King <peff@peff.net>
---
 alloc.c  | 12 ++++++++++--
 commit.c |  2 --
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 38ff7e7..eb22a45 100644
--- a/alloc.c
+++ b/alloc.c
@@ -47,10 +47,18 @@ union any_object {
 
 DEFINE_ALLOCATOR(blob, struct blob)
 DEFINE_ALLOCATOR(tree, struct tree)
-DEFINE_ALLOCATOR(commit, struct commit)
+DEFINE_ALLOCATOR(raw_commit, struct commit)
 DEFINE_ALLOCATOR(tag, struct tag)
 DEFINE_ALLOCATOR(object, union any_object)
 
+void *alloc_commit_node(void)
+{
+	static int commit_count;
+	struct commit *c = alloc_raw_commit_node();
+	c->index = commit_count++;
+	return c;
+}
+
 static void report(const char *name, unsigned int count, size_t size)
 {
 	fprintf(stderr, "%10s: %8u (%"PRIuMAX" kB)\n",
@@ -64,7 +72,7 @@ void alloc_report(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
-	REPORT(commit, struct commit);
+	REPORT(raw_commit, struct commit);
 	REPORT(tag, struct tag);
 	REPORT(object, union any_object);
 }
diff --git a/commit.c b/commit.c
index bd3d5af..fbdc480 100644
--- a/commit.c
+++ b/commit.c
@@ -17,7 +17,6 @@ static struct commit_extra_header *read_commit_extra_header_lines(const char *bu
 int save_commit_buffer = 1;
 
 const char *commit_type = "commit";
-static int commit_count;
 
 static struct commit *check_commit(struct object *obj,
 				   const unsigned char *sha1,
@@ -64,7 +63,6 @@ struct commit *lookup_commit(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj) {
 		struct commit *c = alloc_commit_node();
-		c->index = commit_count++;
 		return create_object(sha1, OBJ_COMMIT, c);
 	}
 	if (!obj->type)
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/17] do not create "struct commit" with xcalloc
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (3 preceding siblings ...)
  2014-06-10 21:39   ` [PATCH 04/17] commit: push commit_index update into alloc_commit_node Jeff King
@ 2014-06-10 21:39   ` Jeff King
  2014-06-10 21:39   ` [PATCH 06/17] logmsg_reencode: return const buffer Jeff King
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

In both blame and merge-recursive, we sometimes create a
"fake" commit struct for convenience (e.g., to represent the
HEAD state as if we would commit it). By allocating
ourselves rather than using alloc_commit_node, we do not
properly set the "index" field of the commit. This can
produce subtle bugs if we then use commit-slab on the
resulting commit, as we will share the "0" index with
another commit.

We can fix this by using alloc_commit_node() to allocate.
Note that we cannot free the result, as it is part of our
commit allocator. However, both cases were already leaking
the allocated commit anyway, so there's nothing to fix up.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c   | 2 +-
 merge-recursive.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index a52a279..d6056a5 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2286,7 +2286,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 	struct strbuf msg = STRBUF_INIT;
 
 	time(&now);
-	commit = xcalloc(1, sizeof(*commit));
+	commit = alloc_commit_node();
 	commit->object.parsed = 1;
 	commit->date = now;
 	commit->object.type = OBJ_COMMIT;
diff --git a/merge-recursive.c b/merge-recursive.c
index cab16fa..2b37d42 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -40,7 +40,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
 {
-	struct commit *commit = xcalloc(1, sizeof(struct commit));
+	struct commit *commit = alloc_commit_node();
 	struct merge_remote_desc *desc = xmalloc(sizeof(*desc));
 
 	desc->name = comment;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/17] logmsg_reencode: return const buffer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (4 preceding siblings ...)
  2014-06-10 21:39   ` [PATCH 05/17] do not create "struct commit" with xcalloc Jeff King
@ 2014-06-10 21:39   ` Jeff King
  2014-06-10 21:39   ` [PATCH 07/17] sequencer: use logmsg_reencode in get_message Jeff King
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const.  We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
Fixed comment style issue from v1.

 builtin/blame.c |  2 +-
 builtin/reset.c |  2 +-
 commit.h        |  8 ++++----
 pretty.c        | 14 +++++++-------
 revision.c      | 13 ++++++++++---
 5 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index d6056a5..6ce3c6d 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1655,7 +1655,7 @@ static void get_commit_info(struct commit *commit,
 {
 	int len;
 	const char *subject, *encoding;
-	char *message;
+	const char *message;
 
 	commit_info_init(ret);
 
diff --git a/builtin/reset.c b/builtin/reset.c
index f368266..7ebee07 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -93,7 +93,7 @@ static int reset_index(const unsigned char *sha1, int reset_type, int quiet)
 static void print_new_head_line(struct commit *commit)
 {
 	const char *hex, *body;
-	char *msg;
+	const char *msg;
 
 	hex = find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV);
 	printf(_("HEAD is now at %s"), hex);
diff --git a/commit.h b/commit.h
index 1cb55ae..4df48cb 100644
--- a/commit.h
+++ b/commit.h
@@ -115,10 +115,10 @@ struct userformat_want {
 
 extern int has_non_ascii(const char *text);
 struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
-extern char *logmsg_reencode(const struct commit *commit,
-			     char **commit_encoding,
-			     const char *output_encoding);
-extern void logmsg_free(char *msg, const struct commit *commit);
+extern const char *logmsg_reencode(const struct commit *commit,
+				   char **commit_encoding,
+				   const char *output_encoding);
+extern void logmsg_free(const char *msg, const struct commit *commit);
 extern void get_commit_format(const char *arg, struct rev_info *);
 extern const char *format_subject(struct strbuf *sb, const char *msg,
 				  const char *line_separator);
diff --git a/pretty.c b/pretty.c
index e1e2cad..d152de2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -606,9 +606,9 @@ static char *replace_encoding_header(char *buf, const char *encoding)
 	return strbuf_detach(&tmp, NULL);
 }
 
-char *logmsg_reencode(const struct commit *commit,
-		      char **commit_encoding,
-		      const char *output_encoding)
+const char *logmsg_reencode(const struct commit *commit,
+			    char **commit_encoding,
+			    const char *output_encoding)
 {
 	static const char *utf8 = "UTF-8";
 	const char *use_encoding;
@@ -687,10 +687,10 @@ char *logmsg_reencode(const struct commit *commit,
 	return out ? out : msg;
 }
 
-void logmsg_free(char *msg, const struct commit *commit)
+void logmsg_free(const char *msg, const struct commit *commit)
 {
 	if (msg != commit->buffer)
-		free(msg);
+		free((void *)msg);
 }
 
 static int mailmap_name(const char **email, size_t *email_len,
@@ -796,7 +796,7 @@ struct format_commit_context {
 	struct signature_check signature_check;
 	enum flush_type flush_type;
 	enum trunc_type truncate;
-	char *message;
+	const char *message;
 	char *commit_encoding;
 	size_t width, indent1, indent2;
 	int auto_color;
@@ -1700,7 +1700,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	unsigned long beginning_of_body;
 	int indent = 4;
 	const char *msg;
-	char *reencoded;
+	const char *reencoded;
 	const char *encoding;
 	int need_8bit_cte = pp->need_8bit_cte;
 
diff --git a/revision.c b/revision.c
index 71e2337..be151ef 100644
--- a/revision.c
+++ b/revision.c
@@ -2788,7 +2788,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
 	const char *encoding;
-	char *message;
+	const char *message;
 	struct strbuf buf = STRBUF_INIT;
 
 	if (!opt->grep_filter.pattern_list && !opt->grep_filter.header_list)
@@ -2830,12 +2830,19 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		format_display_notes(commit->object.sha1, &buf, encoding, 1);
 	}
 
-	/* Find either in the original commit message, or in the temporary */
+	/*
+	 * Find either in the original commit message, or in the temporary.
+	 * Note that we cast away the constness of "message" here. It is
+	 * const because it may come from the cached commit buffer. That's OK,
+	 * because we know that it is modifiable heap memory, and that while
+	 * grep_buffer may modify it for speed, it will restore any
+	 * changes before returning.
+	 */
 	if (buf.len)
 		retval = grep_buffer(&opt->grep_filter, buf.buf, buf.len);
 	else
 		retval = grep_buffer(&opt->grep_filter,
-				     message, strlen(message));
+				     (char *)message, strlen(message));
 	strbuf_release(&buf);
 	logmsg_free(message, commit);
 	return retval;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/17] sequencer: use logmsg_reencode in get_message
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (5 preceding siblings ...)
  2014-06-10 21:39   ` [PATCH 06/17] logmsg_reencode: return const buffer Jeff King
@ 2014-06-10 21:39   ` Jeff King
  2014-06-10 21:40   ` [PATCH 08/17] provide a helper to free commit buffer Jeff King
                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

This simplifies the code, as logmsg_reencode handles the
reencoding for us in a single call. It also means we learn
logmsg_reencode's trick of pulling the buffer from disk when
commit->buffer is NULL (we currently just silently return!).
It is doubtful this matters in practice, though, as
sequencer operations would not generally turn off
save_commit_buffer.

Note that we may be fixing a bug here. The existing code
does:

  if (same_encoding(to, from))
	  reencode_string(buf, to, from);

That probably should have been "!same_encoding".

Signed-off-by: Jeff King <peff@peff.net>
---
 sequencer.c | 45 +++++----------------------------------------
 1 file changed, 5 insertions(+), 40 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 0a80c58..3fcab4d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -116,39 +116,23 @@ static const char *action_name(const struct replay_opts *opts)
 	return opts->action == REPLAY_REVERT ? "revert" : "cherry-pick";
 }
 
-static char *get_encoding(const char *message);
-
 struct commit_message {
 	char *parent_label;
 	const char *label;
 	const char *subject;
-	char *reencoded_message;
 	const char *message;
 };
 
 static int get_message(struct commit *commit, struct commit_message *out)
 {
-	const char *encoding;
 	const char *abbrev, *subject;
 	int abbrev_len, subject_len;
 	char *q;
 
-	if (!commit->buffer)
-		return -1;
-	encoding = get_encoding(commit->buffer);
-	if (!encoding)
-		encoding = "UTF-8";
 	if (!git_commit_encoding)
 		git_commit_encoding = "UTF-8";
 
-	out->reencoded_message = NULL;
-	out->message = commit->buffer;
-	if (same_encoding(encoding, git_commit_encoding))
-		out->reencoded_message = reencode_string(commit->buffer,
-					git_commit_encoding, encoding);
-	if (out->reencoded_message)
-		out->message = out->reencoded_message;
-
+	out->message = logmsg_reencode(commit, NULL, git_commit_encoding);
 	abbrev = find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV);
 	abbrev_len = strlen(abbrev);
 
@@ -167,29 +151,10 @@ static int get_message(struct commit *commit, struct commit_message *out)
 	return 0;
 }
 
-static void free_message(struct commit_message *msg)
+static void free_message(struct commit *commit, struct commit_message *msg)
 {
 	free(msg->parent_label);
-	free(msg->reencoded_message);
-}
-
-static char *get_encoding(const char *message)
-{
-	const char *p = message, *eol;
-
-	while (*p && *p != '\n') {
-		for (eol = p + 1; *eol && *eol != '\n'; eol++)
-			; /* do nothing */
-		if (starts_with(p, "encoding ")) {
-			char *result = xmalloc(eol - 8 - p);
-			strlcpy(result, p + 9, eol - 8 - p);
-			return result;
-		}
-		p = eol;
-		if (*p == '\n')
-			p++;
-	}
-	return NULL;
+	logmsg_free(msg->message, commit);
 }
 
 static void write_cherry_pick_head(struct commit *commit, const char *pseudoref)
@@ -489,7 +454,7 @@ static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
 	unsigned char head[20];
 	struct commit *base, *next, *parent;
 	const char *base_label, *next_label;
-	struct commit_message msg = { NULL, NULL, NULL, NULL, NULL };
+	struct commit_message msg = { NULL, NULL, NULL, NULL };
 	char *defmsg = NULL;
 	struct strbuf msgbuf = STRBUF_INIT;
 	int res, unborn = 0, allow;
@@ -654,7 +619,7 @@ static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
 		res = run_git_commit(defmsg, opts, allow);
 
 leave:
-	free_message(&msg);
+	free_message(commit, &msg);
 	free(defmsg);
 
 	return res;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/17] provide a helper to free commit buffer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (6 preceding siblings ...)
  2014-06-10 21:39   ` [PATCH 07/17] sequencer: use logmsg_reencode in get_message Jeff King
@ 2014-06-10 21:40   ` Jeff King
  2014-06-12 18:22     ` Junio C Hamano
  2014-06-10 21:40   ` [PATCH 09/17] provide a helper to set the " Jeff King
                     ` (10 subsequent siblings)
  18 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for
the weird case in fsck which passes the buffer back to be
freed.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c       |  3 +--
 builtin/index-pack.c |  2 +-
 builtin/log.c        |  6 ++----
 builtin/rev-list.c   |  3 +--
 commit.c             | 13 +++++++++++++
 commit.h             | 11 +++++++++++
 6 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index fc150c8..8aadca1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -310,8 +310,7 @@ static int fsck_obj(struct object *obj)
 	if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 
-		free(commit->buffer);
-		commit->buffer = NULL;
+		free_commit_buffer(commit);
 
 		if (!commit->parents && show_root)
 			printf("root %s\n", sha1_to_hex(commit->object.sha1));
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 18f57de..995df39 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -786,7 +786,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			}
 			if (obj->type == OBJ_COMMIT) {
 				struct commit *commit = (struct commit *) obj;
-				commit->buffer = NULL;
+				detach_commit_buffer(commit);
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
diff --git a/builtin/log.c b/builtin/log.c
index 39e8836..226f8f2 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -349,8 +349,7 @@ static int cmd_log_walk(struct rev_info *rev)
 			rev->max_count++;
 		if (!rev->reflog_info) {
 			/* we allow cycles in reflog ancestry */
-			free(commit->buffer);
-			commit->buffer = NULL;
+			free_commit_buffer(commit);
 		}
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
@@ -1508,8 +1507,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		    reopen_stdout(rev.numbered_files ? NULL : commit, NULL, &rev, quiet))
 			die(_("Failed to create output files"));
 		shown = log_tree_commit(&rev, commit);
-		free(commit->buffer);
-		commit->buffer = NULL;
+		free_commit_buffer(commit);
 
 		/* We put one extra blank line between formatted
 		 * patches and this flag is used by log-tree code
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f92905..e012ebe 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -173,8 +173,7 @@ static void finish_commit(struct commit *commit, void *data)
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
 	}
-	free(commit->buffer);
-	commit->buffer = NULL;
+	free_commit_buffer(commit);
 }
 
 static void finish_object(struct object *obj,
diff --git a/commit.c b/commit.c
index fbdc480..11a05c1 100644
--- a/commit.c
+++ b/commit.c
@@ -245,6 +245,19 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+void free_commit_buffer(struct commit *commit)
+{
+	free(commit->buffer);
+	commit->buffer = NULL;
+}
+
+const void *detach_commit_buffer(struct commit *commit)
+{
+	void *ret = commit->buffer;
+	commit->buffer = NULL;
+	return ret;
+}
+
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size)
 {
 	const char *tail = buffer;
diff --git a/commit.h b/commit.h
index 4df48cb..d72ed43 100644
--- a/commit.h
+++ b/commit.h
@@ -51,6 +51,17 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 int parse_commit(struct commit *item);
 void parse_commit_or_die(struct commit *item);
 
+/*
+ * Free any cached object buffer associated with the commit.
+ */
+void free_commit_buffer(struct commit *);
+
+/*
+ * Disassociate any cached object buffer from the commit, but do not free it.
+ * The buffer (or NULL, if none) is returned.
+ */
+const void *detach_commit_buffer(struct commit *);
+
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
 
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/17] provide a helper to set the commit buffer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (7 preceding siblings ...)
  2014-06-10 21:40   ` [PATCH 08/17] provide a helper to free commit buffer Jeff King
@ 2014-06-10 21:40   ` Jeff King
  2014-06-10 21:40   ` [PATCH 10/17] provide helpers to access " Jeff King
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Right now this is just a one-liner, but abstracting it will
make it easier to change later.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c | 2 +-
 commit.c        | 7 ++++++-
 commit.h        | 6 ++++++
 object.c        | 2 +-
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 6ce3c6d..0af3a18 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2313,7 +2313,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		    ident, ident, path,
 		    (!contents_from ? path :
 		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
-	commit->buffer = strbuf_detach(&msg, NULL);
+	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
 
 	if (!contents_from || strcmp("-", contents_from)) {
 		struct stat st;
diff --git a/commit.c b/commit.c
index 11a05c1..fc8b4e2 100644
--- a/commit.c
+++ b/commit.c
@@ -245,6 +245,11 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+void set_commit_buffer(struct commit *commit, void *buffer)
+{
+	commit->buffer = buffer;
+}
+
 void free_commit_buffer(struct commit *commit)
 {
 	free(commit->buffer);
@@ -335,7 +340,7 @@ int parse_commit(struct commit *item)
 	}
 	ret = parse_commit_buffer(item, buffer, size);
 	if (save_commit_buffer && !ret) {
-		item->buffer = buffer;
+		set_commit_buffer(item, buffer);
 		return 0;
 	}
 	free(buffer);
diff --git a/commit.h b/commit.h
index d72ed43..cc89128 100644
--- a/commit.h
+++ b/commit.h
@@ -52,6 +52,12 @@ int parse_commit(struct commit *item);
 void parse_commit_or_die(struct commit *item);
 
 /*
+ * Associate an object buffer with the commit. The ownership of the
+ * memory is handed over to the commit, and must be free()-able.
+ */
+void set_commit_buffer(struct commit *, void *buffer);
+
+/*
  * Free any cached object buffer associated with the commit.
  */
 void free_commit_buffer(struct commit *);
diff --git a/object.c b/object.c
index 57a0890..44ca657 100644
--- a/object.c
+++ b/object.c
@@ -198,7 +198,7 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 			if (parse_commit_buffer(commit, buffer, size))
 				return NULL;
 			if (!commit->buffer) {
-				commit->buffer = buffer;
+				set_commit_buffer(commit, buffer);
 				*eaten_p = 1;
 			}
 			obj = &commit->object;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/15] sequencer: use logmsg_reencode in get_message
  2014-06-09 18:10 ` [PATCH 05/15] sequencer: use logmsg_reencode in get_message Jeff King
@ 2014-06-10 21:40   ` Junio C Hamano
  2014-06-10 21:50     ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2014-06-10 21:40 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> Note that we may be fixing a bug here. The existing code
> does:
>
>   if (same_encoding(to, from))
> 	  reencode_string(buf, to, from);
>
> That probably should have been "!same_encoding".
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I didn't actually test for the bug, so it's possible that I'm missing
> something clever...

Thanks for spotting.  There is nothing clever going on.

0e18bcd5 (reencode_string(): introduce and use same_encoding(),
2012-10-18) has this misconversion.

diff --git a/sequencer.c b/sequencer.c
index bd62680..f2f5b13 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -58,7 +58,7 @@ static int get_message(struct commit *commit, struct commit_message *out)
 
 	out->reencoded_message = NULL;
 	out->message = commit->buffer;
-	if (strcmp(encoding, git_commit_encoding))
+	if (same_encoding(encoding, git_commit_encoding))
 		out->reencoded_message = reencode_string(commit->buffer,
 					git_commit_encoding, encoding);
 	if (out->reencoded_message)

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/17] provide helpers to access the commit buffer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (8 preceding siblings ...)
  2014-06-10 21:40   ` [PATCH 09/17] provide a helper to set the " Jeff King
@ 2014-06-10 21:40   ` Jeff King
  2014-06-10 21:40   ` [PATCH 11/17] use get_cached_commit_buffer where appropriate Jeff King
                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Many sites look at commit->buffer to get more detailed
information than what is in the parsed commit struct.
However, we sometimes drop commit->buffer to save memory,
in which case the caller would need to read the object
afresh. Some callers do this (leading to duplicated code),
and others do not (which opens the possibility of a segfault
if somebody else frees the buffer).

Let's provide a pair of helpers, "get" and "unuse", that let
callers easily get the buffer. They will use the cached
buffer when possible, and otherwise load from disk using
read_sha1_file.

Note that we also need to add a "get_cached" variant which
returns NULL when we do not have a cached buffer. At first
glance this seems to defeat the purpose of "get", which is
to always provide a return value. However, some log code
paths actually use the NULL-ness of commit->buffer as a
boolean flag to decide whether to try printing the
commit. At least for now, we want to continue supporting
that use.

Signed-off-by: Jeff King <peff@peff.net>
---
Typofix in commit message since v1.

 commit.c | 28 ++++++++++++++++++++++++++++
 commit.h | 21 +++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/commit.c b/commit.c
index fc8b4e2..b6b0e0d 100644
--- a/commit.c
+++ b/commit.c
@@ -250,6 +250,34 @@ void set_commit_buffer(struct commit *commit, void *buffer)
 	commit->buffer = buffer;
 }
 
+const void *get_cached_commit_buffer(const struct commit *commit)
+{
+	return commit->buffer;
+}
+
+const void *get_commit_buffer(const struct commit *commit)
+{
+	const void *ret = get_cached_commit_buffer(commit);
+	if (!ret) {
+		enum object_type type;
+		unsigned long size;
+		ret = read_sha1_file(commit->object.sha1, &type, &size);
+		if (!ret)
+			die("cannot read commit object %s",
+			    sha1_to_hex(commit->object.sha1));
+		if (type != OBJ_COMMIT)
+			die("expected commit for %s, got %s",
+			    sha1_to_hex(commit->object.sha1), typename(type));
+	}
+	return ret;
+}
+
+void unuse_commit_buffer(const struct commit *commit, const void *buffer)
+{
+	if (commit->buffer != buffer)
+		free((void *)buffer);
+}
+
 void free_commit_buffer(struct commit *commit)
 {
 	free(commit->buffer);
diff --git a/commit.h b/commit.h
index cc89128..259c0ae 100644
--- a/commit.h
+++ b/commit.h
@@ -58,6 +58,27 @@ void parse_commit_or_die(struct commit *item);
 void set_commit_buffer(struct commit *, void *buffer);
 
 /*
+ * Get any cached object buffer associated with the commit. Returns NULL
+ * if none. The resulting memory should not be freed.
+ */
+const void *get_cached_commit_buffer(const struct commit *);
+
+/*
+ * Get the commit's object contents, either from cache or by reading the object
+ * from disk. The resulting memory should not be modified, and must be given
+ * to unuse_commit_buffer when the caller is done.
+ */
+const void *get_commit_buffer(const struct commit *);
+
+/*
+ * Tell the commit subsytem that we are done with a particular commit buffer.
+ * The commit and buffer should be the input and return value, respectively,
+ * from an earlier call to get_commit_buffer.  The buffer may or may not be
+ * freed by this call; callers should not access the memory afterwards.
+ */
+void unuse_commit_buffer(const struct commit *, const void *buffer);
+
+/*
  * Free any cached object buffer associated with the commit.
  */
 void free_commit_buffer(struct commit *);
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/17] use get_cached_commit_buffer where appropriate
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (9 preceding siblings ...)
  2014-06-10 21:40   ` [PATCH 10/17] provide helpers to access " Jeff King
@ 2014-06-10 21:40   ` Jeff King
  2014-06-10 21:41   ` [PATCH 12/17] use get_commit_buffer to avoid duplicate code Jeff King
                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Some call sites check commit->buffer to see whether we have
a cached buffer, and if so, do some work with it. In the
long run we may want to switch these code paths to make
their decision on a different boolean flag (because checking
the cache may get a little more expensive in the future).
But for now, we can easily support them by converting the
calls to use get_cached_commit_buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c | 2 +-
 log-tree.c         | 2 +-
 object.c           | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index e012ebe..3fcbf21 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -106,7 +106,7 @@ static void show_commit(struct commit *commit, void *data)
 	else
 		putchar('\n');
 
-	if (revs->verbose_header && commit->buffer) {
+	if (revs->verbose_header && get_cached_commit_buffer(commit)) {
 		struct strbuf buf = STRBUF_INIT;
 		struct pretty_print_context ctx = {0};
 		ctx.abbrev = revs->abbrev;
diff --git a/log-tree.c b/log-tree.c
index cf2f86c..e9ef8ab 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -588,7 +588,7 @@ void show_log(struct rev_info *opt)
 		show_mergetag(opt, commit);
 	}
 
-	if (!commit->buffer)
+	if (!get_cached_commit_buffer(commit))
 		return;
 
 	if (opt->show_notes) {
diff --git a/object.c b/object.c
index 44ca657..67b6e35 100644
--- a/object.c
+++ b/object.c
@@ -197,7 +197,7 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 		if (commit) {
 			if (parse_commit_buffer(commit, buffer, size))
 				return NULL;
-			if (!commit->buffer) {
+			if (!get_cached_commit_buffer(commit)) {
 				set_commit_buffer(commit, buffer);
 				*eaten_p = 1;
 			}
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/17] use get_commit_buffer to avoid duplicate code
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (10 preceding siblings ...)
  2014-06-10 21:40   ` [PATCH 11/17] use get_cached_commit_buffer where appropriate Jeff King
@ 2014-06-10 21:41   ` Jeff King
  2014-06-10 21:41   ` [PATCH 13/17] convert logmsg_reencode to get_commit_buffer Jeff King
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:41 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

For both of these sites, we already do the "fallback to
read_sha1_file" trick. But we can shorten the code by just
using get_commit_buffer.

Note that the error cases are slightly different when
read_sha1_file fails. get_commit_buffer will die() if the
object cannot be loaded, or is a non-commit.

For get_sha1_oneline, this will almost certainly never
happen, as we will have just called parse_object (and if it
does, it's probably worth complaining about).

For record_author_date, the new behavior is probably better;
we notify the user of the error instead of silently ignoring
it. And because it's used only for sorting by author-date,
somebody examining a corrupt repo can fallback to the
regular traversal order.

Signed-off-by: Jeff King <peff@peff.net>
---
Typofix in commit message since v1.

 commit.c    | 16 +++-------------
 sha1_name.c | 18 ++++--------------
 2 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/commit.c b/commit.c
index b6b0e0d..1903dde 100644
--- a/commit.c
+++ b/commit.c
@@ -583,22 +583,12 @@ static void record_author_date(struct author_date_slab *author_date,
 			       struct commit *commit)
 {
 	const char *buf, *line_end, *ident_line;
-	char *buffer = NULL;
+	const char *buffer = get_commit_buffer(commit);
 	struct ident_split ident;
 	char *date_end;
 	unsigned long date;
 
-	if (!commit->buffer) {
-		unsigned long size;
-		enum object_type type;
-		buffer = read_sha1_file(commit->object.sha1, &type, &size);
-		if (!buffer)
-			return;
-	}
-
-	for (buf = commit->buffer ? commit->buffer : buffer;
-	     buf;
-	     buf = line_end + 1) {
+	for (buf = buffer; buf; buf = line_end + 1) {
 		line_end = strchrnul(buf, '\n');
 		ident_line = skip_prefix(buf, "author ");
 		if (!ident_line) {
@@ -619,7 +609,7 @@ static void record_author_date(struct author_date_slab *author_date,
 	*(author_date_slab_at(author_date, commit)) = date;
 
 fail_exit:
-	free(buffer);
+	unuse_commit_buffer(commit, buffer);
 }
 
 static int compare_commits_by_author_date(const void *a_, const void *b_,
diff --git a/sha1_name.c b/sha1_name.c
index 2b6322f..0a65d23 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -862,27 +862,17 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 		commit_list_insert(l->item, &backup);
 	}
 	while (list) {
-		char *p, *to_free = NULL;
+		const char *p, *buf;
 		struct commit *commit;
-		enum object_type type;
-		unsigned long size;
 		int matches;
 
 		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
 		if (!parse_object(commit->object.sha1))
 			continue;
-		if (commit->buffer)
-			p = commit->buffer;
-		else {
-			p = read_sha1_file(commit->object.sha1, &type, &size);
-			if (!p)
-				continue;
-			to_free = p;
-		}
-
-		p = strstr(p, "\n\n");
+		buf = get_commit_buffer(commit);
+		p = strstr(buf, "\n\n");
 		matches = p && !regexec(&regex, p + 2, 0, NULL, 0);
-		free(to_free);
+		unuse_commit_buffer(commit, buf);
 
 		if (matches) {
 			hashcpy(sha1, commit->object.sha1);
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/17] convert logmsg_reencode to get_commit_buffer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (11 preceding siblings ...)
  2014-06-10 21:41   ` [PATCH 12/17] use get_commit_buffer to avoid duplicate code Jeff King
@ 2014-06-10 21:41   ` Jeff King
  2014-06-10 21:41   ` [PATCH 14/17] use get_commit_buffer everywhere Jeff King
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:41 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.

We return either:

  1. The original commit->buffer

  2. A newly allocated buffer from read_sha1_file

  3. A reencoded buffer (based on either 1 or 2 above).

while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c |  4 ++--
 builtin/reset.c |  2 +-
 commit.h        |  1 -
 pretty.c        | 40 +++++++++++-----------------------------
 revision.c      |  2 +-
 sequencer.c     |  2 +-
 6 files changed, 16 insertions(+), 35 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 0af3a18..cde19eb 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1666,7 +1666,7 @@ static void get_commit_info(struct commit *commit,
 		    &ret->author_time, &ret->author_tz);
 
 	if (!detailed) {
-		logmsg_free(message, commit);
+		unuse_commit_buffer(commit, message);
 		return;
 	}
 
@@ -1680,7 +1680,7 @@ static void get_commit_info(struct commit *commit,
 	else
 		strbuf_addf(&ret->summary, "(%s)", sha1_to_hex(commit->object.sha1));
 
-	logmsg_free(message, commit);
+	unuse_commit_buffer(commit, message);
 }
 
 /*
diff --git a/builtin/reset.c b/builtin/reset.c
index 7ebee07..850d532 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -109,7 +109,7 @@ static void print_new_head_line(struct commit *commit)
 	}
 	else
 		printf("\n");
-	logmsg_free(msg, commit);
+	unuse_commit_buffer(commit, msg);
 }
 
 static void update_index_from_diff(struct diff_queue_struct *q,
diff --git a/commit.h b/commit.h
index 259c0ae..5ce5ce7 100644
--- a/commit.h
+++ b/commit.h
@@ -156,7 +156,6 @@ struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
 extern const char *logmsg_reencode(const struct commit *commit,
 				   char **commit_encoding,
 				   const char *output_encoding);
-extern void logmsg_free(const char *msg, const struct commit *commit);
 extern void get_commit_format(const char *arg, struct rev_info *);
 extern const char *format_subject(struct strbuf *sb, const char *msg,
 				  const char *line_separator);
diff --git a/pretty.c b/pretty.c
index d152de2..8fd60cd 100644
--- a/pretty.c
+++ b/pretty.c
@@ -613,22 +613,9 @@ const char *logmsg_reencode(const struct commit *commit,
 	static const char *utf8 = "UTF-8";
 	const char *use_encoding;
 	char *encoding;
-	char *msg = commit->buffer;
+	const char *msg = get_commit_buffer(commit);
 	char *out;
 
-	if (!msg) {
-		enum object_type type;
-		unsigned long size;
-
-		msg = read_sha1_file(commit->object.sha1, &type, &size);
-		if (!msg)
-			die("Cannot read commit object %s",
-			    sha1_to_hex(commit->object.sha1));
-		if (type != OBJ_COMMIT)
-			die("Expected commit for '%s', got %s",
-			    sha1_to_hex(commit->object.sha1), typename(type));
-	}
-
 	if (!output_encoding || !*output_encoding) {
 		if (commit_encoding)
 			*commit_encoding =
@@ -652,12 +639,13 @@ const char *logmsg_reencode(const struct commit *commit,
 		 * Otherwise, we still want to munge the encoding header in the
 		 * result, which will be done by modifying the buffer. If we
 		 * are using a fresh copy, we can reuse it. But if we are using
-		 * the cached copy from commit->buffer, we need to duplicate it
-		 * to avoid munging commit->buffer.
+		 * the cached copy from get_commit_buffer, we need to duplicate it
+		 * to avoid munging the cached copy.
 		 */
-		out = msg;
-		if (out == commit->buffer)
-			out = xstrdup(out);
+		if (msg == get_cached_commit_buffer(commit))
+			out = xstrdup(msg);
+		else
+			out = (char *)msg;
 	}
 	else {
 		/*
@@ -667,8 +655,8 @@ const char *logmsg_reencode(const struct commit *commit,
 		 * copy, we can free it.
 		 */
 		out = reencode_string(msg, output_encoding, use_encoding);
-		if (out && msg != commit->buffer)
-			free(msg);
+		if (out)
+			unuse_commit_buffer(commit, msg);
 	}
 
 	/*
@@ -687,12 +675,6 @@ const char *logmsg_reencode(const struct commit *commit,
 	return out ? out : msg;
 }
 
-void logmsg_free(const char *msg, const struct commit *commit)
-{
-	if (msg != commit->buffer)
-		free((void *)msg);
-}
-
 static int mailmap_name(const char **email, size_t *email_len,
 			const char **name, size_t *name_len)
 {
@@ -1531,7 +1513,7 @@ void format_commit_message(const struct commit *commit,
 	}
 
 	free(context.commit_encoding);
-	logmsg_free(context.message, commit);
+	unuse_commit_buffer(commit, context.message);
 	free(context.signature_check.gpg_output);
 	free(context.signature_check.signer);
 }
@@ -1767,7 +1749,7 @@ void pretty_print_commit(struct pretty_print_context *pp,
 	if (pp->fmt == CMIT_FMT_EMAIL && sb->len <= beginning_of_body)
 		strbuf_addch(sb, '\n');
 
-	logmsg_free(reencoded, commit);
+	unuse_commit_buffer(commit, reencoded);
 }
 
 void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
diff --git a/revision.c b/revision.c
index be151ef..1cc91e5 100644
--- a/revision.c
+++ b/revision.c
@@ -2844,7 +2844,7 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		retval = grep_buffer(&opt->grep_filter,
 				     (char *)message, strlen(message));
 	strbuf_release(&buf);
-	logmsg_free(message, commit);
+	unuse_commit_buffer(commit, message);
 	return retval;
 }
 
diff --git a/sequencer.c b/sequencer.c
index 3fcab4d..4632f7d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -154,7 +154,7 @@ static int get_message(struct commit *commit, struct commit_message *out)
 static void free_message(struct commit *commit, struct commit_message *msg)
 {
 	free(msg->parent_label);
-	logmsg_free(msg->message, commit);
+	unuse_commit_buffer(commit, msg->message);
 }
 
 static void write_cherry_pick_head(struct commit *commit, const char *pseudoref)
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/17] use get_commit_buffer everywhere
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (12 preceding siblings ...)
  2014-06-10 21:41   ` [PATCH 13/17] convert logmsg_reencode to get_commit_buffer Jeff King
@ 2014-06-10 21:41   ` Jeff King
  2014-06-10 21:42   ` [PATCH 15/17] commit-slab: provide a static initializer Jeff King
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:41 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.

And as a side effect, we abstract away the final bare uses
of commit->buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c   |  5 ++++-
 builtin/fmt-merge-msg.c |  5 ++++-
 builtin/log.c           |  7 +++++--
 fsck.c                  | 13 +++++++++++--
 merge-recursive.c       |  4 +++-
 notes-merge.c           |  4 +++-
 sequencer.c             |  4 +++-
 7 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index b8d8a3a..7ee5e08 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -279,6 +279,7 @@ static const char *find_encoding(const char *begin, const char *end)
 static void handle_commit(struct commit *commit, struct rev_info *rev)
 {
 	int saved_output_format = rev->diffopt.output_format;
+	const char *commit_buffer;
 	const char *author, *author_end, *committer, *committer_end;
 	const char *encoding, *message;
 	char *reencoded = NULL;
@@ -288,7 +289,8 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 	rev->diffopt.output_format = DIFF_FORMAT_CALLBACK;
 
 	parse_commit_or_die(commit);
-	author = strstr(commit->buffer, "\nauthor ");
+	commit_buffer = get_commit_buffer(commit);
+	author = strstr(commit_buffer, "\nauthor ");
 	if (!author)
 		die ("Could not find author in commit %s",
 		     sha1_to_hex(commit->object.sha1));
@@ -335,6 +337,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 			  ? strlen(message) : 0),
 	       reencoded ? reencoded : message ? message : "");
 	free(reencoded);
+	unuse_commit_buffer(commit, commit_buffer);
 
 	for (i = 0, p = commit->parents; p; p = p->next) {
 		int mark = get_object_mark(&p->item->object);
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 3906eda..01f6d59 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -230,12 +230,14 @@ static void add_branch_desc(struct strbuf *out, const char *name)
 static void record_person(int which, struct string_list *people,
 			  struct commit *commit)
 {
+	const char *buffer;
 	char *name_buf, *name, *name_end;
 	struct string_list_item *elem;
 	const char *field;
 
 	field = (which == 'a') ? "\nauthor " : "\ncommitter ";
-	name = strstr(commit->buffer, field);
+	buffer = get_commit_buffer(commit);
+	name = strstr(buffer, field);
 	if (!name)
 		return;
 	name += strlen(field);
@@ -247,6 +249,7 @@ static void record_person(int which, struct string_list *people,
 	if (name_end < name)
 		return;
 	name_buf = xmemdupz(name, name_end - name + 1);
+	unuse_commit_buffer(commit, buffer);
 
 	elem = string_list_lookup(people, name_buf);
 	if (!elem) {
diff --git a/builtin/log.c b/builtin/log.c
index 226f8f2..2c74260 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -918,9 +918,12 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 	log_write_email_headers(rev, head, &pp.subject, &pp.after_subject,
 				&need_8bit_cte);
 
-	for (i = 0; !need_8bit_cte && i < nr; i++)
-		if (has_non_ascii(list[i]->buffer))
+	for (i = 0; !need_8bit_cte && i < nr; i++) {
+		const char *buf = get_commit_buffer(list[i]);
+		if (has_non_ascii(buf))
 			need_8bit_cte = 1;
+		unuse_commit_buffer(list[i], buf);
+	}
 
 	if (!branch_name)
 		branch_name = find_branch_name(rev);
diff --git a/fsck.c b/fsck.c
index abed62b..8223780 100644
--- a/fsck.c
+++ b/fsck.c
@@ -276,9 +276,10 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit_buffer(struct commit *commit, const char *buffer,
+			      fsck_error error_func)
 {
-	const char *buffer = commit->buffer, *tmp;
+	const char *tmp;
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
 	int parents = 0;
@@ -336,6 +337,14 @@ static int fsck_commit(struct commit *commit, fsck_error error_func)
 	return 0;
 }
 
+static int fsck_commit(struct commit *commit, fsck_error error_func)
+{
+	const char *buffer = get_commit_buffer(commit);
+	int ret = fsck_commit_buffer(commit, buffer, error_func);
+	unuse_commit_buffer(commit, buffer);
+	return ret;
+}
+
 static int fsck_tag(struct tag *tag, fsck_error error_func)
 {
 	struct object *tagged = tag->tagged;
diff --git a/merge-recursive.c b/merge-recursive.c
index 2b37d42..0dd6039 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -190,9 +190,11 @@ static void output_commit_title(struct merge_options *o, struct commit *commit)
 			printf(_("(bad commit)\n"));
 		else {
 			const char *title;
-			int len = find_commit_subject(commit->buffer, &title);
+			const char *msg = get_commit_buffer(commit);
+			int len = find_commit_subject(msg, &title);
 			if (len)
 				printf("%.*s\n", len, title);
+			unuse_commit_buffer(commit, msg);
 		}
 	}
 }
diff --git a/notes-merge.c b/notes-merge.c
index 697cec3..e804db2 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -672,7 +672,8 @@ int notes_merge_commit(struct notes_merge_options *o,
 	DIR *dir;
 	struct dirent *e;
 	struct strbuf path = STRBUF_INIT;
-	char *msg = strstr(partial_commit->buffer, "\n\n");
+	const char *buffer = get_commit_buffer(partial_commit);
+	const char *msg = strstr(buffer, "\n\n");
 	int baselen;
 
 	strbuf_addstr(&path, git_path(NOTES_MERGE_WORKTREE));
@@ -721,6 +722,7 @@ int notes_merge_commit(struct notes_merge_options *o,
 
 	create_notes_commit(partial_tree, partial_commit->parents,
 			    msg, strlen(msg), result_sha1);
+	unuse_commit_buffer(partial_commit, buffer);
 	if (o->verbosity >= 4)
 		printf("Finalized notes merge commit: %s\n",
 			sha1_to_hex(result_sha1));
diff --git a/sequencer.c b/sequencer.c
index 4632f7d..5609ab3 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -666,10 +666,12 @@ static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
 	int subject_len;
 
 	for (cur = todo_list; cur; cur = cur->next) {
+		const char *commit_buffer = get_commit_buffer(cur->item);
 		sha1_abbrev = find_unique_abbrev(cur->item->object.sha1, DEFAULT_ABBREV);
-		subject_len = find_commit_subject(cur->item->buffer, &subject);
+		subject_len = find_commit_subject(commit_buffer, &subject);
 		strbuf_addf(buf, "%s %s %.*s\n", action_str, sha1_abbrev,
 			subject_len, subject);
+		unuse_commit_buffer(cur->item, commit_buffer);
 	}
 	return 0;
 }
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/17] commit-slab: provide a static initializer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (13 preceding siblings ...)
  2014-06-10 21:41   ` [PATCH 14/17] use get_commit_buffer everywhere Jeff King
@ 2014-06-10 21:42   ` Jeff King
  2014-06-12 18:15     ` Junio C Hamano
  2014-06-10 21:43   ` [PATCH 16/17] commit: convert commit->buffer to a slab Jeff King
                     ` (3 subsequent siblings)
  18 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Callers currently must use init_foo_slab() at runtime before
accessing a slab. For global slabs, it's much nicer if we
can initialize them in BSS, so that each user does not have
to add code to check-and-initialize.

Signed-off-by: Jeff King <peff@peff.net>
---
There was no comment on this one in v1. I'd be curious if anyone has
comments on what I wrote in:

  http://article.gmane.org/gmane.comp.version-control.git/251099

 commit-slab.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/commit-slab.h b/commit-slab.h
index cc114b5..375c9c7 100644
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -117,4 +117,16 @@ static int stat_ ##slabname## realloc
  * catch because GCC silently parses it by default.
  */
 
+/*
+ * Statically initialize a commit slab named "var". Note that this
+ * evaluates "stride" multiple times! Example:
+ *
+ *   struct indegree indegrees = COMMIT_SLAB_INIT(1, indegrees);
+ *
+ */
+#define COMMIT_SLAB_INIT(stride, var) { \
+	COMMIT_SLAB_SIZE / sizeof(**((var).slab)) / (stride), \
+	(stride), 0, NULL \
+}
+
 #endif /* COMMIT_SLAB_H */
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/17] commit: convert commit->buffer to a slab
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (14 preceding siblings ...)
  2014-06-10 21:42   ` [PATCH 15/17] commit-slab: provide a static initializer Jeff King
@ 2014-06-10 21:43   ` Jeff King
  2014-06-10 21:44   ` [PATCH 17/17] commit: record buffer length in cache Jeff King
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:43 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

This will make it easier to manage the buffer cache
independently of the "struct commit" objects. It also
shrinks "struct commit" by one pointer, which may be
helpful.

Unfortunately it does not reduce the max memory size of
something like "rev-list", because rev-list uses
get_cached_commit_buffer() to decide not to show each
commit's output (and due to the design of slab_at, accessing
the slab requires us to extend it, allocating exactly the
same number of buffer pointers we dropped from the commit
structs).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 20 +++++++++++++-------
 commit.h |  1 -
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/commit.c b/commit.c
index 1903dde..e289c78 100644
--- a/commit.c
+++ b/commit.c
@@ -245,14 +245,17 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+define_commit_slab(buffer_slab, void *);
+static struct buffer_slab buffer_slab = COMMIT_SLAB_INIT(1, buffer_slab);
+
 void set_commit_buffer(struct commit *commit, void *buffer)
 {
-	commit->buffer = buffer;
+	*buffer_slab_at(&buffer_slab, commit) = buffer;
 }
 
 const void *get_cached_commit_buffer(const struct commit *commit)
 {
-	return commit->buffer;
+	return *buffer_slab_at(&buffer_slab, commit);
 }
 
 const void *get_commit_buffer(const struct commit *commit)
@@ -274,20 +277,23 @@ const void *get_commit_buffer(const struct commit *commit)
 
 void unuse_commit_buffer(const struct commit *commit, const void *buffer)
 {
-	if (commit->buffer != buffer)
+	void *cached = *buffer_slab_at(&buffer_slab, commit);
+	if (cached != buffer)
 		free((void *)buffer);
 }
 
 void free_commit_buffer(struct commit *commit)
 {
-	free(commit->buffer);
-	commit->buffer = NULL;
+	void **b = buffer_slab_at(&buffer_slab, commit);
+	free(*b);
+	*b = NULL;
 }
 
 const void *detach_commit_buffer(struct commit *commit)
 {
-	void *ret = commit->buffer;
-	commit->buffer = NULL;
+	void **b = buffer_slab_at(&buffer_slab, commit);
+	void *ret = *b;
+	*b = NULL;
 	return ret;
 }
 
diff --git a/commit.h b/commit.h
index 5ce5ce7..e1c2569 100644
--- a/commit.h
+++ b/commit.h
@@ -20,7 +20,6 @@ struct commit {
 	unsigned long date;
 	struct commit_list *parents;
 	struct tree *tree;
-	char *buffer;
 };
 
 extern int save_commit_buffer;
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/17] commit: record buffer length in cache
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (15 preceding siblings ...)
  2014-06-10 21:43   ` [PATCH 16/17] commit: convert commit->buffer to a slab Jeff King
@ 2014-06-10 21:44   ` Jeff King
  2014-06-10 21:46   ` [PATCH v2 0/17] store length of commit->buffer Jeff King
  2014-06-13  6:32   ` [PATCH v2] reuse cached commit buffer when parsing signatures Jeff King
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
---
Since v1, fixes dubious use of strbuf as pointed out by Christian.

 builtin/blame.c         | 14 ++++++++++++-
 builtin/fast-export.c   |  2 +-
 builtin/fmt-merge-msg.c |  2 +-
 builtin/index-pack.c    |  2 +-
 builtin/log.c           |  2 +-
 builtin/rev-list.c      |  2 +-
 commit.c                | 54 ++++++++++++++++++++++++++++++++-----------------
 commit.h                |  8 ++++----
 fsck.c                  |  2 +-
 log-tree.c              |  2 +-
 merge-recursive.c       |  2 +-
 notes-merge.c           |  2 +-
 object.c                |  4 ++--
 pretty.c                |  4 ++--
 sequencer.c             |  2 +-
 sha1_name.c             |  2 +-
 16 files changed, 68 insertions(+), 38 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index cde19eb..53f43ab 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2266,6 +2266,18 @@ static void append_merge_parents(struct commit_list **tail)
 }
 
 /*
+ * This isn't as simple as passing sb->buf and sb->len, because we
+ * want to transfer ownership of the buffer to the commit (so we
+ * must use detach).
+ */
+static void set_commit_buffer_from_strbuf(struct commit *c, struct strbuf *sb)
+{
+	size_t len;
+	void *buf = strbuf_detach(sb, &len);
+	set_commit_buffer(c, buf, len);
+}
+
+/*
  * Prepare a dummy commit that represents the work tree (or staged) item.
  * Note that annotating work tree item never works in the reverse.
  */
@@ -2313,7 +2325,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		    ident, ident, path,
 		    (!contents_from ? path :
 		     (!strcmp(contents_from, "-") ? "standard input" : contents_from)));
-	set_commit_buffer(commit, strbuf_detach(&msg, NULL));
+	set_commit_buffer_from_strbuf(commit, &msg);
 
 	if (!contents_from || strcmp("-", contents_from)) {
 		struct stat st;
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 7ee5e08..05d161f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -289,7 +289,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 	rev->diffopt.output_format = DIFF_FORMAT_CALLBACK;
 
 	parse_commit_or_die(commit);
-	commit_buffer = get_commit_buffer(commit);
+	commit_buffer = get_commit_buffer(commit, NULL);
 	author = strstr(commit_buffer, "\nauthor ");
 	if (!author)
 		die ("Could not find author in commit %s",
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 01f6d59..ef8b254 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -236,7 +236,7 @@ static void record_person(int which, struct string_list *people,
 	const char *field;
 
 	field = (which == 'a') ? "\nauthor " : "\ncommitter ";
-	buffer = get_commit_buffer(commit);
+	buffer = get_commit_buffer(commit, NULL);
 	name = strstr(buffer, field);
 	if (!name)
 		return;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 995df39..25e3e9b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -786,7 +786,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			}
 			if (obj->type == OBJ_COMMIT) {
 				struct commit *commit = (struct commit *) obj;
-				detach_commit_buffer(commit);
+				detach_commit_buffer(commit, NULL);
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
diff --git a/builtin/log.c b/builtin/log.c
index 2c74260..c599eac 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -919,7 +919,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
 				&need_8bit_cte);
 
 	for (i = 0; !need_8bit_cte && i < nr; i++) {
-		const char *buf = get_commit_buffer(list[i]);
+		const char *buf = get_commit_buffer(list[i], NULL);
 		if (has_non_ascii(buf))
 			need_8bit_cte = 1;
 		unuse_commit_buffer(list[i], buf);
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 3fcbf21..ff84a82 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -106,7 +106,7 @@ static void show_commit(struct commit *commit, void *data)
 	else
 		putchar('\n');
 
-	if (revs->verbose_header && get_cached_commit_buffer(commit)) {
+	if (revs->verbose_header && get_cached_commit_buffer(commit, NULL)) {
 		struct strbuf buf = STRBUF_INIT;
 		struct pretty_print_context ctx = {0};
 		ctx.abbrev = revs->abbrev;
diff --git a/commit.c b/commit.c
index e289c78..a036e18 100644
--- a/commit.c
+++ b/commit.c
@@ -245,22 +245,31 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
-define_commit_slab(buffer_slab, void *);
+struct commit_buffer {
+	void *buffer;
+	unsigned long size;
+};
+define_commit_slab(buffer_slab, struct commit_buffer);
 static struct buffer_slab buffer_slab = COMMIT_SLAB_INIT(1, buffer_slab);
 
-void set_commit_buffer(struct commit *commit, void *buffer)
+void set_commit_buffer(struct commit *commit, void *buffer, unsigned long size)
 {
-	*buffer_slab_at(&buffer_slab, commit) = buffer;
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	v->buffer = buffer;
+	v->size = size;
 }
 
-const void *get_cached_commit_buffer(const struct commit *commit)
+const void *get_cached_commit_buffer(const struct commit *commit, unsigned long *sizep)
 {
-	return *buffer_slab_at(&buffer_slab, commit);
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	if (sizep)
+		*sizep = v->size;
+	return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit)
+const void *get_commit_buffer(const struct commit *commit, unsigned long *sizep)
 {
-	const void *ret = get_cached_commit_buffer(commit);
+	const void *ret = get_cached_commit_buffer(commit, sizep);
 	if (!ret) {
 		enum object_type type;
 		unsigned long size;
@@ -271,29 +280,38 @@ const void *get_commit_buffer(const struct commit *commit)
 		if (type != OBJ_COMMIT)
 			die("expected commit for %s, got %s",
 			    sha1_to_hex(commit->object.sha1), typename(type));
+		if (sizep)
+			*sizep = size;
 	}
 	return ret;
 }
 
 void unuse_commit_buffer(const struct commit *commit, const void *buffer)
 {
-	void *cached = *buffer_slab_at(&buffer_slab, commit);
-	if (cached != buffer)
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	if (v->buffer != buffer)
 		free((void *)buffer);
 }
 
 void free_commit_buffer(struct commit *commit)
 {
-	void **b = buffer_slab_at(&buffer_slab, commit);
-	free(*b);
-	*b = NULL;
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	free(v->buffer);
+	v->buffer = NULL;
+	v->size = 0;
 }
 
-const void *detach_commit_buffer(struct commit *commit)
+const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 {
-	void **b = buffer_slab_at(&buffer_slab, commit);
-	void *ret = *b;
-	*b = NULL;
+	struct commit_buffer *v = buffer_slab_at(&buffer_slab, commit);
+	void *ret;
+
+	ret = v->buffer;
+	if (sizep)
+		*sizep = v->size;
+
+	v->buffer = NULL;
+	v->size = 0;
 	return ret;
 }
 
@@ -374,7 +392,7 @@ int parse_commit(struct commit *item)
 	}
 	ret = parse_commit_buffer(item, buffer, size);
 	if (save_commit_buffer && !ret) {
-		set_commit_buffer(item, buffer);
+		set_commit_buffer(item, buffer, size);
 		return 0;
 	}
 	free(buffer);
@@ -589,7 +607,7 @@ static void record_author_date(struct author_date_slab *author_date,
 			       struct commit *commit)
 {
 	const char *buf, *line_end, *ident_line;
-	const char *buffer = get_commit_buffer(commit);
+	const char *buffer = get_commit_buffer(commit, NULL);
 	struct ident_split ident;
 	char *date_end;
 	unsigned long date;
diff --git a/commit.h b/commit.h
index e1c2569..61559a9 100644
--- a/commit.h
+++ b/commit.h
@@ -54,20 +54,20 @@ void parse_commit_or_die(struct commit *item);
  * Associate an object buffer with the commit. The ownership of the
  * memory is handed over to the commit, and must be free()-able.
  */
-void set_commit_buffer(struct commit *, void *buffer);
+void set_commit_buffer(struct commit *, void *buffer, unsigned long size);
 
 /*
  * Get any cached object buffer associated with the commit. Returns NULL
  * if none. The resulting memory should not be freed.
  */
-const void *get_cached_commit_buffer(const struct commit *);
+const void *get_cached_commit_buffer(const struct commit *, unsigned long *size);
 
 /*
  * Get the commit's object contents, either from cache or by reading the object
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *);
+const void *get_commit_buffer(const struct commit *, unsigned long *size);
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
@@ -86,7 +86,7 @@ void free_commit_buffer(struct commit *);
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
  */
-const void *detach_commit_buffer(struct commit *);
+const void *detach_commit_buffer(struct commit *, unsigned long *sizep);
 
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
diff --git a/fsck.c b/fsck.c
index 8223780..a7233c8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -339,7 +339,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 
 static int fsck_commit(struct commit *commit, fsck_error error_func)
 {
-	const char *buffer = get_commit_buffer(commit);
+	const char *buffer = get_commit_buffer(commit, NULL);
 	int ret = fsck_commit_buffer(commit, buffer, error_func);
 	unuse_commit_buffer(commit, buffer);
 	return ret;
diff --git a/log-tree.c b/log-tree.c
index e9ef8ab..4447021 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -588,7 +588,7 @@ void show_log(struct rev_info *opt)
 		show_mergetag(opt, commit);
 	}
 
-	if (!get_cached_commit_buffer(commit))
+	if (!get_cached_commit_buffer(commit, NULL))
 		return;
 
 	if (opt->show_notes) {
diff --git a/merge-recursive.c b/merge-recursive.c
index 0dd6039..d38a3b2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -190,7 +190,7 @@ static void output_commit_title(struct merge_options *o, struct commit *commit)
 			printf(_("(bad commit)\n"));
 		else {
 			const char *title;
-			const char *msg = get_commit_buffer(commit);
+			const char *msg = get_commit_buffer(commit, NULL);
 			int len = find_commit_subject(msg, &title);
 			if (len)
 				printf("%.*s\n", len, title);
diff --git a/notes-merge.c b/notes-merge.c
index e804db2..fd5fae2 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -672,7 +672,7 @@ int notes_merge_commit(struct notes_merge_options *o,
 	DIR *dir;
 	struct dirent *e;
 	struct strbuf path = STRBUF_INIT;
-	const char *buffer = get_commit_buffer(partial_commit);
+	const char *buffer = get_commit_buffer(partial_commit, NULL);
 	const char *msg = strstr(buffer, "\n\n");
 	int baselen;
 
diff --git a/object.c b/object.c
index 67b6e35..9c31e9a 100644
--- a/object.c
+++ b/object.c
@@ -197,8 +197,8 @@ struct object *parse_object_buffer(const unsigned char *sha1, enum object_type t
 		if (commit) {
 			if (parse_commit_buffer(commit, buffer, size))
 				return NULL;
-			if (!get_cached_commit_buffer(commit)) {
-				set_commit_buffer(commit, buffer);
+			if (!get_cached_commit_buffer(commit, NULL)) {
+				set_commit_buffer(commit, buffer, size);
 				*eaten_p = 1;
 			}
 			obj = &commit->object;
diff --git a/pretty.c b/pretty.c
index 8fd60cd..280d7d0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -613,7 +613,7 @@ const char *logmsg_reencode(const struct commit *commit,
 	static const char *utf8 = "UTF-8";
 	const char *use_encoding;
 	char *encoding;
-	const char *msg = get_commit_buffer(commit);
+	const char *msg = get_commit_buffer(commit, NULL);
 	char *out;
 
 	if (!output_encoding || !*output_encoding) {
@@ -642,7 +642,7 @@ const char *logmsg_reencode(const struct commit *commit,
 		 * the cached copy from get_commit_buffer, we need to duplicate it
 		 * to avoid munging the cached copy.
 		 */
-		if (msg == get_cached_commit_buffer(commit))
+		if (msg == get_cached_commit_buffer(commit, NULL))
 			out = xstrdup(msg);
 		else
 			out = (char *)msg;
diff --git a/sequencer.c b/sequencer.c
index 5609ab3..d83f810 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -666,7 +666,7 @@ static int format_todo(struct strbuf *buf, struct commit_list *todo_list,
 	int subject_len;
 
 	for (cur = todo_list; cur; cur = cur->next) {
-		const char *commit_buffer = get_commit_buffer(cur->item);
+		const char *commit_buffer = get_commit_buffer(cur->item, NULL);
 		sha1_abbrev = find_unique_abbrev(cur->item->object.sha1, DEFAULT_ABBREV);
 		subject_len = find_commit_subject(commit_buffer, &subject);
 		strbuf_addf(buf, "%s %s %.*s\n", action_str, sha1_abbrev,
diff --git a/sha1_name.c b/sha1_name.c
index 0a65d23..c2c938c 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -869,7 +869,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
 		if (!parse_object(commit->object.sha1))
 			continue;
-		buf = get_commit_buffer(commit);
+		buf = get_commit_buffer(commit, NULL);
 		p = strstr(buf, "\n\n");
 		matches = p && !regexec(&regex, p + 2, 0, NULL, 0);
 		unuse_commit_buffer(commit, buf);
-- 
2.0.0.729.g520999f

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/17] store length of commit->buffer
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (16 preceding siblings ...)
  2014-06-10 21:44   ` [PATCH 17/17] commit: record buffer length in cache Jeff King
@ 2014-06-10 21:46   ` Jeff King
  2014-06-12 17:22     ` Junio C Hamano
  2014-06-13  6:32   ` [PATCH v2] reuse cached commit buffer when parsing signatures Jeff King
  18 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Christian Couder, Jakub Narebski, Eric Sunshine

On Tue, Jun 10, 2014 at 05:35:09PM -0400, Jeff King wrote:

> Here's a re-roll of the commit-slab series. It fixes the issues pointed
> out by Eric and Christian (thanks, both).

Side note: I marked this as v2, but forgot to do so in each individual
patch (I write my cover letters first, and then issue format-patch as a
separate step, and I sometimes forget -v2 there). How big an
inconvenience is this?

When I was acting maintainer, it didn't bother me, as I picked the
patches up via threading. But if it is annoying, I can try to teach my
scripts to do it automatically, so I stop forgetting.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/15] do not create "struct commit" with xcalloc
  2014-06-10 21:35   ` Junio C Hamano
@ 2014-06-10 21:49     ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Tue, Jun 10, 2014 at 02:35:48PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > In both blame and merge-recursive, we sometimes create a
> > "fake" commit struct for convenience (e.g., to represent the
> > HEAD state as if we would commit it). By allocating
> > ourselves rather than using alloc_commit_node, we do not
> > properly set the "index" field of the commit. This can
> > produce subtle bugs if we then use commit-slab on the
> > resulting commit, as we will share the "0" index with
> > another commit.
> 
> The change I see here is all good, but I wonder if it is too late to
> start the real index from 1 and catch anything that has 0 index with
> a BUG("do not hand-allocate a commit").

Yeah, I had a similar thought while writing the series. I do not think
it is too late at all. It would just waste one commit's worth of memory
in the slab, but that is not a big deal. Other than slab_at, no callers
should care.

It would cost an extra conditional in each call to slab_at. That
probably doesn't matter, though the original goal was to make slab_at as
fast as an indirect pointer dereference (it's not quite these days,
though, as we have to do a little division to find the right section of
the slab).

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/15] sequencer: use logmsg_reencode in get_message
  2014-06-10 21:40   ` Junio C Hamano
@ 2014-06-10 21:50     ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-10 21:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Tue, Jun 10, 2014 at 02:40:30PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Note that we may be fixing a bug here. The existing code
> > does:
> >
> >   if (same_encoding(to, from))
> > 	  reencode_string(buf, to, from);
> >
> > That probably should have been "!same_encoding".
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > I didn't actually test for the bug, so it's possible that I'm missing
> > something clever...
> 
> Thanks for spotting.  There is nothing clever going on.
> 
> 0e18bcd5 (reencode_string(): introduce and use same_encoding(),
> 2012-10-18) has this misconversion.

Ah, thanks, I tried digging but didn't see anything obvious (and somehow
my eyes glossed over the s/same_encoding/strcmp/ while reading the
history). I feel much better about the change now.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/17] store length of commit->buffer
  2014-06-10 21:46   ` [PATCH v2 0/17] store length of commit->buffer Jeff King
@ 2014-06-12 17:22     ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2014-06-12 17:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> On Tue, Jun 10, 2014 at 05:35:09PM -0400, Jeff King wrote:
>
>> Here's a re-roll of the commit-slab series. It fixes the issues pointed
>> out by Eric and Christian (thanks, both).
>
> Side note: I marked this as v2, but forgot to do so in each individual
> patch (I write my cover letters first, and then issue format-patch as a
> separate step, and I sometimes forget -v2 there). How big an
> inconvenience is this?

Not an inconvenience to me (I do not speak for others, obviously),
but they look a bit confusing to me in the thread list in my MUA
(but not too annoying).

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/17] commit-slab: provide a static initializer
  2014-06-10 21:42   ` [PATCH 15/17] commit-slab: provide a static initializer Jeff King
@ 2014-06-12 18:15     ` Junio C Hamano
  2014-06-12 19:51       ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2014-06-12 18:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> Callers currently must use init_foo_slab() at runtime before
> accessing a slab. For global slabs, it's much nicer if we
> can initialize them in BSS, so that each user does not have
> to add code to check-and-initialize.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There was no comment on this one in v1. I'd be curious if anyone has
> comments on what I wrote in:
>
>   http://article.gmane.org/gmane.comp.version-control.git/251099

Why do we need an initialiser at this point (in other words, how
have other existing slab users coped without having one)?

I think they call init_*_slab() when the slab is needed/used the
first time (e.g. it is not even worth initialising indegree slab
unless we are sorting in topo order).

Unlike the author-date and indegree slabs, there are too many entry
points that may want access to the buffer slab (save_commit_buffer's
default being on does not help us either), so it would be much less
error prone to always initialise a static slab like this patch does,
I guess.


>  commit-slab.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/commit-slab.h b/commit-slab.h
> index cc114b5..375c9c7 100644
> --- a/commit-slab.h
> +++ b/commit-slab.h
> @@ -117,4 +117,16 @@ static int stat_ ##slabname## realloc
>   * catch because GCC silently parses it by default.
>   */
>  
> +/*
> + * Statically initialize a commit slab named "var". Note that this
> + * evaluates "stride" multiple times! Example:
> + *
> + *   struct indegree indegrees = COMMIT_SLAB_INIT(1, indegrees);
> + *
> + */
> +#define COMMIT_SLAB_INIT(stride, var) { \
> +	COMMIT_SLAB_SIZE / sizeof(**((var).slab)) / (stride), \
> +	(stride), 0, NULL \
> +}
> +
>  #endif /* COMMIT_SLAB_H */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/17] provide a helper to free commit buffer
  2014-06-10 21:40   ` [PATCH 08/17] provide a helper to free commit buffer Jeff King
@ 2014-06-12 18:22     ` Junio C Hamano
  2014-06-12 20:08       ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2014-06-12 18:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

Jeff King <peff@peff.net> writes:

> This converts two lines into one at each caller. But more
> importantly, it abstracts the concept of freeing the buffer,
> which will make it easier to change later.
>
> Note that we also need to provide a "detach" mechanism for
> the weird case in fsck which passes the buffer back to be
> freed.

I find that last sentence a bit of white lie ;-).

The sole caller of "detach" is in index-pack, and discards the
return value, which is not wrong per-se because it still has the
pointer to the piece of memory it fed to parse_object_buffer(),
knows and/or assumes that the return value must be the same as the
one it already has, and it handles the freeing of that memory
without relying on the object layer at all.

But that is an even more weird special case than the white-lie
version.  As an API, "detach" that returns the buffer to be freed
looks much less weird than what is really going on in the current
caller.

I however have to wonder if

	if (detach_commit_buffer(commit) != data)
        	die("BUG");

might want to be there.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/fsck.c       |  3 +--
>  builtin/index-pack.c |  2 +-
>  builtin/log.c        |  6 ++----
>  builtin/rev-list.c   |  3 +--
>  commit.c             | 13 +++++++++++++
>  commit.h             | 11 +++++++++++
>  6 files changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index fc150c8..8aadca1 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -310,8 +310,7 @@ static int fsck_obj(struct object *obj)
>  	if (obj->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *) obj;
>  
> -		free(commit->buffer);
> -		commit->buffer = NULL;
> +		free_commit_buffer(commit);
>  
>  		if (!commit->parents && show_root)
>  			printf("root %s\n", sha1_to_hex(commit->object.sha1));
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 18f57de..995df39 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -786,7 +786,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
>  			}
>  			if (obj->type == OBJ_COMMIT) {
>  				struct commit *commit = (struct commit *) obj;
> -				commit->buffer = NULL;
> +				detach_commit_buffer(commit);
>  			}
>  			obj->flags |= FLAG_CHECKED;
>  		}
> diff --git a/builtin/log.c b/builtin/log.c
> index 39e8836..226f8f2 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -349,8 +349,7 @@ static int cmd_log_walk(struct rev_info *rev)
>  			rev->max_count++;
>  		if (!rev->reflog_info) {
>  			/* we allow cycles in reflog ancestry */
> -			free(commit->buffer);
> -			commit->buffer = NULL;
> +			free_commit_buffer(commit);
>  		}
>  		free_commit_list(commit->parents);
>  		commit->parents = NULL;
> @@ -1508,8 +1507,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
>  		    reopen_stdout(rev.numbered_files ? NULL : commit, NULL, &rev, quiet))
>  			die(_("Failed to create output files"));
>  		shown = log_tree_commit(&rev, commit);
> -		free(commit->buffer);
> -		commit->buffer = NULL;
> +		free_commit_buffer(commit);
>  
>  		/* We put one extra blank line between formatted
>  		 * patches and this flag is used by log-tree code
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 9f92905..e012ebe 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -173,8 +173,7 @@ static void finish_commit(struct commit *commit, void *data)
>  		free_commit_list(commit->parents);
>  		commit->parents = NULL;
>  	}
> -	free(commit->buffer);
> -	commit->buffer = NULL;
> +	free_commit_buffer(commit);
>  }
>  
>  static void finish_object(struct object *obj,
> diff --git a/commit.c b/commit.c
> index fbdc480..11a05c1 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -245,6 +245,19 @@ int unregister_shallow(const unsigned char *sha1)
>  	return 0;
>  }
>  
> +void free_commit_buffer(struct commit *commit)
> +{
> +	free(commit->buffer);
> +	commit->buffer = NULL;
> +}
> +
> +const void *detach_commit_buffer(struct commit *commit)
> +{
> +	void *ret = commit->buffer;
> +	commit->buffer = NULL;
> +	return ret;
> +}
> +
>  int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size)
>  {
>  	const char *tail = buffer;
> diff --git a/commit.h b/commit.h
> index 4df48cb..d72ed43 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -51,6 +51,17 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  int parse_commit(struct commit *item);
>  void parse_commit_or_die(struct commit *item);
>  
> +/*
> + * Free any cached object buffer associated with the commit.
> + */
> +void free_commit_buffer(struct commit *);
> +
> +/*
> + * Disassociate any cached object buffer from the commit, but do not free it.
> + * The buffer (or NULL, if none) is returned.
> + */
> +const void *detach_commit_buffer(struct commit *);
> +
>  /* Find beginning and length of commit subject. */
>  int find_commit_subject(const char *commit_buffer, const char **subject);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/17] commit-slab: provide a static initializer
  2014-06-12 18:15     ` Junio C Hamano
@ 2014-06-12 19:51       ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-12 19:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Thu, Jun 12, 2014 at 11:15:49AM -0700, Junio C Hamano wrote:

> Why do we need an initialiser at this point (in other words, how
> have other existing slab users coped without having one)?
> 
> I think they call init_*_slab() when the slab is needed/used the
> first time (e.g. it is not even worth initialising indegree slab
> unless we are sorting in topo order).
> 
> Unlike the author-date and indegree slabs, there are too many entry
> points that may want access to the buffer slab (save_commit_buffer's
> default being on does not help us either), so it would be much less
> error prone to always initialise a static slab like this patch does,
> I guess.

Yes, and also because the lifetime of this slab is different. For the
indegree and author-date slabs, the slab lives for one operation.  So we
init the slab at the beginning of the operation, do the sort, then clear
it at the end.

This slab does not exist for one operation. It is a global cache that
can be used by any code, just like the "struct commit" cache itself.. It
should always be initialized from the start, and continue until the
program exits.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/17] provide a helper to free commit buffer
  2014-06-12 18:22     ` Junio C Hamano
@ 2014-06-12 20:08       ` Jeff King
  2014-06-12 22:05         ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2014-06-12 20:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Thu, Jun 12, 2014 at 11:22:31AM -0700, Junio C Hamano wrote:

> > Note that we also need to provide a "detach" mechanism for
> > the weird case in fsck which passes the buffer back to be
> > freed.
> 
> I find that last sentence a bit of white lie ;-).
> 
> The sole caller of "detach" is in index-pack, and discards the
> return value, which is not wrong per-se because it still has the
> pointer to the piece of memory it fed to parse_object_buffer(),
> knows and/or assumes that the return value must be the same as the
> one it already has, and it handles the freeing of that memory
> without relying on the object layer at all.
> 
> But that is an even more weird special case than the white-lie
> version.  As an API, "detach" that returns the buffer to be freed
> looks much less weird than what is really going on in the current
> caller.
> 
> I however have to wonder if
> 
> 	if (detach_commit_buffer(commit) != data)
>         	die("BUG");
> 
> might want to be there.

Yeah, it is a tricky site. It knows that parse_object_buffer may attach
the buffer we hand it to "commit->buffer", even though we would prefer
to keep the buffer ourselves. So the existing code really just wants to
say "erase that attachment". And of course I started with:

  void detach_commit_buffer(struct commit *commit)
  {
	commit->buffer = NULL;
  }

Both before and after it's a bit of a layering violation; we know how
the internals of buffer attaching work, and that we can detach to get
our original.

I then expanded it to the strbuf-inspired detach you see, even though
there are no callers who actually care about the return value. That
makes more sense to me as a general API. I don't think it actually makes
the layering violation worse (we are still making the exact same
assumption), but I think it _seems_ worse, because the API now seems
more fully formed. And note that we make the exact same assumptions
abotu "struct tree" a few lines above.

The safety check you mention above makes sense to me. There are two
things I'd _rather_ do, but they end up more complicated:

  1. It would be nice to ask parse_object_buffer not to attach the
     buffer in the first place; then we could get rid of the detach
     function entirely. But that attachment is necessary for all of the
     fsck sub-functions we call. Factoring those to take a separate
     buffer would be rather disruptive.

  2. Instead of confirming that detach returns the same buffer, just
     assume our buffer was eaten (as promised by set_commit_buffer),
     and continue on with whatever detach_commit_buffer returns.
     But it is not _our_ buffer in the first place. It is passed in by
     the caller, so this replacement would have to bubble back up to the
     original allocator.

So just putting in the safety check is probably the least-disruptive
thing. It doesn't automatically adapt to a change in the underlying
commit_buffer code, but it would at least notice it.

-Peff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/17] provide a helper to free commit buffer
  2014-06-12 20:08       ` Jeff King
@ 2014-06-12 22:05         ` Jeff King
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-12 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder, Jakub Narebski, Eric Sunshine

On Thu, Jun 12, 2014 at 04:08:55PM -0400, Jeff King wrote:

> So just putting in the safety check is probably the least-disruptive
> thing. It doesn't automatically adapt to a change in the underlying
> commit_buffer code, but it would at least notice it.

Here is that commit with the check you suggested, and a better
explanation in the commit message of what is going on.

I don't know if it's easier for you to just replace patch 8 with this
(there will be a minor textual conflict in the final commit then), or if
you want me to re-post the whole 17-patch series.

Arguably this could be broken into two commits (one to add the simple
"free" and one for this more complicated "detach" case), but that makes
it little more complicated for you to apply. We need some way of
emailing patches coupled with rebase instructions. :)

-- >8 --
Subject: provide a helper to free commit buffer

This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack.. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it.  But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual).  In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.

Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c       |  3 +--
 builtin/index-pack.c |  3 ++-
 builtin/log.c        |  6 ++----
 builtin/rev-list.c   |  3 +--
 commit.c             | 13 +++++++++++++
 commit.h             | 11 +++++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index fc150c8..8aadca1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -310,8 +310,7 @@ static int fsck_obj(struct object *obj)
 	if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 
-		free(commit->buffer);
-		commit->buffer = NULL;
+		free_commit_buffer(commit);
 
 		if (!commit->parents && show_root)
 			printf("root %s\n", sha1_to_hex(commit->object.sha1));
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 18f57de..3f88b12 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -786,7 +786,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			}
 			if (obj->type == OBJ_COMMIT) {
 				struct commit *commit = (struct commit *) obj;
-				commit->buffer = NULL;
+				if (detach_commit_buffer(commit) != data)
+					die("BUG: parse_object_buffer transmogrified our buffer");
 			}
 			obj->flags |= FLAG_CHECKED;
 		}
diff --git a/builtin/log.c b/builtin/log.c
index 39e8836..226f8f2 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -349,8 +349,7 @@ static int cmd_log_walk(struct rev_info *rev)
 			rev->max_count++;
 		if (!rev->reflog_info) {
 			/* we allow cycles in reflog ancestry */
-			free(commit->buffer);
-			commit->buffer = NULL;
+			free_commit_buffer(commit);
 		}
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
@@ -1508,8 +1507,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		    reopen_stdout(rev.numbered_files ? NULL : commit, NULL, &rev, quiet))
 			die(_("Failed to create output files"));
 		shown = log_tree_commit(&rev, commit);
-		free(commit->buffer);
-		commit->buffer = NULL;
+		free_commit_buffer(commit);
 
 		/* We put one extra blank line between formatted
 		 * patches and this flag is used by log-tree code
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f92905..e012ebe 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -173,8 +173,7 @@ static void finish_commit(struct commit *commit, void *data)
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
 	}
-	free(commit->buffer);
-	commit->buffer = NULL;
+	free_commit_buffer(commit);
 }
 
 static void finish_object(struct object *obj,
diff --git a/commit.c b/commit.c
index fbdc480..11a05c1 100644
--- a/commit.c
+++ b/commit.c
@@ -245,6 +245,19 @@ int unregister_shallow(const unsigned char *sha1)
 	return 0;
 }
 
+void free_commit_buffer(struct commit *commit)
+{
+	free(commit->buffer);
+	commit->buffer = NULL;
+}
+
+const void *detach_commit_buffer(struct commit *commit)
+{
+	void *ret = commit->buffer;
+	commit->buffer = NULL;
+	return ret;
+}
+
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size)
 {
 	const char *tail = buffer;
diff --git a/commit.h b/commit.h
index 4df48cb..d72ed43 100644
--- a/commit.h
+++ b/commit.h
@@ -51,6 +51,17 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 int parse_commit(struct commit *item);
 void parse_commit_or_die(struct commit *item);
 
+/*
+ * Free any cached object buffer associated with the commit.
+ */
+void free_commit_buffer(struct commit *);
+
+/*
+ * Disassociate any cached object buffer from the commit, but do not free it.
+ * The buffer (or NULL, if none) is returned.
+ */
+const void *detach_commit_buffer(struct commit *);
+
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
 
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2] reuse cached commit buffer when parsing signatures
  2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
                     ` (17 preceding siblings ...)
  2014-06-10 21:46   ` [PATCH v2 0/17] store length of commit->buffer Jeff King
@ 2014-06-13  6:32   ` Jeff King
  18 siblings, 0 replies; 60+ messages in thread
From: Jeff King @ 2014-06-13  6:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On Tue, Jun 10, 2014 at 05:35:09PM -0400, Jeff King wrote:

> Here's a re-roll of the commit-slab series.

And here is a re-roll of the --show-signature speedup on top (i.e., the
point of the series in the first place), adjusted to use
get_commit_buffer.

-- >8 --
Subject: reuse cached commit buffer when parsing signatures

When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).

However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:

  [baseline, no signatures]
  $ time git log >/dev/null
  real    0m4.902s
  user    0m4.784s
  sys     0m0.120s

  [before]
  $ time git log --show-signature >/dev/null
  real    0m14.735s
  user    0m9.964s
  sys     0m0.944s

  [after]
  $ time git log --show-signature >/dev/null
  real    0m9.981s
  user    0m5.260s
  sys     0m0.936s

Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.

An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c   | 27 ++++++++++-----------------
 commit.h   |  2 +-
 log-tree.c |  2 +-
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/commit.c b/commit.c
index a036e18..ebd7ad8 100644
--- a/commit.c
+++ b/commit.c
@@ -1138,17 +1138,14 @@ static int do_sign_commit(struct strbuf *buf, const char *keyid)
 	return 0;
 }
 
-int parse_signed_commit(const unsigned char *sha1,
+int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature)
 {
+
 	unsigned long size;
-	enum object_type type;
-	char *buffer = read_sha1_file(sha1, &type, &size);
+	const char *buffer = get_commit_buffer(commit, &size);
 	int in_signature, saw_signature = -1;
-	char *line, *tail;
-
-	if (!buffer || type != OBJ_COMMIT)
-		goto cleanup;
+	const char *line, *tail;
 
 	line = buffer;
 	tail = buffer + size;
@@ -1156,7 +1153,7 @@ int parse_signed_commit(const unsigned char *sha1,
 	saw_signature = 0;
 	while (line < tail) {
 		const char *sig = NULL;
-		char *next = memchr(line, '\n', tail - line);
+		const char *next = memchr(line, '\n', tail - line);
 
 		next = next ? next + 1 : tail;
 		if (in_signature && line[0] == ' ')
@@ -1177,8 +1174,7 @@ int parse_signed_commit(const unsigned char *sha1,
 		}
 		line = next;
 	}
- cleanup:
-	free(buffer);
+	unuse_commit_buffer(commit, buffer);
 	return saw_signature;
 }
 
@@ -1269,8 +1265,7 @@ void check_commit_signature(const struct commit* commit, struct signature_check
 
 	sigc->result = 'N';
 
-	if (parse_signed_commit(commit->object.sha1,
-				&payload, &signature) <= 0)
+	if (parse_signed_commit(commit, &payload, &signature) <= 0)
 		goto out;
 	status = verify_signed_buffer(payload.buf, payload.len,
 				      signature.buf, signature.len,
@@ -1315,11 +1310,9 @@ struct commit_extra_header *read_commit_extra_headers(struct commit *commit,
 {
 	struct commit_extra_header *extra = NULL;
 	unsigned long size;
-	enum object_type type;
-	char *buffer = read_sha1_file(commit->object.sha1, &type, &size);
-	if (buffer && type == OBJ_COMMIT)
-		extra = read_commit_extra_header_lines(buffer, size, exclude);
-	free(buffer);
+	const char *buffer = get_commit_buffer(commit, &size);
+	extra = read_commit_extra_header_lines(buffer, size, exclude);
+	unuse_commit_buffer(commit, buffer);
 	return extra;
 }
 
diff --git a/commit.h b/commit.h
index 61559a9..2e1492a 100644
--- a/commit.h
+++ b/commit.h
@@ -325,7 +325,7 @@ struct merge_remote_desc {
  */
 struct commit *get_merge_parent(const char *name);
 
-extern int parse_signed_commit(const unsigned char *sha1,
+extern int parse_signed_commit(const struct commit *commit,
 			       struct strbuf *message, struct strbuf *signature);
 extern void print_commit_list(struct commit_list *list,
 			      const char *format_cur,
diff --git a/log-tree.c b/log-tree.c
index 4447021..10e6844 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -376,7 +376,7 @@ static void show_signature(struct rev_info *opt, struct commit *commit)
 	struct strbuf gpg_output = STRBUF_INIT;
 	int status;
 
-	if (parse_signed_commit(commit->object.sha1, &payload, &signature) <= 0)
+	if (parse_signed_commit(commit, &payload, &signature) <= 0)
 		goto out;
 
 	status = verify_signed_buffer(payload.buf, payload.len,
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2014-06-13  6:32 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
2014-06-09 18:09 ` [PATCH 01/15] alloc: include any-object allocations in alloc_report Jeff King
2014-06-09 18:09 ` [PATCH 02/15] commit: push commit_index update into alloc_commit_node Jeff King
2014-06-10 21:31   ` Junio C Hamano
2014-06-09 18:10 ` [PATCH 03/15] do not create "struct commit" with xcalloc Jeff King
2014-06-10 21:35   ` Junio C Hamano
2014-06-10 21:49     ` Jeff King
2014-06-09 18:10 ` [PATCH 04/15] logmsg_reencode: return const buffer Jeff King
2014-06-10  1:36   ` Eric Sunshine
2014-06-09 18:10 ` [PATCH 05/15] sequencer: use logmsg_reencode in get_message Jeff King
2014-06-10 21:40   ` Junio C Hamano
2014-06-10 21:50     ` Jeff King
2014-06-09 18:11 ` [PATCH 06/15] provide a helper to free commit buffer Jeff King
2014-06-09 18:11 ` [PATCH 07/15] provide a helper to set the " Jeff King
2014-06-09 18:11 ` [PATCH 08/15] provide helpers to access " Jeff King
2014-06-10  8:00   ` Eric Sunshine
2014-06-09 18:12 ` [PATCH 09/15] use get_cached_commit_buffer where appropriate Jeff King
2014-06-09 18:12 ` [PATCH 10/15] use get_commit_buffer to avoid duplicate code Jeff King
2014-06-10  8:01   ` Eric Sunshine
2014-06-10 20:34     ` Jeff King
2014-06-09 18:13 ` [PATCH 11/15] convert logmsg_reencode to get_commit_buffer Jeff King
2014-06-09 18:13 ` [PATCH 12/15] use get_commit_buffer everywhere Jeff King
2014-06-09 22:40   ` Junio C Hamano
2014-06-10  0:02     ` Jeff King
2014-06-10  0:22       ` Jeff King
2014-06-10 16:06       ` Junio C Hamano
2014-06-10 21:27         ` Jeff King
2014-06-09 18:15 ` [PATCH 13/15] commit-slab: provide a static initializer Jeff King
2014-06-09 18:15 ` [PATCH 14/15] commit: convert commit->buffer to a slab Jeff King
2014-06-09 18:15 ` [PATCH 15/15] commit: record buffer length in cache Jeff King
2014-06-10  5:12   ` Christian Couder
2014-06-10  5:27     ` Jeff King
2014-06-10 20:33       ` Jeff King
2014-06-10 21:30         ` Christian Couder
2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
2014-06-10 21:36   ` [PATCH 01/17] commit_tree: take a pointer/len pair rather than a const strbuf Jeff King
2014-06-10 21:38   ` [PATCH 02/17] replace dangerous uses of strbuf_attach Jeff King
2014-06-10 21:38   ` [PATCH 03/17] alloc: include any-object allocations in alloc_report Jeff King
2014-06-10 21:39   ` [PATCH 04/17] commit: push commit_index update into alloc_commit_node Jeff King
2014-06-10 21:39   ` [PATCH 05/17] do not create "struct commit" with xcalloc Jeff King
2014-06-10 21:39   ` [PATCH 06/17] logmsg_reencode: return const buffer Jeff King
2014-06-10 21:39   ` [PATCH 07/17] sequencer: use logmsg_reencode in get_message Jeff King
2014-06-10 21:40   ` [PATCH 08/17] provide a helper to free commit buffer Jeff King
2014-06-12 18:22     ` Junio C Hamano
2014-06-12 20:08       ` Jeff King
2014-06-12 22:05         ` Jeff King
2014-06-10 21:40   ` [PATCH 09/17] provide a helper to set the " Jeff King
2014-06-10 21:40   ` [PATCH 10/17] provide helpers to access " Jeff King
2014-06-10 21:40   ` [PATCH 11/17] use get_cached_commit_buffer where appropriate Jeff King
2014-06-10 21:41   ` [PATCH 12/17] use get_commit_buffer to avoid duplicate code Jeff King
2014-06-10 21:41   ` [PATCH 13/17] convert logmsg_reencode to get_commit_buffer Jeff King
2014-06-10 21:41   ` [PATCH 14/17] use get_commit_buffer everywhere Jeff King
2014-06-10 21:42   ` [PATCH 15/17] commit-slab: provide a static initializer Jeff King
2014-06-12 18:15     ` Junio C Hamano
2014-06-12 19:51       ` Jeff King
2014-06-10 21:43   ` [PATCH 16/17] commit: convert commit->buffer to a slab Jeff King
2014-06-10 21:44   ` [PATCH 17/17] commit: record buffer length in cache Jeff King
2014-06-10 21:46   ` [PATCH v2 0/17] store length of commit->buffer Jeff King
2014-06-12 17:22     ` Junio C Hamano
2014-06-13  6:32   ` [PATCH v2] reuse cached commit buffer when parsing signatures Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.