All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] cleaning up determine_author_info
@ 2014-06-18 20:19 Jeff King
  2014-06-18 20:27 ` [PATCH 1/7] commit: provide a function to find a header in a buffer Jeff King
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:19 UTC (permalink / raw)
  To: git

This is another function I ran across in today's cleanups. The memory
leak in it has bugged me for a while (even though it's really not a big
deal in practice). So this is mostly minor cleanups, but I did find a
bug in the commit parser.

  [1/7]: commit: provide a function to find a header in a buffer
  [2/7]: record_author_info: fix memory leak on malformed commit
  [3/7]: record_author_info: use find_commit_header
  [4/7]: ident_split: store begin/end pairs on their own struct
  [5/7]: use strbufs in date functions
  [6/7]: determine_author_info: reuse parsing functions
  [7/7]: determine_author_info: stop leaking name/email

I built it on top of my commit-slab topic, as otherwise you get some
textual conflicts in determine_author_info. But I notice that Junio's
jk/commit-buffer-length is based on an older master; applying there
produces some other minor textual conflicts. I can build it on whatever
is convenient and handle the conflicts myself. But if
jk/commit-buffer-length is set to graduate soon (as it is marked in WC),
I can just hold onto this until then.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] commit: provide a function to find a header in a buffer
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
@ 2014-06-18 20:27 ` Jeff King
  2014-06-23  1:26   ` Eric Sunshine
  2014-06-18 20:28 ` [PATCH 2/7] record_author_info: fix memory leak on malformed commit Jeff King
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:27 UTC (permalink / raw)
  To: git

Usually when we parse a commit, we read it line by line and
handle each header in a single pass (e.g., in parse_commit
and parse_commit_header).  Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.

Let's provide a reusable function to locate a header within
the commit.  The code is modeled after pretty.c's
get_header, which is used to extract the encoding.

Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter.  The only
thing lost is a warning for truncated commits, but that's
OK.  This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller can
complain).

Signed-off-by: Jeff King <peff@peff.net>
---
As noted in the comments below, I punted on extracting
multi-line headers like mergetag, because this function only
returns a pointer. It might make sense to wrap it with a
function to pull out a copy of the header, with continuation
lines connected to each other (that's almost what the static
get_header does, but I didn't make it public exactly because
it doesn't handle continuation lines).

That might be a more natural interface than
read_commit_extra_header_lines, which pulls out all headers
except ones that are specifically excluded. I haven't looked
closely enough. But in either case, that could easily come
on top of this.

 commit.c | 23 +++++++++++++++++++++++
 commit.h | 11 +++++++++++
 pretty.c | 33 ++++++---------------------------
 3 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/commit.c b/commit.c
index 11106fb..d04b525 100644
--- a/commit.c
+++ b/commit.c
@@ -1652,3 +1652,26 @@ void print_commit_list(struct commit_list *list,
 		printf(format, sha1_to_hex(list->item->object.sha1));
 	}
 }
+
+const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
+{
+	int key_len = strlen(key);
+	const char *line = msg;
+
+	while (line) {
+		const char *eol = strchrnul(line, '\n'), *next;
+
+		if (line == eol)
+			return NULL;
+		next = *eol ? eol + 1 : NULL;
+
+		if (eol - line > key_len &&
+		    !strncmp(line, key, key_len) &&
+		    line[key_len] == ' ') {
+			*out_len = eol - line - key_len - 1;
+			return line + key_len + 1;
+		}
+		line = next;
+	}
+	return NULL;
+}
diff --git a/commit.h b/commit.h
index 61559a9..7c766e9 100644
--- a/commit.h
+++ b/commit.h
@@ -312,6 +312,17 @@ extern struct commit_extra_header *read_commit_extra_headers(struct commit *, co
 
 extern void free_commit_extra_headers(struct commit_extra_header *extra);
 
+/*
+ * Search the commit object contents given by "msg" for the header "key".
+ * Returns a pointer to the start of the header contents, or NULL. The length
+ * of the header, up to the first newline, is returned via out_len.
+ *
+ * Note that some headers (like mergetag) may be multi-line. It is the caller's
+ * responsibility to parse further in this case!
+ */
+extern const char *find_commit_header(const char *msg, const char *key,
+				      size_t *out_len);
+
 struct merge_remote_desc {
 	struct object *obj; /* the named object, could be a tag */
 	const char *name;
diff --git a/pretty.c b/pretty.c
index cc5b45d..6081750 100644
--- a/pretty.c
+++ b/pretty.c
@@ -548,31 +548,11 @@ static void add_merge_info(const struct pretty_print_context *pp,
 	strbuf_addch(sb, '\n');
 }
 
-static char *get_header(const struct commit *commit, const char *msg,
-			const char *key)
+static char *get_header(const char *msg, const char *key)
 {
-	int key_len = strlen(key);
-	const char *line = msg;
-
-	while (line) {
-		const char *eol = strchrnul(line, '\n'), *next;
-
-		if (line == eol)
-			return NULL;
-		if (!*eol) {
-			warning("malformed commit (header is missing newline): %s",
-				sha1_to_hex(commit->object.sha1));
-			next = NULL;
-		} else
-			next = eol + 1;
-		if (eol - line > key_len &&
-		    !strncmp(line, key, key_len) &&
-		    line[key_len] == ' ') {
-			return xmemdupz(line + key_len + 1, eol - line - key_len - 1);
-		}
-		line = next;
-	}
-	return NULL;
+	size_t len;
+	const char *v = find_commit_header(msg, key, &len);
+	return v ? xmemdupz(v, len) : NULL;
 }
 
 static char *replace_encoding_header(char *buf, const char *encoding)
@@ -618,11 +598,10 @@ const char *logmsg_reencode(const struct commit *commit,
 
 	if (!output_encoding || !*output_encoding) {
 		if (commit_encoding)
-			*commit_encoding =
-				get_header(commit, msg, "encoding");
+			*commit_encoding = get_header(msg, "encoding");
 		return msg;
 	}
-	encoding = get_header(commit, msg, "encoding");
+	encoding = get_header(msg, "encoding");
 	if (commit_encoding)
 		*commit_encoding = encoding;
 	use_encoding = encoding ? encoding : utf8;
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] record_author_info: fix memory leak on malformed commit
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
  2014-06-18 20:27 ` [PATCH 1/7] commit: provide a function to find a header in a buffer Jeff King
@ 2014-06-18 20:28 ` Jeff King
  2014-06-18 20:29 ` [PATCH 3/7] record_author_info: use find_commit_header Jeff King
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:28 UTC (permalink / raw)
  To: git

If we hit the end-of-header without finding an "author"
line, we just return from the function. We should jump to
the fail_exit path to clean up the buffer that we may have
allocated.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index d04b525..0c40cfa 100644
--- a/commit.c
+++ b/commit.c
@@ -617,7 +617,7 @@ static void record_author_date(struct author_date_slab *author_date,
 		ident_line = skip_prefix(buf, "author ");
 		if (!ident_line) {
 			if (!line_end[0] || line_end[1] == '\n')
-				return; /* end of header */
+				goto fail_exit; /* end of header */
 			continue;
 		}
 		if (split_ident_line(&ident,
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] record_author_info: use find_commit_header
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
  2014-06-18 20:27 ` [PATCH 1/7] commit: provide a function to find a header in a buffer Jeff King
  2014-06-18 20:28 ` [PATCH 2/7] record_author_info: fix memory leak on malformed commit Jeff King
@ 2014-06-18 20:29 ` Jeff King
  2014-06-18 20:31 ` [PATCH 4/7] ident_split: store begin/end pairs on their own struct Jeff King
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:29 UTC (permalink / raw)
  To: git

This saves us some manual parsing and makes the code more
readable.

Signed-off-by: Jeff King <peff@peff.net>
---
I suspect there are other sites which could use this helper, too; I
didn't do an exhaustive search.

 commit.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 0c40cfa..c33431c 100644
--- a/commit.c
+++ b/commit.c
@@ -606,26 +606,19 @@ define_commit_slab(author_date_slab, unsigned long);
 static void record_author_date(struct author_date_slab *author_date,
 			       struct commit *commit)
 {
-	const char *buf, *line_end, *ident_line;
 	const char *buffer = get_commit_buffer(commit, NULL);
 	struct ident_split ident;
+	const char *ident_line;
+	size_t ident_len;
 	char *date_end;
 	unsigned long date;
 
-	for (buf = buffer; buf; buf = line_end + 1) {
-		line_end = strchrnul(buf, '\n');
-		ident_line = skip_prefix(buf, "author ");
-		if (!ident_line) {
-			if (!line_end[0] || line_end[1] == '\n')
-				goto fail_exit; /* end of header */
-			continue;
-		}
-		if (split_ident_line(&ident,
-				     ident_line, line_end - ident_line) ||
-		    !ident.date_begin || !ident.date_end)
-			goto fail_exit; /* malformed "author" line */
-		break;
-	}
+	ident_line = find_commit_header(buffer, "author", &ident_len);
+	if (!ident_line)
+		goto fail_exit; /* no author line */
+	if (split_ident_line(&ident, ident_line, ident_len) ||
+	    !ident.date_begin || !ident.date_end)
+		goto fail_exit; /* malformed "author" line */
 
 	date = strtoul(ident.date_begin, &date_end, 10);
 	if (date_end != ident.date_end)
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] ident_split: store begin/end pairs on their own struct
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
                   ` (2 preceding siblings ...)
  2014-06-18 20:29 ` [PATCH 3/7] record_author_info: use find_commit_header Jeff King
@ 2014-06-18 20:31 ` Jeff King
  2014-06-23  1:28   ` Eric Sunshine
  2014-06-18 20:32 ` [PATCH 5/7] use strbufs in date functions Jeff King
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:31 UTC (permalink / raw)
  To: git

When we parse an ident line, we end up with several fields,
each with a begin/end pointer into the buffer, like:

  const char *name_begin;
  const char *name_end;

There is nothing except the field names to indicate that
they are paired. This makes it annoying to write helper
functions for dealing with the sub-fields, as you have to
pass both sides. Instead, let's move them into a single
struct "name", with fields "begin" and "end". This will be
stored identically, but can be passed as a unit.

We have to do a mechanical update of "s/_/./" at each point
of use, but other than that, the fields should behave
identically.

Signed-off-by: Jeff King <peff@peff.net>
---
Suggestions welcome on the name "pointer_pair".

While writing this series, I also noticed that it would be more
convenient to have a pointer/len combination rather than two pointers.
You can convert between them, of course, but I found I was always
converting the other way.

I left it this way because it makes the mass-update mechanical (and
because now that I can pass the pair as a unit, I don't have to write
the same "ident->name_begin, ident->name_end - ident->name_begin" pair
over and over).

 builtin/blame.c         | 16 +++++++-------
 builtin/check-mailmap.c |  8 +++----
 builtin/commit.c        | 26 +++++++++++-----------
 builtin/shortlog.c      |  8 +++----
 cache.h                 | 17 ++++++++-------
 commit.c                |  6 +++---
 ident.c                 | 57 +++++++++++++++++++++++--------------------------
 log-tree.c              |  2 +-
 pretty.c                | 36 +++++++++++++++----------------
 revision.c              | 12 +++++------
 10 files changed, 93 insertions(+), 95 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 53f43ab..6e6dddb 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1599,19 +1599,19 @@ static void get_ac_line(const char *inbuf, const char *what,
 		return;
 	}
 
-	namelen = ident.name_end - ident.name_begin;
-	namebuf = ident.name_begin;
+	namelen = ident.name.end - ident.name.begin;
+	namebuf = ident.name.begin;
 
-	maillen = ident.mail_end - ident.mail_begin;
-	mailbuf = ident.mail_begin;
+	maillen = ident.mail.end - ident.mail.begin;
+	mailbuf = ident.mail.begin;
 
-	if (ident.date_begin && ident.date_end)
-		*time = strtoul(ident.date_begin, NULL, 10);
+	if (ident.date.begin && ident.date.end)
+		*time = strtoul(ident.date.begin, NULL, 10);
 	else
 		*time = 0;
 
-	if (ident.tz_begin && ident.tz_end)
-		strbuf_add(tz, ident.tz_begin, ident.tz_end - ident.tz_begin);
+	if (ident.tz.begin && ident.tz.end)
+		strbuf_add(tz, ident.tz.begin, ident.tz.end - ident.tz.begin);
 	else
 		strbuf_addstr(tz, "(unknown)");
 
diff --git a/builtin/check-mailmap.c b/builtin/check-mailmap.c
index 8f4d809..65dcbc6 100644
--- a/builtin/check-mailmap.c
+++ b/builtin/check-mailmap.c
@@ -23,10 +23,10 @@ static void check_mailmap(struct string_list *mailmap, const char *contact)
 	if (split_ident_line(&ident, contact, strlen(contact)))
 		die(_("unable to parse contact: %s"), contact);
 
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
+	name = ident.name.begin;
+	namelen = ident.name.end - ident.name.begin;
+	mail = ident.mail.begin;
+	maillen = ident.mail.end - ident.mail.begin;
 
 	map_user(mailmap, &mail, &maillen, &name, &namelen);
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 84cec9a..047cc76 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -514,14 +514,14 @@ static void export_one(const char *var, const char *s, const char *e, int hack)
 
 static int sane_ident_split(struct ident_split *person)
 {
-	if (!person->name_begin || !person->name_end ||
-	    person->name_begin == person->name_end)
+	if (!person->name.begin || !person->name.end ||
+	    person->name.begin == person->name.end)
 		return 0; /* no human readable name */
-	if (!person->mail_begin || !person->mail_end ||
-	    person->mail_begin == person->mail_end)
+	if (!person->mail.begin || !person->mail.end ||
+	    person->mail.begin == person->mail.end)
 		return 0; /* no usable mail */
-	if (!person->date_begin || !person->date_end ||
-	    !person->tz_begin || !person->tz_end)
+	if (!person->date.begin || !person->date.end ||
+	    !person->tz.begin || !person->tz.end)
 		return 0;
 	return 1;
 }
@@ -602,9 +602,9 @@ static void determine_author_info(struct strbuf *author_ident)
 	strbuf_addstr(author_ident, fmt_ident(name, email, date, IDENT_STRICT));
 	if (!split_ident_line(&author, author_ident->buf, author_ident->len) &&
 	    sane_ident_split(&author)) {
-		export_one("GIT_AUTHOR_NAME", author.name_begin, author.name_end, 0);
-		export_one("GIT_AUTHOR_EMAIL", author.mail_begin, author.mail_end, 0);
-		export_one("GIT_AUTHOR_DATE", author.date_begin, author.tz_end, '@');
+		export_one("GIT_AUTHOR_NAME", author.name.begin, author.name.end, 0);
+		export_one("GIT_AUTHOR_EMAIL", author.mail.begin, author.mail.end, 0);
+		export_one("GIT_AUTHOR_DATE", author.date.begin, author.tz.end, '@');
 	}
 }
 
@@ -861,8 +861,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 				_("%s"
 				"Author:    %.*s <%.*s>"),
 				ident_shown++ ? "" : "\n",
-				(int)(ai.name_end - ai.name_begin), ai.name_begin,
-				(int)(ai.mail_end - ai.mail_begin), ai.mail_begin);
+				(int)(ai.name.end - ai.name.begin), ai.name.begin,
+				(int)(ai.mail.end - ai.mail.begin), ai.mail.begin);
 
 		if (author_date_is_interesting())
 			status_printf_ln(s, GIT_COLOR_NORMAL,
@@ -876,8 +876,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 				_("%s"
 				"Committer: %.*s <%.*s>"),
 				ident_shown++ ? "" : "\n",
-				(int)(ci.name_end - ci.name_begin), ci.name_begin,
-				(int)(ci.mail_end - ci.mail_begin), ci.mail_begin);
+				(int)(ci.name.end - ci.name.begin), ci.name.begin,
+				(int)(ci.mail.end - ci.mail.begin), ci.mail.begin);
 
 		if (ident_shown)
 			status_printf_ln(s, GIT_COLOR_NORMAL, "%s", "");
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 4b7e536..e4ddfea 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -44,10 +44,10 @@ static void insert_one_record(struct shortlog *log,
 	if (split_ident_line(&ident, author, strlen(author)))
 		return;
 
-	namebuf = ident.name_begin;
-	mailbuf = ident.mail_begin;
-	namelen = ident.name_end - ident.name_begin;
-	maillen = ident.mail_end - ident.mail_begin;
+	namebuf = ident.name.begin;
+	mailbuf = ident.mail.begin;
+	namelen = ident.name.end - ident.name.begin;
+	maillen = ident.mail.end - ident.mail.begin;
 
 	map_user(&log->mailmap, &mailbuf, &maillen, &namebuf, &namelen);
 	strbuf_add(&namemailbuf, namebuf, namelen);
diff --git a/cache.h b/cache.h
index cbe1935..5255661 100644
--- a/cache.h
+++ b/cache.h
@@ -1045,15 +1045,16 @@ extern const char *git_editor(void);
 extern const char *git_pager(int stdout_is_tty);
 extern int git_ident_config(const char *, const char *, void *);
 
+struct pointer_pair {
+	const char *begin;
+	const char *end;
+};
+
 struct ident_split {
-	const char *name_begin;
-	const char *name_end;
-	const char *mail_begin;
-	const char *mail_end;
-	const char *date_begin;
-	const char *date_end;
-	const char *tz_begin;
-	const char *tz_end;
+	struct pointer_pair name;
+	struct pointer_pair mail;
+	struct pointer_pair date;
+	struct pointer_pair tz;
 };
 /*
  * Signals an success with 0, but time part of the result may be NULL
diff --git a/commit.c b/commit.c
index c33431c..dcf8dce 100644
--- a/commit.c
+++ b/commit.c
@@ -617,11 +617,11 @@ static void record_author_date(struct author_date_slab *author_date,
 	if (!ident_line)
 		goto fail_exit; /* no author line */
 	if (split_ident_line(&ident, ident_line, ident_len) ||
-	    !ident.date_begin || !ident.date_end)
+	    !ident.date.begin || !ident.date.end)
 		goto fail_exit; /* malformed "author" line */
 
-	date = strtoul(ident.date_begin, &date_end, 10);
-	if (date_end != ident.date_end)
+	date = strtoul(ident.date.begin, &date_end, 10);
+	if (date_end != ident.date.end)
 		goto fail_exit; /* malformed date */
 	*(author_date_slab_at(author_date, commit)) = date;
 
diff --git a/ident.c b/ident.c
index 1d9b6e7..570fda7 100644
--- a/ident.c
+++ b/ident.c
@@ -206,42 +206,42 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 
 	memset(split, 0, sizeof(*split));
 
-	split->name_begin = line;
+	split->name.begin = line;
 	for (cp = line; *cp && cp < line + len; cp++)
 		if (*cp == '<') {
-			split->mail_begin = cp + 1;
+			split->mail.begin = cp + 1;
 			break;
 		}
-	if (!split->mail_begin)
+	if (!split->mail.begin)
 		return status;
 
-	for (cp = split->mail_begin - 2; line <= cp; cp--)
+	for (cp = split->mail.begin - 2; line <= cp; cp--)
 		if (!isspace(*cp)) {
-			split->name_end = cp + 1;
+			split->name.end = cp + 1;
 			break;
 		}
-	if (!split->name_end) {
+	if (!split->name.end) {
 		/* no human readable name */
-		split->name_end = split->name_begin;
+		split->name.end = split->name.begin;
 	}
 
-	for (cp = split->mail_begin; cp < line + len; cp++)
+	for (cp = split->mail.begin; cp < line + len; cp++)
 		if (*cp == '>') {
-			split->mail_end = cp;
+			split->mail.end = cp;
 			break;
 		}
-	if (!split->mail_end)
+	if (!split->mail.end)
 		return status;
 
 	/*
 	 * Look from the end-of-line to find the trailing ">" of the mail
-	 * address, even though we should already know it as split->mail_end.
+	 * address, even though we should already know it as split->mail.end.
 	 * This can help in cases of broken idents with an extra ">" somewhere
 	 * in the email address.  Note that we are assuming the timestamp will
 	 * never have a ">" in it.
 	 *
 	 * Note that we will always find some ">" before going off the front of
-	 * the string, because will always hit the split->mail_end closing
+	 * the string, because will always hit the split->mail.end closing
 	 * bracket.
 	 */
 	for (cp = line + len - 1; *cp != '>'; cp--)
@@ -251,27 +251,27 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
 		;
 	if (line + len <= cp)
 		goto person_only;
-	split->date_begin = cp;
+	split->date.begin = cp;
 	span = strspn(cp, "0123456789");
 	if (!span)
 		goto person_only;
-	split->date_end = split->date_begin + span;
-	for (cp = split->date_end; cp < line + len && isspace(*cp); cp++)
+	split->date.end = split->date.begin + span;
+	for (cp = split->date.end; cp < line + len && isspace(*cp); cp++)
 		;
 	if (line + len <= cp || (*cp != '+' && *cp != '-'))
 		goto person_only;
-	split->tz_begin = cp;
+	split->tz.begin = cp;
 	span = strspn(cp + 1, "0123456789");
 	if (!span)
 		goto person_only;
-	split->tz_end = split->tz_begin + 1 + span;
+	split->tz.end = split->tz.begin + 1 + span;
 	return 0;
 
 person_only:
-	split->date_begin = NULL;
-	split->date_end = NULL;
-	split->tz_begin = NULL;
-	split->tz_end = NULL;
+	split->date.begin = NULL;
+	split->date.end = NULL;
+	split->tz.begin = NULL;
+	split->tz.end = NULL;
 	return 0;
 }
 
@@ -417,15 +417,14 @@ int git_ident_config(const char *var, const char *value, void *data)
 	return 0;
 }
 
-static int buf_cmp(const char *a_begin, const char *a_end,
-		   const char *b_begin, const char *b_end)
+static int buf_cmp(const struct pointer_pair *a, const struct pointer_pair *b)
 {
-	int a_len = a_end - a_begin;
-	int b_len = b_end - b_begin;
+	int a_len = a->end - a->begin;
+	int b_len = b->end - b->begin;
 	int min = a_len < b_len ? a_len : b_len;
 	int cmp;
 
-	cmp = memcmp(a_begin, b_begin, min);
+	cmp = memcmp(a->begin, b->begin, min);
 	if (cmp)
 		return cmp;
 
@@ -437,11 +436,9 @@ int ident_cmp(const struct ident_split *a,
 {
 	int cmp;
 
-	cmp = buf_cmp(a->mail_begin, a->mail_end,
-		      b->mail_begin, b->mail_end);
+	cmp = buf_cmp(&a->mail, &b->mail);
 	if (cmp)
 		return cmp;
 
-	return buf_cmp(a->name_begin, a->name_end,
-		       b->name_begin, b->name_end);
+	return buf_cmp(&a->name, &b->name);
 }
diff --git a/log-tree.c b/log-tree.c
index 4447021..385f07f 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -620,7 +620,7 @@ void show_log(struct rev_info *opt)
 	ctx.mailmap = opt->mailmap;
 	ctx.color = opt->diffopt.use_color;
 	ctx.output_encoding = get_log_output_encoding();
-	if (opt->from_ident.mail_begin && opt->from_ident.name_begin)
+	if (opt->from_ident.mail.begin && opt->from_ident.name.begin)
 		ctx.from_ident = &opt->from_ident;
 	pretty_print_commit(&ctx, commit, &msgbuf);
 
diff --git a/pretty.c b/pretty.c
index 6081750..83ef5b7 100644
--- a/pretty.c
+++ b/pretty.c
@@ -399,13 +399,13 @@ const char *show_ident_date(const struct ident_split *ident,
 	unsigned long date = 0;
 	long tz = 0;
 
-	if (ident->date_begin && ident->date_end)
-		date = strtoul(ident->date_begin, NULL, 10);
+	if (ident->date.begin && ident->date.end)
+		date = strtoul(ident->date.begin, NULL, 10);
 	if (date_overflows(date))
 		date = 0;
 	else {
-		if (ident->tz_begin && ident->tz_end)
-			tz = strtol(ident->tz_begin, NULL, 10);
+		if (ident->tz.begin && ident->tz.end)
+			tz = strtol(ident->tz.begin, NULL, 10);
 		if (tz >= INT_MAX || tz <= INT_MIN)
 			tz = 0;
 	}
@@ -429,10 +429,10 @@ void pp_user_info(struct pretty_print_context *pp,
 	if (split_ident_line(&ident, line, line_end - line))
 		return;
 
-	mailbuf = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	namebuf = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
+	mailbuf = ident.mail.begin;
+	maillen = ident.mail.end - ident.mail.begin;
+	namebuf = ident.name.begin;
+	namelen = ident.name.end - ident.name.begin;
 
 	if (pp->mailmap)
 		map_user(pp->mailmap, &mailbuf, &maillen, &namebuf, &namelen);
@@ -449,10 +449,10 @@ void pp_user_info(struct pretty_print_context *pp,
 			string_list_append(&pp->in_body_headers,
 					   strbuf_detach(&buf, NULL));
 
-			mailbuf = pp->from_ident->mail_begin;
-			maillen = pp->from_ident->mail_end - mailbuf;
-			namebuf = pp->from_ident->name_begin;
-			namelen = pp->from_ident->name_end - namebuf;
+			mailbuf = pp->from_ident->mail.begin;
+			maillen = pp->from_ident->mail.end - mailbuf;
+			namebuf = pp->from_ident->name.begin;
+			namelen = pp->from_ident->name.end - namebuf;
 		}
 
 		strbuf_addstr(sb, "From: ");
@@ -677,10 +677,10 @@ static size_t format_person_part(struct strbuf *sb, char part,
 	if (split_ident_line(&s, msg, len) < 0)
 		goto skip;
 
-	name = s.name_begin;
-	namelen = s.name_end - s.name_begin;
-	mail = s.mail_begin;
-	maillen = s.mail_end - s.mail_begin;
+	name = s.name.begin;
+	namelen = s.name.end - s.name.begin;
+	mail = s.mail.begin;
+	maillen = s.mail.end - s.mail.begin;
 
 	if (part == 'N' || part == 'E') /* mailmap lookup */
 		mailmap_name(&mail, &maillen, &name, &namelen);
@@ -693,11 +693,11 @@ static size_t format_person_part(struct strbuf *sb, char part,
 		return placeholder_len;
 	}
 
-	if (!s.date_begin)
+	if (!s.date.begin)
 		goto skip;
 
 	if (part == 't') {	/* date, UNIX timestamp */
-		strbuf_add(sb, s.date_begin, s.date_end - s.date_begin);
+		strbuf_add(sb, s.date.begin, s.date.end - s.date.begin);
 		return placeholder_len;
 	}
 
diff --git a/revision.c b/revision.c
index 9d628c6..c5fbaf8 100644
--- a/revision.c
+++ b/revision.c
@@ -2762,10 +2762,10 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 	if (split_ident_line(&ident, person, len))
 		return 0;
 
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
+	mail = ident.mail.begin;
+	maillen = ident.mail.end - ident.mail.begin;
+	name = ident.name.begin;
+	namelen = ident.name.end - ident.name.begin;
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
@@ -2773,8 +2773,8 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
 
-		strbuf_splice(buf, ident.name_begin - buf->buf,
-			      ident.mail_end - ident.name_begin + 1,
+		strbuf_splice(buf, ident.name.begin - buf->buf,
+			      ident.mail.end - ident.name.begin + 1,
 			      namemail.buf, namemail.len);
 
 		strbuf_release(&namemail);
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] use strbufs in date functions
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
                   ` (3 preceding siblings ...)
  2014-06-18 20:31 ` [PATCH 4/7] ident_split: store begin/end pairs on their own struct Jeff King
@ 2014-06-18 20:32 ` Jeff King
  2014-06-18 20:35 ` [PATCH 6/7] determine_author_info: reuse parsing functions Jeff King
  2014-06-18 20:36 ` [PATCH 7/7] determine_author_info: stop leaking name/email Jeff King
  6 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:32 UTC (permalink / raw)
  To: git

Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).

Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:

  1. In test-date.c, where we don't care about efficiency.

  2. In determine_author_info, which is not performance
     critical (and where the use of a strbuf will help later
     refactoring).

Signed-off-by: Jeff King <peff@peff.net>
---
I don't think the strcpys are a problem in practice, because we're
typically writing fixed-size output generated by parse_date. But I
didn't analyze it too deeply, so you might be able to cause shenanigans
if you can impact GIT_AUTHOR_DATE or something.

 builtin/commit.c | 20 ++++++++++----------
 cache.h          |  4 ++--
 date.c           | 13 +++++++------
 fast-import.c    | 20 +++++++++-----------
 ident.c          | 26 +++++++++++---------------
 test-date.c      | 10 ++++++----
 6 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 047cc76..bf770cf 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -526,19 +526,16 @@ static int sane_ident_split(struct ident_split *person)
 	return 1;
 }
 
-static int parse_force_date(const char *in, char *out, int len)
+static int parse_force_date(const char *in, struct strbuf *out)
 {
-	if (len < 1)
-		return -1;
-	*out++ = '@';
-	len--;
+	strbuf_addch(out, '@');
 
-	if (parse_date(in, out, len) < 0) {
+	if (parse_date(in, out) < 0) {
 		int errors = 0;
 		unsigned long t = approxidate_careful(in, &errors);
 		if (errors)
 			return -1;
-		snprintf(out, len, "%lu", t);
+		strbuf_addf(out, "%lu", t);
 	}
 
 	return 0;
@@ -548,7 +545,7 @@ static void determine_author_info(struct strbuf *author_ident)
 {
 	char *name, *email, *date;
 	struct ident_split author;
-	char date_buf[64];
+	struct strbuf date_buf = STRBUF_INIT;
 
 	name = getenv("GIT_AUTHOR_NAME");
 	email = getenv("GIT_AUTHOR_EMAIL");
@@ -594,9 +591,10 @@ static void determine_author_info(struct strbuf *author_ident)
 	}
 
 	if (force_date) {
-		if (parse_force_date(force_date, date_buf, sizeof(date_buf)))
+		strbuf_reset(&date_buf);
+		if (parse_force_date(force_date, &date_buf))
 			die(_("invalid date format: %s"), force_date);
-		date = date_buf;
+		date = date_buf.buf;
 	}
 
 	strbuf_addstr(author_ident, fmt_ident(name, email, date, IDENT_STRICT));
@@ -606,6 +604,8 @@ static void determine_author_info(struct strbuf *author_ident)
 		export_one("GIT_AUTHOR_EMAIL", author.mail.begin, author.mail.end, 0);
 		export_one("GIT_AUTHOR_DATE", author.date.begin, author.tz.end, '@');
 	}
+
+	strbuf_release(&date_buf);
 }
 
 static void split_ident_or_die(struct ident_split *id, const struct strbuf *buf)
diff --git a/cache.h b/cache.h
index 5255661..c7740a8 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,10 +1023,10 @@ enum date_mode {
 const char *show_date(unsigned long time, int timezone, enum date_mode mode);
 void show_date_relative(unsigned long time, int tz, const struct timeval *now,
 			struct strbuf *timebuf);
-int parse_date(const char *date, char *buf, int bufsize);
+int parse_date(const char *date, struct strbuf *out);
 int parse_date_basic(const char *date, unsigned long *timestamp, int *offset);
 int parse_expiry_date(const char *date, unsigned long *timestamp);
-void datestamp(char *buf, int bufsize);
+void datestamp(struct strbuf *out);
 #define approxidate(s) approxidate_careful((s), NULL)
 unsigned long approxidate_careful(const char *, int *);
 unsigned long approxidate_relative(const char *date, const struct timeval *now);
diff --git a/date.c b/date.c
index 782de95..2c33468 100644
--- a/date.c
+++ b/date.c
@@ -605,7 +605,7 @@ static int match_tz(const char *date, int *offp)
 	return end - date;
 }
 
-static int date_string(unsigned long date, int offset, char *buf, int len)
+static void date_string(unsigned long date, int offset, struct strbuf *buf)
 {
 	int sign = '+';
 
@@ -613,7 +613,7 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
 		offset = -offset;
 		sign = '-';
 	}
-	return snprintf(buf, len, "%lu %c%02d%02d", date, sign, offset / 60, offset % 60);
+	strbuf_addf(buf, "%lu %c%02d%02d", date, sign, offset / 60, offset % 60);
 }
 
 /*
@@ -735,13 +735,14 @@ int parse_expiry_date(const char *date, unsigned long *timestamp)
 	return errors;
 }
 
-int parse_date(const char *date, char *result, int maxlen)
+int parse_date(const char *date, struct strbuf *result)
 {
 	unsigned long timestamp;
 	int offset;
 	if (parse_date_basic(date, &timestamp, &offset))
 		return -1;
-	return date_string(timestamp, offset, result, maxlen);
+	date_string(timestamp, offset, result);
+	return 0;
 }
 
 enum date_mode parse_date_format(const char *format)
@@ -766,7 +767,7 @@ enum date_mode parse_date_format(const char *format)
 		die("unknown date format %s", format);
 }
 
-void datestamp(char *buf, int bufsize)
+void datestamp(struct strbuf *out)
 {
 	time_t now;
 	int offset;
@@ -776,7 +777,7 @@ void datestamp(char *buf, int bufsize)
 	offset = tm_to_time_t(localtime(&now)) - now;
 	offset /= 60;
 
-	date_string(now, offset, buf, bufsize);
+	date_string(now, offset, out);
 }
 
 /*
diff --git a/fast-import.c b/fast-import.c
index 6707a66..122b916 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1967,7 +1967,7 @@ static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax_t *len_res)
 	return 1;
 }
 
-static int validate_raw_date(const char *src, char *result, int maxlen)
+static int validate_raw_date(const char *src, struct strbuf *result)
 {
 	const char *orig_src = src;
 	char *endp;
@@ -1985,11 +1985,10 @@ static int validate_raw_date(const char *src, char *result, int maxlen)
 		return -1;
 
 	num = strtoul(src + 1, &endp, 10);
-	if (errno || endp == src + 1 || *endp || (endp - orig_src) >= maxlen ||
-	    1400 < num)
+	if (errno || endp == src + 1 || *endp || 1400 < num)
 		return -1;
 
-	strcpy(result, orig_src);
+	strbuf_addstr(result, orig_src);
 	return 0;
 }
 
@@ -1997,7 +1996,7 @@ static char *parse_ident(const char *buf)
 {
 	const char *ltgt;
 	size_t name_len;
-	char *ident;
+	struct strbuf ident = STRBUF_INIT;
 
 	/* ensure there is a space delimiter even if there is no name */
 	if (*buf == '<')
@@ -2016,26 +2015,25 @@ static char *parse_ident(const char *buf)
 		die("Missing space after > in ident string: %s", buf);
 	ltgt++;
 	name_len = ltgt - buf;
-	ident = xmalloc(name_len + 24);
-	strncpy(ident, buf, name_len);
+	strbuf_add(&ident, buf, name_len);
 
 	switch (whenspec) {
 	case WHENSPEC_RAW:
-		if (validate_raw_date(ltgt, ident + name_len, 24) < 0)
+		if (validate_raw_date(ltgt, &ident) < 0)
 			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
 		break;
 	case WHENSPEC_RFC2822:
-		if (parse_date(ltgt, ident + name_len, 24) < 0)
+		if (parse_date(ltgt, &ident) < 0)
 			die("Invalid rfc2822 date \"%s\" in ident: %s", ltgt, buf);
 		break;
 	case WHENSPEC_NOW:
 		if (strcmp("now", ltgt))
 			die("Date in ident must be 'now': %s", buf);
-		datestamp(ident + name_len, 24);
+		datestamp(&ident);
 		break;
 	}
 
-	return ident;
+	return strbuf_detach(&ident, NULL);
 }
 
 static void parse_and_store_blob(
diff --git a/ident.c b/ident.c
index 570fda7..f4fac4a 100644
--- a/ident.c
+++ b/ident.c
@@ -9,7 +9,7 @@
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
-static char git_default_date[50];
+static struct strbuf git_default_date = STRBUF_INIT;
 
 #define IDENT_NAME_GIVEN 01
 #define IDENT_MAIL_GIVEN 02
@@ -129,9 +129,9 @@ const char *ident_default_email(void)
 
 static const char *ident_default_date(void)
 {
-	if (!git_default_date[0])
-		datestamp(git_default_date, sizeof(git_default_date));
-	return git_default_date;
+	if (!git_default_date.len)
+		datestamp(&git_default_date);
+	return git_default_date.buf;
 }
 
 static int crud(unsigned char c)
@@ -292,7 +292,6 @@ const char *fmt_ident(const char *name, const char *email,
 		      const char *date_str, int flag)
 {
 	static struct strbuf ident = STRBUF_INIT;
-	char date[50];
 	int strict = (flag & IDENT_STRICT);
 	int want_date = !(flag & IDENT_NO_DATE);
 	int want_name = !(flag & IDENT_NO_NAME);
@@ -320,15 +319,6 @@ const char *fmt_ident(const char *name, const char *email,
 		die("unable to auto-detect email address (got '%s')", email);
 	}
 
-	if (want_date) {
-		if (date_str && date_str[0]) {
-			if (parse_date(date_str, date, sizeof(date)) < 0)
-				die("invalid date format: %s", date_str);
-		}
-		else
-			strcpy(date, ident_default_date());
-	}
-
 	strbuf_reset(&ident);
 	if (want_name) {
 		strbuf_addstr_without_crud(&ident, name);
@@ -339,8 +329,14 @@ const char *fmt_ident(const char *name, const char *email,
 			strbuf_addch(&ident, '>');
 	if (want_date) {
 		strbuf_addch(&ident, ' ');
-		strbuf_addstr_without_crud(&ident, date);
+		if (date_str && date_str[0]) {
+			if (parse_date(date_str, &ident) < 0)
+				die("invalid date format: %s", date_str);
+		}
+		else
+			strbuf_addstr(&ident, ident_default_date());
 	}
+
 	return ident.buf;
 }
 
diff --git a/test-date.c b/test-date.c
index 10afaab..94a6997 100644
--- a/test-date.c
+++ b/test-date.c
@@ -19,19 +19,21 @@ static void show_dates(char **argv, struct timeval *now)
 
 static void parse_dates(char **argv, struct timeval *now)
 {
+	struct strbuf result = STRBUF_INIT;
+
 	for (; *argv; argv++) {
-		char result[100];
 		unsigned long t;
 		int tz;
 
-		result[0] = 0;
-		parse_date(*argv, result, sizeof(result));
-		if (sscanf(result, "%lu %d", &t, &tz) == 2)
+		strbuf_reset(&result);
+		parse_date(*argv, &result);
+		if (sscanf(result.buf, "%lu %d", &t, &tz) == 2)
 			printf("%s -> %s\n",
 			       *argv, show_date(t, tz, DATE_ISO8601));
 		else
 			printf("%s -> bad\n", *argv);
 	}
+	strbuf_release(&result);
 }
 
 static void parse_approxidate(char **argv, struct timeval *now)
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] determine_author_info: reuse parsing functions
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
                   ` (4 preceding siblings ...)
  2014-06-18 20:32 ` [PATCH 5/7] use strbufs in date functions Jeff King
@ 2014-06-18 20:35 ` Jeff King
  2014-06-18 20:36 ` [PATCH 7/7] determine_author_info: stop leaking name/email Jeff King
  6 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:35 UTC (permalink / raw)
  To: git

Rather than parsing the header manually to find the "author"
field, and then parsing its sub-parts, let's use
find_commit_header and split_ident_line. This is shorter and
easier to read, and should do a more careful parsing job.

For example, the current parser could find the end-of-email
right-bracket across a newline (for a malformed commit), and
calculate a bogus gigantic length for the date (by using
"eol - rb").

As a bonus, this also plugs a memory leak when we pull the
date field from an existing commit (we still leak the name
and email buffers, which will be fixed in a later commit).

Signed-off-by: Jeff King <peff@peff.net>
---
The large buffer comes from wrapping around the negative side of the
size_t space.  In theory you could wrap far enough to get a buffer that
we can actually allocate (probably only on a 32-bit system), and then
we followup by copying "len" random bytes into it. I doubt an attacker
could get that data out of the program, though, as we then run it
through fmt_ident, which should complain if it's full of garbage.

 builtin/commit.c | 61 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 32 insertions(+), 29 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index bf770cf..62abee0 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -541,6 +541,16 @@ static int parse_force_date(const char *in, struct strbuf *out)
 	return 0;
 }
 
+static void strbuf_add_pair(struct strbuf *buf, const struct pointer_pair *p)
+{
+	strbuf_add(buf, p->begin, p->end - p->begin);
+}
+
+static char *xmemdupz_pair(const struct pointer_pair *p)
+{
+	return xmemdupz(p->begin, p->end - p->begin);
+}
+
 static void determine_author_info(struct strbuf *author_ident)
 {
 	char *name, *email, *date;
@@ -552,42 +562,35 @@ static void determine_author_info(struct strbuf *author_ident)
 	date = getenv("GIT_AUTHOR_DATE");
 
 	if (author_message) {
-		const char *a, *lb, *rb, *eol;
-		size_t len;
+		struct ident_split ident;
+		unsigned long len;
+		const char *a;
 
-		a = strstr(author_message_buffer, "\nauthor ");
+		a = find_commit_header(author_message_buffer, "author", &len);
 		if (!a)
-			die(_("invalid commit: %s"), author_message);
-
-		lb = strchrnul(a + strlen("\nauthor "), '<');
-		rb = strchrnul(lb, '>');
-		eol = strchrnul(rb, '\n');
-		if (!*lb || !*rb || !*eol)
-			die(_("invalid commit: %s"), author_message);
-
-		if (lb == a + strlen("\nauthor "))
-			/* \nauthor <foo@example.com> */
-			name = xcalloc(1, 1);
-		else
-			name = xmemdupz(a + strlen("\nauthor "),
-					(lb - strlen(" ") -
-					 (a + strlen("\nauthor "))));
-		email = xmemdupz(lb + strlen("<"), rb - (lb + strlen("<")));
-		len = eol - (rb + strlen("> "));
-		date = xmalloc(len + 2);
-		*date = '@';
-		memcpy(date + 1, rb + strlen("> "), len);
-		date[len + 1] = '\0';
+			die(_("commit '%s' lacks author header"), author_message);
+		if (split_ident_line(&ident, a, len) < 0)
+			die(_("commit '%s' has malformed author line"), author_message);
+
+		name = xmemdupz_pair(&ident.name);
+		email = xmemdupz_pair(&ident.mail);
+		if (ident.date.begin) {
+			strbuf_reset(&date_buf);
+			strbuf_addch(&date_buf, '@');
+			strbuf_add_pair(&date_buf, &ident.date);
+			strbuf_addch(&date_buf, ' ');
+			strbuf_add_pair(&date_buf, &ident.tz);
+			date = date_buf.buf;
+		}
 	}
 
 	if (force_author) {
-		const char *lb = strstr(force_author, " <");
-		const char *rb = strchr(force_author, '>');
+		struct ident_split ident;
 
-		if (!lb || !rb)
+		if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
 			die(_("malformed --author parameter"));
-		name = xstrndup(force_author, lb - force_author);
-		email = xstrndup(lb + 2, rb - (lb + 2));
+		name = xmemdupz_pair(&ident.name);
+		email = xmemdupz_pair(&ident.mail);
 	}
 
 	if (force_date) {
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] determine_author_info: stop leaking name/email
  2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
                   ` (5 preceding siblings ...)
  2014-06-18 20:35 ` [PATCH 6/7] determine_author_info: reuse parsing functions Jeff King
@ 2014-06-18 20:36 ` Jeff King
  2014-06-23  9:28   ` Eric Sunshine
  6 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2014-06-18 20:36 UTC (permalink / raw)
  To: git

When we get the author name and email either from an
existing commit or from the "--author" option, we create a
copy of the strings. We cannot just free() these copies,
since the same pointers may also be pointing to getenv()
storage which we do not own.

Instead, let's treat these the same way as we do the date
buffer: keep a strbuf to be released, and point the bare
pointers at the strbuf.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 62abee0..72beb7f 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -546,16 +546,20 @@ static void strbuf_add_pair(struct strbuf *buf, const struct pointer_pair *p)
 	strbuf_add(buf, p->begin, p->end - p->begin);
 }
 
-static char *xmemdupz_pair(const struct pointer_pair *p)
+static char *set_pair(struct strbuf *buf, struct pointer_pair *p)
 {
-	return xmemdupz(p->begin, p->end - p->begin);
+	strbuf_reset(buf);
+	strbuf_add_pair(buf, p);
+	return buf->buf;
 }
 
 static void determine_author_info(struct strbuf *author_ident)
 {
 	char *name, *email, *date;
 	struct ident_split author;
-	struct strbuf date_buf = STRBUF_INIT;
+	struct strbuf name_buf = STRBUF_INIT,
+		      mail_buf = STRBUF_INIT,
+		      date_buf = STRBUF_INIT;
 
 	name = getenv("GIT_AUTHOR_NAME");
 	email = getenv("GIT_AUTHOR_EMAIL");
@@ -572,8 +576,8 @@ static void determine_author_info(struct strbuf *author_ident)
 		if (split_ident_line(&ident, a, len) < 0)
 			die(_("commit '%s' has malformed author line"), author_message);
 
-		name = xmemdupz_pair(&ident.name);
-		email = xmemdupz_pair(&ident.mail);
+		name = set_pair(&name_buf, &ident.name);
+		email = set_pair(&mail_buf, &ident.mail);
 		if (ident.date.begin) {
 			strbuf_reset(&date_buf);
 			strbuf_addch(&date_buf, '@');
@@ -589,8 +593,8 @@ static void determine_author_info(struct strbuf *author_ident)
 
 		if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
 			die(_("malformed --author parameter"));
-		name = xmemdupz_pair(&ident.name);
-		email = xmemdupz_pair(&ident.mail);
+		name = set_pair(&name_buf, &ident.name);
+		email = set_pair(&mail_buf, &ident.mail);
 	}
 
 	if (force_date) {
@@ -608,6 +612,8 @@ static void determine_author_info(struct strbuf *author_ident)
 		export_one("GIT_AUTHOR_DATE", author.date.begin, author.tz.end, '@');
 	}
 
+	strbuf_release(&name_buf);
+	strbuf_release(&mail_buf);
 	strbuf_release(&date_buf);
 }
 
-- 
2.0.0.566.gfe3e6b2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] commit: provide a function to find a header in a buffer
  2014-06-18 20:27 ` [PATCH 1/7] commit: provide a function to find a header in a buffer Jeff King
@ 2014-06-23  1:26   ` Eric Sunshine
  2014-06-23 16:47     ` Jeff King
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Sunshine @ 2014-06-23  1:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wednesday, June 18, 2014, Jeff King <peff@peff.net> wrote:
> Usually when we parse a commit, we read it line by line and
> handle each header in a single pass (e.g., in parse_commit
> and parse_commit_header).  Sometimes, however, we only care
> about extracting a single header. Code in this situation is
> stuck doing an ad-hoc parse of the commit buffer.
>
> Let's provide a reusable function to locate a header within
> the commit.  The code is modeled after pretty.c's
> get_header, which is used to extract the encoding.
>
> Since some callers may not have the "struct commit" to go
> along with the buffer, we drop that parameter.  The only
> thing lost is a warning for truncated commits, but that's
> OK.  This shouldn't happen in practice, and even if it does,
> there's no particular reason that this function needs to
> complain about it. It either finds the header it was asked
> for, or it doesn't (and in the latter case, the caller can
> complain).
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/commit.c b/commit.c
> index 11106fb..d04b525 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1652,3 +1652,26 @@ void print_commit_list(struct commit_list *list,
>                 printf(format, sha1_to_hex(list->item->object.sha1));
>         }
>  }
> +
> +const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
> +{
> +       int key_len = strlen(key);
> +       const char *line = msg;
> +
> +       while (line) {
> +               const char *eol = strchrnul(line, '\n'), *next;
> +
> +               if (line == eol)
> +                       return NULL;
> +               next = *eol ? eol + 1 : NULL;
> +
> +               if (eol - line > key_len &&
> +                   !strncmp(line, key, key_len) &&
> +                   line[key_len] == ' ') {
> +                       *out_len = eol - line - key_len - 1;
> +                       return line + key_len + 1;
> +               }
> +               line = next;

This is already simplified from the original implementation in
get_header(), but it can be simplified further by dropping 'next',
which is not otherwise used, and assigning 'line' directly:

    line = *eol ? eol + 1 : NULL;

> +       }
> +       return NULL;
> +}
> diff --git a/commit.h b/commit.h
> index 61559a9..7c766e9 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -312,6 +312,17 @@ extern struct commit_extra_header *read_commit_extra_headers(struct commit *, co
>
>  extern void free_commit_extra_headers(struct commit_extra_header *extra);
>
> +/*
> + * Search the commit object contents given by "msg" for the header "key".
> + * Returns a pointer to the start of the header contents, or NULL. The length
> + * of the header, up to the first newline, is returned via out_len.
> + *
> + * Note that some headers (like mergetag) may be multi-line. It is the caller's
> + * responsibility to parse further in this case!
> + */
> +extern const char *find_commit_header(const char *msg, const char *key,
> +                                     size_t *out_len);
> +
>  struct merge_remote_desc {
>         struct object *obj; /* the named object, could be a tag */
>         const char *name;
> diff --git a/pretty.c b/pretty.c
> index cc5b45d..6081750 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -548,31 +548,11 @@ static void add_merge_info(const struct pretty_print_context *pp,
>         strbuf_addch(sb, '\n');
>  }
>
> -static char *get_header(const struct commit *commit, const char *msg,
> -                       const char *key)
> +static char *get_header(const char *msg, const char *key)
>  {
> -       int key_len = strlen(key);
> -       const char *line = msg;
> -
> -       while (line) {
> -               const char *eol = strchrnul(line, '\n'), *next;
> -
> -               if (line == eol)
> -                       return NULL;
> -               if (!*eol) {
> -                       warning("malformed commit (header is missing newline): %s",
> -                               sha1_to_hex(commit->object.sha1));
> -                       next = NULL;
> -               } else
> -                       next = eol + 1;
> -               if (eol - line > key_len &&
> -                   !strncmp(line, key, key_len) &&
> -                   line[key_len] == ' ') {
> -                       return xmemdupz(line + key_len + 1, eol - line - key_len - 1);
> -               }
> -               line = next;
> -       }
> -       return NULL;
> +       size_t len;
> +       const char *v = find_commit_header(msg, key, &len);
> +       return v ? xmemdupz(v, len) : NULL;
>  }
>
>  static char *replace_encoding_header(char *buf, const char *encoding)
> @@ -618,11 +598,10 @@ const char *logmsg_reencode(const struct commit *commit,
>
>         if (!output_encoding || !*output_encoding) {
>                 if (commit_encoding)
> -                       *commit_encoding =
> -                               get_header(commit, msg, "encoding");
> +                       *commit_encoding = get_header(msg, "encoding");
>                 return msg;
>         }
> -       encoding = get_header(commit, msg, "encoding");
> +       encoding = get_header(msg, "encoding");
>         if (commit_encoding)
>                 *commit_encoding = encoding;
>         use_encoding = encoding ? encoding : utf8;
> --
> 2.0.0.566.gfe3e6b2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] ident_split: store begin/end pairs on their own struct
  2014-06-18 20:31 ` [PATCH 4/7] ident_split: store begin/end pairs on their own struct Jeff King
@ 2014-06-23  1:28   ` Eric Sunshine
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Sunshine @ 2014-06-23  1:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wednesday, June 18, 2014, Jeff King <peff@peff.net> wrote:
> Subject: ident_split: store begin/end pairs on their own struct

s/on/in/

> When we parse an ident line, we end up with several fields,
> each with a begin/end pointer into the buffer, like:
>
>   const char *name_begin;
>   const char *name_end;
>
> There is nothing except the field names to indicate that
> they are paired. This makes it annoying to write helper
> functions for dealing with the sub-fields, as you have to
> pass both sides. Instead, let's move them into a single
> struct "name", with fields "begin" and "end". This will be
> stored identically, but can be passed as a unit.
>
> We have to do a mechanical update of "s/_/./" at each point
> of use, but other than that, the fields should behave
> identically.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Suggestions welcome on the name "pointer_pair".

str_segment  ;-)

> While writing this series, I also noticed that it would be more
> convenient to have a pointer/len combination rather than two pointers.
> You can convert between them, of course, but I found I was always
> converting the other way.
>
> I left it this way because it makes the mass-update mechanical (and
> because now that I can pass the pair as a unit, I don't have to write
> the same "ident->name_begin, ident->name_end - ident->name_begin" pair
> over and over).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] determine_author_info: stop leaking name/email
  2014-06-18 20:36 ` [PATCH 7/7] determine_author_info: stop leaking name/email Jeff King
@ 2014-06-23  9:28   ` Eric Sunshine
  2014-06-23  9:33     ` Erik Faye-Lund
  2014-06-23 17:20     ` Jeff King
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Sunshine @ 2014-06-23  9:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Wed, Jun 18, 2014 at 4:36 PM, Jeff King <peff@peff.net> wrote:
> When we get the author name and email either from an
> existing commit or from the "--author" option, we create a
> copy of the strings. We cannot just free() these copies,
> since the same pointers may also be pointing to getenv()
> storage which we do not own.
>
> Instead, let's treat these the same way as we do the date
> buffer: keep a strbuf to be released, and point the bare
> pointers at the strbuf.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/commit.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 62abee0..72beb7f 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -546,16 +546,20 @@ static void strbuf_add_pair(struct strbuf *buf, const struct pointer_pair *p)
>         strbuf_add(buf, p->begin, p->end - p->begin);
>  }
>
> -static char *xmemdupz_pair(const struct pointer_pair *p)
> +static char *set_pair(struct strbuf *buf, struct pointer_pair *p)
>  {
> -       return xmemdupz(p->begin, p->end - p->begin);
> +       strbuf_reset(buf);
> +       strbuf_add_pair(buf, p);
> +       return buf->buf;
>  }
>
>  static void determine_author_info(struct strbuf *author_ident)
>  {
>         char *name, *email, *date;
>         struct ident_split author;
> -       struct strbuf date_buf = STRBUF_INIT;
> +       struct strbuf name_buf = STRBUF_INIT,
> +                     mail_buf = STRBUF_INIT,

nit: The associated 'char *' variable is named "email", so perhaps
s/mail_buf/email_buf/g

> +                     date_buf = STRBUF_INIT;
>
>         name = getenv("GIT_AUTHOR_NAME");
>         email = getenv("GIT_AUTHOR_EMAIL");
> @@ -572,8 +576,8 @@ static void determine_author_info(struct strbuf *author_ident)
>                 if (split_ident_line(&ident, a, len) < 0)
>                         die(_("commit '%s' has malformed author line"), author_message);
>
> -               name = xmemdupz_pair(&ident.name);
> -               email = xmemdupz_pair(&ident.mail);
> +               name = set_pair(&name_buf, &ident.name);
> +               email = set_pair(&mail_buf, &ident.mail);
>                 if (ident.date.begin) {
>                         strbuf_reset(&date_buf);
>                         strbuf_addch(&date_buf, '@');
> @@ -589,8 +593,8 @@ static void determine_author_info(struct strbuf *author_ident)
>
>                 if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
>                         die(_("malformed --author parameter"));
> -               name = xmemdupz_pair(&ident.name);
> -               email = xmemdupz_pair(&ident.mail);
> +               name = set_pair(&name_buf, &ident.name);
> +               email = set_pair(&mail_buf, &ident.mail);

Does the code become too convoluted with these changes? You're now
maintaining three 'char *' variables in parallel with three strbuf
variables. Is it possible to drop the 'char *' variables and just pass
the .buf member of the strbufs to fmt_ident()?

Alternately, you also could solve the leaks by having an envdup() helper:

    static char *envdup(const char *s)
    {
        const char *v = getenv(s);
        return v ? xstrdup(v) : NULL;
    }

    ...
    name = envdup("GIT_AUTHOR_NAME");
    email = envdup("GIT_AUTHOR_EMAIL");
    ...

And then just free() 'name' and 'email' normally.

>         }
>
>         if (force_date) {
> @@ -608,6 +612,8 @@ static void determine_author_info(struct strbuf *author_ident)
>                 export_one("GIT_AUTHOR_DATE", author.date.begin, author.tz.end, '@');
>         }
>
> +       strbuf_release(&name_buf);
> +       strbuf_release(&mail_buf);
>         strbuf_release(&date_buf);
>  }
>
> --
> 2.0.0.566.gfe3e6b2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] determine_author_info: stop leaking name/email
  2014-06-23  9:28   ` Eric Sunshine
@ 2014-06-23  9:33     ` Erik Faye-Lund
  2014-06-23  9:48       ` Eric Sunshine
  2014-06-23 17:21       ` Jeff King
  2014-06-23 17:20     ` Jeff King
  1 sibling, 2 replies; 16+ messages in thread
From: Erik Faye-Lund @ 2014-06-23  9:33 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, Git List

On Mon, Jun 23, 2014 at 11:28 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Wed, Jun 18, 2014 at 4:36 PM, Jeff King <peff@peff.net> wrote:
>> When we get the author name and email either from an
>> existing commit or from the "--author" option, we create a
>> copy of the strings. We cannot just free() these copies,
>> since the same pointers may also be pointing to getenv()
>> storage which we do not own.
>>
>> Instead, let's treat these the same way as we do the date
>> buffer: keep a strbuf to be released, and point the bare
>> pointers at the strbuf.
>>
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>>  builtin/commit.c | 20 +++++++++++++-------
>>  1 file changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin/commit.c b/builtin/commit.c
>> index 62abee0..72beb7f 100644
>> --- a/builtin/commit.c
>> +++ b/builtin/commit.c
>> @@ -546,16 +546,20 @@ static void strbuf_add_pair(struct strbuf *buf, const struct pointer_pair *p)
>>         strbuf_add(buf, p->begin, p->end - p->begin);
>>  }
>>
>> -static char *xmemdupz_pair(const struct pointer_pair *p)
>> +static char *set_pair(struct strbuf *buf, struct pointer_pair *p)
>>  {
>> -       return xmemdupz(p->begin, p->end - p->begin);
>> +       strbuf_reset(buf);
>> +       strbuf_add_pair(buf, p);
>> +       return buf->buf;
>>  }
>>
>>  static void determine_author_info(struct strbuf *author_ident)
>>  {
>>         char *name, *email, *date;
>>         struct ident_split author;
>> -       struct strbuf date_buf = STRBUF_INIT;
>> +       struct strbuf name_buf = STRBUF_INIT,
>> +                     mail_buf = STRBUF_INIT,
>
> nit: The associated 'char *' variable is named "email", so perhaps
> s/mail_buf/email_buf/g
>
>> +                     date_buf = STRBUF_INIT;
>>
>>         name = getenv("GIT_AUTHOR_NAME");
>>         email = getenv("GIT_AUTHOR_EMAIL");
>> @@ -572,8 +576,8 @@ static void determine_author_info(struct strbuf *author_ident)
>>                 if (split_ident_line(&ident, a, len) < 0)
>>                         die(_("commit '%s' has malformed author line"), author_message);
>>
>> -               name = xmemdupz_pair(&ident.name);
>> -               email = xmemdupz_pair(&ident.mail);
>> +               name = set_pair(&name_buf, &ident.name);
>> +               email = set_pair(&mail_buf, &ident.mail);
>>                 if (ident.date.begin) {
>>                         strbuf_reset(&date_buf);
>>                         strbuf_addch(&date_buf, '@');
>> @@ -589,8 +593,8 @@ static void determine_author_info(struct strbuf *author_ident)
>>
>>                 if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
>>                         die(_("malformed --author parameter"));
>> -               name = xmemdupz_pair(&ident.name);
>> -               email = xmemdupz_pair(&ident.mail);
>> +               name = set_pair(&name_buf, &ident.name);
>> +               email = set_pair(&mail_buf, &ident.mail);
>
> Does the code become too convoluted with these changes? You're now
> maintaining three 'char *' variables in parallel with three strbuf
> variables. Is it possible to drop the 'char *' variables and just pass
> the .buf member of the strbufs to fmt_ident()?
>
> Alternately, you also could solve the leaks by having an envdup() helper:
>
>     static char *envdup(const char *s)
>     {
>         const char *v = getenv(s);
>         return v ? xstrdup(v) : NULL;
>     }
>
>     ...
>     name = envdup("GIT_AUTHOR_NAME");
>     email = envdup("GIT_AUTHOR_EMAIL");
>     ...
>
> And then just free() 'name' and 'email' normally.

This approach has the added benefit of fixing the case where getenv
uses a static buffer, like POSIX allows.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] determine_author_info: stop leaking name/email
  2014-06-23  9:33     ` Erik Faye-Lund
@ 2014-06-23  9:48       ` Eric Sunshine
  2014-06-23 17:21       ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Sunshine @ 2014-06-23  9:48 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Jeff King, Git List

On Mon, Jun 23, 2014 at 5:33 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Mon, Jun 23, 2014 at 11:28 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
>> On Wed, Jun 18, 2014 at 4:36 PM, Jeff King <peff@peff.net> wrote:
>>> When we get the author name and email either from an
>>> existing commit or from the "--author" option, we create a
>>> copy of the strings. We cannot just free() these copies,
>>> since the same pointers may also be pointing to getenv()
>>> storage which we do not own.
>>>
>>> Instead, let's treat these the same way as we do the date
>>> buffer: keep a strbuf to be released, and point the bare
>>> pointers at the strbuf.
>>>
>>> Signed-off-by: Jeff King <peff@peff.net>
>>> ---
>>>  builtin/commit.c | 20 +++++++++++++-------
>>>  1 file changed, 13 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/builtin/commit.c b/builtin/commit.c
>>> index 62abee0..72beb7f 100644
>>> --- a/builtin/commit.c
>>> +++ b/builtin/commit.c
>>> @@ -546,16 +546,20 @@ static void strbuf_add_pair(struct strbuf *buf, const struct pointer_pair *p)
>>>         strbuf_add(buf, p->begin, p->end - p->begin);
>>>  }
>>>
>>> -static char *xmemdupz_pair(const struct pointer_pair *p)
>>> +static char *set_pair(struct strbuf *buf, struct pointer_pair *p)
>>>  {
>>> -       return xmemdupz(p->begin, p->end - p->begin);
>>> +       strbuf_reset(buf);
>>> +       strbuf_add_pair(buf, p);
>>> +       return buf->buf;
>>>  }
>>>
>>>  static void determine_author_info(struct strbuf *author_ident)
>>>  {
>>>         char *name, *email, *date;
>>>         struct ident_split author;
>>> -       struct strbuf date_buf = STRBUF_INIT;
>>> +       struct strbuf name_buf = STRBUF_INIT,
>>> +                     mail_buf = STRBUF_INIT,
>>
>> nit: The associated 'char *' variable is named "email", so perhaps
>> s/mail_buf/email_buf/g
>>
>>> +                     date_buf = STRBUF_INIT;
>>>
>>>         name = getenv("GIT_AUTHOR_NAME");
>>>         email = getenv("GIT_AUTHOR_EMAIL");
>>> @@ -572,8 +576,8 @@ static void determine_author_info(struct strbuf *author_ident)
>>>                 if (split_ident_line(&ident, a, len) < 0)
>>>                         die(_("commit '%s' has malformed author line"), author_message);
>>>
>>> -               name = xmemdupz_pair(&ident.name);
>>> -               email = xmemdupz_pair(&ident.mail);
>>> +               name = set_pair(&name_buf, &ident.name);
>>> +               email = set_pair(&mail_buf, &ident.mail);
>>>                 if (ident.date.begin) {
>>>                         strbuf_reset(&date_buf);
>>>                         strbuf_addch(&date_buf, '@');
>>> @@ -589,8 +593,8 @@ static void determine_author_info(struct strbuf *author_ident)
>>>
>>>                 if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
>>>                         die(_("malformed --author parameter"));
>>> -               name = xmemdupz_pair(&ident.name);
>>> -               email = xmemdupz_pair(&ident.mail);
>>> +               name = set_pair(&name_buf, &ident.name);
>>> +               email = set_pair(&mail_buf, &ident.mail);
>>
>> Does the code become too convoluted with these changes? You're now
>> maintaining three 'char *' variables in parallel with three strbuf
>> variables. Is it possible to drop the 'char *' variables and just pass
>> the .buf member of the strbufs to fmt_ident()?
>>
>> Alternately, you also could solve the leaks by having an envdup() helper:
>>
>>     static char *envdup(const char *s)
>>     {
>>         const char *v = getenv(s);
>>         return v ? xstrdup(v) : NULL;
>>     }
>>
>>     ...
>>     name = envdup("GIT_AUTHOR_NAME");
>>     email = envdup("GIT_AUTHOR_EMAIL");
>>     ...
>>
>> And then just free() 'name' and 'email' normally.
>
> This approach has the added benefit of fixing the case where getenv
> uses a static buffer, like POSIX allows.

That reminds me that I was going to suggest a suitable variation of
envdup if Peff wanted to keep the strbufs:

    static void strbuf_env(struct strbuf *s, const char *e)
    {
        const char *v = getenv(e);
        if (v)
            strbuf_add(s, v);
    }

(Or add a generalized strbuf_add_gently() to strbuf.{h,c}, if it seems
likely to come up more often, which would allow
strbuf_add_gently(&name_buf, getenv("GIT_AUTHOR_NAME")) -- with a
better name than "gently", of course.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] commit: provide a function to find a header in a buffer
  2014-06-23  1:26   ` Eric Sunshine
@ 2014-06-23 16:47     ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-23 16:47 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git

On Sun, Jun 22, 2014 at 09:26:44PM -0400, Eric Sunshine wrote:

> > +               if (line == eol)
> > +                       return NULL;
> > +               next = *eol ? eol + 1 : NULL;
> > +
> > +               if (eol - line > key_len &&
> > +                   !strncmp(line, key, key_len) &&
> > +                   line[key_len] == ' ') {
> > +                       *out_len = eol - line - key_len - 1;
> > +                       return line + key_len + 1;
> > +               }
> > +               line = next;
> 
> This is already simplified from the original implementation in
> get_header(), but it can be simplified further by dropping 'next',
> which is not otherwise used, and assigning 'line' directly:
> 
>     line = *eol ? eol + 1 : NULL;

Yeah, you're right. Squashed, thanks.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] determine_author_info: stop leaking name/email
  2014-06-23  9:28   ` Eric Sunshine
  2014-06-23  9:33     ` Erik Faye-Lund
@ 2014-06-23 17:20     ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-23 17:20 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Mon, Jun 23, 2014 at 05:28:14AM -0400, Eric Sunshine wrote:

> >  static void determine_author_info(struct strbuf *author_ident)
> >  {
> >         char *name, *email, *date;
> >         struct ident_split author;
> > -       struct strbuf date_buf = STRBUF_INIT;
> > +       struct strbuf name_buf = STRBUF_INIT,
> > +                     mail_buf = STRBUF_INIT,
> 
> nit: The associated 'char *' variable is named "email", so perhaps
> s/mail_buf/email_buf/g

Yeah, you wouldn't believe the number of times I switched back and forth
while writing the patch. The variable is called "mail" in "struct
ident_split", so you have a mismatch at some point unless we rename the
local pointer.

I avoided doing that to keep the diff smaller, but perhaps it's worth
doing.

> >                 if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
> >                         die(_("malformed --author parameter"));
> > -               name = xmemdupz_pair(&ident.name);
> > -               email = xmemdupz_pair(&ident.mail);
> > +               name = set_pair(&name_buf, &ident.name);
> > +               email = set_pair(&mail_buf, &ident.mail);
> 
> Does the code become too convoluted with these changes? You're now
> maintaining three 'char *' variables in parallel with three strbuf
> variables. Is it possible to drop the 'char *' variables and just pass
> the .buf member of the strbufs to fmt_ident()?

Yeah, I agree it is getting a bit complicated (I tried to introduce
helpers like set_pair to at least keep the per-variable work down to a
single line in each instance).

It unfortunately doesn't work to just pass the ".buf" of each strbuf. We
care about the distinction between the empty string and NULL here (the
latter will cause fmt_ident to use the default ident).

> Alternately, you also could solve the leaks by having an envdup() helper:
> 
>     static char *envdup(const char *s)
>     {
>         const char *v = getenv(s);
>         return v ? xstrdup(v) : NULL;
>     }
> 
>     ...
>     name = envdup("GIT_AUTHOR_NAME");
>     email = envdup("GIT_AUTHOR_EMAIL");
>     ...
> 
> And then just free() 'name' and 'email' normally.

Yeah, I also considered that. You end up having to free() before
re-assigning throughout the function, though that is not much worse than
having to strbuf_reset() before re-adding (the reset is hidden in
set_pair, but we could similarly abstract it).

Here's what that looks like (this converts date_buf away, too, to avoid
relying on getenv()'s static return value):

diff --git a/builtin/commit.c b/builtin/commit.c
index 62abee0..35045ca 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -551,15 +551,26 @@ static char *xmemdupz_pair(const struct pointer_pair *p)
 	return xmemdupz(p->begin, p->end - p->begin);
 }
 
+static void set_ident_var(char **buf, char *val)
+{
+	free(*buf);
+	*buf = val;
+}
+
+static char *envdup(const char *var)
+{
+	const char *val = getenv(var);
+	return val ? xstrdup(val) : NULL;
+}
+
 static void determine_author_info(struct strbuf *author_ident)
 {
 	char *name, *email, *date;
 	struct ident_split author;
-	struct strbuf date_buf = STRBUF_INIT;
 
-	name = getenv("GIT_AUTHOR_NAME");
-	email = getenv("GIT_AUTHOR_EMAIL");
-	date = getenv("GIT_AUTHOR_DATE");
+	name = envdup("GIT_AUTHOR_NAME");
+	email = envdup("GIT_AUTHOR_EMAIL");
+	date = envdup("GIT_AUTHOR_DATE");
 
 	if (author_message) {
 		struct ident_split ident;
@@ -572,15 +583,15 @@ static void determine_author_info(struct strbuf *author_ident)
 		if (split_ident_line(&ident, a, len) < 0)
 			die(_("commit '%s' has malformed author line"), author_message);
 
-		name = xmemdupz_pair(&ident.name);
-		email = xmemdupz_pair(&ident.mail);
+		set_ident_var(&name, xmemdupz_pair(&ident.name));
+		set_ident_var(&email, xmemdupz_pair(&ident.mail));
 		if (ident.date.begin) {
-			strbuf_reset(&date_buf);
+			struct strbuf date_buf = STRBUF_INIT;
 			strbuf_addch(&date_buf, '@');
 			strbuf_add_pair(&date_buf, &ident.date);
 			strbuf_addch(&date_buf, ' ');
 			strbuf_add_pair(&date_buf, &ident.tz);
-			date = date_buf.buf;
+			set_ident_var(&date, strbuf_detach(&date_buf, NULL));
 		}
 	}
 
@@ -589,15 +600,15 @@ static void determine_author_info(struct strbuf *author_ident)
 
 		if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
 			die(_("malformed --author parameter"));
-		name = xmemdupz_pair(&ident.name);
-		email = xmemdupz_pair(&ident.mail);
+		set_ident_var(&name, xmemdupz_pair(&ident.name));
+		set_ident_var(&email, xmemdupz_pair(&ident.mail));
 	}
 
 	if (force_date) {
-		strbuf_reset(&date_buf);
+		struct strbuf date_buf = STRBUF_INIT;
 		if (parse_force_date(force_date, &date_buf))
 			die(_("invalid date format: %s"), force_date);
-		date = date_buf.buf;
+		set_ident_var(&date, strbuf_detach(&date_buf, NULL));
 	}
 
 	strbuf_addstr(author_ident, fmt_ident(name, email, date, IDENT_STRICT));
@@ -608,7 +619,9 @@ static void determine_author_info(struct strbuf *author_ident)
 		export_one("GIT_AUTHOR_DATE", author.date.begin, author.tz.end, '@');
 	}
 
-	strbuf_release(&date_buf);
+	free(name);
+	free(email);
+	free(date);
 }
 
 static void split_ident_or_die(struct ident_split *id, const struct strbuf *buf)


I dunno. Maybe the set_ident_var helper is a little too cutesy and
obfuscating.

-Peff

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] determine_author_info: stop leaking name/email
  2014-06-23  9:33     ` Erik Faye-Lund
  2014-06-23  9:48       ` Eric Sunshine
@ 2014-06-23 17:21       ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2014-06-23 17:21 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Eric Sunshine, Git List

On Mon, Jun 23, 2014 at 11:33:56AM +0200, Erik Faye-Lund wrote:

> This approach has the added benefit of fixing the case where getenv
> uses a static buffer, like POSIX allows.

Good point. I knew we could invalidate the pointer if setenv() was called, but
I didn't know that another getenv() could (I don't know of any such
implementation offhand, but it is not too hard to handle it).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-06-23 17:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-18 20:19 [PATCH 0/7] cleaning up determine_author_info Jeff King
2014-06-18 20:27 ` [PATCH 1/7] commit: provide a function to find a header in a buffer Jeff King
2014-06-23  1:26   ` Eric Sunshine
2014-06-23 16:47     ` Jeff King
2014-06-18 20:28 ` [PATCH 2/7] record_author_info: fix memory leak on malformed commit Jeff King
2014-06-18 20:29 ` [PATCH 3/7] record_author_info: use find_commit_header Jeff King
2014-06-18 20:31 ` [PATCH 4/7] ident_split: store begin/end pairs on their own struct Jeff King
2014-06-23  1:28   ` Eric Sunshine
2014-06-18 20:32 ` [PATCH 5/7] use strbufs in date functions Jeff King
2014-06-18 20:35 ` [PATCH 6/7] determine_author_info: reuse parsing functions Jeff King
2014-06-18 20:36 ` [PATCH 7/7] determine_author_info: stop leaking name/email Jeff King
2014-06-23  9:28   ` Eric Sunshine
2014-06-23  9:33     ` Erik Faye-Lund
2014-06-23  9:48       ` Eric Sunshine
2014-06-23 17:21       ` Jeff King
2014-06-23 17:20     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.