All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems with unrecognized headers in git bundles
@ 2012-02-22 16:05 Jannis Pohlmann
  2012-02-22 19:34 ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
  2012-02-22 20:25 ` Problems with unrecognized headers in git bundles Øyvind A. Holm
  0 siblings, 2 replies; 23+ messages in thread
From: Jannis Pohlmann @ 2012-02-22 16:05 UTC (permalink / raw)
  To: git

Hi,

creating bundles from some repositories seems to lead to bundles with 
incorrectly formatted headers, at least with git >= 1.7.2. When cloning 
from such bundles, git prints the following error/warning:

   $ git clone perl-clone.bundle perl-clone
   Cloning into 'perl-clone'...
   warning: unrecognized header: --work around mangled archname on...

This can be reproduced easily with git from any version >= 1.7.2 or from 
master, using the following steps:

   git clone git://perl5.git.perl.org/perl.git perl
   GIT_DIR=perl/.git git bundle create perl-clone.bundle --all
   git clone perl-clone.bundle perl-clone

The content of the bundle is:

   # v2 git bundle
   -- work around mangled archname on win32 while finding...
   39ec54a59ce332fc44e553f4e5eeceef88e8369e refs/heads/blead
   39ec54a59ce332fc44e553f4e5eeceef88e8369e refs/remotes/origin/HEAD
   ...

The "--work around mangled archname..." line is rather long, so I've 
omitted most of it. What it contains is a series of commit messages all 
combined into a single line. It appears that this line is the problem 
because it's neither a comment (like '#v2 git bundle') nor a SHA1 
followed by a ref name.

Note that this problem does not occur with all repositories. A bundle 
created from a test repository with a single text file and just one 
commit does not have this problem.

Note also that "git clone <bundle> <dest>" does not fail with corrupted 
bundles if the git version is something like 1.7.2 or 1.7.7.6. Those 
version print the above warning but still succeed in cloning. "git 
clone" from master however treats this as an error and fails.

Without any knowledge of how git works internally, I would assume that 
this is either a bug in how bundles are created, or a hint at a slightly 
broken perl repository (other's have the same thing though, perhaps it 
is because of a conversion from another SCM software to git in the 
past). Does this sound reasonable?

Is there a way to work around this or fix it properly? I'm not sure what 
lead to the decision to no longer ignore unrecognized headers in git 
master, would it be sensible to revert this change if nothing can be 
done to solve this during bundle creation?

   - Jannis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments
  2012-02-22 16:05 Problems with unrecognized headers in git bundles Jannis Pohlmann
@ 2012-02-22 19:34 ` Thomas Rast
  2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
  2012-02-22 20:51   ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Jeff King
  2012-02-22 20:25 ` Problems with unrecognized headers in git bundles Øyvind A. Holm
  1 sibling, 2 replies; 23+ messages in thread
From: Thomas Rast @ 2012-02-22 19:34 UTC (permalink / raw)
  To: git; +Cc: Jannis Pohlmann

The comment even said that it should eventually go there.  While at
it, match the calling convention and name of the function to the
strbuf_get*line family.  So it now is strbuf_getwholeline_fd.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---

I was left looking for strbuf_readline_fd() for a while, then tried to
look for the readline() from the libc that it would presumably match.
Hence this cleanup.

 bundle.c |   21 ++-------------------
 strbuf.c |   16 ++++++++++++++++
 strbuf.h |    1 +
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/bundle.c b/bundle.c
index b8acf3c..313de42 100644
--- a/bundle.c
+++ b/bundle.c
@@ -23,23 +23,6 @@ static void add_to_ref_list(const unsigned char *sha1, const char *name,
 	list->nr++;
 }
 
-/* Eventually this should go to strbuf.[ch] */
-static int strbuf_readline_fd(struct strbuf *sb, int fd)
-{
-	strbuf_reset(sb);
-
-	while (1) {
-		char ch;
-		ssize_t len = xread(fd, &ch, 1);
-		if (len <= 0)
-			return len;
-		strbuf_addch(sb, ch);
-		if (ch == '\n')
-			break;
-	}
-	return 0;
-}
-
 static int parse_bundle_header(int fd, struct bundle_header *header,
 			       const char *report_path)
 {
@@ -47,7 +30,7 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 	int status = 0;
 
 	/* The bundle header begins with the signature */
-	if (strbuf_readline_fd(&buf, fd) ||
+	if (strbuf_getwholeline_fd(&buf, fd, '\n') ||
 	    strcmp(buf.buf, bundle_signature)) {
 		if (report_path)
 			error("'%s' does not look like a v2 bundle file",
@@ -57,7 +40,7 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 	}
 
 	/* The bundle header ends with an empty line */
-	while (!strbuf_readline_fd(&buf, fd) &&
+	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
 	       buf.len && buf.buf[0] != '\n') {
 		unsigned char sha1[20];
 		int is_prereq = 0;
diff --git a/strbuf.c b/strbuf.c
index ff0b96b..5135d59 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -383,6 +383,22 @@ int strbuf_getline(struct strbuf *sb, FILE *fp, int term)
 	return 0;
 }
 
+int strbuf_getwholeline_fd(struct strbuf *sb, int fd, int term)
+{
+	strbuf_reset(sb);
+
+	while (1) {
+		char ch;
+		ssize_t len = xread(fd, &ch, 1);
+		if (len <= 0)
+			return EOF;
+		strbuf_addch(sb, ch);
+		if (ch == term)
+			break;
+	}
+	return 0;
+}
+
 int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
 {
 	int fd, len;
diff --git a/strbuf.h b/strbuf.h
index fbf059f..3effaa8 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -116,6 +116,7 @@ static inline void strbuf_complete_line(struct strbuf *sb)
 
 extern int strbuf_getwholeline(struct strbuf *, FILE *, int);
 extern int strbuf_getline(struct strbuf *, FILE *, int);
+extern int strbuf_getwholeline_fd(struct strbuf *, int, int);
 
 extern void stripspace(struct strbuf *buf, int skip_comments);
 extern int launch_editor(const char *path, struct strbuf *buffer, const char *const *env);
-- 
1.7.9.1.430.g4998543

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 19:34 ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
@ 2012-02-22 19:34   ` Thomas Rast
  2012-02-22 20:22     ` Junio C Hamano
                       ` (3 more replies)
  2012-02-22 20:51   ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Jeff King
  1 sibling, 4 replies; 23+ messages in thread
From: Thomas Rast @ 2012-02-22 19:34 UTC (permalink / raw)
  To: git; +Cc: Jannis Pohlmann

The first part of the bundle header contains the boundary commits, and
could be approximated by

  # v2 git bundle
  $(git rev-list --pretty=oneline --boundary <ARGS> | grep ^-)

git-bundle actually spawns exactly this rev-list invocation, and does
the grepping internally.

There was a subtle bug in the latter step: it used fgets() with a
1024-byte buffer.  If the user has sufficiently long subjects (e.g.,
by not adhering to the git oneline-subject convention in the first
place), the 'oneline' format can easily overflow the buffer.  fgets()
then returns the rest of the line in the next call(s).  If one of
these remaining parts started with '-', git-bundle would mistakenly
insert it into the bundle thinking it was a boundary commit.

Fix it by using strbuf_getwholeline() instead, which handles arbitrary
line lengths correctly.

Note that on the receiving side in parse_bundle_header() we were
already using strbuf_getwholeline_fd(), so that part is safe.

Reported-by: Jannis Pohlmann <jannis.pohlmann@codethink.co.uk>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 bundle.c          |   14 +++++++-------
 t/t5704-bundle.sh |   17 +++++++++++++++++
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/bundle.c b/bundle.c
index 313de42..0dbd174 100644
--- a/bundle.c
+++ b/bundle.c
@@ -234,7 +234,7 @@ int create_bundle(struct bundle_header *header, const char *path,
 	const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
 	const char **argv_pack = xmalloc(6 * sizeof(const char *));
 	int i, ref_count = 0;
-	char buffer[1024];
+	struct strbuf buf = STRBUF_INIT;
 	struct rev_info revs;
 	struct child_process rls;
 	FILE *rls_fout;
@@ -266,16 +266,16 @@ int create_bundle(struct bundle_header *header, const char *path,
 	if (start_command(&rls))
 		return -1;
 	rls_fout = xfdopen(rls.out, "r");
-	while (fgets(buffer, sizeof(buffer), rls_fout)) {
+	while (strbuf_getwholeline(&buf, rls_fout, '\n') != EOF) {
 		unsigned char sha1[20];
-		if (buffer[0] == '-') {
-			write_or_die(bundle_fd, buffer, strlen(buffer));
-			if (!get_sha1_hex(buffer + 1, sha1)) {
+		if (buf.len > 0 && buf.buf[0] == '-') {
+			write_or_die(bundle_fd, buf.buf, buf.len);
+			if (!get_sha1_hex(buf.buf + 1, sha1)) {
 				struct object *object = parse_object(sha1);
 				object->flags |= UNINTERESTING;
-				add_pending_object(&revs, object, buffer);
+				add_pending_object(&revs, object, buf.buf);
 			}
-		} else if (!get_sha1_hex(buffer, sha1)) {
+		} else if (!get_sha1_hex(buf.buf, sha1)) {
 			struct object *object = parse_object(sha1);
 			object->flags |= SHOWN;
 		}
diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index 4ae127d..7c2f307 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -59,4 +59,21 @@ test_expect_success 'empty bundle file is rejected' '
 
 '
 
+# If "ridiculous" is at least 1004 chars, this traps a bug in old
+# versions where the resulting 1025-char line (with --pretty=oneline)
+# was longer than a 1024-char buffer
+test_expect_success 'ridiculously long subject in boundary' '
+
+	: > file4 &&
+	test_tick &&
+	git add file4 &&
+	printf "abcdefghijkl %s\n" $(seq 1 100) | git commit -F - &&
+	test_commit fifth &&
+	git bundle create long-subject-bundle.bdl HEAD^..HEAD &&
+	git fetch long-subject-bundle.bdl &&
+	sed -n "/^-/{p;q}" long-subject-bundle.bdl > boundary &&
+	grep "^-$_x40 " boundary
+
+'
+
 test_done
-- 
1.7.9.1.430.g4998543

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
@ 2012-02-22 20:22     ` Junio C Hamano
  2012-02-22 20:25       ` Thomas Rast
  2012-02-22 20:55     ` Jeff King
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2012-02-22 20:22 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Jannis Pohlmann

Thomas Rast <trast@student.ethz.ch> writes:

> The first part of the bundle header contains the boundary commits, and
> could be approximated by
>
>   # v2 git bundle
>   $(git rev-list --pretty=oneline --boundary <ARGS> | grep ^-)
>
> git-bundle actually spawns exactly this rev-list invocation, and does
> the grepping internally.
>
> There was a subtle bug in the latter step: it used fgets() with a
> 1024-byte buffer.  If the user has sufficiently long subjects (e.g.,
> by not adhering to the git oneline-subject convention in the first
> place), the 'oneline' format can easily overflow the buffer.  fgets()
> then returns the rest of the line in the next call(s).  If one of
> these remaining parts started with '-', git-bundle would mistakenly
> insert it into the bundle thinking it was a boundary commit.
>
> Fix it by using strbuf_getwholeline() instead, which handles arbitrary
> line lengths correctly.
>
> Note that on the receiving side in parse_bundle_header() we were
> already using strbuf_getwholeline_fd(), so that part is safe.

Thanks for diagnosing this, but I wonder if it even needs --pretty=oneline
to begin with, except for debugging purposes.

Do we ever use the subject string read from the rev-list output in any
way?

In other words, I am wondering if the right patch to minimally fix the
issue starting from older releases is something along this line instead:

diff --git a/bundle.c b/bundle.c
index b8acf3c..339dbb0 100644
--- a/bundle.c
+++ b/bundle.c
@@ -248,7 +248,7 @@ int create_bundle(struct bundle_header *header, const char *path,
 	static struct lock_file lock;
 	int bundle_fd = -1;
 	int bundle_to_stdout;
-	const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
+	const char **argv_boundary = xmalloc((argc + 3) * sizeof(const char *));
 	const char **argv_pack = xmalloc(6 * sizeof(const char *));
 	int i, ref_count = 0;
 	char buffer[1024];
@@ -271,11 +271,10 @@ int create_bundle(struct bundle_header *header, const char *path,
 	init_revisions(&revs, NULL);
 
 	/* write prerequisites */
-	memcpy(argv_boundary + 3, argv + 1, argc * sizeof(const char *));
+	memcpy(argv_boundary + 2, argv + 1, argc * sizeof(const char *));
 	argv_boundary[0] = "rev-list";
 	argv_boundary[1] = "--boundary";
-	argv_boundary[2] = "--pretty=oneline";
-	argv_boundary[argc + 2] = NULL;
+	argv_boundary[argc + 1] = NULL;
 	memset(&rls, 0, sizeof(rls));
 	rls.argv = argv_boundary;
 	rls.out = -1;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Problems with unrecognized headers in git bundles
  2012-02-22 16:05 Problems with unrecognized headers in git bundles Jannis Pohlmann
  2012-02-22 19:34 ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
@ 2012-02-22 20:25 ` Øyvind A. Holm
  2012-02-22 20:40   ` Øyvind A. Holm
  2012-02-23 13:27   ` Erik Faye-Lund
  1 sibling, 2 replies; 23+ messages in thread
From: Øyvind A. Holm @ 2012-02-22 20:25 UTC (permalink / raw)
  To: Jannis Pohlmann; +Cc: git

On 22 February 2012 17:05, Jannis Pohlmann wrote:
> Hi,
>
> creating bundles from some repositories seems to lead to bundles with
> incorrectly formatted headers, at least with git >= 1.7.2. When
> cloning from such bundles, git prints the following error/warning:
>
>  $ git clone perl-clone.bundle perl-clone
>  Cloning into 'perl-clone'...
>  warning: unrecognized header: --work around mangled archname on...
>
> This can be reproduced easily with git from any version >= 1.7.2 or
> from master, using the following steps:
>
>  git clone git://perl5.git.perl.org/perl.git perl
>  GIT_DIR=perl/.git git bundle create perl-clone.bundle --all
>  git clone perl-clone.bundle perl-clone
>
> The content of the bundle is:
>
>  # v2 git bundle
>  -- work around mangled archname on win32 while finding...
>  39ec54a59ce332fc44e553f4e5eeceef88e8369e refs/heads/blead
>  39ec54a59ce332fc44e553f4e5eeceef88e8369e refs/remotes/origin/HEAD

Have researched this a bit, and I've found that all git versions back to
when git-bundle was introduced (around v1.5.4) produces the same invalid
line. The culprit is commit 3e8148feadabd0d0b1869fcc4d218a6475a5b0bc in
perl.git, branch 'maint-5.005'. The log message of that commit contains
email headers, maybe that's the reason git bundle gets confused?

        Øyvind

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 20:22     ` Junio C Hamano
@ 2012-02-22 20:25       ` Thomas Rast
  2012-02-22 20:50         ` Junio C Hamano
  2012-02-22 21:00         ` Jeff King
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Rast @ 2012-02-22 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Rast, git, Jannis Pohlmann

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Rast <trast@student.ethz.ch> writes:
>
>>   # v2 git bundle
>>   $(git rev-list --pretty=oneline --boundary <ARGS> | grep ^-)
>>
>> git-bundle actually spawns exactly this rev-list invocation, and does
>> the grepping internally.
>>
>> There was a subtle bug in the latter step: it used fgets() with a
>> 1024-byte buffer.  If the user has sufficiently long subjects (e.g.,
>> by not adhering to the git oneline-subject convention in the first
>> place), the 'oneline' format can easily overflow the buffer.
>
> Thanks for diagnosing this, but I wonder if it even needs --pretty=oneline
> to begin with, except for debugging purposes.
>
> Do we ever use the subject string read from the rev-list output in any
> way?
>
> In other words, I am wondering if the right patch to minimally fix the
> issue starting from older releases is something along this line instead:

Not sure.  The only use I could think of would be to google for the
subjects, in the hope of finding some repository that has the commit you
are looking for.  Other than that...

In any case the --pretty=oneline is very deliberate, as we can see from
the commit below.  It just doesn't give a reason :-)

commit 239296770dae75e21c179733785731ec6ffae1f5
Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Date:   Fri Feb 23 03:17:51 2007 +0100

    git-bundle: record commit summary in the prerequisite data

diff --git a/builtin-bundle.c b/builtin-bundle.c
index 191ec55..d74afaa 100644
--- a/builtin-bundle.c
+++ b/builtin-bundle.c
@@ -267,7 +267,7 @@ static int create_bundle(struct bundle_header *header, const char *path,
 		int argc, const char **argv)
 {
 	int bundle_fd = -1;
-	const char **argv_boundary = xmalloc((argc + 3) * sizeof(const char *));
+	const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
 	const char **argv_pack = xmalloc(4 * sizeof(const char *));
 	int pid, in, out, i, status;
 	char buffer[1024];
@@ -282,10 +282,11 @@ static int create_bundle(struct bundle_header *header, const char *path,
 	write_or_die(bundle_fd, bundle_signature, strlen(bundle_signature));
 
 	/* write prerequisites */
-	memcpy(argv_boundary + 2, argv + 1, argc * sizeof(const char *));
+	memcpy(argv_boundary + 3, argv + 1, argc * sizeof(const char *));
 	argv_boundary[0] = "rev-list";
 	argv_boundary[1] = "--boundary";
-	argv_boundary[argc + 1] = NULL;
+	argv_boundary[2] = "--pretty=oneline";
+	argv_boundary[argc + 2] = NULL;
 	out = -1;
 	pid = fork_with_pipe(argv_boundary, NULL, &out);
 	if (pid < 0)


-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Problems with unrecognized headers in git bundles
  2012-02-22 20:25 ` Problems with unrecognized headers in git bundles Øyvind A. Holm
@ 2012-02-22 20:40   ` Øyvind A. Holm
  2012-02-23 13:27   ` Erik Faye-Lund
  1 sibling, 0 replies; 23+ messages in thread
From: Øyvind A. Holm @ 2012-02-22 20:40 UTC (permalink / raw)
  To: Jannis Pohlmann; +Cc: git

On 22 February 2012 21:25, Øyvind A. Holm <sunny@sunbase.org> wrote:
> On 22 February 2012 17:05, Jannis Pohlmann wrote:
> > creating bundles from some repositories seems to lead to bundles
> > with incorrectly formatted headers, at least with git >= 1.7.2. When
> > cloning from such bundles, git prints the following error/warning:
> >
> >  $ git clone perl-clone.bundle perl-clone
> >  Cloning into 'perl-clone'...
> >  warning: unrecognized header: --work around mangled archname on...
>
> Have researched this a bit, and I've found that all git versions back
> to when git-bundle was introduced (around v1.5.4) produces the same
> invalid line. The culprit is commit
> 3e8148feadabd0d0b1869fcc4d218a6475a5b0bc in perl.git, branch
> 'maint-5.005'. The log message of that commit contains email headers,
> maybe that's the reason git bundle gets confused?

...or maybe because the log message doesn't contain any empty lines, so
they're joined together into an insanely long line. I've seen this
behaviour before, in git-am or git-apply, I think. Anyway, when the
bundle doesn't contain this commit, the line is not present.

  Øyvind

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 20:25       ` Thomas Rast
@ 2012-02-22 20:50         ` Junio C Hamano
  2012-02-22 21:00         ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2012-02-22 20:50 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Thomas Rast, git, Jannis Pohlmann

Thomas Rast <trast@inf.ethz.ch> writes:

> In any case the --pretty=oneline is very deliberate, as we can see from
> the commit below.  It just doesn't give a reason :-)

Thanks for digging and I hope everybody on this list who would ever send a
patch learns the importance of describing _why_ in the log from this
lesson.

I'll amend your 2/2 with a note that keeping the subject material is not
justified and we may want to remove (or at least reduce) it in the future.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments
  2012-02-22 19:34 ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
  2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
@ 2012-02-22 20:51   ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff King @ 2012-02-22 20:51 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Jannis Pohlmann

On Wed, Feb 22, 2012 at 08:34:22PM +0100, Thomas Rast wrote:

> The comment even said that it should eventually go there.  While at
> it, match the calling convention and name of the function to the
> strbuf_get*line family.  So it now is strbuf_getwholeline_fd.
> [...]
>  bundle.c |   21 ++-------------------
>  strbuf.c |   16 ++++++++++++++++
>  strbuf.h |    1 +

Nit: no update to Documentation/technical/api-strbuf.txt.

You might want to also mention that this will read() one byte at a time,
and is therefore only a good idea if you really care about the position
of the fd. Otherwise, fdopen() + strbuf_getwholeline() is much more
efficient.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
  2012-02-22 20:22     ` Junio C Hamano
@ 2012-02-22 20:55     ` Jeff King
  2012-02-22 22:10       ` Johannes Sixt
  2012-02-23  3:20     ` Junio C Hamano
  2012-02-23  3:38     ` Junio C Hamano
  3 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2012-02-22 20:55 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Jannis Pohlmann

On Wed, Feb 22, 2012 at 08:34:23PM +0100, Thomas Rast wrote:

> +# If "ridiculous" is at least 1004 chars, this traps a bug in old
> +# versions where the resulting 1025-char line (with --pretty=oneline)
> +# was longer than a 1024-char buffer
> +test_expect_success 'ridiculously long subject in boundary' '
> +
> +	: > file4 &&
> +	test_tick &&
> +	git add file4 &&
> +	printf "abcdefghijkl %s\n" $(seq 1 100) | git commit -F - &&

Seq is not portable. I usually use either

  perl -le "print for (1..100)"

or just do:

  z16=zzzzzzzzzzzzzzzz
  z256=$z16$z16$z16$z16$z16$z16$z16$z16
  z1024=$z256$z256$z256$z256$z256$z256$z256$z256

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 20:25       ` Thomas Rast
  2012-02-22 20:50         ` Junio C Hamano
@ 2012-02-22 21:00         ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff King @ 2012-02-22 21:00 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Junio C Hamano, Thomas Rast, git, Jannis Pohlmann

On Wed, Feb 22, 2012 at 09:25:53PM +0100, Thomas Rast wrote:

> > Thanks for diagnosing this, but I wonder if it even needs --pretty=oneline
> > to begin with, except for debugging purposes.
> >
> > Do we ever use the subject string read from the rev-list output in any
> > way?
> >
> > In other words, I am wondering if the right patch to minimally fix the
> > issue starting from older releases is something along this line instead:
> 
> Not sure.  The only use I could think of would be to google for the
> subjects, in the hope of finding some repository that has the commit you
> are looking for.  Other than that...

Or because the bundle creator (which may even be you on a different day)
did not correctly guess at the negative refs when specifying a cutoff.
Assuming you no longer have access to the original repo (not
unreasonable, since you are using a bundle), the messages could help you
understand where the error occurred.

Of course, once you figure out "aha! I expected the bundle to include
foo, but it is actually a cutoff point. I should have said XYZ^ as the
cutoff", I'm not sure what you do then. It's not like there is an easy
way to salvage the data that is in the bundle.

So I think it is a debugging aid, but one that ultimately doesn't help
you that much.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 20:55     ` Jeff King
@ 2012-02-22 22:10       ` Johannes Sixt
  0 siblings, 0 replies; 23+ messages in thread
From: Johannes Sixt @ 2012-02-22 22:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Rast, git, Jannis Pohlmann

Am 22.02.2012 21:55, schrieb Jeff King:
> On Wed, Feb 22, 2012 at 08:34:23PM +0100, Thomas Rast wrote:
>> +	printf "abcdefghijkl %s\n" $(seq 1 100) | git commit -F - &&
> 
> Seq is not portable.

Thanks for pointing this out.

> I usually use either
> 
>   perl -le "print for (1..100)"
> 
> or just do:
> 
>   z16=zzzzzzzzzzzzzzzz
>   z256=$z16$z16$z16$z16$z16$z16$z16$z16
>   z1024=$z256$z256$z256$z256$z256$z256$z256$z256

If a sequence of ASCII zeros is good enough, you can also do:

  printf %02134d 0

-- Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
  2012-02-22 20:22     ` Junio C Hamano
  2012-02-22 20:55     ` Jeff King
@ 2012-02-23  3:20     ` Junio C Hamano
  2012-02-23  3:38     ` Junio C Hamano
  3 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2012-02-23  3:20 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Jannis Pohlmann

Thomas Rast <trast@student.ethz.ch> writes:

> +# If "ridiculous" is at least 1004 chars, this traps a bug in old
> +# versions where the resulting 1025-char line (with --pretty=oneline)
> +# was longer than a 1024-char buffer
> +test_expect_success 'ridiculously long subject in boundary' '
> +
> +	: > file4 &&
> +	test_tick &&
> +	git add file4 &&
> +	printf "abcdefghijkl %s\n" $(seq 1 100) | git commit -F - &&
> +	test_commit fifth &&
> +	git bundle create long-subject-bundle.bdl HEAD^..HEAD &&
> +	git fetch long-subject-bundle.bdl &&
> +	sed -n "/^-/{p;q}" long-subject-bundle.bdl > boundary &&
> +	grep "^-$_x40 " boundary
> +
> +'

Hmm, aside from the use of seq J6t mentioned, I am not sure what this is
testing.

I thought the point is to make "--oneline" output produce a long line
whose 1024th character is '-' and make the fgets() based parser to mistake
it as the beginning of a line that records a boundary commit, but you do
not seem to have any '-' there.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits
  2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
                       ` (2 preceding siblings ...)
  2012-02-23  3:20     ` Junio C Hamano
@ 2012-02-23  3:38     ` Junio C Hamano
  2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
  3 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2012-02-23  3:38 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git, Jannis Pohlmann

Thomas Rast <trast@student.ethz.ch> writes:

> diff --git a/bundle.c b/bundle.c
> index 313de42..0dbd174 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -234,7 +234,7 @@ int create_bundle(struct bundle_header *header, const char *path,
>  	const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
>  	const char **argv_pack = xmalloc(6 * sizeof(const char *));
>  	int i, ref_count = 0;
> -	char buffer[1024];
> +	struct strbuf buf = STRBUF_INIT;
>  	struct rev_info revs;
>  	struct child_process rls;
>  	FILE *rls_fout;
> @@ -266,16 +266,16 @@ int create_bundle(struct bundle_header *header, const char *path,
>  	if (start_command(&rls))
>  		return -1;
>  	rls_fout = xfdopen(rls.out, "r");
> -	while (fgets(buffer, sizeof(buffer), rls_fout)) {
> +	while (strbuf_getwholeline(&buf, rls_fout, '\n') != EOF) {

I'd add strbuf_release(&buf) after this loop.

Perhaps we would want to squash something like this to the test to avoid
"seq", using J6t's idea.  The issue is that we do not write the end of
line for the long boundary (because it is hidden from us), keep reading
and rejecting bogus letters, and then the tip is written without
terminating the boundary line.  So checking boundary line is not a very
useful test, but "fetch" and "list-heads" are.

 t/t5704-bundle.sh |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index 7c2f307..5319b84 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -59,19 +59,20 @@ test_expect_success 'empty bundle file is rejected' '
 
 '
 
-# If "ridiculous" is at least 1004 chars, this traps a bug in old
-# versions where the resulting 1025-char line (with --pretty=oneline)
-# was longer than a 1024-char buffer
+# This triggers a bug in older versions where the resulting line (with
+# --pretty=oneline) was longer than a 1024-char buffer.
 test_expect_success 'ridiculously long subject in boundary' '
 
-	: > file4 &&
+	: >file4 &&
 	test_tick &&
 	git add file4 &&
-	printf "abcdefghijkl %s\n" $(seq 1 100) | git commit -F - &&
+	printf "%0982d\n" 0 | git commit -F - &&
 	test_commit fifth &&
 	git bundle create long-subject-bundle.bdl HEAD^..HEAD &&
+	git bundle list-heads long-subject-bundle.bdl >heads &&
+	test -s heads &&
 	git fetch long-subject-bundle.bdl &&
-	sed -n "/^-/{p;q}" long-subject-bundle.bdl > boundary &&
+	sed -n "/^-/{p;q}" long-subject-bundle.bdl >boundary &&
 	grep "^-$_x40 " boundary
 
 '
-- 
1.7.9.2.258.g4407a

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 0/4] Making an elephant out of a getline() bug
  2012-02-23  3:38     ` Junio C Hamano
@ 2012-02-23  9:42       ` Thomas Rast
  2012-02-23  9:42         ` [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation Thomas Rast
                           ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Thomas Rast @ 2012-02-23  9:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

So you all made very good points, and I don't want to repeat them.
Junio's analysis

> Perhaps we would want to squash something like this to the test to avoid
> "seq", using J6t's idea.  The issue is that we do not write the end of
> line for the long boundary (because it is hidden from us), keep reading
> and rejecting bogus letters, and then the tip is written without
> terminating the boundary line.  So checking boundary line is not a very
> useful test, but "fetch" and "list-heads" are.

is of course also correct for the case where the overlong line is in a
boundary commit.  (The analysis about the accidentally-leading '-' is
also correct and can create a boundary line where there was none, like
in the perl case.)

The patches in this round are:

[1] new; following up on Peff's complaint that I should document
    strbuf_getwholeline_fd, I found that the strbuf_get*line
    documentation was also lacking.

[2] previously (1/2), now with documentation added

[3] new; Junio corrected me on the ': > file' spelling, which I added
    because the start of the test used it too.  So let's just fix it
    outright to match the Git style all over.

[4] previously (2/2), now incorporating a strbuf_release and the
    suggestions for the tests as sent by Junio.  I upped the %0982 a
    bit as it felt *too* tailored for the old bug, and this way
    everyone can see immediately that it will exceed the 1024 char
    limit (without thinking about the length of a sha and accounting
    for \0 and \n).

Thomas Rast (4):
  strbuf: improve strbuf_get*line documentation
  bundle: put strbuf_readline_fd in strbuf.c with adjustments
  t5704: match tests to modern style
  bundle: use a strbuf to scan the log for boundary commits

 Documentation/technical/api-strbuf.txt |   19 +++++++++++--
 bundle.c                               |   36 +++++++-----------------
 strbuf.c                               |   16 +++++++++++
 strbuf.h                               |    1 +
 t/t5704-bundle.sh                      |   47 ++++++++++++++++----------------
 5 files changed, 67 insertions(+), 52 deletions(-)

-- 
1.7.9.1.430.g4998543

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation
  2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
@ 2012-02-23  9:42         ` Thomas Rast
  2012-02-23 10:08           ` Jeff King
  2012-02-23  9:42         ` [PATCH v2 2/4] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Rast @ 2012-02-23  9:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

strbuf_getline() was not documented very clearly, though a reader
familiar with getline() would not have had any questions about it.
strbuf_getwholeline() was not documented at all.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 Documentation/technical/api-strbuf.txt |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index afe2759..cc6db5e 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -255,8 +255,16 @@ same behaviour as well.
 
 `strbuf_getline`::
 
-	Read a line from a FILE* pointer. The second argument specifies the line
-	terminator character, typically `'\n'`.
+	Read a line from a FILE*. The second argument specifies the
+	line terminator character, typically `'\n'`.  Reading stops
+	after the terminator or at EOF.  The terminator is removed
+	from the buffer before returning.  Returns 0 unless there was
+	nothing left before EOF, in which case it returns `EOF`.
+
+`strbuf_getwholeline`::
+	
+	Like `strbuf_getline`, but keeps the trailing terminator (if
+	any) in the buffer.
 
 `stripspace`::
 
-- 
1.7.9.1.430.g4998543

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/4] bundle: put strbuf_readline_fd in strbuf.c with adjustments
  2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
  2012-02-23  9:42         ` [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation Thomas Rast
@ 2012-02-23  9:42         ` Thomas Rast
  2012-02-23  9:42         ` [PATCH v2 3/4] t5704: match tests to modern style Thomas Rast
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Thomas Rast @ 2012-02-23  9:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

The comment even said that it should eventually go there.  While at
it, match the calling convention and name of the function to the
strbuf_get*line family.  So it now is strbuf_getwholeline_fd.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 Documentation/technical/api-strbuf.txt |    7 +++++++
 bundle.c                               |   21 ++-------------------
 strbuf.c                               |   16 ++++++++++++++++
 strbuf.h                               |    1 +
 4 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt
index cc6db5e..827c4bd 100644
--- a/Documentation/technical/api-strbuf.txt
+++ b/Documentation/technical/api-strbuf.txt
@@ -266,6 +266,13 @@ same behaviour as well.
 	Like `strbuf_getline`, but keeps the trailing terminator (if
 	any) in the buffer.
 
+`strbuf_getwholeline_fd`::
+
+	Like `strbuf_getwholeline`, but operates on a file descriptor.
+	It reads one character at a time, so it is very slow.  Do not
+	use it unless you need the correct position in the file
+	descriptor!
+
 `stripspace`::
 
 	Strip whitespace from a buffer. The second parameter controls if
diff --git a/bundle.c b/bundle.c
index b8acf3c..313de42 100644
--- a/bundle.c
+++ b/bundle.c
@@ -23,23 +23,6 @@ static void add_to_ref_list(const unsigned char *sha1, const char *name,
 	list->nr++;
 }
 
-/* Eventually this should go to strbuf.[ch] */
-static int strbuf_readline_fd(struct strbuf *sb, int fd)
-{
-	strbuf_reset(sb);
-
-	while (1) {
-		char ch;
-		ssize_t len = xread(fd, &ch, 1);
-		if (len <= 0)
-			return len;
-		strbuf_addch(sb, ch);
-		if (ch == '\n')
-			break;
-	}
-	return 0;
-}
-
 static int parse_bundle_header(int fd, struct bundle_header *header,
 			       const char *report_path)
 {
@@ -47,7 +30,7 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 	int status = 0;
 
 	/* The bundle header begins with the signature */
-	if (strbuf_readline_fd(&buf, fd) ||
+	if (strbuf_getwholeline_fd(&buf, fd, '\n') ||
 	    strcmp(buf.buf, bundle_signature)) {
 		if (report_path)
 			error("'%s' does not look like a v2 bundle file",
@@ -57,7 +40,7 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 	}
 
 	/* The bundle header ends with an empty line */
-	while (!strbuf_readline_fd(&buf, fd) &&
+	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
 	       buf.len && buf.buf[0] != '\n') {
 		unsigned char sha1[20];
 		int is_prereq = 0;
diff --git a/strbuf.c b/strbuf.c
index ff0b96b..5135d59 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -383,6 +383,22 @@ int strbuf_getline(struct strbuf *sb, FILE *fp, int term)
 	return 0;
 }
 
+int strbuf_getwholeline_fd(struct strbuf *sb, int fd, int term)
+{
+	strbuf_reset(sb);
+
+	while (1) {
+		char ch;
+		ssize_t len = xread(fd, &ch, 1);
+		if (len <= 0)
+			return EOF;
+		strbuf_addch(sb, ch);
+		if (ch == term)
+			break;
+	}
+	return 0;
+}
+
 int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
 {
 	int fd, len;
diff --git a/strbuf.h b/strbuf.h
index fbf059f..3effaa8 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -116,6 +116,7 @@ static inline void strbuf_complete_line(struct strbuf *sb)
 
 extern int strbuf_getwholeline(struct strbuf *, FILE *, int);
 extern int strbuf_getline(struct strbuf *, FILE *, int);
+extern int strbuf_getwholeline_fd(struct strbuf *, int, int);
 
 extern void stripspace(struct strbuf *buf, int skip_comments);
 extern int launch_editor(const char *path, struct strbuf *buffer, const char *const *env);
-- 
1.7.9.1.430.g4998543

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/4] t5704: match tests to modern style
  2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
  2012-02-23  9:42         ` [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation Thomas Rast
  2012-02-23  9:42         ` [PATCH v2 2/4] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
@ 2012-02-23  9:42         ` Thomas Rast
  2012-02-23 23:36           ` Junio C Hamano
  2012-02-23  9:42         ` [PATCH v2 4/4] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
  2012-02-23 19:04         ` [PATCH v2 0/4] Making an elephant out of a getline() bug Junio C Hamano
  4 siblings, 1 reply; 23+ messages in thread
From: Thomas Rast @ 2012-02-23  9:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

The test did not adhere to the current style on several counts:

- empty lines around the test blocks, but within the test string

- ': > file' or even just '> file' with an extra space

- inconsistent indentation

- hand-rolled commits instead of using test_commit

Fix all of them.  There's a catch to the last point: test_commit
creates a tag.  We still change it to test_commit, and explicitly
delete the tags, so as to highlight that the test relies on not having
them.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 t/t5704-bundle.sh |   33 ++++++++-------------------------
 1 file changed, 8 insertions(+), 25 deletions(-)

diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index 4ae127d..f35f559 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -4,59 +4,42 @@ test_description='some bundle related tests'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-
-	: > file &&
-	git add file &&
-	test_tick &&
-	git commit -m initial &&
+	test_commit initial &&
 	test_tick &&
 	git tag -m tag tag &&
-	: > file2 &&
-	git add file2 &&
-	: > file3 &&
-	test_tick &&
-	git commit -m second &&
-	git add file3 &&
-	test_tick &&
-	git commit -m third
-
+	test_commit second &&
+	test_commit third &&
+	git tag -d initial &&
+	git tag -d second &&
+	git tag -d third
 '
 
 test_expect_success 'tags can be excluded by rev-list options' '
-
 	git bundle create bundle --all --since=7.Apr.2005.15:16:00.-0700 &&
 	git ls-remote bundle > output &&
 	! grep tag output
-
 '
 
 test_expect_success 'die if bundle file cannot be created' '
-
 	mkdir adir &&
 	test_must_fail git bundle create adir --all
-
 '
 
 test_expect_failure 'bundle --stdin' '
-
 	echo master | git bundle create stdin-bundle.bdl --stdin &&
 	git ls-remote stdin-bundle.bdl >output &&
 	grep master output
-
 '
 
 test_expect_failure 'bundle --stdin <rev-list options>' '
-
 	echo master | git bundle create hybrid-bundle.bdl --stdin tag &&
 	git ls-remote hybrid-bundle.bdl >output &&
 	grep master output
-
 '
 
 test_expect_success 'empty bundle file is rejected' '
-
-    >empty-bundle && test_must_fail git fetch empty-bundle
-
+	: >empty-bundle &&
+	test_must_fail git fetch empty-bundle
 '
 
 test_done
-- 
1.7.9.1.430.g4998543

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/4] bundle: use a strbuf to scan the log for boundary commits
  2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
                           ` (2 preceding siblings ...)
  2012-02-23  9:42         ` [PATCH v2 3/4] t5704: match tests to modern style Thomas Rast
@ 2012-02-23  9:42         ` Thomas Rast
  2012-02-23 19:04         ` [PATCH v2 0/4] Making an elephant out of a getline() bug Junio C Hamano
  4 siblings, 0 replies; 23+ messages in thread
From: Thomas Rast @ 2012-02-23  9:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

The first part of the bundle header contains the boundary commits, and
could be approximated by

  # v2 git bundle
  $(git rev-list --pretty=oneline --boundary <ARGS> | grep ^-)

git-bundle actually spawns exactly this rev-list invocation, and does
the grepping internally.

There was a subtle bug in the latter step: it used fgets() with a
1024-byte buffer.  If the user has sufficiently long subjects (e.g.,
by not adhering to the git oneline-subject convention in the first
place), the 'oneline' format can easily overflow the buffer.  fgets()
then returns only the first part of the line _without a \n_, and it
would be printed as such, turning the next line into trailing garbage.
On top of that, fgets() returns the rest of the line in the next
call(s).  If one of these remaining parts starts with '-', git-bundle
would mistakenly insert it into the boundary list.

Fix it by using strbuf_getwholeline() instead, which handles arbitrary
line lengths correctly.

Note that on the receiving side in parse_bundle_header() we were
already using strbuf_getwholeline_fd(), so that part is safe.

Thanks to Peff, Johannes Sixt and Junio for suggestions.

Reported-by: Jannis Pohlmann <jannis.pohlmann@codethink.co.uk>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
 bundle.c          |   15 ++++++++-------
 t/t5704-bundle.sh |   16 ++++++++++++++++
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/bundle.c b/bundle.c
index 313de42..7a760db 100644
--- a/bundle.c
+++ b/bundle.c
@@ -234,7 +234,7 @@ int create_bundle(struct bundle_header *header, const char *path,
 	const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
 	const char **argv_pack = xmalloc(6 * sizeof(const char *));
 	int i, ref_count = 0;
-	char buffer[1024];
+	struct strbuf buf = STRBUF_INIT;
 	struct rev_info revs;
 	struct child_process rls;
 	FILE *rls_fout;
@@ -266,20 +266,21 @@ int create_bundle(struct bundle_header *header, const char *path,
 	if (start_command(&rls))
 		return -1;
 	rls_fout = xfdopen(rls.out, "r");
-	while (fgets(buffer, sizeof(buffer), rls_fout)) {
+	while (strbuf_getwholeline(&buf, rls_fout, '\n') != EOF) {
 		unsigned char sha1[20];
-		if (buffer[0] == '-') {
-			write_or_die(bundle_fd, buffer, strlen(buffer));
-			if (!get_sha1_hex(buffer + 1, sha1)) {
+		if (buf.len > 0 && buf.buf[0] == '-') {
+			write_or_die(bundle_fd, buf.buf, buf.len);
+			if (!get_sha1_hex(buf.buf + 1, sha1)) {
 				struct object *object = parse_object(sha1);
 				object->flags |= UNINTERESTING;
-				add_pending_object(&revs, object, buffer);
+				add_pending_object(&revs, object, buf.buf);
 			}
-		} else if (!get_sha1_hex(buffer, sha1)) {
+		} else if (!get_sha1_hex(buf.buf, sha1)) {
 			struct object *object = parse_object(sha1);
 			object->flags |= SHOWN;
 		}
 	}
+	strbuf_release(&buf);
 	fclose(rls_fout);
 	if (finish_command(&rls))
 		return error("rev-list died");
diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index f35f559..a51c8b0 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -42,4 +42,20 @@ test_expect_success 'empty bundle file is rejected' '
 	test_must_fail git fetch empty-bundle
 '
 
+# This triggers a bug in older versions where the resulting line (with
+# --pretty=oneline) was longer than a 1024-char buffer.
+test_expect_success 'ridiculously long subject in boundary' '
+	: >file4 &&
+	test_tick &&
+	git add file4 &&
+	printf "%01200d\n" 0 | git commit -F - &&
+	test_commit fifth &&
+	git bundle create long-subject-bundle.bdl HEAD^..HEAD &&
+	git bundle list-heads long-subject-bundle.bdl >heads &&
+	test -s heads &&
+	git fetch long-subject-bundle.bdl &&
+	sed -n "/^-/{p;q}" long-subject-bundle.bdl >boundary &&
+	grep "^-$_x40 " boundary
+'
+
 test_done
-- 
1.7.9.1.430.g4998543

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation
  2012-02-23  9:42         ` [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation Thomas Rast
@ 2012-02-23 10:08           ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2012-02-23 10:08 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Junio C Hamano, Johannes Sixt, Jannis Pohlmann, git

On Thu, Feb 23, 2012 at 10:42:21AM +0100, Thomas Rast wrote:

> strbuf_getline() was not documented very clearly, though a reader
> familiar with getline() would not have had any questions about it.
> strbuf_getwholeline() was not documented at all.

Thanks for improving the existing docs. One suggestion:

>  `strbuf_getline`::
>  
> -	Read a line from a FILE* pointer. The second argument specifies the line
> -	terminator character, typically `'\n'`.
> +	Read a line from a FILE*. The second argument specifies the
> +	line terminator character, typically `'\n'`.  Reading stops
> +	after the terminator or at EOF.  The terminator is removed
> +	from the buffer before returning.  Returns 0 unless there was
> +	nothing left before EOF, in which case it returns `EOF`.

The get*line functions are unlike the rest of the strbuf API in that
they overwrite, rather than append to, the strbuf argument. Maybe:

  s/from a FILE\*/&, overwriting the existing contents of the strbuf/

?

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Problems with unrecognized headers in git bundles
  2012-02-22 20:25 ` Problems with unrecognized headers in git bundles Øyvind A. Holm
  2012-02-22 20:40   ` Øyvind A. Holm
@ 2012-02-23 13:27   ` Erik Faye-Lund
  1 sibling, 0 replies; 23+ messages in thread
From: Erik Faye-Lund @ 2012-02-23 13:27 UTC (permalink / raw)
  To: Øyvind A. Holm; +Cc: Jannis Pohlmann, git

On Wed, Feb 22, 2012 at 9:25 PM, Øyvind A. Holm <sunny@sunbase.org> wrote:
> On 22 February 2012 17:05, Jannis Pohlmann wrote:
>> Hi,
>>
>> creating bundles from some repositories seems to lead to bundles with
>> incorrectly formatted headers, at least with git >= 1.7.2. When
>> cloning from such bundles, git prints the following error/warning:
>>
>>  $ git clone perl-clone.bundle perl-clone
>>  Cloning into 'perl-clone'...
>>  warning: unrecognized header: --work around mangled archname on...
>>
>> This can be reproduced easily with git from any version >= 1.7.2 or
>> from master, using the following steps:
>>
>>  git clone git://perl5.git.perl.org/perl.git perl
>>  GIT_DIR=perl/.git git bundle create perl-clone.bundle --all
>>  git clone perl-clone.bundle perl-clone
>>
>> The content of the bundle is:
>>
>>  # v2 git bundle
>>  -- work around mangled archname on win32 while finding...
>>  39ec54a59ce332fc44e553f4e5eeceef88e8369e refs/heads/blead
>>  39ec54a59ce332fc44e553f4e5eeceef88e8369e refs/remotes/origin/HEAD
>
> Have researched this a bit, and I've found that all git versions back to
> when git-bundle was introduced (around v1.5.4) produces the same invalid
> line. The culprit is commit 3e8148feadabd0d0b1869fcc4d218a6475a5b0bc in
> perl.git, branch 'maint-5.005'. The log message of that commit contains
> email headers, maybe that's the reason git bundle gets confused?

For the lazy, the commit can be found here:

http://perl5.git.perl.org/perl.git/commit/3e8148feadabd0d0b1869fcc4d218a6475a5b0bc

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/4] Making an elephant out of a getline() bug
  2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
                           ` (3 preceding siblings ...)
  2012-02-23  9:42         ` [PATCH v2 4/4] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
@ 2012-02-23 19:04         ` Junio C Hamano
  4 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2012-02-23 19:04 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

Thomas Rast <trast@student.ethz.ch> writes:

> So you all made very good points, and I don't want to repeat them.

I really wish you did not make an elephant out of this, or if you must,
kept the minimum bugfix part that must apply to older codebase and making
an elephant part clearly separable.

Anyway, I've reordered so that the bugfix part would apply to an older
maintenance release, and then the remainder applied on top would merge
cleanly to the bleeding edge, so there is no need to resend.

Thanks for working on this, everybody.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/4] t5704: match tests to modern style
  2012-02-23  9:42         ` [PATCH v2 3/4] t5704: match tests to modern style Thomas Rast
@ 2012-02-23 23:36           ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2012-02-23 23:36 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Johannes Sixt, Jeff King, Jannis Pohlmann, git

Thomas Rast <trast@student.ethz.ch> writes:

> Fix all of them.  There's a catch to the last point: test_commit
> creates a tag.  We still change it to test_commit, and explicitly
> delete the tags, so as to highlight that the test relies on not having
> them.

I do not have an objection to the use of these three test_commits, and I
do not have an objection to delete the extra three tags they create so
that the set of refs in the repository matches what the later test
expects, either.

But I found the explanation a bit iffy.

>  test_expect_success 'setup' '
> +	test_commit initial &&
>  	test_tick &&
>  	git tag -m tag tag &&
> +	test_commit second &&
> +	test_commit third &&
> +	git tag -d initial &&
> +	git tag -d second &&
> +	git tag -d third
>  '
>  
>  test_expect_success 'tags can be excluded by rev-list options' '
>  	git bundle create bundle --all --since=7.Apr.2005.15:16:00.-0700 &&
>  	git ls-remote bundle > output &&
>  	! grep tag output
>  '

If you do not delete 'third' tag, which matches the tip of the current
branch, the resulting bundle created with "create --all" would end up
containing that tag, and you will see it in the ls-remote output.

But the funny thing is that you can leave initial and second in the
repository and the resulting bundle still passes the test.  'initial',
'second' and 'tag' are excluded.  Exclusion of 'tag' tag (sheesh, it makes
this conversation more confusing than necessary) is what this test checks,
and the date range given to the rev-list ensures that initial and second
are not included.

But the tip of the current branch and the lightweight 'third' tag both
point at the same commit object, and that, together with the fact that we
ask for '--all', is the reason why it is included in the result, making
the test fail.

So perhaps a better fix may be to do something like the attached on top,
and rewrite everything after "Fix all of them".

	... Fix all of them.

        Also rename the manually created tag 'tag' that points at a commit
        that is older than the --since threshold a later test uses, to a
        more descriptive 'ancienttag', and update the check that reads
        from the resulting bundle with ls-remote to look for 'ancienttag'.
        The purpose of this test is to make sure that the tag that
        predates the date range is not in the resulting bundle, but
        because these test_commit also will create tags, one of which
        (namely, 'third') points at a commit that is newer than that
        threshold, ls-remote will list it in its output. Looking for 'tag'
        will match refs/tags/third, making the test incorrectly fail.

diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index a51c8b0..3c436e7 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -6,18 +6,15 @@ test_description='some bundle related tests'
 test_expect_success 'setup' '
 	test_commit initial &&
 	test_tick &&
-	git tag -m tag tag &&
+	git tag -m tag antienttag &&
 	test_commit second &&
-	test_commit third &&
-	git tag -d initial &&
-	git tag -d second &&
-	git tag -d third
+	test_commit third
 '
 
 test_expect_success 'tags can be excluded by rev-list options' '
 	git bundle create bundle --all --since=7.Apr.2005.15:16:00.-0700 &&
 	git ls-remote bundle > output &&
-	! grep tag output
+	! grep antienttag output
 '
 
 test_expect_success 'die if bundle file cannot be created' '

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-02-23 23:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-22 16:05 Problems with unrecognized headers in git bundles Jannis Pohlmann
2012-02-22 19:34 ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
2012-02-22 19:34   ` [PATCH 2/2] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
2012-02-22 20:22     ` Junio C Hamano
2012-02-22 20:25       ` Thomas Rast
2012-02-22 20:50         ` Junio C Hamano
2012-02-22 21:00         ` Jeff King
2012-02-22 20:55     ` Jeff King
2012-02-22 22:10       ` Johannes Sixt
2012-02-23  3:20     ` Junio C Hamano
2012-02-23  3:38     ` Junio C Hamano
2012-02-23  9:42       ` [PATCH v2 0/4] Making an elephant out of a getline() bug Thomas Rast
2012-02-23  9:42         ` [PATCH v2 1/4] strbuf: improve strbuf_get*line documentation Thomas Rast
2012-02-23 10:08           ` Jeff King
2012-02-23  9:42         ` [PATCH v2 2/4] bundle: put strbuf_readline_fd in strbuf.c with adjustments Thomas Rast
2012-02-23  9:42         ` [PATCH v2 3/4] t5704: match tests to modern style Thomas Rast
2012-02-23 23:36           ` Junio C Hamano
2012-02-23  9:42         ` [PATCH v2 4/4] bundle: use a strbuf to scan the log for boundary commits Thomas Rast
2012-02-23 19:04         ` [PATCH v2 0/4] Making an elephant out of a getline() bug Junio C Hamano
2012-02-22 20:51   ` [PATCH 1/2] bundle: put strbuf_readline_fd in strbuf.c with adjustments Jeff King
2012-02-22 20:25 ` Problems with unrecognized headers in git bundles Øyvind A. Holm
2012-02-22 20:40   ` Øyvind A. Holm
2012-02-23 13:27   ` Erik Faye-Lund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.