git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Weird growth in packfile during initial push
@ 2009-04-15 18:27 Robin H. Johnson
  2009-04-15 19:51 ` Nicolas Pitre
  0 siblings, 1 reply; 16+ messages in thread
From: Robin H. Johnson @ 2009-04-15 18:27 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2285 bytes --]

I was doing a more recent conversion of the Gentoo repo, and ran into
some odd behavior in the packfile size.

For anybody else following the repo, you can now get it on the new hardware at:
http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary

I did the conversion with cvs2svn, packed, added the remote and pushed, only to
find that the pack on the remote side suddenly seemed to be ~60MiB larger.

$ time git repack -adf --window=250 --depth=250
real    19m59.339s
user    96m48.011s
sys     0m36.914s

$ ls -la /tmp/convert/gentoo-x86-cvs2git/.git/objects/pack
total 903804
drwxr-xr-x 2 robbat2 users       119 Apr 14 08:05 .
drwxr-xr-x 4 robbat2 users        28 Apr 14 08:05 ..
-r--r--r-- 1 robbat2 users 139155472 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
-r--r--r-- 1 robbat2 users 786336481 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack

$ git remote add origin git+ssh://git@git-exp.overlays.gentoo.org/exp/gentoo-x86.git
$ git push origin master:master
Initialized empty Git repository in /var/gitroot/exp/gentoo-x86.git/
Counting objects: 4969800, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (1217809/1217809), done.
Writing objects: 100% (4969800/4969800), 810.56 MiB | 21608 KiB/s, done.
Total 4969800 (delta 3735812), reused 4969800 (delta 3735812)
To git+ssh://git@git-exp.overlays.gentoo.org/exp/gentoo-x86.git
 * [new branch]      master -> master

$ ls -la /var/gitroot/exp/gentoo-x86.git/objects/pack
total 966876
drwxr-xr-x 2 git git      4096 Apr 14 08:43 .
drwxr-xr-x 4 git git      4096 Apr 14 08:35 ..
-r--r--r-- 1 git git 139155472 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
-r--r--r-- 1 git git 849936308 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack

On the client side after the initial clone, it DOES match (in size) what was
cloned.

(If you're looking for the 849MB one right now, I'll have to get it back for
you, I wanted to save that extra space so just did an rsync of the other pack
over the too-large one for now).

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Weird growth in packfile during initial push
  2009-04-15 18:27 Weird growth in packfile during initial push Robin H. Johnson
@ 2009-04-15 19:51 ` Nicolas Pitre
  2009-04-29 23:57   ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2009-04-15 19:51 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: git

On Wed, 15 Apr 2009, Robin H. Johnson wrote:

> I was doing a more recent conversion of the Gentoo repo, and ran into
> some odd behavior in the packfile size.
> 
> For anybody else following the repo, you can now get it on the new hardware at:
> http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary
> 
> I did the conversion with cvs2svn, packed, added the remote and pushed, only to
> find that the pack on the remote side suddenly seemed to be ~60MiB larger.

Hmmm.

> $ ls -la /tmp/convert/gentoo-x86-cvs2git/.git/objects/pack
> total 903804
> drwxr-xr-x 2 robbat2 users       119 Apr 14 08:05 .
> drwxr-xr-x 4 robbat2 users        28 Apr 14 08:05 ..
> -r--r--r-- 1 robbat2 users 139155472 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
> -r--r--r-- 1 robbat2 users 786336481 Apr 14 08:05 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack
> 
> $ git remote add origin git+ssh://git@git-exp.overlays.gentoo.org/exp/gentoo-x86.git
> $ git push origin master:master
> Initialized empty Git repository in /var/gitroot/exp/gentoo-x86.git/
> Counting objects: 4969800, done.
> Delta compression using up to 8 threads.
> Compressing objects: 100% (1217809/1217809), done.
> Writing objects: 100% (4969800/4969800), 810.56 MiB | 21608 KiB/s, done.
> Total 4969800 (delta 3735812), reused 4969800 (delta 3735812)

Here we know for sure that all objects were directly reused, so no 
attempt at recompressing them was done.  The only thing that 
pack-objects might do in this case in addition to directly streaming the 
existing pack is to convert delta object headers from OFS_DELTA to 
REF_DELTA.

> $ ls -la /var/gitroot/exp/gentoo-x86.git/objects/pack
> total 966876
> drwxr-xr-x 2 git git      4096 Apr 14 08:43 .
> drwxr-xr-x 4 git git      4096 Apr 14 08:35 ..
> -r--r--r-- 1 git git 139155472 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
> -r--r--r-- 1 git git 849936308 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack

Let's see if my theory stands:

	849936308 - 786336481 = 63599827
	63599827 / 3735812 = 17.02

Hence an average difference of 17 bytes per delta.  Given that REF_DELTA 
objects have a 20-byte SHA1 base reference which is replaced with a 
variable length encoding of a pack offset in the OFS_DELTA case, we're 
talking about 2.98 bytes for that offset encoding which feels about 
right.

[...]

And the code matches this theory as well.  Can you try this patch if you 
have a chance?

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 91c3651..e41adbf 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -44,12 +44,16 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		"--stdout",
 		NULL,
 		NULL,
+		NULL,
 	};
 	struct child_process po;
 	int i;
 
+	i = 4;
 	if (args->use_thin_pack)
-		argv[4] = "--thin";
+		argv[i++] = "--thin";
+	if (args->use_ofs_delta)
+		argv[i++] = "--delta-base-offset";
 	memset(&po, 0, sizeof(po));
 	po.argv = argv;
 	po.in = -1;
@@ -316,6 +320,8 @@ int send_pack(struct send_pack_args *args,
 		ask_for_status_report = 1;
 	if (server_supports("delete-refs"))
 		allow_deleting_refs = 1;
+	if (server_supports("ofs-delta"))
+		args->use_ofs_delta = 1;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 83d76c7..1d7b1b3 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -6,6 +6,7 @@ struct send_pack_args {
 		send_mirror:1,
 		force_update:1,
 		use_thin_pack:1,
+		use_ofs_delta:1,
 		dry_run:1;
 };
 


Nicolas

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Weird growth in packfile during initial push
  2009-04-15 19:51 ` Nicolas Pitre
@ 2009-04-29 23:57   ` Junio C Hamano
  2009-04-30  2:52     ` Nicolas Pitre
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-04-29 23:57 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Robin H. Johnson, git

Nicolas Pitre <nico@cam.org> writes:

>> $ git push origin master:master
>> Initialized empty Git repository in /var/gitroot/exp/gentoo-x86.git/
>> Counting objects: 4969800, done.
>> Delta compression using up to 8 threads.
>> Compressing objects: 100% (1217809/1217809), done.
>> Writing objects: 100% (4969800/4969800), 810.56 MiB | 21608 KiB/s, done.
>> Total 4969800 (delta 3735812), reused 4969800 (delta 3735812)
>
> Here we know for sure that all objects were directly reused, so no 
> attempt at recompressing them was done.  The only thing that 
> pack-objects might do in this case in addition to directly streaming the 
> existing pack is to convert delta object headers from OFS_DELTA to 
> REF_DELTA.
>
>> $ ls -la /var/gitroot/exp/gentoo-x86.git/objects/pack
>> total 966876
>> drwxr-xr-x 2 git git      4096 Apr 14 08:43 .
>> drwxr-xr-x 4 git git      4096 Apr 14 08:35 ..
>> -r--r--r-- 1 git git 139155472 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.idx
>> -r--r--r-- 1 git git 849936308 Apr 14 08:43 pack-f805bb448f864becfeac9c7f8a8ac2ef90c26787.pack
>
> Let's see if my theory stands:
>
> 	849936308 - 786336481 = 63599827
> 	63599827 / 3735812 = 17.02
>
> Hence an average difference of 17 bytes per delta.  Given that REF_DELTA 
> objects have a 20-byte SHA1 base reference which is replaced with a 
> variable length encoding of a pack offset in the OFS_DELTA case, we're 
> talking about 2.98 bytes for that offset encoding which feels about 
> right.
>
> [...]
>
> And the code matches this theory as well.  Can you try this patch if you 
> have a chance?

Is there any progress on this?

I think you did a veryclear analysis.  8% size reduction is not only
unignorable but use of delta offset should also help runtime efficiency,
right?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Weird growth in packfile during initial push
  2009-04-29 23:57   ` Junio C Hamano
@ 2009-04-30  2:52     ` Nicolas Pitre
  2009-05-01  6:17     ` Robin H. Johnson
  2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
  2 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2009-04-30  2:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin H. Johnson, git

On Wed, 29 Apr 2009, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > And the code matches this theory as well.  Can you try this patch if you 
> > have a chance?
> 
> Is there any progress on this?

I'll try to find 5 min tomorrow to test the patch.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Weird growth in packfile during initial push
  2009-04-29 23:57   ` Junio C Hamano
  2009-04-30  2:52     ` Nicolas Pitre
@ 2009-05-01  6:17     ` Robin H. Johnson
  2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
  2 siblings, 0 replies; 16+ messages in thread
From: Robin H. Johnson @ 2009-05-01  6:17 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List; +Cc: Nicolas Pitre, Robin H. Johnson

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

On Wed, Apr 29, 2009 at 04:57:37PM -0700, Junio C Hamano wrote:
> > And the code matches this theory as well.  Can you try this patch if you 
> > have a chance?
> Is there any progress on this?
Sorry, I was just away for 2 weeks, only got back late yesterday. I'll
try to get to it in the next few days unless Nicolas beats me to it.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] allow OFS_DELTA objects during a push
  2009-04-29 23:57   ` Junio C Hamano
  2009-04-30  2:52     ` Nicolas Pitre
  2009-05-01  6:17     ` Robin H. Johnson
@ 2009-05-01 20:56     ` Nicolas Pitre
  2009-05-01 23:49       ` Junio C Hamano
  2009-05-04 22:11       ` Shawn O. Pearce
  2 siblings, 2 replies; 16+ messages in thread
From: Nicolas Pitre @ 2009-05-01 20:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin H. Johnson, git

The fetching of OFS_DELTA objects has been negotiated between both peers 
since git version 1.4.4.  However, this was missing from the push side 
where every OFS_DELTA objects were always converted to REF_DELTA objects 
causing an increase in transferred data.

To fix this, both the client and the server processes have to be 
modified: the former to invoke pack-objects with --delta-base-offset 
when the server provides the ofs-delta capability, and the later to send 
that capability when OFS_DELTA objects are allowed as already indicated 
by the repack.usedeltabaseoffset config variable which is TRUE by 
default since git v1.6.0.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---

On Wed, 29 Apr 2009, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > Hence an average difference of 17 bytes per delta.  Given that REF_DELTA 
> > objects have a 20-byte SHA1 base reference which is replaced with a 
> > variable length encoding of a pack offset in the OFS_DELTA case, we're 
> > talking about 2.98 bytes for that offset encoding which feels about 
> > right.
> >
> > [...]
> >
> > And the code matches this theory as well.  Can you try this patch if you 
> > have a chance?
> 
> Is there any progress on this?
> 
> I think you did a veryclear analysis.  8% size reduction is not only
> unignorable but use of delta offset should also help runtime efficiency,
> right?

Indeed.

Here's the final patch.  My initial one didn't work because the server 
side didn't advertise the needed capability.  So both sides will have to 
be updated for pushes with OFS_DELTA to kick in.

 builtin-receive-pack.c |   22 +++++++++++++++-------
 builtin-send-pack.c    |    8 +++++++-
 send-pack.h            |    1 +
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/builtin-receive-pack.c b/builtin-receive-pack.c
index a970b39..4b9d921 100644
--- a/builtin-receive-pack.c
+++ b/builtin-receive-pack.c
@@ -27,10 +27,9 @@ static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int unpack_limit = 100;
 static int report_status;
+static int prefer_ofs_delta = 1;
 static const char *head_name;
-
-static char capabilities[] = " report-status delete-refs ";
-static int capabilities_sent;
+static char *capabilities_to_send;
 
 static enum deny_action parse_deny_action(const char *var, const char *value)
 {
@@ -84,24 +83,29 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "repack.usedeltabaseoffset") == 0) {
+		prefer_ofs_delta = git_config_bool(var, value);
+		return 0;
+	}
+
 	return git_default_config(var, value, cb);
 }
 
 static int show_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data)
 {
-	if (capabilities_sent)
+	if (!capabilities_to_send)
 		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
 	else
 		packet_write(1, "%s %s%c%s\n",
-			     sha1_to_hex(sha1), path, 0, capabilities);
-	capabilities_sent = 1;
+			     sha1_to_hex(sha1), path, 0, capabilities_to_send);
+	capabilities_to_send = NULL;
 	return 0;
 }
 
 static void write_head_info(void)
 {
 	for_each_ref(show_ref, NULL);
-	if (!capabilities_sent)
+	if (capabilities_to_send)
 		show_ref("capabilities^{}", null_sha1, 0, NULL);
 
 }
@@ -687,6 +691,10 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 	else if (0 <= receive_unpack_limit)
 		unpack_limit = receive_unpack_limit;
 
+	capabilities_to_send = (prefer_ofs_delta) ?
+		" report-status delete-refs ofs-delta " :
+		" report-status delete-refs ";
+
 	add_alternate_refs();
 	write_head_info();
 	clear_extra_refs();
diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index d5a1c48..473a3de 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -43,12 +43,16 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		"--stdout",
 		NULL,
 		NULL,
+		NULL,
 	};
 	struct child_process po;
 	int i;
 
+	i = 4;
 	if (args->use_thin_pack)
-		argv[4] = "--thin";
+		argv[i++] = "--thin";
+	if (args->use_ofs_delta)
+		argv[i++] = "--delta-base-offset";
 	memset(&po, 0, sizeof(po));
 	po.argv = argv;
 	po.in = -1;
@@ -315,6 +319,8 @@ int send_pack(struct send_pack_args *args,
 		ask_for_status_report = 1;
 	if (server_supports("delete-refs"))
 		allow_deleting_refs = 1;
+	if (server_supports("ofs-delta"))
+		args->use_ofs_delta = 1;
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 83d76c7..1d7b1b3 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -6,6 +6,7 @@ struct send_pack_args {
 		send_mirror:1,
 		force_update:1,
 		use_thin_pack:1,
+		use_ofs_delta:1,
 		dry_run:1;
 };
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] allow OFS_DELTA objects during a push
  2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
@ 2009-05-01 23:49       ` Junio C Hamano
  2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
  2009-05-02  0:24         ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
  2009-05-04 22:11       ` Shawn O. Pearce
  1 sibling, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-05-01 23:49 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Robin H. Johnson, git

Thanks.

The code looks correct, I am reasonably sure updated server-client
combination would work fine, and use of the capability mechanism means
other combinations like old pusher and new receiver, and/or new pusher and
old receiver, should be also Ok.

I see Shawn did the same to jgit.

But I'd like to queue this in 'next', and make it official after 1.6.3
happens.

I just do not want to repeat silly mistakes, this close to the final,
similar to the "github needs to get stuck forever at 1.6.1" we made with
40c155f (push: prepare sender to receive extended ref information from the
receiver, 2008-09-09); it was done as a good change after a discussion
among Shawn, Daniel and I.  We managed to botch it and had to later fix
with 02322e1 (send-pack: do not send unknown object name from ".have" to
pack-objects, 2009-01-27).

(references)

  http://thread.gmane.org/gmane.comp.version-control.git/95351
  http://thread.gmane.org/gmane.comp.version-control.git/95072
  http://thread.gmane.org/gmane.comp.version-control.git/107417/focus=107500

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Compatibility between git.git and jgit
  2009-05-01 23:49       ` Junio C Hamano
@ 2009-05-02  0:01         ` Shawn O. Pearce
  2009-05-02  1:14           ` A Large Angry SCM
                             ` (2 more replies)
  2009-05-02  0:24         ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
  1 sibling, 3 replies; 16+ messages in thread
From: Shawn O. Pearce @ 2009-05-02  0:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git

Junio C Hamano <gitster@pobox.com> wrote:
> The code looks correct, I am reasonably sure updated server-client
> combination would work fine, and use of the capability mechanism means
> other combinations like old pusher and new receiver, and/or new pusher and
> old receiver, should be also Ok.
> 
> I see Shawn did the same to jgit.

On an unrelated note, someone asked me recently, how do we ensure
compatibility in implementations between git.git and jgit?

There isn't exactly a great notion of "a Git implementation can do
X, Y, Z, and never does Q".  So its not like we have a compability
test suite to run between the two systems.

JGit is really starting to gain some traction in the open source
world.

A lot of folks at Eclipse are really excited about being able to
ship a BSD licensed VCS.  Some of the Maven folks are really excited
about being able to link JGit up to Apache MINA SSHD and have a 100%
pure Java server solution for Git, that doesn't require native OS
authentication systems.  Gerrit Code Review relies entirely on it,
and some folks within Google are now using Gerrit Code Review and
its embedded MINA SSHD/JGit server as their only Git daemon.

Thus far, our compatibility story with git.git has been, "it should
work, uh, we think, because Shawn understands git reasonably well,
and wrote some of JGit, so uh, yea....".  :-)

But I think in another 12 months we'll be seeing people running
only JGit in many contexts, making compatibility between the two
implementations somewhat more important than it has been in the past.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] allow OFS_DELTA objects during a push
  2009-05-01 23:49       ` Junio C Hamano
  2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
@ 2009-05-02  0:24         ` Nicolas Pitre
  1 sibling, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2009-05-02  0:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin H. Johnson, git

On Fri, 1 May 2009, Junio C Hamano wrote:

> Thanks.
> 
> The code looks correct, I am reasonably sure updated server-client
> combination would work fine, and use of the capability mechanism means
> other combinations like old pusher and new receiver, and/or new pusher and
> old receiver, should be also Ok.
> 
> I see Shawn did the same to jgit.
> 
> But I'd like to queue this in 'next', and make it official after 1.6.3
> happens.
> 
> I just do not want to repeat silly mistakes, this close to the final,
> similar to the "github needs to get stuck forever at 1.6.1" we made with
> 40c155f (push: prepare sender to receive extended ref information from the
> receiver, 2008-09-09); it was done as a good change after a discussion
> among Shawn, Daniel and I.

No problem.  Since this is fixing something that actually never worked 
before, it is therefore not what I would call a critical fix.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Compatibility between git.git and jgit
  2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
@ 2009-05-02  1:14           ` A Large Angry SCM
  2009-05-02  1:39           ` Nicolas Pitre
  2009-05-02  1:40           ` Michael Witten
  2 siblings, 0 replies; 16+ messages in thread
From: A Large Angry SCM @ 2009-05-02  1:14 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, Nicolas Pitre, git

Shawn O. Pearce wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
>> The code looks correct, I am reasonably sure updated server-client
>> combination would work fine, and use of the capability mechanism means
>> other combinations like old pusher and new receiver, and/or new pusher and
>> old receiver, should be also Ok.
>>
>> I see Shawn did the same to jgit.
> 
> On an unrelated note, someone asked me recently, how do we ensure
> compatibility in implementations between git.git and jgit?
> 
> There isn't exactly a great notion of "a Git implementation can do
> X, Y, Z, and never does Q".  So its not like we have a compability
> test suite to run between the two systems.
> 
> JGit is really starting to gain some traction in the open source
> world.
> 
> A lot of folks at Eclipse are really excited about being able to
> ship a BSD licensed VCS.  Some of the Maven folks are really excited
> about being able to link JGit up to Apache MINA SSHD and have a 100%
> pure Java server solution for Git, that doesn't require native OS
> authentication systems.  Gerrit Code Review relies entirely on it,
> and some folks within Google are now using Gerrit Code Review and
> its embedded MINA SSHD/JGit server as their only Git daemon.
> 
> Thus far, our compatibility story with git.git has been, "it should
> work, uh, we think, because Shawn understands git reasonably well,
> and wrote some of JGit, so uh, yea....".  :-)
> 
> But I think in another 12 months we'll be seeing people running
> only JGit in many contexts, making compatibility between the two
> implementations somewhat more important than it has been in the past.
> 

[A non-answer to this implied question]

Usually one of the implementations is declared "the reference 
implementation".

[And another question]

Hasn't this issue come up before in the mailing list?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Compatibility between git.git and jgit
  2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
  2009-05-02  1:14           ` A Large Angry SCM
@ 2009-05-02  1:39           ` Nicolas Pitre
  2009-05-02  1:59             ` Shawn O. Pearce
  2009-05-02 16:56             ` Ealdwulf Wuffinga
  2009-05-02  1:40           ` Michael Witten
  2 siblings, 2 replies; 16+ messages in thread
From: Nicolas Pitre @ 2009-05-02  1:39 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

On Fri, 1 May 2009, Shawn O. Pearce wrote:

> On an unrelated note, someone asked me recently, how do we ensure
> compatibility in implementations between git.git and jgit?
> 
> There isn't exactly a great notion of "a Git implementation can do
> X, Y, Z, and never does Q".  So its not like we have a compability
> test suite to run between the two systems.
> 
> JGit is really starting to gain some traction in the open source
> world.

Well... this is not exactly easy.  As I said in the past 
(http://marc.info/?l=git&m=121035043412788&w=2), I think that the C 
version must remain the reference with regards to protocols and on-disk 
data structures.  If people go wild with JGit and start making changes 
to data structures then it simply won't be Git compatible anymore and 
the user base will get fragmented.  A good way to prevent this is for 
people interested in making git compatible tools to monitor and interact 
on the git mailing list, however if we look at the results from some 
past GSOC projects we must conclude that not everyone is giving enough 
consideration to that.

A formal compatibility test suite would imply that every Git 
reimplementation should be compatible with the reference C version.  
You could add some tests in your test suite which are performed in 
parallel using JGit and the C git, and make sure that the produced 
results are identical, etc.

But to which extent should the C version remain backward compatible with 
other implementations?  Let's suppose a future protocol extension is 
made and old unsuspecting C clients work just fine but some other 
implementation crashes with it?  The more you have reimplementations of 
Git, the greater is the possibility for one of them to be flawed and 
buggier in one way or another but happened to just work with older C git 
versions.  And the reference implementation cannot be held back because 
of bugs in all alternative implementations.

> A lot of folks at Eclipse are really excited about being able to
> ship a BSD licensed VCS.  Some of the Maven folks are really excited
> about being able to link JGit up to Apache MINA SSHD and have a 100%
> pure Java server solution for Git, that doesn't require native OS
> authentication systems.  Gerrit Code Review relies entirely on it,
> and some folks within Google are now using Gerrit Code Review and
> its embedded MINA SSHD/JGit server as their only Git daemon.

As long as they're futzing^Wdeveloping on top of Jgit then 
interoperability shouldn't be at risk.  If people would start adding new 
object types and pack formats and the like without obtaining a consensus 
with people around the C version then I might get extremely worried (and 
pissed) though.

> Thus far, our compatibility story with git.git has been, "it should
> work, uh, we think, because Shawn understands git reasonably well,
> and wrote some of JGit, so uh, yea....".  :-)

If that works, and as I know you I'm sure that works great, then maybe 
this should just continue that way for as long as it is workable.

> But I think in another 12 months we'll be seeing people running
> only JGit in many contexts, making compatibility between the two
> implementations somewhat more important than it has been in the past.

One defensive approach we could adopt is to use a capability slot to 
identify the software version of each peer involved in the network 
communication.  The advantage would be for a later Git version to avoid 
doing some things that are known to break with client X or Y.  Of course 
even such a scheme can be abused and misused, like on some web sites if 
you don't have the "right" browser, leading some of them to allow faking 
the User-Agent string, etc.  But maybe the upsides are more important 
than the downsides.  This doesn't help with on-disk interoperability, 
but this is probably less important than communication interoperability.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Compatibility between git.git and jgit
  2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
  2009-05-02  1:14           ` A Large Angry SCM
  2009-05-02  1:39           ` Nicolas Pitre
@ 2009-05-02  1:40           ` Michael Witten
  2 siblings, 0 replies; 16+ messages in thread
From: Michael Witten @ 2009-05-02  1:40 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, Nicolas Pitre, git

On Fri, May 1, 2009 at 19:01, Shawn O. Pearce <spearce@spearce.org> wrote:
>
> But I think in another 12 months we'll be seeing people running
> only JGit in many contexts, making compatibility between the two
> implementations somewhat more important than it has been in the past.

:-D

Getting a little cocky there, eh Shawn?

Interestingly, you don't say "making compatibility **with git.git**
more important"...

Watch your back Junio!

:-D

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Compatibility between git.git and jgit
  2009-05-02  1:39           ` Nicolas Pitre
@ 2009-05-02  1:59             ` Shawn O. Pearce
  2009-05-02 16:56             ` Ealdwulf Wuffinga
  1 sibling, 0 replies; 16+ messages in thread
From: Shawn O. Pearce @ 2009-05-02  1:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git

Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 1 May 2009, Shawn O. Pearce wrote:
> 
> > On an unrelated note, someone asked me recently, how do we ensure
> > compatibility in implementations between git.git and jgit?
> 
> Well... this is not exactly easy.  As I said in the past 
> (http://marc.info/?l=git&m=121035043412788&w=2), I think that the C 
> version must remain the reference with regards to protocols and on-disk 
> data structures.

I agree fully.

> If people go wild with JGit and start making changes 
> to data structures then it simply won't be Git compatible anymore and 
> the user base will get fragmented.

Agree.  We may see some prototyping happen in JGit first on some
topics, and JGit may even support something earlier than git.git,
e.g JGit has an amazon-s3:// transport that git.git doesn't have.
But it also isn't widely used.

> A formal compatibility test suite would imply that every Git 
> reimplementation should be compatible with the reference C version.  
> You could add some tests in your test suite which are performed in 
> parallel using JGit and the C git, and make sure that the produced 
> results are identical, etc.

Yea, and to some extent we try to do that already in JGit, but our
tests aren't complete enough in that area.
 
> But to which extent should the C version remain backward compatible with 
> other implementations?  Let's suppose a future protocol extension is 
> made and old unsuspecting C clients work just fine but some other 
> implementation crashes with it?

This is what I think scares both myself and the folks that have
recently asked me about compatibility.

If JGit gets a broader user base, and suddenly it stops working
against a newer C git-daemon because of a protocol change, those
users are going to be pissed.  Its no worse than the "github can't
ever upgrade past 1.6.1" issue we had not too long ago.

I think we're doing better these days about embedding file format
version numbers into files (e.g. pack idx v2) to help alert older
clients that the format is different.  But we also have a something
of a history of looking for "holes" in older C git parsers in
order to wedge in new features where we didn't plan for them in
the first place.  E.g. the protocol capability slots we have now.

I think that as reimplementations become more popular, we need to
rely less on extending things by exploiting parser quirks in older
C git.git code, and rely more on at least explicit version markers
that everyone can work with.

> And the reference implementation cannot be held back because 
> of bugs in all alternative implementations.

I agree.  A bug is a bug.  But I'd really like to get away from the
trend where we exploit bugs in older C git.git implementations to
add new functionality, because maybe JGit doesn't have that same
bug and will fall flat on its face with that exploit.

> As long as they're futzing^Wdeveloping on top of Jgit then 
> interoperability shouldn't be at risk.  If people would start adding new 
> object types and pack formats and the like without obtaining a consensus 
> with people around the C version then I might get extremely worried (and 
> pissed) though.

That's why JGit is BSD, so everyone can use the one f'king library
and not risk fragmenting the Java market further.

But yea, I'd be really pissed too if someone hacked up JGit and made
it incompatible with anything else.  Its a risk that the liberal
BSD license permits.

I'm really sort of hoping that the development momentum around
git.git and JGit trying to keep up will keep them coming back
to the canonical JGit for updates, forcing them to give back any
hacks^Wimprovements they have made.  If the improvements really are
worthwhile, they can be easily ported over to C before they become
widely used in JGit.
 
> One defensive approach we could adopt is to use a capability slot to 
> identify the software version of each peer involved in the network 
> communication.  The advantage would be for a later Git version to avoid 
> doing some things that are known to break with client X or Y.  Of course 
> even such a scheme can be abused and misused, like on some web sites if 
> you don't have the "right" browser, leading some of them to allow faking 
> the User-Agent string, etc.  But maybe the upsides are more important 
> than the downsides.  This doesn't help with on-disk interoperability, 
> but this is probably less important than communication interoperability.

Blargh.  I'm with you about the whole User-Agent mess.

Asking clients and servers to identify with implementation and
version markers might be useful for analysis of who-is-using-what,
but I don't think its a good way to negotiate between the peers of
what functionality to enable or disable, or what bug workarounds
to use.  Reminds me of the Apache hack during output to work around
an HTTP header parsing bug in Netscape 2 when the "\r\n" pair was
exactly at byte 256 in the stream.  *shudder*


FWIW, an EGit user recently complained that some random Git hosting
site they were using couldn't work with EGit, but EGit worked fine
with other sites, e.g. GitHub.  Apparently this site's SSH forced command
filter script didn't like EGit asking for "git upload-pack 'path.git'".

Its not strictly a Git protocol issue, how the client launches
the remote process over SSH, but this random hosting site was
apparently relying on C git's current calling convention of
"git-upload-pack 'path.git'".

Long story short, I claimed it was the hosting site's bug.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Compatibility between git.git and jgit
  2009-05-02  1:39           ` Nicolas Pitre
  2009-05-02  1:59             ` Shawn O. Pearce
@ 2009-05-02 16:56             ` Ealdwulf Wuffinga
  1 sibling, 0 replies; 16+ messages in thread
From: Ealdwulf Wuffinga @ 2009-05-02 16:56 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn O. Pearce, Junio C Hamano, git

On Sat, May 2, 2009 at 2:39 AM, Nicolas Pitre <nico@cam.org> wrote:
>
> A formal compatibility test suite would imply that every Git
> reimplementation should be compatible with the reference C version.
> You could add some tests in your test suite which are performed in
> parallel using JGit and the C git, and make sure that the produced
> results are identical, etc.

 If at all possible, it would be a good idea to make it trivial for
new tests in the usual
git testsuite to be compatability tests (in a special mode, since it
would probably slow them down drastically). Ie, we have special
separate copy of all the git.git executables, which
underneath run two different versions of git and check that they did
the same thing.
Or alternatively wait for the librarification of git.git to complete,
and do it at the level of that API.

This may be hideously slow unless you have some kind of snapshotting filesystem
underneath. (Ironically the one that springs to mind is built into
vesta, another scm: http://www.vestasys.org. Vesta's
filesystem-manipulation language would be ideal
for this. Maybe you could copy it;  it's LGPL).

It's less obvious how networking related tests could be automatically
made into compatability
tests.

Doing this would be a lot trickier than writing some new conformance
tests, but once it was working, it would be a lot easier to keep track
of new features. It may be too tricky
to get to work, but it strikes me as worth thinking about.

Ealdwulf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] allow OFS_DELTA objects during a push
  2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
  2009-05-01 23:49       ` Junio C Hamano
@ 2009-05-04 22:11       ` Shawn O. Pearce
  2009-05-04 22:30         ` Shawn O. Pearce
  1 sibling, 1 reply; 16+ messages in thread
From: Shawn O. Pearce @ 2009-05-04 22:11 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Robin H. Johnson, git

Nicolas Pitre <nico@cam.org> wrote:
> The fetching of OFS_DELTA objects has been negotiated between both peers 
> since git version 1.4.4.  However, this was missing from the push side 
> where every OFS_DELTA objects were always converted to REF_DELTA objects 
> causing an increase in transferred data.

Folks, this may have broken git push for me.

I'm trying to debug it right now, but something in next between
46488d2 and 03e1664 has caused "git push" to not create a pack
file, sending the remote peer 0 objects, when really we should have
transmitted objects, e.g. in the case I just looked at, we should
have sent 11.

FWIW, I'm currently blaming this change as its the only thing to
touch builtin-send-pack.c in that commit range.  :-)

/me goes off to debug this further...
 
>  builtin-receive-pack.c |   22 +++++++++++++++-------
>  builtin-send-pack.c    |    8 +++++++-
>  send-pack.h            |    1 +
>  3 files changed, 23 insertions(+), 8 deletions(-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] allow OFS_DELTA objects during a push
  2009-05-04 22:11       ` Shawn O. Pearce
@ 2009-05-04 22:30         ` Shawn O. Pearce
  0 siblings, 0 replies; 16+ messages in thread
From: Shawn O. Pearce @ 2009-05-04 22:30 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Robin H. Johnson, git

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > The fetching of OFS_DELTA objects has been negotiated between both peers 
> > since git version 1.4.4.  However, this was missing from the push side 
> > where every OFS_DELTA objects were always converted to REF_DELTA objects 
> > causing an increase in transferred data.
> 
> Folks, this may have broken git push for me.
> 
> I'm trying to debug it right now, but something in next between
> 46488d2 and 03e1664 has caused "git push" to not create a pack
> file, sending the remote peer 0 objects, when really we should have
> transmitted objects, e.g. in the case I just looked at, we should
> have sent 11.

Uhm, never mind.

Somehow my incremental build failed horribly; it compiled and created
a git-push which worked "some of the time".  Building again produced
a working git-push.

Scary stuff.  Now I have to worry about the toolchain on this system.
But I don't think there is a problem in git.git.
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-05-04 22:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-15 18:27 Weird growth in packfile during initial push Robin H. Johnson
2009-04-15 19:51 ` Nicolas Pitre
2009-04-29 23:57   ` Junio C Hamano
2009-04-30  2:52     ` Nicolas Pitre
2009-05-01  6:17     ` Robin H. Johnson
2009-05-01 20:56     ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-01 23:49       ` Junio C Hamano
2009-05-02  0:01         ` Compatibility between git.git and jgit Shawn O. Pearce
2009-05-02  1:14           ` A Large Angry SCM
2009-05-02  1:39           ` Nicolas Pitre
2009-05-02  1:59             ` Shawn O. Pearce
2009-05-02 16:56             ` Ealdwulf Wuffinga
2009-05-02  1:40           ` Michael Witten
2009-05-02  0:24         ` [PATCH] allow OFS_DELTA objects during a push Nicolas Pitre
2009-05-04 22:11       ` Shawn O. Pearce
2009-05-04 22:30         ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).