All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects
@ 2011-05-13 16:54 Johan Herland
  2011-05-13 17:09 ` Junio C Hamano
  2011-05-13 18:20 ` [PATCH 2/2] receive-pack: Add receive.denyObjectLimit " Johannes Sixt
  0 siblings, 2 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-13 16:54 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The new receive.denyObjectLimit config variable defines an upper limit on the
number of objects to accept in a single push. If the number of objects in a
push exceeds this limit, the entire push is immediately aborted without
storing the pushed objects on the server at all.

This is meant to prevent an unintended large push (typically a result of the
user not being aware of exactly what is being pushed, e.g. pushing a large
rewritten history) from entering the repo.

Usually, this kind of limit could be imposed by a pre-receive or update hook,
but both of those run _after_ the pack has been received and stored by
receive-pack, so they cannot prevent the pack from being transferred in the
first place.

Documentation and tests are included.

Signed-off-by: Johan Herland <johan@herland.net>
---

There's something weird going on here: The included test case works as
intended, but when I try to test this in "real-world" conditions,
I don't get the expected error message in the output:

  # From inside my git.git repo:
  $ git init --bare foo.git
  $ (cd foo.git && git config receive.denyObjectLimit 100000)
  $ git push foo.git/ master
  Pushing to foo.git/
  Counting objects: 112383, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (27658/27658), done.
  error: pack-objects died of signal 13
  error: failed to push some refs to 'foo.git/'
  $

So the pack-objects on the local side dies of a broken pipe (as
expected), but the error message from the remote side:

  error: unpack failed: received pack exceeds configured receive.denyObjectLimit

is never printed, so the user gets no clue as to why the push failed.
This should be fixed before this patch can be considered for inclusion.


...Johan


 Documentation/config.txt |    9 +++++++++
 builtin/receive-pack.c   |   10 +++++++++-
 t/t5400-send-pack.sh     |   44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 285c7f7..7c60146 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1591,6 +1591,15 @@ receive.unpackLimit::
 	especially on slow filesystems.  If not set, the value of
 	`transfer.unpackLimit` is used instead.
 
+receive.denyObjectLimit::
+	If the number of objects received in a push exceeds this limit,
+	then the entire push will be refused. This is meant to prevent
+	an unintended large push (typically a result of the user not
+	being aware of exactly what is being pushed, e.g. pushing a
+	large rewritten history) from entering the repo. If not set,
+	there is no upper limit on the number of objects transferred
+	in a single push.
+
 receive.denyDeletes::
 	If set to true, git-receive-pack will deny a ref update that deletes
 	the ref. Use this to prevent such a ref deletion via a push.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e1ba4dc..856650f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -25,6 +25,7 @@ static int deny_non_fast_forwards;
 static enum deny_action deny_current_branch = DENY_UNCONFIGURED;
 static enum deny_action deny_delete_current = DENY_UNCONFIGURED;
 static int receive_fsck_objects;
+static int receive_deny_limit;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int unpack_limit = 100;
@@ -63,6 +64,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.denyobjectlimit") == 0) {
+		receive_deny_limit = git_config_int(var, value);
+		return 0;
+	}
+
 	if (strcmp(var, "receive.unpacklimit") == 0) {
 		receive_unpack_limit = git_config_int(var, value);
 		return 0;
@@ -648,7 +654,9 @@ static const char *unpack(void)
 			"--pack_header=%"PRIu32",%"PRIu32,
 			ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
 
-	if (ntohl(hdr.hdr_entries) < unpack_limit) {
+	if (receive_deny_limit > 0 && ntohl(hdr.hdr_entries) > receive_deny_limit)
+		return "received pack exceeds configured receive.denyObjectLimit";
+	else if (ntohl(hdr.hdr_entries) < unpack_limit) {
 		int code, i = 0;
 		const char *unpacker[4];
 		unpacker[i++] = "unpack-objects";
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 0eace37..2fbe34b 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -222,4 +222,48 @@ test_expect_success 'deny pushing to delete current branch' '
 	)
 '
 
+test_expect_success 'deny pushing when receive.denyObjectLimit is exceeded' '
+	rewound_push_setup &&
+	(
+	    cd parent &&
+	    git config receive.denyObjectLimit 1
+	) &&
+	(
+	    cd child &&
+	    git reset --hard origin/master &&
+	    echo three > file && git commit -a -m three &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "receive\\.denyObjectLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+	(
+	    cd child &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "receive\\.denyObjectLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increasing receive.denyObjectLimit allows the push' '
+	(
+	    cd parent &&
+	    git config receive.denyObjectLimit 10
+	) &&
+	(
+	    cd child &&
+	    git send-pack ../parent master 2>errs &&
+	    test_must_fail grep -q "receive\\.denyObjectLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" = "$child_head"
+'
+
 test_done
-- 
1.7.5.1.354.g761178


-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects
  2011-05-13 16:54 [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects Johan Herland
@ 2011-05-13 17:09 ` Junio C Hamano
  2011-05-14  1:43   ` Johan Herland
  2011-05-13 18:20 ` [PATCH 2/2] receive-pack: Add receive.denyObjectLimit " Johannes Sixt
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-13 17:09 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> writes:

> The new receive.denyObjectLimit config variable defines an upper limit on the
> number of objects to accept in a single push. If the number of objects in a
> push exceeds this limit, the entire push is immediately aborted without
> storing the pushed objects on the server at all.

Where does the error message go?  Can clients pushing over various
transports receive the reason without your server consuming the data from
them?  Don't you want to "receive-in-core-and-discard" instead?

For the purpose of "preventing an accidental push", I suspect that people
would expect you to limit either by number of commits (i.e. depth of
history) or by the total size of the data being transferred.

The name "objectlimit" sounds as if you are doing the latter and we can
use "200MB" there, but you are only limiting by count, so it is somewhat
misleading.  We would want to see "count" or "number" somewhere in its
name.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects
  2011-05-13 16:54 [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects Johan Herland
  2011-05-13 17:09 ` Junio C Hamano
@ 2011-05-13 18:20 ` Johannes Sixt
  2011-05-14  1:49   ` Johan Herland
  1 sibling, 1 reply; 53+ messages in thread
From: Johannes Sixt @ 2011-05-13 18:20 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

Am 13.05.2011 18:54, schrieb Johan Herland:
> There's something weird going on here: The included test case works as
> intended, but when I try to test this in "real-world" conditions,
> I don't get the expected error message in the output:
> 
>   # From inside my git.git repo:
>   $ git init --bare foo.git
>   $ (cd foo.git && git config receive.denyObjectLimit 100000)
>   $ git push foo.git/ master
>   Pushing to foo.git/
>   Counting objects: 112383, done.
>   Delta compression using up to 8 threads.
>   Compressing objects: 100% (27658/27658), done.
>   error: pack-objects died of signal 13
>   error: failed to push some refs to 'foo.git/'
>   $
> 
> So the pack-objects on the local side dies of a broken pipe (as
> expected), but the error message from the remote side:
> 
>   error: unpack failed: received pack exceeds configured receive.denyObjectLimit
> 
> is never printed, so the user gets no clue as to why the push failed.

The error message is printed by receive_status(), called around line 350
in builtin/send-pack.c. But when pack-object fails, then the
pack_objects() call around line 340 signals an error and an early-exit
branch is taken, and receive_status() is never called.

In the test case, only a small amount of data is produced by
pack-objects, so that it can exit successfully and quickly enough
because the data fits into the pipe buffer. If the pack-objects process
were scheduled differently, there is a chance that it dies from SIGPIPE
as well. So, you are just being lucky that the test case succeeds.

-- Hannes

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects
  2011-05-13 17:09 ` Junio C Hamano
@ 2011-05-14  1:43   ` Johan Herland
  2011-05-14  2:03     ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit " Johan Herland
  0 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-14  1:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Friday 13 May 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > The new receive.denyObjectLimit config variable defines an upper limit
> > on the number of objects to accept in a single push. If the number of
> > objects in a push exceeds this limit, the entire push is immediately
> > aborted without storing the pushed objects on the server at all.
> 
> Where does the error message go?  Can clients pushing over various
> transports receive the reason without your server consuming the data from
> them?  Don't you want to "receive-in-core-and-discard" instead?

Yes. Will be fixed in the re-roll.

> For the purpose of "preventing an accidental push", I suspect that people
> would expect you to limit either by number of commits (i.e. depth of
> history) or by the total size of the data being transferred.

Yes, I agree that limiting by #commits, or by pack size would be more 
intuitive. However, neither of those values are available to me at the point 
where I have to decide what to do with the pack data (only the pack header 
is available, and that only contains the object count).

> The name "objectlimit" sounds as if you are doing the latter and we can
> use "200MB" there, but you are only limiting by count, so it is somewhat
> misleading.  We would want to see "count" or "number" somewhere in its
> name.

Agreed. Will be renamed to receive.objectCountLimit in the re-roll.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects
  2011-05-13 18:20 ` [PATCH 2/2] receive-pack: Add receive.denyObjectLimit " Johannes Sixt
@ 2011-05-14  1:49   ` Johan Herland
  0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-14  1:49 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano

On Friday 13 May 2011, Johannes Sixt wrote:
> Am 13.05.2011 18:54, schrieb Johan Herland:
> > So the pack-objects on the local side dies of a broken pipe (as
> > expected), but the error message from the remote side:
> > 
> >   error: unpack failed: received pack exceeds configured
> >   receive.denyObjectLimit
> > 
> > is never printed, so the user gets no clue as to why the push failed.
> 
> The error message is printed by receive_status(), called around line 350
> in builtin/send-pack.c. But when pack-object fails, then the
> pack_objects() call around line 340 signals an error and an early-exit
> branch is taken, and receive_status() is never called.
> 
> In the test case, only a small amount of data is produced by
> pack-objects, so that it can exit successfully and quickly enough
> because the data fits into the pipe buffer. If the pack-objects process
> were scheduled differently, there is a chance that it dies from SIGPIPE
> as well. So, you are just being lucky that the test case succeeds.

Thanks for the helpful explanation!

Indeed (as Junio already suggested) it seems I must swallow and discard all 
the data sent by the client, and there's no way to easily abort the transfer 
_and_ get the error message printed on the client side. Still, I can at 
least prevent the pack from being stored server-side.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14  1:43   ` Johan Herland
@ 2011-05-14  2:03     ` Johan Herland
  2011-05-14  2:30       ` Shawn Pearce
  0 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-14  2:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

The new receive.objectCountLimit config variable defines an upper limit
on the number of objects to accept in a single push. If the number of
objects in a push exceeds this limit, the entire push is discarded
without storing the pushed objects on the server at all.

This is meant to prevent an unintended large push (typically a result
of the user not being aware of exactly what is being pushed,
e.g. pushing a large rewritten history) from entering the repo.

Usually, this kind of limit could be imposed by a pre-receive or update
hook, but both of those run _after_ the pack has been received and
stored by receive-pack, so they cannot prevent the pack from being
stored on the server.

Documentation and tests are included.

Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

Here is the re-roll with "receive-in-core-and-discard", plus the config
variable rename.


Have fun! :)

...Johan


 Documentation/config.txt |    9 +++++++++
 builtin/receive-pack.c   |   14 +++++++++++++-
 t/t5400-send-pack.sh     |   44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 285c7f7..b356956 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1591,6 +1591,15 @@ receive.unpackLimit::
 	especially on slow filesystems.  If not set, the value of
 	`transfer.unpackLimit` is used instead.
 
+receive.objectCountLimit::
+	If the number of objects received in a push exceeds this limit,
+	then the entire push will be refused. This is meant to prevent
+	an unintended large push (typically a result of the user not
+	being aware of exactly what is being pushed, e.g. pushing a
+	large rewritten history) from entering the repo. If not set,
+	there is no upper limit on the number of objects transferred
+	in a single push.
+
 receive.denyDeletes::
 	If set to true, git-receive-pack will deny a ref update that deletes
 	the ref. Use this to prevent such a ref deletion via a push.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e1ba4dc..f521c83 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -25,6 +25,7 @@ static int deny_non_fast_forwards;
 static enum deny_action deny_current_branch = DENY_UNCONFIGURED;
 static enum deny_action deny_delete_current = DENY_UNCONFIGURED;
 static int receive_fsck_objects;
+static int receive_object_count_limit;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int unpack_limit = 100;
@@ -63,6 +64,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.objectcountlimit") == 0) {
+		receive_object_count_limit = git_config_int(var, value);
+		return 0;
+	}
+
 	if (strcmp(var, "receive.unpacklimit") == 0) {
 		receive_unpack_limit = git_config_int(var, value);
 		return 0;
@@ -648,7 +654,13 @@ static const char *unpack(void)
 			"--pack_header=%"PRIu32",%"PRIu32,
 			ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
 
-	if (ntohl(hdr.hdr_entries) < unpack_limit) {
+	if (receive_object_count_limit > 0 &&
+	    ntohl(hdr.hdr_entries) > receive_object_count_limit) {
+		char buf[1024];
+		while (xread(0, buf, 1024) > 0)
+			; /* nothing */
+		return "received pack exceeds configured receive.objectCountLimit";
+	} else if (ntohl(hdr.hdr_entries) < unpack_limit) {
 		int code, i = 0;
 		const char *unpacker[4];
 		unpacker[i++] = "unpack-objects";
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 0eace37..9198019 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -222,4 +222,48 @@ test_expect_success 'deny pushing to delete current branch' '
 	)
 '
 
+test_expect_success 'deny pushing when receive.objectCountLimit is exceeded' '
+	rewound_push_setup &&
+	(
+	    cd parent &&
+	    git config receive.objectCountLimit 1
+	) &&
+	(
+	    cd child &&
+	    git reset --hard origin/master &&
+	    echo three > file && git commit -a -m three &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "receive\\.objectCountLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+	(
+	    cd child &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "receive\\.objectCountLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increasing receive.objectCountLimit allows the push' '
+	(
+	    cd parent &&
+	    git config receive.objectCountLimit 10
+	) &&
+	(
+	    cd child &&
+	    git send-pack ../parent master 2>errs &&
+	    test_must_fail grep -q "receive\\.objectCountLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" = "$child_head"
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14  2:03     ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit " Johan Herland
@ 2011-05-14  2:30       ` Shawn Pearce
  2011-05-14 13:17         ` Johan Herland
  2011-05-14 17:50         ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects Junio C Hamano
  0 siblings, 2 replies; 53+ messages in thread
From: Shawn Pearce @ 2011-05-14  2:30 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Fri, May 13, 2011 at 19:03, Johan Herland <johan@herland.net> wrote:
> The new receive.objectCountLimit config variable defines an upper limit
> on the number of objects to accept in a single push. If the number of
> objects in a push exceeds this limit, the entire push is discarded
> without storing the pushed objects on the server at all.
...
> +           ntohl(hdr.hdr_entries) > receive_object_count_limit) {
> +               char buf[1024];
> +               while (xread(0, buf, 1024) > 0)
> +                       ; /* nothing */
> +               return "received pack exceeds configured receive.objectCountLimit";

Discarding in core is painful. We are still consuming bandwidth to
toss away the data.

I wonder... should we instead export the objectCountLimit as part of
the advertisement to the client, and teach send-pack to look at this
and pass it down to pack-objects? If pack-objects winds up with more
than this limit, it aborts and the client doesn't even transmit data.
Newer clients would abort cleanly with a nice error.

I don't see it as a problem to advertise to a client "This server will
only accept X objects from you, sending X + 1 is an error and will be
rejected." If we are worried about an evil client using this
advertisement to try and DoS a server... he can easily do that with a
single giant blob. Or a very long delta chain of a single blob and
many, many tiny deltas applied onto it. Knowing what the remote's
objectCountLimit is doesn't increase the risk of a DoS attack.

For older clients that don't know this new advertised capability, they
should fail hard and not transfer all of this data. In my experience
when a user gets these strange errors from his Git client, he contacts
his server administrator with the screen output. At which point the
administrator can see the Counting objects line, check the repository
configuration, and tell the user what the problem is... and encourage
them to upgrade their client to a newer version.

If we are going to put limits in, does it make sense to try and push
these limits back to pack-objects in a more detailed way? You talked
about depth of history, or size of pack. pack-objects could
approximate both. If its depth of history, it might even be able to
cut off before it enumerates too many commits. :-)

The remote side obviously cannot abort early with number of commits or
pack size limits, but if those were soft limits suggested to a client,
while the object count was a hard limit, you might get a better
approximation for what you want. A server administrator might
configure a soft limit of 10 commits, but a hard limit of 5,000
objects. For most users, a bad push would abort very early on the soft
limit of 10 commits if they did an incorrect rebase. Meanwhile a user
who made 1 commit but changed every GPL header in the linux-2.6
repository (26,000+ files) would also be stopped for exceeding the
(hard) 5000 object limit.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14  2:30       ` Shawn Pearce
@ 2011-05-14 13:17         ` Johan Herland
  2011-05-14 22:17           ` Shawn Pearce
  2011-05-14 17:50         ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-14 13:17 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, Junio C Hamano

On Saturday 14 May 2011, Shawn Pearce wrote:
> I wonder... should we instead export the objectCountLimit as part of
> the advertisement to the client, and teach send-pack to look at this
> and pass it down to pack-objects? If pack-objects winds up with more
> than this limit, it aborts and the client doesn't even transmit data.
> Newer clients would abort cleanly with a nice error.

Good idea (although it grows the scope from the quick-fix I initially 
intended it to be...)

I'm planning to add a new capability collection/namespace, called "limit-*", 
where the server can communicate capabilities to the client, like so:

  limit-object-count_100000
  limit-commit-count_1000
  limit-pack-size_500000000

(I'd prefer to s/_/=/ or s/_/:/, but according to pack-protocol.txt, a 
capability may not contain "=" or ":")

However, you say:

> For older clients that don't know this new advertised capability, they
> should fail hard and not transfer all of this data.

AFAICS this is not the case. If a client does not understand a capability, 
it simply ignores it, and carries on doing its usual thing.

IINM there are only two ways to prevent an older client from transferring 
all the data:

1. Change the pack protocol in an incompatible way, that causes older client 
to abort with a pack format error prior to transmitting the pack.

2. (as in initial patch) Abort receive-pack when the server detects a limit 
violation, leaving the client with a broken pipe. I haven't read the pack 
protocol closely, but I wouldn't be surprised if this behavior is strictly 
in violation of the protocol.

> In my experience
> when a user gets these strange errors from his Git client, he contacts
> his server administrator with the screen output. At which point the
> administrator can see the Counting objects line, check the repository
> configuration, and tell the user what the problem is... and encourage
> them to upgrade their client to a newer version.

Hmm... Not ideal, but I guess we can live with that. At least we should warn 
the server administrator of this in the documentation of the config 
variable(s).


Otherwise, I agree with everything you wrote.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14  2:30       ` Shawn Pearce
  2011-05-14 13:17         ` Johan Herland
@ 2011-05-14 17:50         ` Junio C Hamano
  2011-05-14 22:27           ` Shawn Pearce
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-14 17:50 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Johan Herland, git

Shawn Pearce <spearce@spearce.org> writes:

> If we are going to put limits in, does it make sense to try and push
> these limits back to pack-objects in a more detailed way? You talked
> about depth of history, or size of pack. pack-objects could
> approximate both.

I think the receiving end switches between index-pack and unpack-objects
based on the object count, but if we can add a protocol extension to carry
these estimates from the sender, we would be much better off. A large push
with a small number of objects would want to keep everything in a pack,
for example, but currently there is no way to do so.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14 13:17         ` Johan Herland
@ 2011-05-14 22:17           ` Shawn Pearce
  2011-05-15 17:42             ` Johan Herland
  0 siblings, 1 reply; 53+ messages in thread
From: Shawn Pearce @ 2011-05-14 22:17 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Sat, May 14, 2011 at 06:17, Johan Herland <johan@herland.net> wrote:
> I'm planning to add a new capability collection/namespace, called "limit-*",
> where the server can communicate capabilities to the client, like so:
>
>  limit-object-count_100000
>  limit-commit-count_1000
>  limit-pack-size_500000000
>
> (I'd prefer to s/_/=/ or s/_/:/, but according to pack-protocol.txt, a
> capability may not contain "=" or ":")

I forget why = and : are forbidden here. I think its just because we
wanted the options to be "simple". I agree, I would prefer = here too,
and probably would have written the patch that way myself. There
shouldn't be a technical reason why = isn't allowed here. Its just
documented as being not a good idea because at one time someone wrote
that down.

> However, you say:
>
>> For older clients that don't know this new advertised capability, they
>> should fail hard and not transfer all of this data.
>
> AFAICS this is not the case. If a client does not understand a capability,
> it simply ignores it, and carries on doing its usual thing.

By this I meant #2 below (the initial patch).

> IINM there are only two ways to prevent an older client from transferring
> all the data:
>
> 1. Change the pack protocol in an incompatible way, that causes older client
> to abort with a pack format error prior to transmitting the pack.

This is not a good idea. We still want the client to be able to talk
to the server if it would be within the limits.

> 2. (as in initial patch) Abort receive-pack when the server detects a limit
> violation, leaving the client with a broken pipe. I haven't read the pack
> protocol closely, but I wouldn't be surprised if this behavior is strictly
> in violation of the protocol.

It is a violation of the protocol... sort of. Its allowed for the
server to up and die in the middle of a push. What happens if the
remote system loses power due to a grid failure while you are writing
to it? The remote system can't tell you "I'm going away now" first. It
just freezes and stops ACK'ing the TCP packets. Or if the remote
system gets overloaded and the Linux OOM killer kicks in... the remote
might get one of the processes elected for killing, and your TCP
connection breaks.

I don't think its as bad as it sounds. Its not a great user
experience, sure. And we maybe should also look at changing the
send-pack code to check the pipe for received data from the remote
peer if pack-objects dies (today it doesn't)... just in case the
reason pack-objects died is because an error message was written and
then the stream was closed.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14 17:50         ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects Junio C Hamano
@ 2011-05-14 22:27           ` Shawn Pearce
  0 siblings, 0 replies; 53+ messages in thread
From: Shawn Pearce @ 2011-05-14 22:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, git

On Sat, May 14, 2011 at 10:50, Junio C Hamano <gitster@pobox.com> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
>
>> If we are going to put limits in, does it make sense to try and push
>> these limits back to pack-objects in a more detailed way? You talked
>> about depth of history, or size of pack. pack-objects could
>> approximate both.
>
> I think the receiving end switches between index-pack and unpack-objects
> based on the object count,

It does...

> but if we can add a protocol extension to carry
> these estimates from the sender, we would be much better off. A large push
> with a small number of objects would want to keep everything in a pack,
> for example, but currently there is no way to do so.

Yes, my thoughts exactly. A 600 MiB push of 3 objects (1 huge blob,
top level tree, commit) shouldn't be unpacked. Serving that huge blob
back out to clients that want to fetch it is more efficient when its
coming from a pack than from a loose object, due to the object reuse
that is available from a pack. Unfortunately we don't have the data
necessary to make that decision.

Probably the cleanest way to do this estimate transmission is to
modify the pack stream format. Introduce a new capability for
pack-stats. If advertised from receive-pack, send-pack can pass a
--pack-stats flag to pack-objects. That process then writes out a PACK
header with version code 4 (rather than 2), and includes additional
data beyond the object count in an extended PACK header. receive-pack
can then read the PACK header and handle version 2 as it does today,
and version 4 by also checking the additional stat fields and dumping
what is left over into either unpack-objects or index-pack like it
does now. We wouldn't even need to change index-pack or
unpack-objects, receive-pack could translate the PACK version 4 header
into a version 2 command line flag when it invokes the sub process to
handle the stream.

Hmm, or maybe this should be pack v5, just so nobody confuses it with
the mythical pack v4 we haven't implemented.

If we do add this stats header to the PACK stream, we need to be a bit
smarter about it. I would lay out the header with an additional 4 byte
field telling us the size of the stats block, and then use repeating
key-value pairs within the stats block to transmit the data. This way
clients can add additional stats to the stream and receive-pack can
ignore them if it doesn't understand them. We can start out with
number of commits and estimated size of the packed data (based on
summing up the reused object sizes) but could add more later like date
of oldest commit if we later decide it would help the remote handle
the pack stream, or reject it early.


As far as receive-pack rejecting the stream after seeing the header, I
would like to avoid an extra round-trip between the two peers. So we
probably need to modify send-pack like I suggest above to check the
stream for an error message after pack-objects terminates, even if
pack-objects terminates with a non-zero exit status. Even over a POSIX
pipe we should have room for 512 bytes of error data to be crammed
into the stream from receive-pack before we deadlock the two peers. If
we wanted to be really paranoid, we would run pack-objects in the
background and have send-pack constantly monitor the incoming half of
the pipe for a status result. Since pack-objects is already another
process this shouldn't be that difficult. (But it is harder for me in
JGit where its not a separate thread.)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects
  2011-05-14 22:17           ` Shawn Pearce
@ 2011-05-15 17:42             ` Johan Herland
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
  0 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 17:42 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, Junio C Hamano

On Sunday 15 May 2011, Shawn Pearce wrote:
> On Sat, May 14, 2011 at 06:17, Johan Herland <johan@herland.net> wrote:
> > I'm planning to add a new capability collection/namespace, called
> > "limit-*", where the server can communicate capabilities to the
> > client, like so:
> > 
> >  limit-object-count_100000
> >  limit-commit-count_1000
> >  limit-pack-size_500000000
> > 
> > (I'd prefer to s/_/=/ or s/_/:/, but according to pack-protocol.txt, a
> > capability may not contain "=" or ":")
> 
> I forget why = and : are forbidden here. I think its just because we
> wanted the options to be "simple". I agree, I would prefer = here too,
> and probably would have written the patch that way myself. There
> shouldn't be a technical reason why = isn't allowed here. Its just
> documented as being not a good idea because at one time someone wrote
> that down.

Ok. I'll use '='.

> > However, you say:
> >> For older clients that don't know this new advertised capability, they
> >> should fail hard and not transfer all of this data.
> > 
> > AFAICS this is not the case. If a client does not understand a
> > capability, it simply ignores it, and carries on doing its usual
> > thing.
> 
> By this I meant #2 below (the initial patch).

Ah, I see. Will use that in the re-roll.

> I don't think its as bad as it sounds. Its not a great user
> experience, sure. And we maybe should also look at changing the
> send-pack code to check the pipe for received data from the remote
> peer if pack-objects dies (today it doesn't)... just in case the
> reason pack-objects died is because an error message was written and
> then the stream was closed.

This will be in the re-roll.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHv3 0/9] Push limits
  2011-05-15 17:42             ` Johan Herland
@ 2011-05-15 21:37               ` Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
                                   ` (9 more replies)
  0 siblings, 10 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git, Jeff King, Johannes Sixt

Here's the next iteration of what started out as a simple patch to let
the server refuse pushes that exceeded a configurable limit on the
#objects in a pack.

The current patch series allows limiting pushes by
 - size of pack
 - #objects in pack
 - #commits in pack

The limits are controlled by corresponding (new) config variables:
 - receive.packSizeLimit
 - receive.objectCountLimit
 - receive.commitCountLimit

Setting one or more of these config variables causes receive-pack to
advertise the corresponding (new) capabilities:
 - limit-pack-size=<num>
 - limit-object-count=<num>
 - limit-commit-count=<num>

These capabilities are parsed by the send-pack client, which pass them
on to pack-objects, using the corresponding (mostly new) pack-objects
options:
 --max-pack-size=<num>      (extended to be usable in this context)
 --max-object-count=<num>   (new)
 --max-commit-count=<num>   (new)

When one or more of those options are given together with --stdout to
pack-objects, pack-objects will check the generated pack against those
limits, and abort the pack generation if any limit is exceeded.

In addition, the server will also verify the object count limit,
if enabled, and abort the push if the pack exceeds the limit.
Currently, the server cannot easily check the other two limits without
changing the pack format (left as an exercise to the reader... ;-]),
so exceeding the pack size limit or the commit count limit will not be
caught by the server.


Finally, a quick run-through of the patches:

 - #1 is a very minor fix to the pack protocol docs.

 - #2 attempts to retrieve and display the remote status, even when
   pack-objects fail. This patch touches the same code as the recent
   send-pack deadlock fixes, so I'd like Peff or JSixt to review them.

 - #3 - #5 teaches pack-objects some new options to impose the above
   limits on the generated pack.

 - #6 contains some general preparation for the "limit-*" capabilities.

 - #7 - #9 adds the new limits to receive-pack, send-pack, and the
   corresponding protocol capabilites.


Have fun! :)

...Johan


Johan Herland (9):
  Update technical docs to reflect side-band-64k capability in receive-pack
  send-pack: Attempt to retrieve remote status even if pack-objects fails
  pack-objects: Allow --max-pack-size to be used together with --stdout
  pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  pack-objects: Teach new option --max-commit-count, limiting #commits in pack
  receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  send-pack/receive-pack: Allow server to refuse pushes with too many objects
  send-pack/receive-pack: Allow server to refuse pushing too large packs
  send-pack/receive-pack: Allow server to refuse pushes with too many commits

 Documentation/config.txt                          |   35 ++++
 Documentation/git-pack-objects.txt                |   20 +++
 Documentation/git-repack.txt                      |    6 +
 Documentation/technical/pack-protocol.txt         |    5 +-
 Documentation/technical/protocol-capabilities.txt |   30 +++-
 builtin/pack-objects.c                            |   53 +++++-
 builtin/receive-pack.c                            |   45 +++++-
 builtin/send-pack.c                               |   44 ++++--
 cache.h                                           |    2 +-
 connect.c                                         |    7 +-
 git-repack.sh                                     |   27 ++--
 send-pack.h                                       |    3 +
 t/t5300-pack-object.sh                            |  121 +++++++++++++
 t/t5400-send-pack.sh                              |  191 +++++++++++++++++++++
 14 files changed, 543 insertions(+), 46 deletions(-)

-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
                                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/technical/pack-protocol.txt         |    3 ++-
 Documentation/technical/protocol-capabilities.txt |    4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 369f91d..4a68f0f 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -391,7 +391,8 @@ The reference discovery phase is done nearly the same way as it is in the
 fetching protocol. Each reference obj-id and name on the server is sent
 in packet-line format to the client, followed by a flush-pkt.  The only
 real difference is that the capability listing is different - the only
-possible values are 'report-status', 'delete-refs' and 'ofs-delta'.
+possible values are 'report-status', 'delete-refs', 'side-band-64k' and
+'ofs-delta'.
 
 Reference Update Request and Packfile Transfer
 ----------------------------------------------
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index b15517f..b732e80 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -21,8 +21,8 @@ NOT advertise capabilities it does not understand.
 The 'report-status' and 'delete-refs' capabilities are sent and
 recognized by the receive-pack (push to server) process.
 
-The 'ofs-delta' capability is sent and recognized by both upload-pack
-and receive-pack protocols.
+The 'side-band-64k' and 'ofs-delta' capabilities are sent and
+recognized by both upload-pack and receive-pack protocols.
 
 All other capabilities are only recognized by the upload-pack (fetch
 from server) process.
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-16  4:07                   ` Jeff King
  2011-05-15 21:37                 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
                                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git, Jeff King, Johannes Sixt

When pushing, send-pack uses pack-objects to write the pack data to the
receive-pack process running on the remote end. The scenarios where
pack-objects dies unexpectedly, can be roughly divided based whether the
reason for the failure is _local_ (i.e. something in pack-objects caused
it to fail of its own accord), or _remote_ (i.e. something in the remote
receive-pack process caused it to fail, leaving the local pack-objects
process with a broken pipe)

If the reason for the failure is local, we expect pack-objects to report
an appropriate error message to the user.

However, if the reason for the failure is remote, pack-objects will merely
abort because of the broken pipe, and the user is left with no clue as to
the reason why the remote receive-pack process died.

In certain cases, though, the receive-pack process on the other end may have
produced an error message immediately before exiting. This error message may
be currently waiting to be read by the local send-pack process.

Therefore, we should try to read from the remote end, even when pack-objects
dies unexepectedly. We accomplish this by _always_ calling receive_status()
after pack_objects(). If the remote end managed to produce a well-formed
status report before exiting, then receive_status() simply presents that to
the user. Even if the data from the remote end cannot be understood by
receive_status(), it will print that data as part of its error message. In
any case, we give the user as much information about the failure as possible.

Signed-off-by: Johan Herland <johan@herland.net>
---

I first wrote this patch on a base where e07fd15 (Peff's "send-pack:
unbreak push over stateless rpc") was not present, and then resolved
a conflict when rebasing this patch onto current master. I hope Peff
or Johannes (Sixt) can verify that my patch does not reintroduce the
deadlock they fixed.


...Johan

 builtin/send-pack.c |   18 +++++-------------
 1 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 4ac2ca9..f571917 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -251,7 +251,7 @@ int send_pack(struct send_pack_args *args,
 	int status_report = 0;
 	int use_sideband = 0;
 	unsigned cmds_sent = 0;
-	int ret;
+	int ret = 0;
 	struct async demux;
 
 	/* Does the other end support the reporting? */
@@ -339,23 +339,15 @@ int send_pack(struct send_pack_args *args,
 	}
 
 	if (new_refs && cmds_sent) {
-		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
-			for (ref = remote_refs; ref; ref = ref->next)
-				ref->status = REF_STATUS_NONE;
-			if (args->stateless_rpc)
-				close(out);
-			if (use_sideband)
-				finish_async(&demux);
-			return -1;
-		}
+		ret = pack_objects(out, remote_refs, extra_have, args);
+		if (ret && args->stateless_rpc)
+			close(out);
 	}
 	if (args->stateless_rpc && cmds_sent)
 		packet_flush(out);
 
 	if (status_report && cmds_sent)
-		ret = receive_status(in, remote_refs);
-	else
-		ret = 0;
+		ret |= receive_status(in, remote_refs);
 	if (args->stateless_rpc)
 		packet_flush(out);
 
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 22:06                   ` Shawn Pearce
  2011-05-16  6:12                   ` Junio C Hamano
  2011-05-15 21:37                 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
                                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

Currently we refuse combining --max-pack-size with --stdout since there's
no way to make multiple packs when the pack is written to stdout. However,
we want to be able to limit the maximum size of the pack created by
--stdout (and abort pack-objects if we are unable to meet that limit).

Therefore, when used together with --stdout, we reinterpret --max-pack-size
to indicate the maximum pack size which - if exceeded - will cause
pack-objects to abort with an error message.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-pack-objects.txt |    3 ++
 builtin/pack-objects.c             |    9 ++++---
 t/t5300-pack-object.sh             |   43 ++++++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 20c8551..ca97463 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -112,6 +112,9 @@ base-name::
 	If specified,  multiple packfiles may be created.
 	The default is unlimited, unless the config variable
 	`pack.packSizeLimit` is set.
++
+When used together with --stdout, the command will fail with an error
+message if the pack output exceeds the given limit.
 
 --honor-pack-keep::
 	This flag causes an object already in a local pack that
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f402a84..69f1c51 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -229,7 +229,7 @@ static unsigned long write_object(struct sha1file *f,
 
 	if (!entry->delta)
 		usable_delta = 0;	/* no delta */
-	else if (!pack_size_limit)
+	else if (!pack_size_limit || pack_to_stdout)
 	       usable_delta = 1;	/* unlimited packfile */
 	else if (entry->delta->idx.offset == (off_t)-1)
 		usable_delta = 0;	/* base was written to another pack */
@@ -478,6 +478,9 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
+			if (nr_written != nr_remaining)
+				die("unable to make pack within the pack size"
+				    " limit (%lu bytes)", pack_size_limit);
 			sha1close(f, sha1, CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
 			sha1close(f, sha1, CSUM_FSYNC);
@@ -2315,9 +2318,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	if (!pack_to_stdout && !pack_size_limit)
 		pack_size_limit = pack_size_limit_cfg;
-	if (pack_to_stdout && pack_size_limit)
-		die("--max-pack-size cannot be used to build a pack for transfer.");
-	if (pack_size_limit && pack_size_limit < 1024*1024) {
+	if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
 		warning("minimum pack size limit is 1 MiB");
 		pack_size_limit = 1024*1024;
 	}
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 602806d..00f1bd8 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -396,6 +396,49 @@ test_expect_success 'verify resulting packs' '
 	git verify-pack test-11-*.pack
 '
 
+test_expect_success '--stdout ignores pack.packSizeLimit' '
+	git pack-objects --stdout <obj-list >test-12.pack &&
+	git index-pack --strict test-12.pack
+'
+
+test_expect_success 'verify resulting pack' '
+	git verify-pack test-12.pack
+'
+
+test_expect_success 'honor --max-pack-size' '
+	git config --unset pack.packSizeLimit &&
+	packname_13=$(git pack-objects --max-pack-size=3m test-13 <obj-list) &&
+	test 2 = $(ls test-13-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+	git verify-pack test-13-*.pack
+'
+
+test_expect_success 'tolerate --max-pack-size smaller than biggest object' '
+	packname_14=$(git pack-objects --max-pack-size=1 test-14 <obj-list) &&
+	test 5 = $(ls test-14-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+	git verify-pack test-14-*.pack
+'
+
+test_expect_success '--stdout works with large enough --max-pack-size' '
+	git pack-objects --stdout --max-pack-size=10m <obj-list >test-15.pack &&
+	git index-pack --strict test-15.pack
+'
+
+test_expect_success 'verify resulting pack' '
+	git verify-pack test-15.pack
+'
+
+test_expect_success '--stdout fails when pack exceeds --max-pack-size' '
+	test_must_fail git pack-objects --stdout --max-pack-size=1 <obj-list >test-16.pack 2>errs &&
+	test_must_fail git index-pack --strict test-16.pack &&
+	grep -q "pack size limit" errs
+'
+
 #
 # WARNING!
 #
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (2 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 22:07                   ` Shawn Pearce
  2011-05-15 21:37                 ` [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
                                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

The new --max-object-count option behaves similarly to --max-pack-size,
except that the decision to split packs is determined by the number of
objects in the pack, and not by the size of the pack.

The new option also has a corresponding configuration variable, named
pack.objectCountLimit, which works similarly to pack.packSizeLimit,
subject to the difference mentioned above.

As with --max-pack-size, you can use --max-object-count together with
--stdout to put a limit on the number of objects in the pack written to
stdout. If the pack would exceed this limit, pack-objects will abort with
an error message.

Finally, for completeness, the new option is also added to git-repack,
which simply forwards it to pack-objects

Documentation and tests are included.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt           |    8 ++++++
 Documentation/git-pack-objects.txt |    9 +++++++
 Documentation/git-repack.txt       |    6 +++++
 builtin/pack-objects.c             |   20 ++++++++++++++++
 git-repack.sh                      |   27 +++++++++++----------
 t/t5300-pack-object.sh             |   44 ++++++++++++++++++++++++++++++++++++
 6 files changed, 101 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 285c7f7..056e01f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1523,6 +1523,14 @@ pack.packSizeLimit::
 	Common unit suffixes of 'k', 'm', or 'g' are
 	supported.
 
+pack.objectCountLimit::
+	The maximum number of objects in a pack.  This setting only
+	affects packing to a file when repacking, i.e. the git://
+	protocol is unaffected.  It can be overridden by the
+	`\--max-object-count` option of linkgit:git-repack[1].
+	The default is unlimited. Common unit suffixes of 'k', 'm',
+	or 'g' are supported.
+
 pager.<cmd>::
 	If the value is boolean, turns on or off pagination of the
 	output of a particular git subcommand when writing to a tty.
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index ca97463..5ab1fe9 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -116,6 +116,15 @@ base-name::
 When used together with --stdout, the command will fail with an error
 message if the pack output exceeds the given limit.
 
+--max-object-count=<n>::
+	Maximum number of objects in each output pack file. The number
+	can be suffixed with "k", "m", or "g". If specified, multiple
+	packfiles may be created. The default is unlimited, unless
+	the config variable `pack.objectCountLimit` is set.
++
+When used together with --stdout, the command will fail with an error
+message if the pack output exceeds the given limit.
+
 --honor-pack-keep::
 	This flag causes an object already in a local pack that
 	has a .keep file to be ignored, even if it would have
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 0decee2..f9a44b8 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -109,6 +109,12 @@ other objects in that pack they already have locally.
 	The default is unlimited, unless the config variable
 	`pack.packSizeLimit` is set.
 
+--max-object-count=<n>::
+	Maximum number of objects in each output pack file. The number
+	can be suffixed with "k", "m", or "g". If specified, multiple
+	packfiles may be created. The default is unlimited, unless
+	the config variable `pack.objectCountLimit` is set.
+
 
 Configuration
 -------------
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 69f1c51..c4fbc54 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -74,6 +74,7 @@ static const char *base_name;
 static int progress = 1;
 static int window = 10;
 static unsigned long pack_size_limit, pack_size_limit_cfg;
+static unsigned long object_count_limit, object_count_limit_cfg;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
@@ -227,6 +228,10 @@ static unsigned long write_object(struct sha1file *f,
 	else
 		limit = pack_size_limit - write_offset;
 
+	/* Trigger new pack when we reach object count limit */
+	if (object_count_limit && nr_written >= object_count_limit)
+		return 0;
+
 	if (!entry->delta)
 		usable_delta = 0;	/* no delta */
 	else if (!pack_size_limit || pack_to_stdout)
@@ -1897,6 +1902,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		pack_size_limit_cfg = git_config_ulong(k, v);
 		return 0;
 	}
+	if (!strcmp(k, "pack.objectcountlimit")) {
+		object_count_limit_cfg = git_config_ulong(k, v);
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
@@ -2182,6 +2191,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 				usage(pack_usage);
 			continue;
 		}
+		if (!prefixcmp(arg, "--max-object-count=")) {
+			object_count_limit_cfg = 0;
+			if (!git_parse_ulong(arg+19, &object_count_limit))
+				usage(pack_usage);
+			continue;
+		}
 		if (!prefixcmp(arg, "--window=")) {
 			char *end;
 			window = strtoul(arg+9, &end, 0);
@@ -2322,6 +2337,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		warning("minimum pack size limit is 1 MiB");
 		pack_size_limit = 1024*1024;
 	}
+	if (!pack_to_stdout && !object_count_limit)
+		object_count_limit = object_count_limit_cfg;
 
 	if (!pack_to_stdout && thin)
 		die("--thin cannot be used to build an indexable pack.");
@@ -2349,6 +2366,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	if (non_empty && !nr_result)
 		return 0;
+	if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
+		die("unable to make pack within the object count limit"
+			" (%lu objects)", object_count_limit);
 	if (nr_result)
 		prepare_pack(window, depth);
 	write_pack_file();
diff --git a/git-repack.sh b/git-repack.sh
index 624feec..e8693a6 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -7,19 +7,20 @@ OPTIONS_KEEPDASHDASH=
 OPTIONS_SPEC="\
 git repack [options]
 --
-a               pack everything in a single pack
-A               same as -a, and turn unreachable objects loose
-d               remove redundant packs, and run git-prune-packed
-f               pass --no-reuse-delta to git-pack-objects
-F               pass --no-reuse-object to git-pack-objects
-n               do not run git-update-server-info
-q,quiet         be quiet
-l               pass --local to git-pack-objects
+a                  pack everything in a single pack
+A                  same as -a, and turn unreachable objects loose
+d                  remove redundant packs, and run git-prune-packed
+f                  pass --no-reuse-delta to git-pack-objects
+F                  pass --no-reuse-object to git-pack-objects
+n                  do not run git-update-server-info
+q,quiet            be quiet
+l                  pass --local to git-pack-objects
  Packing constraints
-window=         size of the window used for delta compression
-window-memory=  same as the above, but limit memory size instead of entries count
-depth=          limits the maximum delta depth
-max-pack-size=  maximum size of each packfile
+window=            size of the window used for delta compression
+window-memory=     same as the above, but limit memory size instead of entries count
+depth=             limits the maximum delta depth
+max-pack-size=     maximum size of each packfile
+max-object-count=  maximum number of objects in each packfile
 "
 SUBDIRECTORY_OK='Yes'
 . git-sh-setup
@@ -38,7 +39,7 @@ do
 	-f)	no_reuse=--no-reuse-delta ;;
 	-F)	no_reuse=--no-reuse-object ;;
 	-l)	local=--local ;;
-	--max-pack-size|--window|--window-memory|--depth)
+	--max-pack-size|--max-object-count|--window|--window-memory|--depth)
 		extra="$extra $1=$2"; shift ;;
 	--) shift; break;;
 	*)	usage ;;
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 00f1bd8..d133c28 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -439,6 +439,50 @@ test_expect_success '--stdout fails when pack exceeds --max-pack-size' '
 	grep -q "pack size limit" errs
 '
 
+test_expect_success 'honor pack.objectCountLimit' '
+	git config pack.objectCountLimit 5 &&
+	packname_17=$(git pack-objects test-17 <obj-list) &&
+	test 2 = $(ls test-17-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+	git verify-pack test-17-*.pack
+'
+
+test_expect_success '--stdout ignores pack.objectCountLimit' '
+	git pack-objects --stdout <obj-list >test-18.pack &&
+	git index-pack --strict test-18.pack
+'
+
+test_expect_success 'verify resulting pack' '
+	git verify-pack test-18.pack
+'
+
+test_expect_success 'honor --max-object-count' '
+	git config --unset pack.objectCountLimit &&
+	packname_19=$(git pack-objects --max-object-count=5 test-19 <obj-list) &&
+	test 2 = $(ls test-19-*.pack | wc -l)
+'
+
+test_expect_success 'verify resulting packs' '
+	git verify-pack test-19-*.pack
+'
+
+test_expect_success '--stdout works with large enough --max-object-count' '
+	git pack-objects --stdout --max-object-count=8 <obj-list >test-20.pack &&
+	git index-pack --strict test-20.pack
+'
+
+test_expect_success 'verify resulting pack' '
+	git verify-pack test-20.pack
+'
+
+test_expect_success '--stdout fails when pack exceeds --max-object-count' '
+	test_must_fail git pack-objects --stdout --max-object-count=1 <obj-list >test-21.pack 2>errs &&
+	test_must_fail git index-pack --strict test-21.pack &&
+	grep -q "object count limit" errs
+'
+
 #
 # WARNING!
 #
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (3 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
                                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

The new --max-commit-count option behaves similarly to --max-object-count,
when used together with --stdout: It limits the number of commits in the
pack written to stdout. If the pack would exceed this limit, pack-objects
will abort with an error message.

Unlike --max-pack-size and --max-object-count, --max-commit-count must
always be used together with --stdout. This is because using the commit
count to split packs is not at all a good heuristic, since Git does not
necessarily distribute commit objects uniformly across packs.

Documentation and tests are included.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-pack-objects.txt |    8 ++++++++
 builtin/pack-objects.c             |   24 +++++++++++++++++++++---
 t/t5300-pack-object.sh             |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 5ab1fe9..21c30c0 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -125,6 +125,14 @@ message if the pack output exceeds the given limit.
 When used together with --stdout, the command will fail with an error
 message if the pack output exceeds the given limit.
 
+--max-commit-count=<n>::
+	This option is only useful together with --stdout.
+	Specifies the maximum number of commits allowed in the created
+	pack. If the number of commits would exceed the given limit,
+	pack-objects will fail with an error message.
+	The number can be suffixed with "k", "m", or "g".
+	The default is unlimited.
+
 --honor-pack-keep::
 	This flag causes an object already in a local pack that
 	has a .keep file to be ignored, even if it would have
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c4fbc54..6c6945e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -61,7 +61,7 @@ struct object_entry {
  */
 static struct object_entry *objects;
 static struct pack_idx_entry **written_list;
-static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
+static uint32_t nr_objects, nr_alloc, nr_result, nr_commits, nr_written;
 
 static int non_empty;
 static int reuse_delta = 1, reuse_object = 1;
@@ -661,8 +661,11 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 	if (ix >= 0) {
 		if (exclude) {
 			entry = objects + object_ix[ix] - 1;
-			if (!entry->preferred_base)
+			if (!entry->preferred_base) {
 				nr_result--;
+				if (entry->type == OBJ_COMMIT)
+					nr_commits--;
+			}
 			entry->preferred_base = 1;
 		}
 		return 0;
@@ -702,8 +705,11 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
 		entry->type = type;
 	if (exclude)
 		entry->preferred_base = 1;
-	else
+	else {
 		nr_result++;
+		if (type == OBJ_COMMIT)
+			nr_commits++;
+	}
 	if (found_pack) {
 		entry->in_pack = found_pack;
 		entry->in_pack_offset = found_offset;
@@ -2137,6 +2143,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	const char **rp_av;
 	int rp_ac_alloc = 64;
 	int rp_ac;
+	unsigned long commit_count_limit = 0;
 
 	read_replace_refs = 0;
 
@@ -2197,6 +2204,11 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 				usage(pack_usage);
 			continue;
 		}
+		if (!prefixcmp(arg, "--max-commit-count=")) {
+			if (!git_parse_ulong(arg+19, &commit_count_limit))
+				usage(pack_usage);
+			continue;
+		}
 		if (!prefixcmp(arg, "--window=")) {
 			char *end;
 			window = strtoul(arg+9, &end, 0);
@@ -2340,6 +2352,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (!pack_to_stdout && !object_count_limit)
 		object_count_limit = object_count_limit_cfg;
 
+	if (!pack_to_stdout && commit_count_limit)
+		die("--max-commit-count is only useful together with --stdout.");
+
 	if (!pack_to_stdout && thin)
 		die("--thin cannot be used to build an indexable pack.");
 
@@ -2369,6 +2384,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
 		die("unable to make pack within the object count limit"
 			" (%lu objects)", object_count_limit);
+	if (pack_to_stdout && commit_count_limit && commit_count_limit < nr_commits)
+		die("unable to make pack within the commit count limit"
+			" (%lu commits)", commit_count_limit);
 	if (nr_result)
 		prepare_pack(window, depth);
 	write_pack_file();
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index d133c28..919e2da 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -483,6 +483,40 @@ test_expect_success '--stdout fails when pack exceeds --max-object-count' '
 	grep -q "object count limit" errs
 '
 
+test_expect_success 'make a few more commits' '
+	git reset --hard $commit &&
+	echo "change" > file &&
+	git add file &&
+	git commit -m second &&
+	commit2=`git rev-parse --verify HEAD` &&
+	echo "more change" >> file &&
+	git commit -a -m third &&
+	commit3=`git rev-parse --verify HEAD` &&
+	echo "even more change" >> file &&
+	git commit -a -m fourth &&
+	commit4=`git rev-parse --verify HEAD` && {
+		echo $commit &&
+		echo $commit2 &&
+		echo $commit3 &&
+		echo $commit4
+	} >>commit-list
+'
+
+test_expect_success '--stdout works with large enough --max-commit-count' '
+	git pack-objects --revs --stdout --max-commit-count=4 <commit-list >test-22.pack &&
+	git index-pack --strict test-22.pack
+'
+
+test_expect_success 'verify resulting pack' '
+	git verify-pack test-22.pack
+'
+
+test_expect_success '--stdout fails when pack exceeds --max-commit-count' '
+	test_must_fail git pack-objects --revs --stdout --max-commit-count=3 <commit-list >test-23.pack 2>errs &&
+	test_must_fail git index-pack --strict test-23.pack &&
+	grep -q "commit count limit" errs
+'
+
 #
 # WARNING!
 #
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (4 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-16  6:50                   ` Junio C Hamano
  2011-05-15 21:37                 ` [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects Johan Herland
                                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

This adds some technical documentation on the 'limit-*' family of
capabilities that will be added in the following commits.

Also refactor the generation of the capabilities declaration in receive-pack.
This will also be further expanded in the following commits.

Finally, change the return type of server_supports() to allow the caller to
more closely examine the found capability, e.g. by calling
server_supports("limit-foo="), and then use the return value to parse the
value following the '='.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/technical/pack-protocol.txt         |    6 ++--
 Documentation/technical/protocol-capabilities.txt |   22 +++++++++++++++++++++
 builtin/receive-pack.c                            |   16 +++++++++++---
 cache.h                                           |    2 +-
 connect.c                                         |    7 +++--
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index 4a68f0f..ddc0d0e 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -167,7 +167,7 @@ MUST peel the ref if it's an annotated tag.
   other-peeled     =  obj-id SP refname "^{}" LF
 
   capability-list  =  capability *(SP capability)
-  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
+  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_" / "=")
   LC_ALPHA         =  %x61-7A
 ----
 
@@ -391,8 +391,8 @@ The reference discovery phase is done nearly the same way as it is in the
 fetching protocol. Each reference obj-id and name on the server is sent
 in packet-line format to the client, followed by a flush-pkt.  The only
 real difference is that the capability listing is different - the only
-possible values are 'report-status', 'delete-refs', 'side-band-64k' and
-'ofs-delta'.
+possible values are 'report-status', 'delete-refs', 'side-band-64k',
+'ofs-delta' and 'limit-*'.
 
 Reference Update Request and Packfile Transfer
 ----------------------------------------------
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index b732e80..11849a3 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -21,6 +21,9 @@ NOT advertise capabilities it does not understand.
 The 'report-status' and 'delete-refs' capabilities are sent and
 recognized by the receive-pack (push to server) process.
 
+Any 'limit-*' capabilities may only be sent by the receive-pack
+process. It is never requested by client.
+
 The 'side-band-64k' and 'ofs-delta' capabilities are sent and
 recognized by both upload-pack and receive-pack protocols.
 
@@ -185,3 +188,22 @@ it is capable of accepting a zero-id value as the target
 value of a reference update.  It is not sent back by the client, it
 simply informs the client that it can be sent zero-id values
 to delete references.
+
+limit-*
+-------
+
+If the server sends one or more capabilities that start with "limit-",
+it means that there are certain limits to what kind of pack the server
+will receive. More specifically, these capabilities must be of the form
+"limit-<what>=<num>" where "<what>" (a sequence of lower-case letters,
+digits and "-") describes which property of the pack is limited, and
+"<num>" (a sequence of decimal digits) specifies the limit value.
+Capabilities of this type are not sent back by the client; instead the
+client must verify that the created packfile does not exceed the given
+limits. This check should happen prior to transferring the packfile to
+the server. If the check fails, the client must abort the upload, and
+report the reason for the aborted push back to the user.
+The following "limit-*" capabilites are recognized:
+
+More "limit-*" capabilities may be added in the future. The client
+is free to ignore any "limit-*" capabilities it does not understand.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e1ba4dc..c55989d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -106,15 +106,23 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+static const char *capabilities()
+{
+	static char buf[1024];
+	int ret = snprintf(buf, sizeof(buf),
+			   " report-status delete-refs side-band-64k%s",
+			   prefer_ofs_delta ? " ofs-delta" : "");
+	assert(ret < sizeof(buf));
+	return buf;
+}
+
 static int show_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data)
 {
 	if (sent_capabilities)
 		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
 	else
-		packet_write(1, "%s %s%c%s%s\n",
-			     sha1_to_hex(sha1), path, 0,
-			     " report-status delete-refs side-band-64k",
-			     prefer_ofs_delta ? " ofs-delta" : "");
+		packet_write(1, "%s %s%c%s\n",
+			     sha1_to_hex(sha1), path, 0, capabilities());
 	sent_capabilities = 1;
 	return 0;
 }
diff --git a/cache.h b/cache.h
index 2b34116..db97097 100644
--- a/cache.h
+++ b/cache.h
@@ -981,7 +981,7 @@ struct extra_have_objects {
 	unsigned char (*array)[20];
 };
 extern struct ref **get_remote_heads(int in, struct ref **list, int nr_match, char **match, unsigned int flags, struct extra_have_objects *);
-extern int server_supports(const char *feature);
+extern const char *server_supports(const char *feature);
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path);
 
diff --git a/connect.c b/connect.c
index 57dc20c..015c570 100644
--- a/connect.c
+++ b/connect.c
@@ -102,10 +102,11 @@ struct ref **get_remote_heads(int in, struct ref **list,
 	return list;
 }
 
-int server_supports(const char *feature)
+const char *server_supports(const char *feature)
 {
-	return server_capabilities &&
-		strstr(server_capabilities, feature) != NULL;
+	if (server_capabilities)
+		return strstr(server_capabilities, feature);
+	return NULL;
 }
 
 int path_match(const char *path, int nr, char **match)
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (5 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
                                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

Add a new receive.objectCountLimit config variable which defines an upper
limit on the number of objects to accept in a single push.

This limit is advertised to clients, using the new "limit-object-count=<num>"
capability. The client side - aka. send-pack - parses this capability and
forwards it to pack-objects, using the recently added --max-object-count
option. pack-objects then checks the generated pack against the limit and
aborts the pack generation if the pack would have too many objects.

Additionally - for older clients that do not understand the capability - the
server aborts the transfer if the number of objects in the transferred pack
exceeds the limit. This is a suboptimal fallback solution, as it leaves the
client with a broken pipe, and likely a confused user.

Server administrators might want to use this config variable to prevent
unintended large pushes from entering the repo (typically a result of the
user not being aware of exactly what is being pushed, e.g. pushing a large
rewritten history). Note that this config variable is not intended to protect
against DoS attacks, since there are countless other ways to attempt to DoS a
server without violating this limit.

Traditionally, this kind of limit would be imposed by a pre-receive or update
hook, but both of those run _after_ the pack has been received and stored by
receive-pack, so they cannot prevent the pack from being stored on the server.

Documentation and tests are included.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt                          |    9 +++
 Documentation/technical/protocol-capabilities.txt |    2 +
 builtin/receive-pack.c                            |   13 ++++-
 builtin/send-pack.c                               |   10 +++
 send-pack.h                                       |    1 +
 t/t5400-send-pack.sh                              |   66 +++++++++++++++++++++
 6 files changed, 100 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 056e01f..fd6c130 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1599,6 +1599,15 @@ receive.unpackLimit::
 	especially on slow filesystems.  If not set, the value of
 	`transfer.unpackLimit` is used instead.
 
+receive.objectCountLimit::
+	If the number of objects received in a push exceeds this limit,
+	then the entire push will be refused. This is meant to prevent
+	an unintended large push (typically a result of the user not
+	being aware of exactly what is being pushed, e.g. pushing a
+	large rewritten history) from entering the repo. If not set,
+	there is no upper limit on the number of objects transferred
+	in a single push.
+
 receive.denyDeletes::
 	If set to true, git-receive-pack will deny a ref update that deletes
 	the ref. Use this to prevent such a ref deletion via a push.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 11849a3..aac66af 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -205,5 +205,7 @@ the server. If the check fails, the client must abort the upload, and
 report the reason for the aborted push back to the user.
 The following "limit-*" capabilites are recognized:
 
+ - limit-object-count=<num> (Maximum number of objects in a pack)
+
 More "limit-*" capabilities may be added in the future. The client
 is free to ignore any "limit-*" capabilities it does not understand.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index c55989d..d919e17 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -28,6 +28,7 @@ static int receive_fsck_objects;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int unpack_limit = 100;
+static unsigned long limit_object_count;
 static int report_status;
 static int use_sideband;
 static int prefer_ofs_delta = 1;
@@ -73,6 +74,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.objectcountlimit") == 0) {
+		limit_object_count = git_config_ulong(var, value);
+		return 0;
+	}
+
 	if (strcmp(var, "receive.fsckobjects") == 0) {
 		receive_fsck_objects = git_config_bool(var, value);
 		return 0;
@@ -112,6 +118,9 @@ static const char *capabilities()
 	int ret = snprintf(buf, sizeof(buf),
 			   " report-status delete-refs side-band-64k%s",
 			   prefer_ofs_delta ? " ofs-delta" : "");
+	if (limit_object_count > 0)
+		ret += snprintf(buf + ret, sizeof(buf) - ret,
+				" limit-object-count=%lu", limit_object_count);
 	assert(ret < sizeof(buf));
 	return buf;
 }
@@ -656,7 +665,9 @@ static const char *unpack(void)
 			"--pack_header=%"PRIu32",%"PRIu32,
 			ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
 
-	if (ntohl(hdr.hdr_entries) < unpack_limit) {
+	if (limit_object_count > 0 && ntohl(hdr.hdr_entries) > limit_object_count)
+		return "received pack exceeds configured receive.objectCountLimit";
+	else if (ntohl(hdr.hdr_entries) < unpack_limit) {
 		int code, i = 0;
 		const char *unpacker[4];
 		unpacker[i++] = "unpack-objects";
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index f571917..4103116 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -49,9 +49,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		NULL,
 		NULL,
 		NULL,
+		NULL,
 	};
 	struct child_process po;
 	int i;
+	char buf[40];
 
 	i = 4;
 	if (args->use_thin_pack)
@@ -62,6 +64,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		argv[i++] = "-q";
 	if (args->progress)
 		argv[i++] = "--progress";
+	if (args->max_object_count > 0) {
+		snprintf(buf, sizeof(buf), "--max-object-count=%lu",
+			 args->max_object_count);
+		argv[i++] = buf;
+	}
 	memset(&po, 0, sizeof(po));
 	po.argv = argv;
 	po.in = -1;
@@ -253,6 +260,7 @@ int send_pack(struct send_pack_args *args,
 	unsigned cmds_sent = 0;
 	int ret = 0;
 	struct async demux;
+	const char *p;
 
 	/* Does the other end support the reporting? */
 	if (server_supports("report-status"))
@@ -263,6 +271,8 @@ int send_pack(struct send_pack_args *args,
 		args->use_ofs_delta = 1;
 	if (server_supports("side-band-64k"))
 		use_sideband = 1;
+	if ((p = server_supports("limit-object-count=")))
+		args->max_object_count = strtoul(p + 19, NULL, 10);
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 05d7ab1..eaacb20 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -12,6 +12,7 @@ struct send_pack_args {
 		use_ofs_delta:1,
 		dry_run:1,
 		stateless_rpc:1;
+	unsigned long max_object_count;
 };
 
 int send_pack(struct send_pack_args *args,
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 0eace37..f1f8f80 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -222,4 +222,70 @@ test_expect_success 'deny pushing to delete current branch' '
 	)
 '
 
+echo "0000" > pkt-flush
+
+test_expect_success 'verify that limit-object-count capability is not advertised by default' '
+	rewound_push_setup &&
+	(
+	    cd parent &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    test_must_fail grep -q "limit-object-count" output
+	)
+'
+
+test_expect_success 'verify that receive.objectCountLimit triggers limit-object-count capability' '
+	(
+	    cd parent &&
+	    git config receive.objectCountLimit 1 &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    grep -q "limit-object-count=1" output
+	)
+'
+
+test_expect_success 'deny pushing when receive.objectCountLimit is exceeded' '
+	(
+	    cd child &&
+	    git reset --hard origin/master &&
+	    echo three > file && git commit -a -m three &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "object count limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+	(
+	    cd child &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "object count limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increase receive.objectCountLimit' '
+	(
+	    cd parent &&
+	    git config receive.objectCountLimit 10 &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    grep -q "limit-object-count=10" output
+	)
+'
+
+test_expect_success 'push is allowed when object limit is not exceeded' '
+	(
+	    cd child &&
+	    git send-pack ../parent master 2>errs &&
+	    test_must_fail grep -q "object count limit" errs &&
+	    # Also no error message from remote receive-pack
+	    test_must_fail grep -q "receive\\.objectCountLimit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" = "$child_head"
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (6 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 21:37                 ` [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
  2011-05-15 21:52                 ` [PATCHv3 0/9] Push limits Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

Add a new receive.packSizeLimit config variable which defines an upper
limit on the pack size to accept in a single push.

This limit is advertised to clients, using the new "limit-pack-size=<num>"
capability. The client side - aka. send-pack - parses this capability and
forwards it to pack-objects, using the --max-pack-size option.
pack-objects then checks the generated pack against the limit and aborts
the pack transmission if the pack becomes too large.

However, older clients that do not understand the capability will not check
their pack against the limit, and will end up pushing the pack to the server.
Currently there is no extra check on the server to detect a push that exceeds
receive.packSizeLimit. However, such a check could be done in a pre-receive
or update hook.

Documentation and tests are included.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt                          |    9 +++
 Documentation/technical/protocol-capabilities.txt |    1 +
 builtin/receive-pack.c                            |   10 +++-
 builtin/send-pack.c                               |   14 ++++-
 send-pack.h                                       |    1 +
 t/t5400-send-pack.sh                              |   62 +++++++++++++++++++++
 6 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fd6c130..acc1ef2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1599,6 +1599,15 @@ receive.unpackLimit::
 	especially on slow filesystems.  If not set, the value of
 	`transfer.unpackLimit` is used instead.
 
+receive.packSizeLimit::
+	If the pack file transferred in a push exceeds this limit,
+	then the entire push will be refused. This is meant to prevent
+	an unintended large push (typically a result of the user not
+	being aware of exactly what is being pushed, e.g. pushing a
+	large rewritten history) from entering the repo. If not set,
+	there is no upper limit on the size of the pack transferred
+	in a single push.
+
 receive.objectCountLimit::
 	If the number of objects received in a push exceeds this limit,
 	then the entire push will be refused. This is meant to prevent
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index aac66af..d3921a9 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -205,6 +205,7 @@ the server. If the check fails, the client must abort the upload, and
 report the reason for the aborted push back to the user.
 The following "limit-*" capabilites are recognized:
 
+ - limit-pack-size=<num>    (Maximum size (in bytes) of uploaded pack)
  - limit-object-count=<num> (Maximum number of objects in a pack)
 
 More "limit-*" capabilities may be added in the future. The client
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d919e17..97c154b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -28,7 +28,7 @@ static int receive_fsck_objects;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int unpack_limit = 100;
-static unsigned long limit_object_count;
+static unsigned long limit_pack_size, limit_object_count;
 static int report_status;
 static int use_sideband;
 static int prefer_ofs_delta = 1;
@@ -74,6 +74,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.packsizelimit") == 0) {
+		limit_pack_size = git_config_ulong(var, value);
+		return 0;
+	}
+
 	if (strcmp(var, "receive.objectcountlimit") == 0) {
 		limit_object_count = git_config_ulong(var, value);
 		return 0;
@@ -118,6 +123,9 @@ static const char *capabilities()
 	int ret = snprintf(buf, sizeof(buf),
 			   " report-status delete-refs side-band-64k%s",
 			   prefer_ofs_delta ? " ofs-delta" : "");
+	if (limit_pack_size > 0)
+		ret += snprintf(buf + ret, sizeof(buf) - ret,
+				" limit-pack-size=%lu", limit_pack_size);
 	if (limit_object_count > 0)
 		ret += snprintf(buf + ret, sizeof(buf) - ret,
 				" limit-object-count=%lu", limit_object_count);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 4103116..e7c82ce 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -50,10 +50,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		NULL,
 		NULL,
 		NULL,
+		NULL,
 	};
 	struct child_process po;
 	int i;
-	char buf[40];
+	char buf1[40], buf2[40];
 
 	i = 4;
 	if (args->use_thin_pack)
@@ -64,10 +65,15 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		argv[i++] = "-q";
 	if (args->progress)
 		argv[i++] = "--progress";
+	if (args->max_pack_size > 0) {
+		snprintf(buf1, sizeof(buf1), "--max-pack-size=%lu",
+			 args->max_pack_size);
+		argv[i++] = buf1;
+	}
 	if (args->max_object_count > 0) {
-		snprintf(buf, sizeof(buf), "--max-object-count=%lu",
+		snprintf(buf2, sizeof(buf2), "--max-object-count=%lu",
 			 args->max_object_count);
-		argv[i++] = buf;
+		argv[i++] = buf2;
 	}
 	memset(&po, 0, sizeof(po));
 	po.argv = argv;
@@ -271,6 +277,8 @@ int send_pack(struct send_pack_args *args,
 		args->use_ofs_delta = 1;
 	if (server_supports("side-band-64k"))
 		use_sideband = 1;
+	if ((p = server_supports("limit-pack-size=")))
+		args->max_pack_size = strtoul(p + 16, NULL, 10);
 	if ((p = server_supports("limit-object-count=")))
 		args->max_object_count = strtoul(p + 19, NULL, 10);
 
diff --git a/send-pack.h b/send-pack.h
index eaacb20..1c6c202 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -12,6 +12,7 @@ struct send_pack_args {
 		use_ofs_delta:1,
 		dry_run:1,
 		stateless_rpc:1;
+	unsigned long max_pack_size;
 	unsigned long max_object_count;
 };
 
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index f1f8f80..26fd1b4 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -288,4 +288,66 @@ test_expect_success 'push is allowed when object limit is not exceeded' '
 	test "$parent_head" = "$child_head"
 '
 
+test_expect_success 'verify that limit-pack-size capability is not advertised by default' '
+	rewound_push_setup &&
+	(
+	    cd parent &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    test_must_fail grep -q "limit-pack-size" output
+	)
+'
+
+test_expect_success 'verify that receive.packSizeLimit triggers limit-pack-size capability' '
+	(
+	    cd parent &&
+	    git config receive.packSizeLimit 10 &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    grep -q "limit-pack-size=10" output
+	)
+'
+
+test_expect_success 'deny pushing when receive.packSizeLimit is exceeded' '
+	(
+	    cd child &&
+	    git reset --hard origin/master &&
+	    echo three > file && git commit -a -m three &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "pack size limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+	(
+	    cd child &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "pack size limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increase receive.packSizeLimit' '
+	(
+	    cd parent &&
+	    git config receive.packSizeLimit 1000000 &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    grep -q "limit-pack-size=1000000" output
+	)
+'
+
+test_expect_success 'push is allowed when pack size is not exceeded' '
+	(
+	    cd child &&
+	    git send-pack ../parent master 2>errs &&
+	    test_must_fail grep -q "pack size limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" = "$child_head"
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (7 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
@ 2011-05-15 21:37                 ` Johan Herland
  2011-05-15 21:52                 ` [PATCHv3 0/9] Push limits Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Johan Herland, git

Add a new receive.commitCountLimit config variable which defines an upper
limit on the number of commits to accept in a single push.

This limit is advertised to clients, using the new "limit-commit-count=<num>"
capability. The client side - aka. send-pack - parses this capability and
forwards it to pack-objects, using the recently added --max-commit-count
option. pack-objects then checks the generated pack against the limit and
aborts the pack generation if the pack would have too many objects.

However, older clients that do not understand the capability will not check
their pack against the limit, and will end up pushing the pack to the server.
Currently there is no extra check on the server to detect a push that exceeds
receive.commitCountLimit. However, such a check could be done in a pre-receive
or update hook.

Documentation and tests are included.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/config.txt                          |    9 +++
 Documentation/technical/protocol-capabilities.txt |    1 +
 builtin/receive-pack.c                            |   10 +++-
 builtin/send-pack.c                               |   10 +++-
 send-pack.h                                       |    1 +
 t/t5400-send-pack.sh                              |   63 +++++++++++++++++++++
 6 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index acc1ef2..c63a224 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1617,6 +1617,15 @@ receive.objectCountLimit::
 	there is no upper limit on the number of objects transferred
 	in a single push.
 
+receive.commitCountLimit::
+	If the number of commits received in a push exceeds this limit,
+	then the entire push will be refused. This is meant to prevent
+	an unintended large push (typically a result of the user not
+	being aware of exactly what is being pushed, e.g. pushing a
+	large rewritten history) from entering the repo. If not set,
+	there is no upper limit on the number of commits transferred
+	in a single push.
+
 receive.denyDeletes::
 	If set to true, git-receive-pack will deny a ref update that deletes
 	the ref. Use this to prevent such a ref deletion via a push.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index d3921a9..c16240b 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -207,6 +207,7 @@ The following "limit-*" capabilites are recognized:
 
  - limit-pack-size=<num>    (Maximum size (in bytes) of uploaded pack)
  - limit-object-count=<num> (Maximum number of objects in a pack)
+ - limit-commit-count=<num> (Maximum number of commits in a pack)
 
 More "limit-*" capabilities may be added in the future. The client
 is free to ignore any "limit-*" capabilities it does not understand.
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 97c154b..1d1170b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -28,7 +28,7 @@ static int receive_fsck_objects;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int unpack_limit = 100;
-static unsigned long limit_pack_size, limit_object_count;
+static unsigned long limit_pack_size, limit_object_count, limit_commit_count;
 static int report_status;
 static int use_sideband;
 static int prefer_ofs_delta = 1;
@@ -84,6 +84,11 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (strcmp(var, "receive.commitcountlimit") == 0) {
+		limit_commit_count = git_config_ulong(var, value);
+		return 0;
+	}
+
 	if (strcmp(var, "receive.fsckobjects") == 0) {
 		receive_fsck_objects = git_config_bool(var, value);
 		return 0;
@@ -129,6 +134,9 @@ static const char *capabilities()
 	if (limit_object_count > 0)
 		ret += snprintf(buf + ret, sizeof(buf) - ret,
 				" limit-object-count=%lu", limit_object_count);
+	if (limit_commit_count > 0)
+		ret += snprintf(buf + ret, sizeof(buf) - ret,
+				" limit-commit-count=%lu", limit_commit_count);
 	assert(ret < sizeof(buf));
 	return buf;
 }
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e7c82ce..0ef68ea 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -51,10 +51,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		NULL,
 		NULL,
 		NULL,
+		NULL,
 	};
 	struct child_process po;
 	int i;
-	char buf1[40], buf2[40];
+	char buf1[40], buf2[40], buf3[40];
 
 	i = 4;
 	if (args->use_thin_pack)
@@ -75,6 +76,11 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 			 args->max_object_count);
 		argv[i++] = buf2;
 	}
+	if (args->max_commit_count > 0) {
+		snprintf(buf3, sizeof(buf3), "--max-commit-count=%lu",
+			 args->max_commit_count);
+		argv[i++] = buf3;
+	}
 	memset(&po, 0, sizeof(po));
 	po.argv = argv;
 	po.in = -1;
@@ -281,6 +287,8 @@ int send_pack(struct send_pack_args *args,
 		args->max_pack_size = strtoul(p + 16, NULL, 10);
 	if ((p = server_supports("limit-object-count=")))
 		args->max_object_count = strtoul(p + 19, NULL, 10);
+	if ((p = server_supports("limit-commit-count=")))
+		args->max_commit_count = strtoul(p + 19, NULL, 10);
 
 	if (!remote_refs) {
 		fprintf(stderr, "No refs in common and none specified; doing nothing.\n"
diff --git a/send-pack.h b/send-pack.h
index 1c6c202..9d28cc5 100644
--- a/send-pack.h
+++ b/send-pack.h
@@ -14,6 +14,7 @@ struct send_pack_args {
 		stateless_rpc:1;
 	unsigned long max_pack_size;
 	unsigned long max_object_count;
+	unsigned long max_commit_count;
 };
 
 int send_pack(struct send_pack_args *args,
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 26fd1b4..a71bc47 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -350,4 +350,67 @@ test_expect_success 'push is allowed when pack size is not exceeded' '
 	test "$parent_head" = "$child_head"
 '
 
+test_expect_success 'verify that limit-commit-count capability is not advertised by default' '
+	rewound_push_setup &&
+	(
+	    cd parent &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    test_must_fail grep -q "limit-commit-count" output
+	)
+'
+
+test_expect_success 'verify that receive.commitCountLimit triggers limit-commit-count capability' '
+	(
+	    cd parent &&
+	    git config receive.commitCountLimit 1 &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    grep -q "limit-commit-count=1" output
+	)
+'
+
+test_expect_success 'deny pushing when receive.commitCountLimit is exceeded' '
+	(
+	    cd child &&
+	    git reset --hard origin/master &&
+	    echo three > file && git commit -a -m three &&
+	    echo four > file && git commit -a -m four &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "commit count limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'repeated push failure proves that objects were not stored remotely' '
+	(
+	    cd child &&
+	    test_must_fail git send-pack ../parent master 2>errs &&
+	    grep -q "commit count limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" != "$child_head"
+'
+
+test_expect_success 'increase receive.commitCountLimit' '
+	(
+	    cd parent &&
+	    git config receive.commitCountLimit 2 &&
+	    test_might_fail git receive-pack . <../pkt-flush >output &&
+	    grep -q "limit-commit-count=2" output
+	)
+'
+
+test_expect_success 'push is allowed when commit limit is not exceeded' '
+	(
+	    cd child &&
+	    git send-pack ../parent master 2>errs &&
+	    test_must_fail grep -q "commit count limit" errs
+	) &&
+	parent_head=$(cd parent && git rev-parse --verify master) &&
+	child_head=$(cd child && git rev-parse --verify master) &&
+	test "$parent_head" = "$child_head"
+'
+
 test_done
-- 
1.7.5.rc1.3.g4d7b

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 0/9] Push limits
  2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
                                   ` (8 preceding siblings ...)
  2011-05-15 21:37                 ` [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
@ 2011-05-15 21:52                 ` Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 53+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-05-15 21:52 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Jeff King, Johannes Sixt

On Sun, May 15, 2011 at 23:37, Johan Herland <johan@herland.net> wrote:
> Here's the next iteration of what started out as a simple patch to let
> the server refuse pushes that exceeded a configurable limit on the
> #objects in a pack.
>
> The current patch series allows limiting pushes by
>  - size of pack
>  - #objects in pack
>  - #commits in pack

FWIW I'd find this very useful. I recently spent a fair amount of time
cleaning up the mess created by a user pushing all the tags from
repository A to repository B (don't ask), the options you've
implemented here would have stopped that.

And as you point out even if you refuse these sort of things with a
hook (which I later implemented) that doesn't stop the server from
accepting the objects and keeping them around.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
  2011-05-15 21:37                 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
@ 2011-05-15 22:06                   ` Shawn Pearce
  2011-05-16  1:39                     ` Johan Herland
  2011-05-16  6:12                   ` Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Shawn Pearce @ 2011-05-15 22:06 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> Currently we refuse combining --max-pack-size with --stdout since there's
> no way to make multiple packs when the pack is written to stdout. However,
> we want to be able to limit the maximum size of the pack created by
> --stdout (and abort pack-objects if we are unable to meet that limit).
>
> Therefore, when used together with --stdout, we reinterpret --max-pack-size
> to indicate the maximum pack size which - if exceeded - will cause
> pack-objects to abort with an error message.
...
>                if (pack_to_stdout) {
> +                       if (nr_written != nr_remaining)
> +                               die("unable to make pack within the pack size"
> +                                   " limit (%lu bytes)", pack_size_limit);

I think this is too late. We have already output a bunch of data, up
to the size limit at this point. If the size limit is non-trivial
(e.g. 5 MB) we have already sent most of that to the remote side, and
its already written some of that out to disk.

I'd like this to be a soft limit derived from the reused object sizes.
When planning the pack by looking at where we will reuse an object
from, sum those sizes. If the sum of these sizes would break this
limit, then we abort before even writing the pack header out.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  2011-05-15 21:37                 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
@ 2011-05-15 22:07                   ` Shawn Pearce
  2011-05-15 22:31                     ` Johan Herland
  2011-05-16  6:25                     ` Junio C Hamano
  0 siblings, 2 replies; 53+ messages in thread
From: Shawn Pearce @ 2011-05-15 22:07 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, git

On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> The new --max-object-count option behaves similarly to --max-pack-size,
> except that the decision to split packs is determined by the number of
> objects in the pack, and not by the size of the pack.

Like my note about pack size for this case... I think doing this
during writing is too late. We should be aborting the counting phase
if the output pack is to stdout and we are going to exceed this limit.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  2011-05-15 22:07                   ` Shawn Pearce
@ 2011-05-15 22:31                     ` Johan Herland
  2011-05-15 23:48                       ` Shawn Pearce
  2011-05-16  6:25                     ` Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Johan Herland @ 2011-05-15 22:31 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, Junio C Hamano

On Monday 16 May 2011, Shawn Pearce wrote:
> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> > The new --max-object-count option behaves similarly to --max-pack-size,
> > except that the decision to split packs is determined by the number of
> > objects in the pack, and not by the size of the pack.
> 
> Like my note about pack size for this case... I think doing this
> during writing is too late. We should be aborting the counting phase
> if the output pack is to stdout and we are going to exceed this limit.

The patch actually does this in the --stdout case. Look at the last
hunk in builtin/pack-objects.c:

@@ -2349,6 +2366,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
        if (non_empty && !nr_result)
                return 0;
+       if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
+               die("unable to make pack within the object count limit"
+                       " (%lu objects)", object_count_limit);
        if (nr_result)
                prepare_pack(window, depth);
        write_pack_file();

So in the --stdout case, we have already aborted before we start
writing the pack (i.e. after the counting phase).

The commit message you quote above, are for the case where someone uses
--max-object-count _without_ --stdout, in which case we compare
nr_written to object_count_limit to determine when to split the pack.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  2011-05-15 22:31                     ` Johan Herland
@ 2011-05-15 23:48                       ` Shawn Pearce
  0 siblings, 0 replies; 53+ messages in thread
From: Shawn Pearce @ 2011-05-15 23:48 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

On Sun, May 15, 2011 at 15:31, Johan Herland <johan@herland.net> wrote:
> On Monday 16 May 2011, Shawn Pearce wrote:
>> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
>> > The new --max-object-count option behaves similarly to --max-pack-size,
>> > except that the decision to split packs is determined by the number of
>> > objects in the pack, and not by the size of the pack.
>>
>> Like my note about pack size for this case... I think doing this
>> during writing is too late. We should be aborting the counting phase
>> if the output pack is to stdout and we are going to exceed this limit.
>
> The patch actually does this in the --stdout case. Look at the last
> hunk in builtin/pack-objects.c:
>
> @@ -2349,6 +2366,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>
>        if (non_empty && !nr_result)
>                return 0;
> +       if (pack_to_stdout && object_count_limit && object_count_limit < nr_result)
> +               die("unable to make pack within the object count limit"
> +                       " (%lu objects)", object_count_limit);
>        if (nr_result)
>                prepare_pack(window, depth);
>        write_pack_file();
>
> So in the --stdout case, we have already aborted before we start
> writing the pack (i.e. after the counting phase).
>
> The commit message you quote above, are for the case where someone uses
> --max-object-count _without_ --stdout, in which case we compare
> nr_written to object_count_limit to determine when to split the pack.

Thanks for the clarification. Its Sunday, I am clearly not scanning
patches with the level of detail I should be.  :-)

Given that this block is in here, most of the series looks pretty good
to me. Thanks for following up with this round, I know its a lot more
than you originally wanted to do for this "simple" limit, but I think
its a worthwhile improvement.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
  2011-05-15 22:06                   ` Shawn Pearce
@ 2011-05-16  1:39                     ` Johan Herland
  0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16  1:39 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, Junio C Hamano

On Monday 16 May 2011, Shawn Pearce wrote:
> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> > Currently we refuse combining --max-pack-size with --stdout since
> > there's no way to make multiple packs when the pack is written to
> > stdout. However, we want to be able to limit the maximum size of the
> > pack created by --stdout (and abort pack-objects if we are unable to
> > meet that limit).
> > 
> > Therefore, when used together with --stdout, we reinterpret
> > --max-pack-size to indicate the maximum pack size which - if exceeded
> > - will cause pack-objects to abort with an error message.
> 
> ...
> 
> >                if (pack_to_stdout) {
> > +                       if (nr_written != nr_remaining)
> > +                               die("unable to make pack within the pack size"
> > +                                   " limit (%lu bytes)", pack_size_limit);
> 
> I think this is too late. We have already output a bunch of data, up
> to the size limit at this point. If the size limit is non-trivial
> (e.g. 5 MB) we have already sent most of that to the remote side, and
> its already written some of that out to disk.
> 
> I'd like this to be a soft limit derived from the reused object sizes.
> When planning the pack by looking at where we will reuse an object
> from, sum those sizes. If the sum of these sizes would break this
> limit, then we abort before even writing the pack header out.

I agree, but it's currently late Sunday (early Monday), and after
looking at this for a while, I'm no longer thinking straight.
If someone that groks the pack-objects internal could help out, I'd be
really grateful. AFAICS, we need to drill into prepare_pack() to find
the details needed to estimate the total pack size, but I don't know
exactly which data structure(s) holds the data needed. We probably need
to accumulate a pack size estimate in find_deltas(), and then sum those
across the threads, before we finally compare the total estimate to
pack_size_limit prior to calling write_pack_file().


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
  2011-05-15 21:37                 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
@ 2011-05-16  4:07                   ` Jeff King
  2011-05-16  6:13                     ` Jeff King
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16  4:07 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt

On Sun, May 15, 2011 at 11:37:13PM +0200, Johan Herland wrote:

> I first wrote this patch on a base where e07fd15 (Peff's "send-pack:
> unbreak push over stateless rpc") was not present, and then resolved
> a conflict when rebasing this patch onto current master. I hope Peff
> or Johannes (Sixt) can verify that my patch does not reintroduce the
> deadlock they fixed.

I don't think it reintroduces the deadlock we fixed, but I am worried
that it produces a new, similar one. That is, imagine pack-objects fails
horribly, maybe or maybe not producing any output. We close its pipe
outgoing pipe at the end of run_command, and per Johannes' 09c9957, we
are sure that the sideband demuxer does not hold a pipe end open,
either.

So the remote side sees us close our end of the pipe, knows there is no
more pack data, and then closes their end. So our receive_status should
either get some error message, or EOF, either of which is fine. So no
deadlock there. Essentially, we have done a half-duplex shutdown of the
connection to the remote, and that is enough for everybody to keep
going.

But what if we are not using pipes, but have an actual TCP socket? In
that case, I'm not sure what happens. We don't seem to do a half-duplex
shutdown() anywhere. So I'm concerned that we are still open for sending
from the remote's perspective, and we may deadlock.

However, that would not necessarily be something introduced by your
patch; you would deadlock in receive_status, but prior to that it would
deadlock in the sideband demuxer.

AFAICT, the only way to have an actual TCP connection instead of pipes
is for the push to go over git://, which is enabled almost nowhere. But
we should perhaps check for deadlock on failed pack-objects in that
case, both with and without your patch.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
  2011-05-15 21:37                 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
  2011-05-15 22:06                   ` Shawn Pearce
@ 2011-05-16  6:12                   ` Junio C Hamano
  2011-05-16  9:27                     ` Johan Herland
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16  6:12 UTC (permalink / raw)
  To: Johan Herland; +Cc: Shawn Pearce, git

Johan Herland <johan@herland.net> writes:

> Currently we refuse combining --max-pack-size with --stdout since there's
> no way to make multiple packs when the pack is written to stdout. However,
> we want to be able to limit the maximum size of the pack created by
> --stdout (and abort pack-objects if we are unable to meet that limit).
>
> Therefore, when used together with --stdout, we reinterpret --max-pack-size
> to indicate the maximum pack size which - if exceeded - will cause
> pack-objects to abort with an error message.

I only gave the code a cursory look, but I think your patch does more than
the above paragraphs say. I am not sure those extra change are justified.

For example,

> @@ -229,7 +229,7 @@ static unsigned long write_object(struct sha1file *f,
>  
>  	if (!entry->delta)
>  		usable_delta = 0;	/* no delta */
> -	else if (!pack_size_limit)
> +	else if (!pack_size_limit || pack_to_stdout)
>  	       usable_delta = 1;	/* unlimited packfile */

Why does this conditional have to change its behaviour when writing to the
standard output?  I thought that the only thing you are doing "earlier we
didn't allow setting size limit when writing to standard output, now we
do", and I do not see the linkage between that objective and this change.

> @@ -2315,9 +2318,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  
>  	if (!pack_to_stdout && !pack_size_limit)
>  		pack_size_limit = pack_size_limit_cfg;
> -	if (pack_to_stdout && pack_size_limit)
> -		die("--max-pack-size cannot be used to build a pack for transfer.");
> -	if (pack_size_limit && pack_size_limit < 1024*1024) {
> +	if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
>  		warning("minimum pack size limit is 1 MiB");
>  		pack_size_limit = 1024*1024;
>  	}

Why is the new combination "writing to the standard output, but the
maximum size is limited" does not have the same lower bound to pack size
limit while on-disk packs do?

If you have a reason to believe 1 MiB is too large for a pack size limit,
shouldn't that logic apply equally to the on-disk case?  What does this
change have to do with the interaction with --stdout option?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
  2011-05-16  4:07                   ` Jeff King
@ 2011-05-16  6:13                     ` Jeff King
  2011-05-16  6:39                       ` Jeff King
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16  6:13 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt

On Mon, May 16, 2011 at 12:07:45AM -0400, Jeff King wrote:

> But what if we are not using pipes, but have an actual TCP socket? In
> that case, I'm not sure what happens. We don't seem to do a half-duplex
> shutdown() anywhere. So I'm concerned that we are still open for sending
> from the remote's perspective, and we may deadlock.
> 
> However, that would not necessarily be something introduced by your
> patch; you would deadlock in receive_status, but prior to that it would
> deadlock in the sideband demuxer.
> 
> AFAICT, the only way to have an actual TCP connection instead of pipes
> is for the push to go over git://, which is enabled almost nowhere. But
> we should perhaps check for deadlock on failed pack-objects in that
> case, both with and without your patch.

Ugh, yeah, yet another deadlock. I can reproduce reliably with this:

  [in one terminal]
  mkdir daemon &&
  git init --bare daemon/repo.git &&
  git --git-dir=daemon/repo.git config daemon.receivepack true &&
  git daemon --base-path=$PWD/daemon --export-all --verbose

  [in another]
  git init repo &&
  cd repo &&
  git remote add origin git://localhost/repo.git &&
  echo content >file && git add file && git commit -a -m one &&
  git push -f origin HEAD &&
  echo content >>file && git commit -a -m two &&
  sha1=`git rev-parse HEAD:file` &&
  file=`echo $sha1 | sed 's,..,&/,'` &&
  rm -fv .git/objects/$file &&
  git push

and this patch fixes it:

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index e2f4e21..b9da044 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -345,6 +345,13 @@ int send_pack(struct send_pack_args *args,
 				ref->status = REF_STATUS_NONE;
 			if (args->stateless_rpc)
 				close(out);
+			/* in case we actually have a full-duplex socket
+			 * and not two pipes; we can't use "out" because
+			 * it has been closed already, but in the full-duplex
+			 * case, "in" and "out" are merely dups of each other.
+			 * We can't directly use "in" because it may be
+			 * pointing to the sideband demuxer now */
+			shutdown(fd[0], SHUT_WR);
 			if (use_sideband)
 				finish_async(&demux);
 			return -1;

It does call shutdown() on a non-socket in the pipe case. That should be
a harmless noop, AFAIK.

-Peff

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  2011-05-15 22:07                   ` Shawn Pearce
  2011-05-15 22:31                     ` Johan Herland
@ 2011-05-16  6:25                     ` Junio C Hamano
  2011-05-16  9:49                       ` Johan Herland
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16  6:25 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Shawn Pearce

Shawn Pearce <spearce@spearce.org> writes:

> On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
>> The new --max-object-count option behaves similarly to --max-pack-size,
>> except that the decision to split packs is determined by the number of
>> objects in the pack, and not by the size of the pack.
>
> Like my note about pack size for this case... I think doing this
> during writing is too late. We should be aborting the counting phase
> if the output pack is to stdout and we are going to exceed this limit.

Well, even more important is if this is even useful. What is the user
trying to prevent from happening, and is it a useful thing?

I am not interested in a literal answer "The user is trying to prevent a
push that pushes too many objects in a single push into a repository". I
am questioning why does anybody even care about the object count per-se.

I think "do not hog too much disk" (i.e. size) is an understandable wish,
and max-pack-size imposed on --stdout would be a good approximation for
that.

I would understand "this project has only these files, and pushing a tree
that has 100x leaves than that may be a mistake" (i.e. recursive sum of
number of entries of an individual tree). I would also sort-of understand
"do not push too deep a history at once" (i.e. we do not welcome pushing a
wildly diverged fork that has been allowed to grow for too long).

But I do not think max-object-count is a good approximation for either
to be useful.

Without a good answer to the above question, this looks like a "because we
could", not "because it is useful", feature.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails
  2011-05-16  6:13                     ` Jeff King
@ 2011-05-16  6:39                       ` Jeff King
  2011-05-16  6:46                         ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
                                           ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Jeff King @ 2011-05-16  6:39 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt

On Mon, May 16, 2011 at 02:13:54AM -0400, Jeff King wrote:

> and this patch fixes it:
> 
> diff --git a/builtin/send-pack.c b/builtin/send-pack.c
> index e2f4e21..b9da044 100644
> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -345,6 +345,13 @@ int send_pack(struct send_pack_args *args,
>  				ref->status = REF_STATUS_NONE;
>  			if (args->stateless_rpc)
>  				close(out);
> +			/* in case we actually have a full-duplex socket
> +			 * and not two pipes; we can't use "out" because
> +			 * it has been closed already, but in the full-duplex
> +			 * case, "in" and "out" are merely dups of each other.
> +			 * We can't directly use "in" because it may be
> +			 * pointing to the sideband demuxer now */
> +			shutdown(fd[0], SHUT_WR);
>  			if (use_sideband)
>  				finish_async(&demux);
>  			return -1;
> 
> It does call shutdown() on a non-socket in the pipe case. That should be
> a harmless noop, AFAIK.

If we do care (or if we just want to be cleaner), this patch series also
works (and goes on top of the same deadlock topic, i.e., e07fd15):

  [1/3]: connect: treat generic proxy processes like ssh processes
  [2/3]: connect: let callers know if connection is a socket
  [3/3]: send-pack: avoid deadlock on git:// push with failed pack-objects

Another approach would be to actually spawn a pipe-based helper for tcp
connections (sort of a "git netcat"). That would mean all git-protocol
connections would get the same descriptor semantics, in case any other
bugs are lurking. I'm not sure if the ugliness (extra process to manage)
and decreased efficiency (pointlessly proxying data through an extra set
of pipes) are worth it. The only thing which makes me not reject it out
of hand is that it is already how git-over-ssh works (and not unlike
git-over-http), so the extra process and inefficiency are probably not
_that_ big a deal. It just feels ugly. I wish there were a portable way
to split a full-duplex socket into two half-duplex halves, but AFAIK,
that is not possible.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/3] connect: treat generic proxy processes like ssh processes
  2011-05-16  6:39                       ` Jeff King
@ 2011-05-16  6:46                         ` Jeff King
  2011-05-16 19:57                           ` Johannes Sixt
  2011-05-16  6:52                         ` [PATCH 2/3] connect: let callers know if connection is a socket Jeff King
  2011-05-16  6:52                         ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
  2 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16  6:46 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt

The git_connect function returns two ends of a pipe for
talking with a remote, plus a struct child_process
representing the other end of the pipe. If we have a direct
socket connection, then this points to a special "no_fork"
child process.

The code path for doing git-over-pipes or git-over-ssh sets
up this child process to point to the child git command or
the ssh process. When we call finish_connect eventually, we
check wait() on the command and report its return value.

The code path for git://, on the other hand, always sets it
to no_fork. In the case of a direct TCP connection, this
makes sense; we have no child process. But in the case of a
proxy command (configured by core.gitproxy), we do have a
child process, but we throw away its pid, and therefore
ignore its return code.

Instead, let's keep that information in the proxy case, and
respect its return code, which can help catch some errors
(though depending on your proxy command, it will be errors
reported by the proxy command itself, and not propagated
from git commands. Still, it is probably better to propagate
such errors than to ignore them).

It also means that the child_process field can reliably be
used to determine whether the returned descriptors are
actually a full-duplex socket, which means we should be
using shutdown() instead of a simple close.

Signed-off-by: Jeff King <peff@peff.net>
---
Obviously I am interested mainly in the last bit for this series. But I
consider the rest of it a minor bugfix on its own.

 connect.c |   27 +++++++++++++++------------
 1 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/connect.c b/connect.c
index db965c9..86ad150 100644
--- a/connect.c
+++ b/connect.c
@@ -403,12 +403,12 @@ static int git_use_proxy(const char *host)
 	return (git_proxy_command && *git_proxy_command);
 }
 
-static void git_proxy_connect(int fd[2], char *host)
+static struct child_process *git_proxy_connect(int fd[2], char *host)
 {
 	const char *port = STR(DEFAULT_GIT_PORT);
 	char *colon, *end;
 	const char *argv[4];
-	struct child_process proxy;
+	struct child_process *proxy;
 
 	if (host[0] == '[') {
 		end = strchr(host + 1, ']');
@@ -431,14 +431,15 @@ static void git_proxy_connect(int fd[2], char *host)
 	argv[1] = host;
 	argv[2] = port;
 	argv[3] = NULL;
-	memset(&proxy, 0, sizeof(proxy));
-	proxy.argv = argv;
-	proxy.in = -1;
-	proxy.out = -1;
-	if (start_command(&proxy))
+	proxy = xcalloc(1, sizeof(*proxy));
+	proxy->argv = argv;
+	proxy->in = -1;
+	proxy->out = -1;
+	if (start_command(proxy))
 		die("cannot start proxy %s", argv[0]);
-	fd[0] = proxy.out; /* read from proxy stdout */
-	fd[1] = proxy.in;  /* write to proxy stdin */
+	fd[0] = proxy->out; /* read from proxy stdout */
+	fd[1] = proxy->in;  /* write to proxy stdin */
+	return proxy;
 }
 
 #define MAX_CMD_LEN 1024
@@ -553,9 +554,11 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 		 */
 		char *target_host = xstrdup(host);
 		if (git_use_proxy(host))
-			git_proxy_connect(fd, host);
-		else
+			conn = git_proxy_connect(fd, host);
+		else {
 			git_tcp_connect(fd, host, flags);
+			conn = &no_fork;
+		}
 		/*
 		 * Separate original protocol components prog and path
 		 * from extended host header with a NUL byte.
@@ -571,7 +574,7 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 		free(url);
 		if (free_path)
 			free(path);
-		return &no_fork;
+		return conn;
 	}
 
 	conn = xcalloc(1, sizeof(*conn));
-- 
1.7.5.1.13.g0ca09.dirty

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-15 21:37                 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
@ 2011-05-16  6:50                   ` Junio C Hamano
  2011-05-16  9:53                     ` Johan Herland
  2011-05-16 22:02                     ` Sverre Rabbelier
  0 siblings, 2 replies; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16  6:50 UTC (permalink / raw)
  To: Johan Herland; +Cc: Shawn Pearce, git

Johan Herland <johan@herland.net> writes:

> +const char *server_supports(const char *feature)
>  {
> -	return server_capabilities &&
> -		strstr(server_capabilities, feature) != NULL;
> +	if (server_capabilities)
> +		return strstr(server_capabilities, feature);
> +	return NULL;
>  }

I've been meaning to fix this part, but currently the feature set is given
as space separated list " featurea featureb featurec" and we check with a
token without any space around, e.g. "if (server_supports("no-done"))",
which is quite broken.

We should tighten this strstr() to make sure we are not matching in the
middle of a string, and the need to do so is even greater now that you are
going to introduce "foo=<value>" and the value could even be strings in
the future.

How about implementing rules like these:

 - feature must appear at the beginning of server_capabilities, or the
   byte immediately before the matched location in server_capabilities
   must be a SP; and

 - if "feature" does not end with an equal sign, it does not expect a
   value. The byte after the matched location in server_capabilities must
   be either the end of string or a SP. A feature that expects a value is
   checked with 'server_supports("feature=")' and the matched location in
   server_capabilities can be followed by anything (i.e. if at the end of
   string or a SP, it gets an empty string as the value, and otherwise it
   will get the stretch of bytes after the '=' up to the next SP).

Given the server_capabilities string "foo=ab bar=froboz boz=nitfol",
I would like to see these happen:

  server_supports("foo=") matches "foo=ab";

  server_supports("ab") does not match anything;

  server_supports("bar") does not match anything;

  server_supports("boz") matches boz=nitfol, without failing at
                         the end of bar=froboz that comes earlier.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 2/3] connect: let callers know if connection is a socket
  2011-05-16  6:39                       ` Jeff King
  2011-05-16  6:46                         ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
@ 2011-05-16  6:52                         ` Jeff King
  2011-05-16  6:52                         ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
  2 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2011-05-16  6:52 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt

They might care because they want to do a half-duplex close.
With pipes, that means simply closing the output descriptor;
with a socket, you must actually call shutdown.

Instead of exposing the magic no_fork child_process struct,
let's encapsulate the test in a function.

Signed-off-by: Jeff King <peff@peff.net>
---
An more object-oriented refactoring would be something like:

  struct git_connection {
    struct child_process child;
    int in;
    int out;
  };

  void git_connection_read_fd(struct git_connection *c)
  {
          return c->in;
  }

  void git_connect_write_fd(struct git_connect *c)
  {
          return c->child.pid ? c->out : c->in;
  }

  void git_connection_half_duplex_close(struct git_connection *c)
  {
          if (!c->child.pid)
                  shutdown(c->in, SHUT_WR);
          else
                  close(c->out);
  }

but the idea that a git connection is defined by two file descriptors
runs throughout the code (in fact, we don't even explicitly do the
half-duplex close in the pipe case; we hand the descriptor off to the
pack-objects run-command, which takes ownership). So trying to be fancy
and abstracted is not worth it in this case.

 cache.h   |    1 +
 connect.c |    7 ++++++-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index bf468e5..724aad4 100644
--- a/cache.h
+++ b/cache.h
@@ -865,6 +865,7 @@ extern struct ref *find_ref_by_name(const struct ref *list, const char *name);
 #define CONNECT_VERBOSE       (1u << 0)
 extern struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
 extern int finish_connect(struct child_process *conn);
+extern int git_connection_is_socket(struct child_process *conn);
 extern int path_match(const char *path, int nr, char **match);
 struct extra_have_objects {
 	int nr, alloc;
diff --git a/connect.c b/connect.c
index 86ad150..20a008d 100644
--- a/connect.c
+++ b/connect.c
@@ -634,10 +634,15 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	return conn;
 }
 
+int git_connection_is_socket(struct child_process *conn)
+{
+	return conn == &no_fork;
+}
+
 int finish_connect(struct child_process *conn)
 {
 	int code;
-	if (!conn || conn == &no_fork)
+	if (!conn || git_connection_is_socket(conn))
 		return 0;
 
 	code = finish_command(conn);
-- 
1.7.5.1.13.g0ca09.dirty

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects
  2011-05-16  6:39                       ` Jeff King
  2011-05-16  6:46                         ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
  2011-05-16  6:52                         ` [PATCH 2/3] connect: let callers know if connection is a socket Jeff King
@ 2011-05-16  6:52                         ` Jeff King
  2011-05-16 20:02                           ` Johannes Sixt
  2 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-16  6:52 UTC (permalink / raw)
  To: Johan Herland; +Cc: Junio C Hamano, Shawn Pearce, git, Johannes Sixt

Commit 09c9957c fixes a deadlock in which pack-objects
fails, the remote end is still waiting for pack data, and we
are still waiting for the remote end to say something (see
that commit for a much more in-depth explanation).

We solved the problem there by making sure the output pipe
is closed on error; thus the remote sees EOF, and proceeds
to complain and close its end of the connection.

However, in the special case of push over git://, we don't
have a pipe, but rather a full-duplex socket, with another
dup()-ed descriptor in place of the second half of the pipe.
In this case, closing the second descriptor signals nothing
to the remote end, and we still deadlock.

This patch calls shutdown() explicitly to signal EOF to the
other side.

Signed-off-by: Jeff King <peff@peff.net>
---
And if you wanted to drop the first two patches, this probably works OK
without the conditional, as the shutdown is just a no-op on a pipe
descriptor then.

 builtin-send-pack.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 3e70795..682a3f9 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -520,6 +520,8 @@ int send_pack(struct send_pack_args *args,
 				ref->status = REF_STATUS_NONE;
 			if (args->stateless_rpc)
 				close(out);
+			if (git_connection_is_socket(conn))
+				shutdown(fd[0], SHUT_WR);
 			if (use_sideband)
 				finish_async(&demux);
 			return -1;
-- 
1.7.5.1.13.g0ca09.dirty

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout
  2011-05-16  6:12                   ` Junio C Hamano
@ 2011-05-16  9:27                     ` Johan Herland
  0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16  9:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn Pearce

On Monday 16 May 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Currently we refuse combining --max-pack-size with --stdout since
> > there's no way to make multiple packs when the pack is written to
> > stdout. However, we want to be able to limit the maximum size of the
> > pack created by --stdout (and abort pack-objects if we are unable to
> > meet that limit).
> > 
> > Therefore, when used together with --stdout, we reinterpret
> > --max-pack-size to indicate the maximum pack size which - if exceeded
> > - will cause pack-objects to abort with an error message.
> 
> I only gave the code a cursory look, but I think your patch does more
> than the above paragraphs say. I am not sure those extra change are
> justified.
> 
> For example,
> 
> > @@ -229,7 +229,7 @@ static unsigned long write_object(struct sha1file *f,
> > 
> >  	if (!entry->delta)
> >  		usable_delta = 0;	/* no delta */
> > -	else if (!pack_size_limit)
> > +	else if (!pack_size_limit || pack_to_stdout)
> >  	       usable_delta = 1;	/* unlimited packfile */
> 
> Why does this conditional have to change its behaviour when writing to
> the standard output?  I thought that the only thing you are doing
> "earlier we didn't allow setting size limit when writing to standard
> output, now we do", and I do not see the linkage between that objective
> and this change.

AFAICS, the intention of the above "else if" block, is to enable
usable_delta when there is no chance of a pack split happening.
To establish that a pack split cannot happen, the code checks that
pack_size_limit is disabled. Previously, pack_size_limit and
pack_to_stdout was mutually exclusive (look at the last hunk in
pack-objects.c). However, with this patch, it is possible to have
both pack_size_limit and pack_to_stdout enabled.

Crucially, though, when both are enabled, a pack split is _still_
impossible, since a pack split while pack_to_stdout is enabled, forces
pack-objects to abort altogether.

In other words, when pack_to_stdout is enabled, pack_size_limit is no
longer a good indicator of whether a pack split might happen. Instead,
pack_to_stdout being enabled is a good enough indicator in itself to
guarantee that no pack split can happen (and hence that usable_delta
should be enabled).

> > @@ -2315,9 +2318,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> > 
> >  	if (!pack_to_stdout && !pack_size_limit)
> >  		pack_size_limit = pack_size_limit_cfg;
> > -	if (pack_to_stdout && pack_size_limit)
> > -		die("--max-pack-size cannot be used to build a pack for transfer.");
> > -	if (pack_size_limit && pack_size_limit < 1024*1024) {
> > +	if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {
> >  		warning("minimum pack size limit is 1 MiB");
> >  		pack_size_limit = 1024*1024;
> >  	}
> 
> Why is the new combination "writing to the standard output, but the
> maximum size is limited" does not have the same lower bound to pack size
> limit while on-disk packs do?

The reason for this change is purely to leave the server (receive-pack)
in control of the pack size limit. If the server for any reason does
specify a pack size limit less than 1 MiB, we do not want the client
blindly ignoring that limit, and then submitting a pack that the server
will reject.

If we _do_ want a hard lower bound on the pack size limit, we should
apply that server-side (by setting 1 MiB as the minimum allowed value
for receive.packSizeLimit). However, we will now have a problem if we
in the future decide to change this lower bound for any reason
whatsoever: We will need to change it in both receive-pack and
pack-objects, and for as long as there are new clients talking to old
servers (or vice versa if we decrease the lower bound), there will be
clients submitting packs that are then rejected by the server.

I'd rather put the server in total control of the pack size limit.

> If you have a reason to believe 1 MiB is too large for a pack size limit,
> shouldn't that logic apply equally to the on-disk case?  What does this
> change have to do with the interaction with --stdout option?

I have no opinion on the lower bound of the pack size limit in the
on-disk case.

In the above discussion, the usage of --stdout is synonymous with the
send-pack scenario (pack-objects communicating with receive-pack).
Although not true in general, it is true for this patch, since up until
now pack-objects would abort if both --stdout and --max-pack-size were
in use. Therefore, for the two changes discussed above:

 - else if (!pack_size_limit)
 + else if (!pack_size_limit || pack_to_stdout)

and

 - if (pack_size_limit && pack_size_limit < 1024*1024) {
 + if (!pack_to_stdout && pack_size_limit && pack_size_limit < 1024*1024) {

I can deduce that they only affect the current use case (the send-pack
scenario), since these changes make no (functional) difference in any
other use case (where --stdout and --max-pack-size are mutually
exclusive).


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size
  2011-05-16  6:25                     ` Junio C Hamano
@ 2011-05-16  9:49                       ` Johan Herland
  0 siblings, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16  9:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn Pearce

On Monday 16 May 2011, Junio C Hamano wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> > On Sun, May 15, 2011 at 14:37, Johan Herland <johan@herland.net> wrote:
> >> The new --max-object-count option behaves similarly to
> >> --max-pack-size, except that the decision to split packs is
> >> determined by the number of objects in the pack, and not by the size
> >> of the pack.
> > 
> > Like my note about pack size for this case... I think doing this
> > during writing is too late. We should be aborting the counting phase
> > if the output pack is to stdout and we are going to exceed this limit.
> 
> Well, even more important is if this is even useful. What is the user
> trying to prevent from happening, and is it a useful thing?

I initially met this problem at $dayjob, where a user would push a large 
pack from an unrelated repo, and the push would be refused by our update 
hook, but not without storing the pack on the server first. Obviously, the 
main objective of the patch is to prevent the pack from being stored on the 
server in the first place. [1]

Based on that, I do not really care exactly what kind of limit I have to set 
in order to prevent the push. I can easily enough find usable thresholds in 
any of #objects, #commits or pack size.

> I think "do not hog too much disk" (i.e. size) is an understandable wish,
> and max-pack-size imposed on --stdout would be a good approximation for
> that.

Agreed.

> I would understand "this project has only these files, and pushing a tree
> that has 100x leaves than that may be a mistake" (i.e. recursive sum of
> number of entries of an individual tree). I would also sort-of understand
> "do not push too deep a history at once" (i.e. we do not welcome pushing
> a wildly diverged fork that has been allowed to grow for too long).
> 
> But I do not think max-object-count is a good approximation for either
> to be useful.

I agree in principle, but currently, it's the only limit that we can enforce 
on the server side.

> Without a good answer to the above question, this looks like a "because
> we could", not "because it is useful", feature.

Ok, I can drop the client side part of this patch (including the limit-
object-count capability), but I would like to retain the server side part 
(abort if pack header reveals too many objects), since otherwise I have to 
wait for all users to upgrade to a "limit-*"-aware client before I can 
consider this problem truly solved.


...Johan


[1]: There is a followup to this problem, where a subsequent 'git gc' will 
find all the objects in the pushed (but rejected by update-hook) pack to be 
unreferenced, and then explode them into loose objects. In one case, a user 
pushed a 300 MB pack to the server that was then exploded into 5 GB of loose 
objects by a subsequent 'git gc'. Needless to say, we now run 'git gc --
prune=now' as a workaround until this can be fixed.

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-16  6:50                   ` Junio C Hamano
@ 2011-05-16  9:53                     ` Johan Herland
  2011-05-16 22:02                     ` Sverre Rabbelier
  1 sibling, 0 replies; 53+ messages in thread
From: Johan Herland @ 2011-05-16  9:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Shawn Pearce

On Monday 16 May 2011, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > +const char *server_supports(const char *feature)
> > 
> >  {
> > 
> > -	return server_capabilities &&
> > -		strstr(server_capabilities, feature) != NULL;
> > +	if (server_capabilities)
> > +		return strstr(server_capabilities, feature);
> > +	return NULL;
> > 
> >  }
> 
> I've been meaning to fix this part, but currently the feature set is
> given as space separated list " featurea featureb featurec" and we check
> with a token without any space around, e.g. "if
> (server_supports("no-done"))", which is quite broken.
> 
> We should tighten this strstr() to make sure we are not matching in the
> middle of a string, and the need to do so is even greater now that you
> are going to introduce "foo=<value>" and the value could even be strings
> in the future.
> 
> How about implementing rules like these:
> 
> [...]

Agreed. I'll take a stab at this in the re-roll.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
  2011-05-16  6:46                         ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
@ 2011-05-16 19:57                           ` Johannes Sixt
  2011-05-16 23:12                             ` Junio C Hamano
  2011-05-17  5:54                             ` Jeff King
  0 siblings, 2 replies; 53+ messages in thread
From: Johannes Sixt @ 2011-05-16 19:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

Am 16.05.2011 08:46, schrieb Jeff King:
> The git_connect function returns two ends of a pipe for
> talking with a remote, plus a struct child_process
> representing the other end of the pipe. If we have a direct
> socket connection, then this points to a special "no_fork"
> child process.
> 
> The code path for doing git-over-pipes or git-over-ssh sets
> up this child process to point to the child git command or
> the ssh process. When we call finish_connect eventually, we
> check wait() on the command and report its return value.
> 
> The code path for git://, on the other hand, always sets it
> to no_fork. In the case of a direct TCP connection, this
> makes sense; we have no child process. But in the case of a
> proxy command (configured by core.gitproxy), we do have a
> child process, but we throw away its pid, and therefore
> ignore its return code.
> 
> Instead, let's keep that information in the proxy case, and
> respect its return code, which can help catch some errors

This patch looks strikingly familiar. I had written an almost identical
change more than 3 years ago and forgot about it, though the
justification I noted in the commit was more to properly shutdown the
proxy process rather than to abandon it and let it be collected by
init(8). Your justification is much better.

There's one problem with your implementation, though:

> -static void git_proxy_connect(int fd[2], char *host)
> +static struct child_process *git_proxy_connect(int fd[2], char *host)
>  {
>  	const char *port = STR(DEFAULT_GIT_PORT);
>  	char *colon, *end;
>  	const char *argv[4];
> -	struct child_process proxy;
> +	struct child_process *proxy;
>  
>  	if (host[0] == '[') {
>  		end = strchr(host + 1, ']');
> @@ -431,14 +431,15 @@ static void git_proxy_connect(int fd[2], char *host)
>  	argv[1] = host;
>  	argv[2] = port;
>  	argv[3] = NULL;
> -	memset(&proxy, 0, sizeof(proxy));
> -	proxy.argv = argv;
> -	proxy.in = -1;
> -	proxy.out = -1;
> -	if (start_command(&proxy))
> +	proxy = xcalloc(1, sizeof(*proxy));
> +	proxy->argv = argv;
> +	proxy->in = -1;
> +	proxy->out = -1;
> +	if (start_command(proxy))

At this point, proxy->argv would point to automatic storage; but we
need argv[0] in finish_command() for error reporting. In my
implementation, I xmalloced the pointer array and leaked it. (And
that's probably the reason that I never submitted the patch.) I
wouldn't dare to make argv just static because this limits us to have
just one open connection at a time established via git_proxy_connect().
Dunno...

Below is the interdiff that turns your patch into mine, mostly for
exposition: The two hunks in git_connect() are just cosmetic
differences. But the first hunk should be squashed into your patch to
fix a potential crash in an error situation (e.g., when the proxy
dies from a signal) at the cost of a small memory leak.

diff --git a/connect.c b/connect.c
index a28e084..8668913 100644
--- a/connect.c
+++ b/connect.c
@@ -399,9 +399,10 @@ static struct child_process *git_proxy_connect(int fd[2], char *host)
 {
 	const char *port = STR(DEFAULT_GIT_PORT);
-	const char *argv[4];
+	const char **argv;
 	struct child_process *proxy;
 
 	get_host_and_port(&host, &port);
 
+	argv = xmalloc(4*sizeof(*argv));
 	argv[0] = git_proxy_command;
 	argv[1] = host;
@@ -457,5 +458,5 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 	char *end;
 	int c;
-	struct child_process *conn;
+	struct child_process *conn = &no_fork;
 	enum protocol protocol = PROTO_LOCAL;
 	int free_path = 0;
@@ -543,8 +544,6 @@ struct child_process *git_connect(int fd[2], const char *url_orig,
 		if (git_use_proxy(host))
 			conn = git_proxy_connect(fd, host);
-		else {
+		else
 			git_tcp_connect(fd, host, flags);
-			conn = &no_fork;
-		}
 		/*
 		 * Separate original protocol components prog and path

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects
  2011-05-16  6:52                         ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
@ 2011-05-16 20:02                           ` Johannes Sixt
  2011-05-17  5:56                             ` Jeff King
  0 siblings, 1 reply; 53+ messages in thread
From: Johannes Sixt @ 2011-05-16 20:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

Am 16.05.2011 08:52, schrieb Jeff King:
> And if you wanted to drop the first two patches, this probably works OK
> without the conditional, as the shutdown is just a no-op on a pipe
> descriptor then.
> 
>  builtin-send-pack.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/builtin-send-pack.c b/builtin-send-pack.c
> index 3e70795..682a3f9 100644
> --- a/builtin-send-pack.c
> +++ b/builtin-send-pack.c
> @@ -520,6 +520,8 @@ int send_pack(struct send_pack_args *args,
>  				ref->status = REF_STATUS_NONE;
>  			if (args->stateless_rpc)
>  				close(out);
> +			if (git_connection_is_socket(conn))
> +				shutdown(fd[0], SHUT_WR);
>  			if (use_sideband)
>  				finish_async(&demux);
>  			return -1;

We probably need a wrapper for shutdown() on Windows. I'll look into
this tomorrow.

-- Hannes

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-16  6:50                   ` Junio C Hamano
  2011-05-16  9:53                     ` Johan Herland
@ 2011-05-16 22:02                     ` Sverre Rabbelier
  2011-05-16 22:07                       ` Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Sverre Rabbelier @ 2011-05-16 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Shawn Pearce, git

Heya,

On Sun, May 15, 2011 at 23:50, Junio C Hamano <gitster@pobox.com> wrote:
> We should tighten this strstr() to make sure we are not matching in the
> middle of a string, and the need to do so is even greater now that you are
> going to introduce "foo=<value>" and the value could even be strings in
> the future.

If we are writing this down somewhere, should we also dictate how
spaces should be escaped to be "forward-compatible"?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-16 22:02                     ` Sverre Rabbelier
@ 2011-05-16 22:07                       ` Junio C Hamano
  2011-05-16 22:09                         ` Sverre Rabbelier
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 22:07 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Junio C Hamano, Johan Herland, Shawn Pearce, git

Sverre Rabbelier <srabbelier@gmail.com> writes:

> On Sun, May 15, 2011 at 23:50, Junio C Hamano <gitster@pobox.com> wrote:
>> We should tighten this strstr() to make sure we are not matching in the
>> middle of a string, and the need to do so is even greater now that you are
>> going to introduce "foo=<value>" and the value could even be strings in
>> the future.
>
> If we are writing this down somewhere, should we also dictate how
> spaces should be escaped to be "forward-compatible"?

The old clients treat it as SP separted list, e.g. "featurea featureb featureb", 
from the beginning of how capabilities[] code was writte, so I do not see
a point. I would expect an arbitrary string values would be encased in
something like base64, base85 or hex.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-16 22:07                       ` Junio C Hamano
@ 2011-05-16 22:09                         ` Sverre Rabbelier
  2011-05-16 22:12                           ` Junio C Hamano
  0 siblings, 1 reply; 53+ messages in thread
From: Sverre Rabbelier @ 2011-05-16 22:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Shawn Pearce, git

Heya,

On Mon, May 16, 2011 at 15:07, Junio C Hamano <gitster@pobox.com> wrote:
> The old clients treat it as SP separted list, e.g. "featurea featureb featureb",
> from the beginning of how capabilities[] code was writte, so I do not see
> a point. I would expect an arbitrary string values would be encased in
> something like base64, base85 or hex.

Right, that's my point, do we want to leave it up to each individual
option to decide what encoding to use?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-16 22:09                         ` Sverre Rabbelier
@ 2011-05-16 22:12                           ` Junio C Hamano
  2011-05-16 22:16                             ` Sverre Rabbelier
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 22:12 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Junio C Hamano, Johan Herland, Shawn Pearce, git

Sverre Rabbelier <srabbelier@gmail.com> writes:

> On Mon, May 16, 2011 at 15:07, Junio C Hamano <gitster@pobox.com> wrote:
>> The old clients treat it as SP separted list, e.g. "featurea featureb featureb",
>> from the beginning of how capabilities[] code was writte, so I do not see
>> a point. I would expect an arbitrary string values would be encased in
>> something like base64, base85 or hex.
>
> Right, that's my point, do we want to leave it up to each individual
> option to decide what encoding to use?

I do not see any point in deciding that right now without even knowing
what kind of strings (short? mostly ascii? etc.) we would typically want
as payload and tie our hands with a poor decision we would end up making
out of thin air.

That is what I meant by "I do not see a point".

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities
  2011-05-16 22:12                           ` Junio C Hamano
@ 2011-05-16 22:16                             ` Sverre Rabbelier
  0 siblings, 0 replies; 53+ messages in thread
From: Sverre Rabbelier @ 2011-05-16 22:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Shawn Pearce, git

Heya,

On Mon, May 16, 2011 at 15:12, Junio C Hamano <gitster@pobox.com> wrote:
> I do not see any point in deciding that right now without even knowing
> what kind of strings (short? mostly ascii? etc.) we would typically want
> as payload and tie our hands with a poor decision we would end up making
> out of thin air.

Ok, fair enough. :)

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
  2011-05-16 19:57                           ` Johannes Sixt
@ 2011-05-16 23:12                             ` Junio C Hamano
  2011-05-17  5:54                             ` Jeff King
  1 sibling, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2011-05-16 23:12 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Johan Herland, Shawn Pearce, git

Johannes Sixt <j6t@kdbg.org> writes:

> At this point, proxy->argv would point to automatic storage; but we
> need argv[0] in finish_command() for error reporting. In my
> implementation, I xmalloced the pointer array and leaked it. (And
> that's probably the reason that I never submitted the patch.) I
> wouldn't dare to make argv just static because this limits us to have
> just one open connection at a time established via git_proxy_connect().

Good point. If we really care, I would imagine that struct child_process
can gain a boolean flag to tell finish_command() that the argv[] needs, so
I do not think leakage here is such a big deal.

Will squash all three hunks in.  Assuming no_fork first and updating conn
as we discover what kind of URL we have feels natural.

Thanks, both.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
  2011-05-16 19:57                           ` Johannes Sixt
  2011-05-16 23:12                             ` Junio C Hamano
@ 2011-05-17  5:54                             ` Jeff King
  2011-05-17 20:14                               ` Johannes Sixt
  1 sibling, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-17  5:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

On Mon, May 16, 2011 at 09:57:58PM +0200, Johannes Sixt wrote:

> > The code path for git://, on the other hand, always sets it
> > to no_fork. In the case of a direct TCP connection, this
> > makes sense; we have no child process. But in the case of a
> > proxy command (configured by core.gitproxy), we do have a
> > child process, but we throw away its pid, and therefore
> > ignore its return code.
> > 
> > Instead, let's keep that information in the proxy case, and
> > respect its return code, which can help catch some errors
> 
> This patch looks strikingly familiar. I had written an almost identical
> change more than 3 years ago and forgot about it, though the
> justification I noted in the commit was more to properly shutdown the
> proxy process rather than to abandon it and let it be collected by
> init(8). Your justification is much better.

Thanks, I had no idea your patch existed. I hate to duplicate work, but
at least it's a sanity check that it's not a totally stupid idea. ;)

> >  	const char *argv[4];
> > -	struct child_process proxy;
> > +	struct child_process *proxy;
> [...]
> At this point, proxy->argv would point to automatic storage; but we
> need argv[0] in finish_command() for error reporting.

Ick. Good catch.

> In my implementation, I xmalloced the pointer array and leaked it.
> (And that's probably the reason that I never submitted the patch.) I
> wouldn't dare to make argv just static because this limits us to have
> just one open connection at a time established via
> git_proxy_connect().  Dunno...

We also need to worry about the contents of each argv[] element, no? So
we should be xstrdup()ing the host and port, which point into some
string which gets passed to us. I didn't trace its provenance but I
think it is better to be defensive.

The leak is probably OK in a practical sense (you generally make no more
than one such connection per command), but it does seem ugly. I would
not be surprised if many other run-command invocations leak similarly.

The interdiff with the strdups is:

diff --git a/connect.c b/connect.c
index c678ceb..c24866e 100644
--- a/connect.c
+++ b/connect.c
@@ -398,14 +398,15 @@ static int git_use_proxy(const char *host)
 static struct child_process *git_proxy_connect(int fd[2], char *host)
 {
 	const char *port = STR(DEFAULT_GIT_PORT);
-	const char *argv[4];
+	const char **argv;
 	struct child_process *proxy;
 
 	get_host_and_port(&host, &port);
 
+	argv = xmalloc(4 * sizeof(*argv));
 	argv[0] = git_proxy_command;
-	argv[1] = host;
-	argv[2] = port;
+	argv[1] = xstrdup(host);
+	argv[2] = xstrdup(port);
 	argv[3] = NULL;
 	proxy = xcalloc(1, sizeof(*proxy));
 	proxy->argv = argv;

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects
  2011-05-16 20:02                           ` Johannes Sixt
@ 2011-05-17  5:56                             ` Jeff King
  2011-05-18 20:24                               ` [PATCH] Windows: add a wrapper for the shutdown() system call Johannes Sixt
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2011-05-17  5:56 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

On Mon, May 16, 2011 at 10:02:16PM +0200, Johannes Sixt wrote:

> > +			if (git_connection_is_socket(conn))
> > +				shutdown(fd[0], SHUT_WR);
> 
> We probably need a wrapper for shutdown() on Windows. I'll look into
> this tomorrow.

FWIW, we already make an identical call in transport-helper.c (I was
tempted even to use "1" instead of SHUT_WR for portability, but it seems
nobody has complained so far about the use in transport-helper).

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
  2011-05-17  5:54                             ` Jeff King
@ 2011-05-17 20:14                               ` Johannes Sixt
  2011-05-18  8:57                                 ` Jeff King
  0 siblings, 1 reply; 53+ messages in thread
From: Johannes Sixt @ 2011-05-17 20:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

Am 17.05.2011 07:54, schrieb Jeff King:
> On Mon, May 16, 2011 at 09:57:58PM +0200, Johannes Sixt wrote:
>> In my implementation, I xmalloced the pointer array and leaked it.

I noticed that it actually isn't leaked because finish_connect() frees
it. For this reason, I actually have to wonder why your version that
stored a pointer to automatic storage in ->argv worked.

> We also need to worry about the contents of each argv[] element, no? So
> we should be xstrdup()ing the host and port, which point into some
> string which gets passed to us. I didn't trace its provenance but I
> think it is better to be defensive.

I would not worry too much today. Of course, functions other than
start_command() might begin to access ->argv[i] with i > 0 later, but
then we have to audit all users of struct child_process anyway.
Currently, only start_command() uses these values, which is always
called at a time when they are still valid.

-- Hannes

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/3] connect: treat generic proxy processes like ssh processes
  2011-05-17 20:14                               ` Johannes Sixt
@ 2011-05-18  8:57                                 ` Jeff King
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2011-05-18  8:57 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

On Tue, May 17, 2011 at 10:14:43PM +0200, Johannes Sixt wrote:

> Am 17.05.2011 07:54, schrieb Jeff King:
> > On Mon, May 16, 2011 at 09:57:58PM +0200, Johannes Sixt wrote:
> >> In my implementation, I xmalloced the pointer array and leaked it.
> 
> I noticed that it actually isn't leaked because finish_connect() frees
> it. For this reason, I actually have to wonder why your version that
> stored a pointer to automatic storage in ->argv worked.

It probably didn't. As a simple refactoring, I didn't test the proxy
codepath, and apparently nothing in our test suite does, either. :(

We can put the patch below on top (it fails with my original series, but
passes with the bugfix you noticed).

> I would not worry too much today. Of course, functions other than
> start_command() might begin to access ->argv[i] with i > 0 later, but
> then we have to audit all users of struct child_process anyway.
> Currently, only start_command() uses these values, which is always
> called at a time when they are still valid.

That makes sense.

-- >8 --
Subject: [PATCH] test core.gitproxy configuration

This is just a basic sanity test to see whether
core.gitproxy works at all. Until now, we were not testing
anywhere.

Signed-off-by: Jeff King <peff@peff.net>
---
This is really basic. Apparently you can do horrible things like

  git config core.gitproxy "./proxy for kernel.org"
  git config core.gitproxy "./other-proxy for example.com"

I can make it more elaborate if we really want to care.

 t/t5532-fetch-proxy.sh |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 42 insertions(+), 0 deletions(-)
 create mode 100755 t/t5532-fetch-proxy.sh

diff --git a/t/t5532-fetch-proxy.sh b/t/t5532-fetch-proxy.sh
new file mode 100755
index 0000000..07119d9
--- /dev/null
+++ b/t/t5532-fetch-proxy.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='fetching via git:// using core.gitproxy'
+. ./test-lib.sh
+
+test_expect_success 'setup remote repo' '
+	git init remote &&
+	(cd remote &&
+	 echo content >file &&
+	 git add file &&
+	 git commit -m one
+	)
+'
+
+cat >proxy <<'EOF'
+#!/bin/sh
+echo >&2 "proxying for $*"
+cmd=`perl -e '
+	read(STDIN, $buf, 4);
+	my $n = hex($buf) - 4;
+	read(STDIN, $buf, $n);
+	my ($cmd, $other) = split /\0/, $buf;
+	# drop absolute-path on repo name
+	$cmd =~ s{ /}{ };
+	print $cmd;
+'`
+exec $cmd
+EOF
+chmod +x proxy
+test_expect_success 'setup local repo' '
+	git remote add fake git://example.com/remote &&
+	git config core.gitproxy ./proxy
+'
+
+test_expect_success 'fetch through proxy works' '
+	git fetch fake &&
+	echo one >expect &&
+	git log -1 --format=%s FETCH_HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
1.7.5.8.ga95ab

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH] Windows: add a wrapper for the shutdown() system call
  2011-05-17  5:56                             ` Jeff King
@ 2011-05-18 20:24                               ` Johannes Sixt
  0 siblings, 0 replies; 53+ messages in thread
From: Johannes Sixt @ 2011-05-18 20:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Johan Herland, Junio C Hamano, Shawn Pearce, git

Even though Windows's socket functions look like their POSIX counter parts,
they do not operate on file descriptors, but on "socket objects". To bring
the functions in line with POSIX, we have proxy functions that wrap and
unwrap the socket objects in file descriptors using open_osfhandle and
get_osfhandle. But shutdown() was not proxied, yet. Fix this.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Am 17.05.2011 07:56, schrieb Jeff King:
> On Mon, May 16, 2011 at 10:02:16PM +0200, Johannes Sixt wrote:
> 
>>> +			if (git_connection_is_socket(conn))
>>> +				shutdown(fd[0], SHUT_WR);
>>
>> We probably need a wrapper for shutdown() on Windows. I'll look into
>> this tomorrow.

We have a shutdown() on Windows, but it does not work out of the box.
Making it work is not a big deal, but it is an academic excercise if it
were only for this topic: On Windows, send-pack hangs when network
connections are involved for unknown reasons as long as side-band-64k
is enabled. If it is not enabled, the deadlock that you fixed does not
happen in the first place and it is irrelevant whether shutdown works.

> FWIW, we already make an identical call in transport-helper.c (I was
> tempted even to use "1" instead of SHUT_WR for portability, but it seems
> nobody has complained so far about the use in transport-helper).

Yes, and for the reasons mentioned above, this patch is intended for
the 1.7.4 series, where transport-helper.c went public. (But it should
apply cleanly to current master as well.) We do not have tests that
exercise the code in transport-helper.c, but I did test that this
shutdown implementation does something reasonable when it is merged
with your send-pack deadlock topic branch.

 compat/mingw.c |    7 +++++++
 compat/mingw.h |    3 +++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index bee6054..1cbc9e8 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1219,6 +1219,13 @@ int mingw_setsockopt(int sockfd, int lvl, int optname, void *optval, int optlen)
 	return setsockopt(s, lvl, optname, (const char*)optval, optlen);
 }
 
+#undef shutdown
+int mingw_shutdown(int sockfd, int how)
+{
+	SOCKET s = (SOCKET)_get_osfhandle(sockfd);
+	return shutdown(s, how);
+}
+
 #undef listen
 int mingw_listen(int sockfd, int backlog)
 {
diff --git a/compat/mingw.h b/compat/mingw.h
index 14211c6..3b20fa9 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -219,6 +219,9 @@ int mingw_bind(int sockfd, struct sockaddr *sa, size_t sz);
 int mingw_setsockopt(int sockfd, int lvl, int optname, void *optval, int optlen);
 #define setsockopt mingw_setsockopt
 
+int mingw_shutdown(int sockfd, int how);
+#define shutdown mingw_shutdown
+
 int mingw_listen(int sockfd, int backlog);
 #define listen mingw_listen
 
-- 
1.7.5.rc1.97.ge0653

^ permalink raw reply related	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2011-05-18 20:24 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-13 16:54 [PATCH 2/2] receive-pack: Add receive.denyObjectLimit to refuse push with too many objects Johan Herland
2011-05-13 17:09 ` Junio C Hamano
2011-05-14  1:43   ` Johan Herland
2011-05-14  2:03     ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit " Johan Herland
2011-05-14  2:30       ` Shawn Pearce
2011-05-14 13:17         ` Johan Herland
2011-05-14 22:17           ` Shawn Pearce
2011-05-15 17:42             ` Johan Herland
2011-05-15 21:37               ` [PATCHv3 0/9] Push limits Johan Herland
2011-05-15 21:37                 ` [PATCHv3 1/9] Update technical docs to reflect side-band-64k capability in receive-pack Johan Herland
2011-05-15 21:37                 ` [PATCHv3 2/9] send-pack: Attempt to retrieve remote status even if pack-objects fails Johan Herland
2011-05-16  4:07                   ` Jeff King
2011-05-16  6:13                     ` Jeff King
2011-05-16  6:39                       ` Jeff King
2011-05-16  6:46                         ` [PATCH 1/3] connect: treat generic proxy processes like ssh processes Jeff King
2011-05-16 19:57                           ` Johannes Sixt
2011-05-16 23:12                             ` Junio C Hamano
2011-05-17  5:54                             ` Jeff King
2011-05-17 20:14                               ` Johannes Sixt
2011-05-18  8:57                                 ` Jeff King
2011-05-16  6:52                         ` [PATCH 2/3] connect: let callers know if connection is a socket Jeff King
2011-05-16  6:52                         ` [PATCH 3/3] send-pack: avoid deadlock on git:// push with failed pack-objects Jeff King
2011-05-16 20:02                           ` Johannes Sixt
2011-05-17  5:56                             ` Jeff King
2011-05-18 20:24                               ` [PATCH] Windows: add a wrapper for the shutdown() system call Johannes Sixt
2011-05-15 21:37                 ` [PATCHv3 3/9] pack-objects: Allow --max-pack-size to be used together with --stdout Johan Herland
2011-05-15 22:06                   ` Shawn Pearce
2011-05-16  1:39                     ` Johan Herland
2011-05-16  6:12                   ` Junio C Hamano
2011-05-16  9:27                     ` Johan Herland
2011-05-15 21:37                 ` [PATCHv3 4/9] pack-objects: Teach new option --max-object-count, similar to --max-pack-size Johan Herland
2011-05-15 22:07                   ` Shawn Pearce
2011-05-15 22:31                     ` Johan Herland
2011-05-15 23:48                       ` Shawn Pearce
2011-05-16  6:25                     ` Junio C Hamano
2011-05-16  9:49                       ` Johan Herland
2011-05-15 21:37                 ` [PATCHv3 5/9] pack-objects: Teach new option --max-commit-count, limiting #commits in pack Johan Herland
2011-05-15 21:37                 ` [PATCHv3 6/9] receive-pack: Prepare for addition of the new 'limit-*' family of capabilities Johan Herland
2011-05-16  6:50                   ` Junio C Hamano
2011-05-16  9:53                     ` Johan Herland
2011-05-16 22:02                     ` Sverre Rabbelier
2011-05-16 22:07                       ` Junio C Hamano
2011-05-16 22:09                         ` Sverre Rabbelier
2011-05-16 22:12                           ` Junio C Hamano
2011-05-16 22:16                             ` Sverre Rabbelier
2011-05-15 21:37                 ` [PATCHv3 7/9] send-pack/receive-pack: Allow server to refuse pushes with too many objects Johan Herland
2011-05-15 21:37                 ` [PATCHv3 8/9] send-pack/receive-pack: Allow server to refuse pushing too large packs Johan Herland
2011-05-15 21:37                 ` [PATCHv3 9/9] send-pack/receive-pack: Allow server to refuse pushes with too many commits Johan Herland
2011-05-15 21:52                 ` [PATCHv3 0/9] Push limits Ævar Arnfjörð Bjarmason
2011-05-14 17:50         ` [PATCHv2 2/2] receive-pack: Add receive.objectCountLimit to refuse push with too many objects Junio C Hamano
2011-05-14 22:27           ` Shawn Pearce
2011-05-13 18:20 ` [PATCH 2/2] receive-pack: Add receive.denyObjectLimit " Johannes Sixt
2011-05-14  1:49   ` Johan Herland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.