All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] repack: call prune_packed_objects() and update_server_info() directly
@ 2014-09-13  7:28 René Scharfe
  2014-09-13  8:59 ` Stefan Beller
  2014-09-13 20:15 ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: René Scharfe @ 2014-09-13  7:28 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Junio C Hamano

Call the functions behind git prune-packed and git update-server-info
directly instead of using run_command().  This is shorter, easier and
quicker.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 builtin/repack.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index fc088db..2aae05d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -377,6 +377,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	/* End of pack replacement. */
 
 	if (delete_redundant) {
+		int opts = 0;
 		sort_string_list(&names);
 		for_each_string_list_item(item, &existing_packs) {
 			char *sha1;
@@ -387,25 +388,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 			if (!string_list_has_string(&names, sha1))
 				remove_redundant_pack(packdir, item->string);
 		}
-		argv_array_push(&cmd_args, "prune-packed");
-		if (quiet)
-			argv_array_push(&cmd_args, "--quiet");
-
-		memset(&cmd, 0, sizeof(cmd));
-		cmd.argv = cmd_args.argv;
-		cmd.git_cmd = 1;
-		run_command(&cmd);
-		argv_array_clear(&cmd_args);
+		if (!quiet && isatty(2))
+			opts |= PRUNE_PACKED_VERBOSE;
+		prune_packed_objects(opts);
 	}
 
-	if (!no_update_server_info) {
-		argv_array_push(&cmd_args, "update-server-info");
-		memset(&cmd, 0, sizeof(cmd));
-		cmd.argv = cmd_args.argv;
-		cmd.git_cmd = 1;
-		run_command(&cmd);
-		argv_array_clear(&cmd_args);
-	}
+	if (!no_update_server_info)
+		update_server_info(0);
 	remove_temporary_files();
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: call prune_packed_objects() and update_server_info() directly
  2014-09-13  7:28 [PATCH] repack: call prune_packed_objects() and update_server_info() directly René Scharfe
@ 2014-09-13  8:59 ` Stefan Beller
  2014-09-13 20:15 ` Jeff King
  1 sibling, 0 replies; 9+ messages in thread
From: Stefan Beller @ 2014-09-13  8:59 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List; +Cc: Junio C Hamano

On 13.09.2014 09:28, René Scharfe wrote:
> Call the functions behind git prune-packed and git update-server-info
> directly instead of using run_command().  This is shorter, easier and
> quicker.
> 
> Signed-off-by: Rene Scharfe <l.s.r@web.de>

Thanks for cleaning up the literal rewrite of the shell script
and making it look more like a C program.

Reviewed-by: Stefan Beller <stefanbeller@gmail.com>

> ---
>  builtin/repack.c | 23 ++++++-----------------
>  1 file changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/builtin/repack.c b/builtin/repack.c
> index fc088db..2aae05d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -377,6 +377,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	/* End of pack replacement. */
>  
>  	if (delete_redundant) {
> +		int opts = 0;
>  		sort_string_list(&names);
>  		for_each_string_list_item(item, &existing_packs) {
>  			char *sha1;
> @@ -387,25 +388,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  			if (!string_list_has_string(&names, sha1))
>  				remove_redundant_pack(packdir, item->string);
>  		}
> -		argv_array_push(&cmd_args, "prune-packed");
> -		if (quiet)
> -			argv_array_push(&cmd_args, "--quiet");
> -
> -		memset(&cmd, 0, sizeof(cmd));
> -		cmd.argv = cmd_args.argv;
> -		cmd.git_cmd = 1;
> -		run_command(&cmd);
> -		argv_array_clear(&cmd_args);
> +		if (!quiet && isatty(2))
> +			opts |= PRUNE_PACKED_VERBOSE;
> +		prune_packed_objects(opts);
>  	}
>  
> -	if (!no_update_server_info) {
> -		argv_array_push(&cmd_args, "update-server-info");
> -		memset(&cmd, 0, sizeof(cmd));
> -		cmd.argv = cmd_args.argv;
> -		cmd.git_cmd = 1;
> -		run_command(&cmd);
> -		argv_array_clear(&cmd_args);
> -	}
> +	if (!no_update_server_info)
> +		update_server_info(0);
>  	remove_temporary_files();
>  	string_list_clear(&names, 0);
>  	string_list_clear(&rollback, 0);
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] repack: call prune_packed_objects() and update_server_info() directly
  2014-09-13  7:28 [PATCH] repack: call prune_packed_objects() and update_server_info() directly René Scharfe
  2014-09-13  8:59 ` Stefan Beller
@ 2014-09-13 20:15 ` Jeff King
  2014-09-13 20:16   ` [PATCH 1/3] prune-packed: fix minor memory leak Jeff King
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Jeff King @ 2014-09-13 20:15 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Stefan Beller, Junio C Hamano

On Sat, Sep 13, 2014 at 09:28:01AM +0200, René Scharfe wrote:

> Call the functions behind git prune-packed and git update-server-info
> directly instead of using run_command().  This is shorter, easier and
> quicker.

It can also introduce bugs, since a lot of git code assumes it is
running in a single process and can die() or mark up global variables at
will. :)

I gave a quick read-through of the code and I think these calls are OK.
The two things I noticed were:

  1. We might die on a malloc failure that would otherwise go unnoticed
     in a sub-process. That's probably OK.

  2. The info/packs file is generated from our internal packed_git list.
     This list can get crufty if you have a long-running process that
     accesses objects and other processes are repacking. I think that's
     OK here; the parent repack process is not very long-lived.

I did, however, notice that the code we are calling has some problems of
its own. :) Here are some fixes:

  [1/3]: prune-packed: fix minor memory leak
  [2/3]: make update-server-info more robust
  [3/3]: server-info: clean up after writing info/packs

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] prune-packed: fix minor memory leak
  2014-09-13 20:15 ` Jeff King
@ 2014-09-13 20:16   ` Jeff King
  2014-09-13 20:19   ` [PATCH 2/3] make update-server-info more robust Jeff King
  2014-09-13 20:19   ` [PATCH 3/3] server-info: clean up after writing info/packs Jeff King
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2014-09-13 20:16 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Stefan Beller, Junio C Hamano

We form all of our directories in a strbuf, but never release it.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/prune-packed.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index 6879468..d430731 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -68,6 +68,7 @@ void prune_packed_objects(int opts)
 		rmdir(pathname.buf);
 	}
 	stop_progress(&progress);
+	strbuf_release(&pathname);
 }
 
 int cmd_prune_packed(int argc, const char **argv, const char *prefix)
-- 
2.1.0.373.g91ca799

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] make update-server-info more robust
  2014-09-13 20:15 ` Jeff King
  2014-09-13 20:16   ` [PATCH 1/3] prune-packed: fix minor memory leak Jeff King
@ 2014-09-13 20:19   ` Jeff King
  2014-09-14 17:38     ` René Scharfe
  2014-09-15 18:39     ` Junio C Hamano
  2014-09-13 20:19   ` [PATCH 3/3] server-info: clean up after writing info/packs Jeff King
  2 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2014-09-13 20:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Stefan Beller, Junio C Hamano

Since "git update-server-info" may be called automatically
as part of a push or a "gc --auto", we should be robust
against two processes trying to update it simultaneously.
However, we currently use a fixed tempfile, which means that
two simultaneous writers may step on each other's toes and
end up renaming junk into place.

Let's instead switch to using a unique tempfile via mkstemp.
We do not want to use a lockfile here, because it's OK for
two writers to simultaneously update (one will "win" the
rename race, but that's OK; they should be writing the same
information).

While we're there, let's clean up a few other things:

  1. Detect write errors. Report them and abort the update
     if any are found.

  2. Free path memory rather than leaking it (and clean up
     the tempfile when necessary).

  3. Use the pathdup functions consistently rather than
     static buffers or manually calculated lengths.

This last one fixes a potential overflow of "infofile" in
update_info_packs (e.g., by putting large junk into
$GIT_OBJECT_DIRECTORY). However, this overflow was probably
not an interesting attack vector for two reasons:

  a. The attacker would need to control the environment to
     do this, in which case it was already game-over.

  b. During its setup phase, git checks that the directory
     actually exists, which means it is probably shorter
     than PATH_MAX anyway.

Because both update_info_refs and update_info_packs share
these same failings (and largely duplicate each other), this
patch factors out the improved error-checking version into a
helper function.

Signed-off-by: Jeff King <peff@peff.net>
---
I guess point (b) may not apply on systems that have a really small
PATH_MAX that does not reflect what you can actually create in the
filesystem (Windows?). But I think point (a) still applies, so this is
not really a big deal security-wise (though it is certainly a bugfix for
such systems).

 server-info.c | 116 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 71 insertions(+), 45 deletions(-)

diff --git a/server-info.c b/server-info.c
index 9ec744e..d54a3d6 100644
--- a/server-info.c
+++ b/server-info.c
@@ -4,45 +4,80 @@
 #include "commit.h"
 #include "tag.h"
 
-/* refs */
-static FILE *info_ref_fp;
+/*
+ * Create the file "path" by writing to a temporary file and renaming
+ * it into place. The contents of the file come from "generate", which
+ * should return non-zero if it encounters an error.
+ */
+static int update_info_file(char *path, int (*generate)(FILE *))
+{
+	char *tmp = mkpathdup("%s_XXXXXX", path);
+	int ret = -1;
+	int fd = -1;
+	FILE *fp = NULL;
+
+	safe_create_leading_directories(path);
+	fd = mkstemp(tmp);
+	if (fd < 0)
+		goto out;
+	fp = fdopen(fd, "w");
+	if (!fp)
+		goto out;
+	ret = generate(fp);
+	if (ret)
+		goto out;
+	if (fclose(fp))
+		goto out;
+	if (adjust_shared_perm(tmp) < 0)
+		goto out;
+	if (rename(tmp, path) < 0)
+		goto out;
+	ret = 0;
+
+out:
+	if (ret) {
+		error("unable to update %s: %s", path, strerror(errno));
+		if (fp)
+			fclose(fp);
+		else if (fd >= 0)
+			close(fd);
+		unlink(tmp);
+	}
+	free(tmp);
+	return ret;
+}
 
 static int add_info_ref(const char *path, const unsigned char *sha1, int flag, void *cb_data)
 {
+	FILE *fp = cb_data;
 	struct object *o = parse_object(sha1);
 	if (!o)
 		return -1;
 
-	fprintf(info_ref_fp, "%s	%s\n", sha1_to_hex(sha1), path);
+	if (fprintf(fp, "%s	%s\n", sha1_to_hex(sha1), path) < 0)
+		return -1;
+
 	if (o->type == OBJ_TAG) {
 		o = deref_tag(o, path, 0);
 		if (o)
-			fprintf(info_ref_fp, "%s	%s^{}\n",
-				sha1_to_hex(o->sha1), path);
+			if (fprintf(fp, "%s	%s^{}\n",
+				sha1_to_hex(o->sha1), path) < 0)
+				return -1;
 	}
 	return 0;
 }
 
+static int generate_info_refs(FILE *fp)
+{
+	return for_each_ref(add_info_ref, fp);
+}
+
 static int update_info_refs(int force)
 {
-	char *path0 = git_pathdup("info/refs");
-	int len = strlen(path0);
-	char *path1 = xmalloc(len + 2);
-
-	strcpy(path1, path0);
-	strcpy(path1 + len, "+");
-
-	safe_create_leading_directories(path0);
-	info_ref_fp = fopen(path1, "w");
-	if (!info_ref_fp)
-		return error("unable to update %s", path1);
-	for_each_ref(add_info_ref, NULL);
-	fclose(info_ref_fp);
-	adjust_shared_perm(path1);
-	rename(path1, path0);
-	free(path0);
-	free(path1);
-	return 0;
+	char *path = git_pathdup("info/refs");
+	int ret = update_info_file(path, generate_info_refs);
+	free(path);
+	return ret;
 }
 
 /* packs */
@@ -198,36 +233,27 @@ static void init_pack_info(const char *infofile, int force)
 		info[i]->new_num = i;
 }
 
-static void write_pack_info_file(FILE *fp)
+static int write_pack_info_file(FILE *fp)
 {
 	int i;
-	for (i = 0; i < num_pack; i++)
-		fprintf(fp, "P %s\n", info[i]->p->pack_name + objdirlen + 6);
-	fputc('\n', fp);
+	for (i = 0; i < num_pack; i++) {
+		if (fprintf(fp, "P %s\n", info[i]->p->pack_name + objdirlen + 6) < 0)
+			return -1;
+	}
+	if (fputc('\n', fp) == EOF)
+		return -1;
+	return 0;
 }
 
 static int update_info_packs(int force)
 {
-	char infofile[PATH_MAX];
-	char name[PATH_MAX];
-	int namelen;
-	FILE *fp;
-
-	namelen = sprintf(infofile, "%s/info/packs", get_object_directory());
-	strcpy(name, infofile);
-	strcpy(name + namelen, "+");
+	char *infofile = mkpathdup("%s/info/packs", get_object_directory());
+	int ret;
 
 	init_pack_info(infofile, force);
-
-	safe_create_leading_directories(name);
-	fp = fopen(name, "w");
-	if (!fp)
-		return error("cannot open %s", name);
-	write_pack_info_file(fp);
-	fclose(fp);
-	adjust_shared_perm(name);
-	rename(name, infofile);
-	return 0;
+	ret = update_info_file(infofile, write_pack_info_file);
+	free(infofile);
+	return ret;
 }
 
 /* public */
-- 
2.1.0.373.g91ca799

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] server-info: clean up after writing info/packs
  2014-09-13 20:15 ` Jeff King
  2014-09-13 20:16   ` [PATCH 1/3] prune-packed: fix minor memory leak Jeff King
  2014-09-13 20:19   ` [PATCH 2/3] make update-server-info more robust Jeff King
@ 2014-09-13 20:19   ` Jeff King
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2014-09-13 20:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Stefan Beller, Junio C Hamano

We allocate pack information in a static global list but
never clean it up. This leaks memory, and means that calling
update_server_info twice will generate a buggy file (it will
have duplicate entries).

Signed-off-by: Jeff King <peff@peff.net>
---
 server-info.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/server-info.c b/server-info.c
index d54a3d6..31f4a74 100644
--- a/server-info.c
+++ b/server-info.c
@@ -233,6 +233,14 @@ static void init_pack_info(const char *infofile, int force)
 		info[i]->new_num = i;
 }
 
+static void free_pack_info(void)
+{
+	int i;
+	for (i = 0; i < num_pack; i++)
+		free(info[i]);
+	free(info);
+}
+
 static int write_pack_info_file(FILE *fp)
 {
 	int i;
@@ -252,6 +260,7 @@ static int update_info_packs(int force)
 
 	init_pack_info(infofile, force);
 	ret = update_info_file(infofile, write_pack_info_file);
+	free_pack_info();
 	free(infofile);
 	return ret;
 }
-- 
2.1.0.373.g91ca799

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] make update-server-info more robust
  2014-09-13 20:19   ` [PATCH 2/3] make update-server-info more robust Jeff King
@ 2014-09-14 17:38     ` René Scharfe
  2014-09-15 18:39     ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: René Scharfe @ 2014-09-14 17:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Stefan Beller, Junio C Hamano

Am 13.09.2014 um 22:19 schrieb Jeff King:
> Since "git update-server-info" may be called automatically
> as part of a push or a "gc --auto", we should be robust
> against two processes trying to update it simultaneously.
> However, we currently use a fixed tempfile, which means that
> two simultaneous writers may step on each other's toes and
> end up renaming junk into place.
> 
> Let's instead switch to using a unique tempfile via mkstemp.
> We do not want to use a lockfile here, because it's OK for
> two writers to simultaneously update (one will "win" the
> rename race, but that's OK; they should be writing the same
> information).
> 
> While we're there, let's clean up a few other things:
> 
>    1. Detect write errors. Report them and abort the update
>       if any are found.
> 
>    2. Free path memory rather than leaking it (and clean up
>       the tempfile when necessary).
> 
>    3. Use the pathdup functions consistently rather than
>       static buffers or manually calculated lengths.
> 
> This last one fixes a potential overflow of "infofile" in
> update_info_packs (e.g., by putting large junk into
> $GIT_OBJECT_DIRECTORY). However, this overflow was probably
> not an interesting attack vector for two reasons:
> 
>    a. The attacker would need to control the environment to
>       do this, in which case it was already game-over.
> 
>    b. During its setup phase, git checks that the directory
>       actually exists, which means it is probably shorter
>       than PATH_MAX anyway.
> 
> Because both update_info_refs and update_info_packs share
> these same failings (and largely duplicate each other), this
> patch factors out the improved error-checking version into a
> helper function.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I guess point (b) may not apply on systems that have a really small
> PATH_MAX that does not reflect what you can actually create in the
> filesystem (Windows?).

It's the other way around: PATH_MAX is an actual limit basically only
on Windows [1] unless you avoid using the Windows API [2].

Regardless of the security implications, getting rid of more PATH_MAX
buffers is a good move.

And I looked only briefly at your patch, but I like the three bullet
points above. :)

René


[1] http://insanecoding.blogspot.de/2007/11/pathmax-simply-isnt.html
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] make update-server-info more robust
  2014-09-13 20:19   ` [PATCH 2/3] make update-server-info more robust Jeff King
  2014-09-14 17:38     ` René Scharfe
@ 2014-09-15 18:39     ` Junio C Hamano
  2014-09-15 23:56       ` Jeff King
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2014-09-15 18:39 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git Mailing List, Stefan Beller

Jeff King <peff@peff.net> writes:

> Since "git update-server-info" may be called automatically
> as part of a push or a "gc --auto", we should be robust
> against two processes trying to update it simultaneously.
> However, we currently use a fixed tempfile, which means that
> two simultaneous writers may step on each other's toes and
> end up renaming junk into place.

Thanks.  I'll queue these clean-ups but we may want to start
thinking about deprecating and removing the dumb http support.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] make update-server-info more robust
  2014-09-15 18:39     ` Junio C Hamano
@ 2014-09-15 23:56       ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2014-09-15 23:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Git Mailing List, Stefan Beller

On Mon, Sep 15, 2014 at 11:39:12AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Since "git update-server-info" may be called automatically
> > as part of a push or a "gc --auto", we should be robust
> > against two processes trying to update it simultaneously.
> > However, we currently use a fixed tempfile, which means that
> > two simultaneous writers may step on each other's toes and
> > end up renaming junk into place.
> 
> Thanks.  I'll queue these clean-ups but we may want to start
> thinking about deprecating and removing the dumb http support.

Yeah, I have often thought about that (especially the push support,
which has always been flaky and underused). However, some possible
schemes for resumable clone could be easily implemented by shunting the
cloner to a dumb-http conversation. So it may be worth keeping at least
the fetch side around for the time being. Food for thought.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-15 23:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-13  7:28 [PATCH] repack: call prune_packed_objects() and update_server_info() directly René Scharfe
2014-09-13  8:59 ` Stefan Beller
2014-09-13 20:15 ` Jeff King
2014-09-13 20:16   ` [PATCH 1/3] prune-packed: fix minor memory leak Jeff King
2014-09-13 20:19   ` [PATCH 2/3] make update-server-info more robust Jeff King
2014-09-14 17:38     ` René Scharfe
2014-09-15 18:39     ` Junio C Hamano
2014-09-15 23:56       ` Jeff King
2014-09-13 20:19   ` [PATCH 3/3] server-info: clean up after writing info/packs Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.