All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong()
@ 2014-08-24 16:07 Steffen Prohaska
  2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-24 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, pclouds, john, schacon, Steffen Prohaska

Changes since v4: use git_parse_ulong() to parse env vars.

Steffen Prohaska (4):
  convert: Refactor would_convert_to_git() to single arg 'path'
  Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  convert: Stream from fd to required clean filter instead of mmap

 convert.c             | 60 +++++++++++++++++++++++++++++++++++++++++++++------
 convert.h             | 10 ++++++---
 sha1_file.c           | 50 +++++++++++++++++++++++++++++++++++++++---
 t/t0021-conversion.sh | 24 ++++++++++++++++-----
 t/t1050-large.sh      |  2 +-
 wrapper.c             | 16 ++++++++------
 6 files changed, 138 insertions(+), 24 deletions(-)

-- 
2.1.0.8.gd3b6067

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path'
  2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
@ 2014-08-24 16:07 ` Steffen Prohaska
  2014-08-25 22:55   ` Junio C Hamano
  2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-24 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, pclouds, john, schacon, Steffen Prohaska

It is only the path that matters in the decision whether to filter or
not.  Clarify this by making path the single argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 convert.h   | 5 ++---
 sha1_file.c | 2 +-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/convert.h b/convert.h
index 0c2143c..c638b33 100644
--- a/convert.h
+++ b/convert.h
@@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
-static inline int would_convert_to_git(const char *path, const char *src,
-				       size_t len, enum safe_crlf checksafe)
+static inline int would_convert_to_git(const char *path)
 {
-	return convert_to_git(path, src, len, NULL, checksafe);
+	return convert_to_git(path, NULL, 0, NULL, 0);
 }
 
 /*****************************************************************
diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..00c07f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path, NULL, 0, 0)))
+		 (path && would_convert_to_git(path)))
 		ret = index_core(sha1, fd, size, type, path, flags);
 	else
 		ret = index_stream(sha1, fd, size, type, path, flags);
-- 
2.1.0.8.gd3b6067

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
  2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
@ 2014-08-24 16:07 ` Steffen Prohaska
  2014-08-25 11:38   ` Jeff King
  2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
  2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
  3 siblings, 1 reply; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-24 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, pclouds, john, schacon, Steffen Prohaska

GIT_ALLOC_LIMIT limits xmalloc()'s size, which is of type size_t.
Better use git_parse_ulong() to parse the environment variable, so that
the postfixes 'k', 'm', and 'g' can be used; and use size_t to store the
limit for consistency.  The change to size_t has no direct practical
impact, because we use GIT_ALLOC_LIMIT to test small sizes.

The cast of size in the call to die() is changed to uintmax_t to match
the format string PRIuMAX.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t1050-large.sh |  2 +-
 wrapper.c        | 16 ++++++++++------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..e7657ab 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -13,7 +13,7 @@ test_expect_success setup '
 	echo X | dd of=large2 bs=1k seek=2000 &&
 	echo X | dd of=large3 bs=1k seek=2000 &&
 	echo Y | dd of=huge bs=1k seek=2500 &&
-	GIT_ALLOC_LIMIT=1500 &&
+	GIT_ALLOC_LIMIT=1500k &&
 	export GIT_ALLOC_LIMIT
 '
 
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..69d1c9b 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = do_nothing;
 
 static void memory_limit_check(size_t size)
 {
-	static int limit = -1;
-	if (limit == -1) {
-		const char *env = getenv("GIT_ALLOC_LIMIT");
-		limit = env ? atoi(env) * 1024 : 0;
+	static size_t limit = SIZE_MAX;
+	if (limit == SIZE_MAX) {
+		const char *var = "GIT_ALLOC_LIMIT";
+		unsigned long val = 0;
+		const char *env = getenv(var);
+		if (env && !git_parse_ulong(env, &val))
+			die("Failed to parse %s", var);
+		limit = val;
 	}
 	if (limit && size > limit)
-		die("attempting to allocate %"PRIuMAX" over limit %d",
-		    (intmax_t)size, limit);
+		die("attempting to allocate %"PRIuMAX" over limit %"PRIuMAX,
+		    (uintmax_t)size, (uintmax_t)limit);
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
-- 
2.1.0.8.gd3b6067

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
  2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
  2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
@ 2014-08-24 16:07 ` Steffen Prohaska
  2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
  3 siblings, 0 replies; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-24 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, pclouds, john, schacon, Steffen Prohaska

Similar to testing expectations about malloc with GIT_ALLOC_LIMIT (see
commit d41489 and previous commit), it can be useful to test
expectations about mmap.

This introduces a new environment variable GIT_MMAP_LIMIT to limit the
largest allowed mmap length.  xmmap() is modified to check the limit.
Together with GIT_ALLOC_LIMIT tests can now easily confirm expectations
about memory consumption.

GIT_MMAP_LIMIT will be used in the next commit to test that data will be
streamed to an external filter without mmaping the entire file.

[commit d41489]: d41489a6424308dc9a0409bc2f6845aa08bd4f7d Add more large
    blob test cases

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 sha1_file.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 00c07f2..3204f66 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -663,10 +663,29 @@ void release_pack_memory(size_t need)
 		; /* nothing */
 }
 
+static void mmap_limit_check(size_t length)
+{
+	static size_t limit = SIZE_MAX;
+	if (limit == SIZE_MAX) {
+		const char *var = "GIT_MMAP_LIMIT";
+		unsigned long val = 0;
+		const char *env = getenv(var);
+		if (env && !git_parse_ulong(env, &val))
+			die("Failed to parse %s", var);
+		limit = val;
+	}
+	if (limit && length > limit)
+		die("attempting to mmap %"PRIuMAX" over limit %"PRIuMAX,
+		    (uintmax_t)length, (uintmax_t)limit);
+}
+
 void *xmmap(void *start, size_t length,
 	int prot, int flags, int fd, off_t offset)
 {
-	void *ret = mmap(start, length, prot, flags, fd, offset);
+	void *ret;
+
+	mmap_limit_check(length);
+	ret = mmap(start, length, prot, flags, fd, offset);
 	if (ret == MAP_FAILED) {
 		if (!length)
 			return NULL;
-- 
2.1.0.8.gd3b6067

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
                   ` (2 preceding siblings ...)
  2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
@ 2014-08-24 16:07 ` Steffen Prohaska
  2014-08-25 12:43   ` Jeff King
  3 siblings, 1 reply; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-24 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, pclouds, john, schacon, Steffen Prohaska

The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  The original data is not available
to git, so it must fail if the filter fails.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
the previous commit is used to test that the expected code path is
taken.  A related test that exercises required filters is modified to
verify that the data actually has been modified on its way from the file
system to the object store.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 convert.c             | 60 +++++++++++++++++++++++++++++++++++++++++++++------
 convert.h             |  5 +++++
 sha1_file.c           | 27 ++++++++++++++++++++++-
 t/t0021-conversion.sh | 24 ++++++++++++++++-----
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/convert.c b/convert.c
index cb5fbb4..463f6de 100644
--- a/convert.c
+++ b/convert.c
@@ -312,11 +312,12 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 struct filter_params {
 	const char *src;
 	unsigned long size;
+	int fd;
 	const char *cmd;
 	const char *path;
 };
 
-static int filter_buffer(int in, int out, void *data)
+static int filter_buffer_or_fd(int in, int out, void *data)
 {
 	/*
 	 * Spawn cmd and feed the buffer contents through its stdin.
@@ -325,6 +326,7 @@ static int filter_buffer(int in, int out, void *data)
 	struct filter_params *params = (struct filter_params *)data;
 	int write_err, status;
 	const char *argv[] = { NULL, NULL };
+	int fd;
 
 	/* apply % substitution to cmd */
 	struct strbuf cmd = STRBUF_INIT;
@@ -355,7 +357,17 @@ static int filter_buffer(int in, int out, void *data)
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
-	write_err = (write_in_full(child_process.in, params->src, params->size) < 0);
+	if (params->src) {
+	    write_err = (write_in_full(child_process.in, params->src, params->size) < 0);
+	} else {
+	    /* dup(), because copy_fd() closes the input fd. */
+	    fd = dup(params->fd);
+	    if (fd < 0)
+		write_err = error("failed to dup file descriptor.");
+	    else
+		write_err = copy_fd(fd, child_process.in);
+	}
+
 	if (close(child_process.in))
 		write_err = 1;
 	if (write_err)
@@ -371,7 +383,7 @@ static int filter_buffer(int in, int out, void *data)
 	return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len,
+static int apply_filter(const char *path, const char *src, size_t len, int fd,
                         struct strbuf *dst, const char *cmd)
 {
 	/*
@@ -392,11 +404,12 @@ static int apply_filter(const char *path, const char *src, size_t len,
 		return 1;
 
 	memset(&async, 0, sizeof(async));
-	async.proc = filter_buffer;
+	async.proc = filter_buffer_or_fd;
 	async.data = &params;
 	async.out = -1;
 	params.src = src;
 	params.size = len;
+	params.fd = fd;
 	params.cmd = cmd;
 	params.path = path;
 
@@ -747,6 +760,24 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
 	}
 }
 
+int would_convert_to_git_filter_fd(const char *path)
+{
+	struct conv_attrs ca;
+
+	convert_attrs(&ca, path);
+	if (!ca.drv)
+	    return 0;
+
+	/* Apply a filter to an fd only if the filter is required to succeed.
+	 * We must die if the filter fails, because the original data before
+	 * filtering is not available.
+	 */
+	if (!ca.drv->required)
+	    return 0;
+
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
@@ -761,7 +792,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		required = ca.drv->required;
 	}
 
-	ret |= apply_filter(path, src, len, dst, filter);
+	ret |= apply_filter(path, src, len, -1, dst, filter);
 	if (!ret && required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -778,6 +809,23 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
+void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+			      enum safe_crlf checksafe)
+{
+	struct conv_attrs ca;
+	convert_attrs(&ca, path);
+
+	assert(ca.drv);
+	assert(ca.drv->clean);
+
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
+		die("%s: clean filter '%s' failed", path, ca.drv->name);
+
+	ca.crlf_action = input_crlf_action(ca.crlf_action, ca.eol_attr);
+	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
+}
+
 static int convert_to_working_tree_internal(const char *path, const char *src,
 					    size_t len, struct strbuf *dst,
 					    int normalizing)
@@ -811,7 +859,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, dst, filter);
+	ret_filter = apply_filter(path, src, len, -1, dst, filter);
 	if (!ret_filter && required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
diff --git a/convert.h b/convert.h
index c638b33..d9d853c 100644
--- a/convert.h
+++ b/convert.h
@@ -44,6 +44,11 @@ static inline int would_convert_to_git(const char *path)
 {
 	return convert_to_git(path, NULL, 0, NULL, 0);
 }
+/* Precondition: would_convert_to_git_filter_fd(path) == true */
+extern void convert_to_git_filter_fd(const char *path, int fd,
+				     struct strbuf *dst,
+				     enum safe_crlf checksafe);
+extern int would_convert_to_git_filter_fd(const char *path);
 
 /*****************************************************************
  *
diff --git a/sha1_file.c b/sha1_file.c
index 3204f66..386652f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3093,6 +3093,29 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	return ret;
 }
 
+static int index_stream_convert_blob(unsigned char *sha1, int fd,
+				     const char *path, unsigned flags)
+{
+	int ret;
+	const int write_object = flags & HASH_WRITE_OBJECT;
+	struct strbuf sbuf = STRBUF_INIT;
+
+	assert(path);
+	assert(would_convert_to_git_filter_fd(path));
+
+	convert_to_git_filter_fd(path, fd, &sbuf,
+				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
+
+	if (write_object)
+		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+				      sha1);
+	else
+		ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+				     sha1);
+	strbuf_release(&sbuf);
+	return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
 		      const char *path, unsigned flags)
 {
@@ -3160,7 +3183,9 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	int ret;
 	size_t size = xsize_t(st->st_size);
 
-	if (!S_ISREG(st->st_mode))
+	if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(path))
+		ret = index_stream_convert_blob(sha1, fd, path, flags);
+	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (size <= big_file_threshold || type != OBJ_BLOB ||
 		 (path && would_convert_to_git(path)))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index f890c54..82ebbff 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -153,17 +153,23 @@ test_expect_success 'filter shell-escaped filenames' '
 	:
 '
 
-test_expect_success 'required filter success' '
-	git config filter.required.smudge cat &&
-	git config filter.required.clean cat &&
+test_expect_success 'required filter should filter data' '
+	git config filter.required.smudge ./rot13.sh &&
+	git config filter.required.clean ./rot13.sh &&
 	git config filter.required.required true &&
 
 	echo "*.r filter=required" >.gitattributes &&
 
-	echo test >test.r &&
+	cat test.o >test.r &&
 	git add test.r &&
+
 	rm -f test.r &&
-	git checkout -- test.r
+	git checkout -- test.r &&
+	cmp test.o test.r &&
+
+	./rot13.sh <test.o >expected &&
+	git cat-file blob :test.r >actual &&
+	cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
@@ -189,6 +195,14 @@ test_expect_success 'required filter clean failure' '
 	echo test >test.fc &&
 	test_must_fail git add test.fc
 '
+test_expect_success \
+'filtering large input to small output should use little memory' '
+	git config filter.devnull.clean "cat >/dev/null" &&
+	git config filter.devnull.required true &&
+	for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
+	echo "30MB filter=devnull" >.gitattributes &&
+	GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
+'
 
 test_expect_success EXPENSIVE 'filter large file' '
 	git config filter.largefile.smudge cat &&
-- 
2.1.0.8.gd3b6067

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
@ 2014-08-25 11:38   ` Jeff King
  2014-08-25 15:06     ` Steffen Prohaska
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2014-08-25 11:38 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git, pclouds, john, schacon

On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote:

> diff --git a/wrapper.c b/wrapper.c
> index bc1bfb8..69d1c9b 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = do_nothing;
>  
>  static void memory_limit_check(size_t size)
>  {
> -	static int limit = -1;
> -	if (limit == -1) {
> -		const char *env = getenv("GIT_ALLOC_LIMIT");
> -		limit = env ? atoi(env) * 1024 : 0;
> +	static size_t limit = SIZE_MAX;
> +	if (limit == SIZE_MAX) {

You use SIZE_MAX as the sentinel for "not set", and 0 as the sentinel
for "no limit". That seems kind of backwards.

I guess you are inheriting this from the existing code, which lets
GIT_ALLOC_LIMIT=0 mean "no limit". I'm not sure if we want to keep that
or not (it would be backwards incompatible to change it, but we are
already breaking compatibility here by assuming bytes rather than
kilobytes; I think that's OK because this is not a documented feature,
or one intended to be used externally).

> +		const char *var = "GIT_ALLOC_LIMIT";
> +		unsigned long val = 0;
> +		const char *env = getenv(var);
> +		if (env && !git_parse_ulong(env, &val))
> +			die("Failed to parse %s", var);
> +		limit = val;
>  	}

This and the next patch both look OK to me, but I notice this part is
largely duplicated between the two. We already have git_env_bool to do a
similar thing for boolean environment variables. Should we do something
similar like:

diff --git a/config.c b/config.c
index 058505c..11919eb 100644
--- a/config.c
+++ b/config.c
@@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def)
 	return v ? git_config_bool(k, v) : def;
 }
 
+unsigned long git_env_ulong(const char *k, unsigned long val)
+{
+	const char *v = getenv(k);
+	if (v && !git_parse_ulong(k, &val))
+		die("failed to parse %s", k);
+	return val;
+}
+
 int git_config_system(void)
 {
 	return !git_env_bool("GIT_CONFIG_NOSYSTEM", 0);

It's not a lot of code, but I think the callers end up being much easier
to read:

  if (limit == SIZE_MAX)
	limit = git_env_ulong("GIT_ALLOC_LIMIT", 0);

>  	if (limit && size > limit)
> -		die("attempting to allocate %"PRIuMAX" over limit %d",
> -		    (intmax_t)size, limit);
> +		die("attempting to allocate %"PRIuMAX" over limit %"PRIuMAX,
> +		    (uintmax_t)size, (uintmax_t)limit);

This part is duplicated, too, though I do not know if the infrastructure
to avoid that is worth the trouble. Unless you wanted to do a whole:

  check_limit(&limit, "GIT_ALLOC_LIMIT", size);

or something, but I am also not convinced that is not just obfuscating
things.

-Peff

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
@ 2014-08-25 12:43   ` Jeff King
  2014-08-25 16:55     ` Steffen Prohaska
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2014-08-25 12:43 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git, pclouds, john, schacon

On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote:

> The data is streamed to the filter process anyway.  Better avoid mapping
> the file if possible.  This is especially useful if a clean filter
> reduces the size, for example if it computes a sha1 for binary data,
> like git media.  The file size that the previous implementation could
> handle was limited by the available address space; large files for
> example could not be handled with (32-bit) msysgit.  The new
> implementation can filter files of any size as long as the filter output
> is small enough.
> 
> The new code path is only taken if the filter is required.  The filter
> consumes data directly from the fd.  The original data is not available
> to git, so it must fail if the filter fails.

Can you clarify this second paragraph a bit more? If I understand
correctly, we handle a non-required filter failing by just reading the
data again (which we can do because we either read it into memory
ourselves, or mmap it). With the streaming approach, we will read the
whole file through our stream; if that fails we would then want to read
the stream from the start.

Couldn't we do that with an lseek (or even an mmap with offset 0)? That
obviously would not work for non-file inputs, but I think we address
that already in index_fd: we push non-seekable things off to index_pipe,
where we spool them to memory.

So it seems like the ideal strategy would be:

  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.

  2. If it's not seekable and the filter is required, try streaming. We
     die anyway if we fail.

  3. If it's not seekable and the filter is not required, decide based
     on file size:

       a. If it's small, spool to memory and proceed as we do now.

       b. If it's big, spool to a seekable tempfile.

Your patch implements part 2. But I would think part 1 is the most common
case. And while part 3b seems unpleasant, it is better than the current
code (with or without your patch), which will do 3a on a large file.

Hmm. Though I guess in (3) we do not have the size up front, so it's
complicated (we could spool N bytes to memory, then start dumping to a
file after that). I do not think we necessarily need to implement that
part, though. It seems like (1) is the thing I would expect to hit the
most (i.e., people do not always mark their filters are "required").

> -	write_err = (write_in_full(child_process.in, params->src, params->size) < 0);
> +	if (params->src) {
> +	    write_err = (write_in_full(child_process.in, params->src, params->size) < 0);

Style: 4-space indentation (rather than a tab). There's more of it in
this function (and in would_convert...) that I didn't mark.

> +	} else {
> +	    /* dup(), because copy_fd() closes the input fd. */
> +	    fd = dup(params->fd);

Not a problem you are introducing, but this seem kind of like a
misfeature in copy_fd. Is it worth fixing? The function only has two
existing callers.

> +	/* Apply a filter to an fd only if the filter is required to succeed.
> +	 * We must die if the filter fails, because the original data before
> +	 * filtering is not available.
> +	 */

Style nit:

  /*
   * We have a blank line at the top of our
   * multi-line comments.
   */

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  2014-08-25 11:38   ` Jeff King
@ 2014-08-25 15:06     ` Steffen Prohaska
  2014-08-25 15:12       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-25 15:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List, pclouds, john, schacon


On Aug 25, 2014, at 1:38 PM, Jeff King <peff@peff.net> wrote:

> On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote:
> 
>> diff --git a/wrapper.c b/wrapper.c
>> index bc1bfb8..69d1c9b 100644
>> --- a/wrapper.c
>> +++ b/wrapper.c
>> @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = do_nothing;
>> 
>> static void memory_limit_check(size_t size)
>> {
>> -	static int limit = -1;
>> -	if (limit == -1) {
>> -		const char *env = getenv("GIT_ALLOC_LIMIT");
>> -		limit = env ? atoi(env) * 1024 : 0;
>> +	static size_t limit = SIZE_MAX;
>> +	if (limit == SIZE_MAX) {
> 
> You use SIZE_MAX as the sentinel for "not set", and 0 as the sentinel
> for "no limit". That seems kind of backwards.
> 
> I guess you are inheriting this from the existing code, which lets
> GIT_ALLOC_LIMIT=0 mean "no limit". I'm not sure if we want to keep that
> or not (it would be backwards incompatible to change it, but we are
> already breaking compatibility here by assuming bytes rather than
> kilobytes; I think that's OK because this is not a documented feature,
> or one intended to be used externally).

I think it's reasonable that GIT_ALLOC_LIMIT=0 means "no limit", so that
the limit can easily be disabled temporarily.

But I could change the sentinel and handle 0 like:

    if (git_parse_ulong(env, &val)) {
	if (!val) {
		val = SIZE_MAX;
	}
    }

Maybe we should do this.



>> +		const char *var = "GIT_ALLOC_LIMIT";
>> +		unsigned long val = 0;
>> +		const char *env = getenv(var);
>> +		if (env && !git_parse_ulong(env, &val))
>> +			die("Failed to parse %s", var);
>> +		limit = val;
>> 	}
> 
> This and the next patch both look OK to me, but I notice this part is
> largely duplicated between the two. We already have git_env_bool to do a
> similar thing for boolean environment variables. Should we do something
> similar like:
> 
> diff --git a/config.c b/config.c
> index 058505c..11919eb 100644
> --- a/config.c
> +++ b/config.c
> @@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def)
> 	return v ? git_config_bool(k, v) : def;
> }
> 
> +unsigned long git_env_ulong(const char *k, unsigned long val)
> +{
> +	const char *v = getenv(k);
> +	if (v && !git_parse_ulong(k, &val))
> +		die("failed to parse %s", k);
> +	return val;
> +}
> +
> int git_config_system(void)
> {
> 	return !git_env_bool("GIT_CONFIG_NOSYSTEM", 0);
> 
> It's not a lot of code, but I think the callers end up being much easier
> to read:
> 
>  if (limit == SIZE_MAX)
> 	limit = git_env_ulong("GIT_ALLOC_LIMIT", 0);

I think you're right.  I'll change it.


	Steffen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
  2014-08-25 15:06     ` Steffen Prohaska
@ 2014-08-25 15:12       ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2014-08-25 15:12 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, Git Mailing List, pclouds, john, schacon

On Mon, Aug 25, 2014 at 05:06:22PM +0200, Steffen Prohaska wrote:

> I think it's reasonable that GIT_ALLOC_LIMIT=0 means "no limit", so that
> the limit can easily be disabled temporarily.

IMHO, GIT_ALLOC_LIMIT= (i.e., the empty string) would be a good way to
say that (and I guess that even works currently, due to the way atoi
works, but I suspect git_parse_ulong might complain). It is probably not
worth worrying about too much. This is not even a user-facing interface,
and the test scripts just set it to 0.

So I'd be OK going that direction, or just leaving it as-is.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-25 12:43   ` Jeff King
@ 2014-08-25 16:55     ` Steffen Prohaska
  2014-08-25 18:35       ` Junio C Hamano
  2014-08-26 17:54       ` Jeff King
  0 siblings, 2 replies; 15+ messages in thread
From: Steffen Prohaska @ 2014-08-25 16:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List, pclouds, john, schacon


On Aug 25, 2014, at 2:43 PM, Jeff King <peff@peff.net> wrote:

> On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote:
> 
>> The data is streamed to the filter process anyway.  Better avoid mapping
>> the file if possible.  This is especially useful if a clean filter
>> reduces the size, for example if it computes a sha1 for binary data,
>> like git media.  The file size that the previous implementation could
>> handle was limited by the available address space; large files for
>> example could not be handled with (32-bit) msysgit.  The new
>> implementation can filter files of any size as long as the filter output
>> is small enough.
>> 
>> The new code path is only taken if the filter is required.  The filter
>> consumes data directly from the fd.  The original data is not available
>> to git, so it must fail if the filter fails.
> 
> Can you clarify this second paragraph a bit more? If I understand
> correctly, we handle a non-required filter failing by just reading the
> data again (which we can do because we either read it into memory
> ourselves, or mmap it).

We don't read the data again.  convert_to_git() assumes that it is already
in memory and simply keeps the original buffer if the filter fails.


> With the streaming approach, we will read the
> whole file through our stream; if that fails we would then want to read
> the stream from the start.
> 
> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
> obviously would not work for non-file inputs, but I think we address
> that already in index_fd: we push non-seekable things off to index_pipe,
> where we spool them to memory.

It could be handled that way, but we would be back to the original problem
that 32-bit git fails for large files.  The convert code path currently
assumes that all data is available in a single buffer at some point to apply
crlf and ident filters.

If the initial filter, which is assumed to reduce the file size, fails, we
could seek to 0 and read the entire file.  But git would then fail for large
files with out-of-memory.  We would not gain anything for the use case that
I describe in the commit message's first paragraph.

To implement something like the ideal strategy below, the entire convert 
machinery for crlf and ident would have to be converted to a streaming
approach.  Another option would be to detect that only the clean filter
would be applied and not crlf and ident.  Maybe we could get away with
something simpler then.

But I think that if the clean filter's purpose is to reduce file size, it
does not make sense to try to handle the case of a failing filter with a 
fallback plan.  The filter should simply be marked "required", because
any sane operation requires it.


> So it seems like the ideal strategy would be:
> 
>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
> 
>  2. If it's not seekable and the filter is required, try streaming. We
>     die anyway if we fail.
> 
>  3. If it's not seekable and the filter is not required, decide based
>     on file size:
> 
>       a. If it's small, spool to memory and proceed as we do now.
> 
>       b. If it's big, spool to a seekable tempfile.
> 
> Your patch implements part 2. But I would think part 1 is the most common
> case. And while part 3b seems unpleasant, it is better than the current
> code (with or without your patch), which will do 3a on a large file.
> 
> Hmm. Though I guess in (3) we do not have the size up front, so it's
> complicated (we could spool N bytes to memory, then start dumping to a
> file after that). I do not think we necessarily need to implement that
> part, though. It seems like (1) is the thing I would expect to hit the
> most (i.e., people do not always mark their filters are "required").

Well, I think they have to mark it if the filter's purpose is to reduce size.

I'll add a bit of the discussion to the commit message.  I'm not convinced
that we should do more at this point.


>> +	} else {
>> +	    /* dup(), because copy_fd() closes the input fd. */
>> +	    fd = dup(params->fd);
> 
> Not a problem you are introducing, but this seem kind of like a
> misfeature in copy_fd. Is it worth fixing? The function only has two
> existing callers.

I found it confusing.  I think it's worth fixing.

	Steffen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-25 16:55     ` Steffen Prohaska
@ 2014-08-25 18:35       ` Junio C Hamano
  2014-08-26 18:00         ` Jeff King
  2014-08-26 17:54       ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-08-25 18:35 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Jeff King, Git Mailing List, pclouds, john, schacon

Steffen Prohaska <prohaska@zib.de> writes:

>> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> obviously would not work for non-file inputs, but I think we address
>> that already in index_fd: we push non-seekable things off to index_pipe,
>> where we spool them to memory.
>
> It could be handled that way, but we would be back to the original problem
> that 32-bit git fails for large files.

Correct, and you are making an incremental improvement so that such
a large blob can be handled _when_ the filters can successfully
munge it back and forth.  If we fail due to out of memory when the
filters cannot, that would be the same as without your improvement,
so you are still making progress.

> To implement something like the ideal strategy below, the entire convert 
> machinery for crlf and ident would have to be converted to a streaming
> approach.

Yes, that has always been the longer term vision since the day the
streaming infrastructure was introduced.

>> So it seems like the ideal strategy would be:
>> 
>>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> 
>>  2. If it's not seekable and the filter is required, try streaming. We
>>     die anyway if we fail.

Puzzled...  Is it assumed that any content the filters tell us to
use the contents from the db as-is by exiting with non-zero status
will always be large not to fit in-core?  For small contents, isn't
this "ideal" strategy a regression?

>>  3. If it's not seekable and the filter is not required, decide based
>>     on file size:
>> 
>>       a. If it's small, spool to memory and proceed as we do now.
>> 
>>       b. If it's big, spool to a seekable tempfile.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path'
  2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
@ 2014-08-25 22:55   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-08-25 22:55 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: git, peff, pclouds, john, schacon

Steffen Prohaska <prohaska@zib.de> writes:

> It is only the path that matters in the decision whether to filter or
> not.  Clarify this by making path the single argument of
> would_convert_to_git().
>
> Signed-off-by: Steffen Prohaska <prohaska@zib.de>
> ---

I've retitled this as:

    convert: drop arguments other than 'path' from would_convert_to_git()

to match the output from "git shortlog --since=3.months --no-merges"
by using lowercase 'd' after the "convert: " area name, and also
more importantly avoid calling "refactor" which this change is not.

Thanks.

>  convert.h   | 5 ++---
>  sha1_file.c | 2 +-
>  2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/convert.h b/convert.h
> index 0c2143c..c638b33 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const char *src,
>  				   size_t len, struct strbuf *dst);
>  extern int renormalize_buffer(const char *path, const char *src, size_t len,
>  			      struct strbuf *dst);
> -static inline int would_convert_to_git(const char *path, const char *src,
> -				       size_t len, enum safe_crlf checksafe)
> +static inline int would_convert_to_git(const char *path)
>  {
> -	return convert_to_git(path, src, len, NULL, checksafe);
> +	return convert_to_git(path, NULL, 0, NULL, 0);
>  }
>  
>  /*****************************************************************
> diff --git a/sha1_file.c b/sha1_file.c
> index 3f70b1d..00c07f2 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
>  	if (!S_ISREG(st->st_mode))
>  		ret = index_pipe(sha1, fd, type, path, flags);
>  	else if (size <= big_file_threshold || type != OBJ_BLOB ||
> -		 (path && would_convert_to_git(path, NULL, 0, 0)))
> +		 (path && would_convert_to_git(path)))
>  		ret = index_core(sha1, fd, size, type, path, flags);
>  	else
>  		ret = index_stream(sha1, fd, size, type, path, flags);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-25 16:55     ` Steffen Prohaska
  2014-08-25 18:35       ` Junio C Hamano
@ 2014-08-26 17:54       ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2014-08-26 17:54 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, Git Mailing List, pclouds, john, schacon

On Mon, Aug 25, 2014 at 06:55:51PM +0200, Steffen Prohaska wrote:

> It could be handled that way, but we would be back to the original problem
> that 32-bit git fails for large files.  The convert code path currently
> assumes that all data is available in a single buffer at some point to apply
> crlf and ident filters.
> 
> If the initial filter, which is assumed to reduce the file size, fails, we
> could seek to 0 and read the entire file.  But git would then fail for large
> files with out-of-memory.  We would not gain anything for the use case that
> I describe in the commit message's first paragraph.

Ah. So the real problem is that we cannot handle _other_ conversions for
large files, and we must try to intercept the data before it gets to
them. So this is really just helping "reduction" filters. Even if our
streaming filter succeeds, it does not help the situation if it did not
reduce the large file to a smaller one.

It would be nice in the long run to let the other filters stream, too,
but that is not a problem we need to solve immediately. Your patch is a
strict improvement.

Thanks for the explanation; your approach makes a lot more sense to me
now.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-25 18:35       ` Junio C Hamano
@ 2014-08-26 18:00         ` Jeff King
  2014-08-26 19:32           ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2014-08-26 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Steffen Prohaska, Git Mailing List, pclouds, john, schacon

On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote:

> Steffen Prohaska <prohaska@zib.de> writes:
> 
> >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
> >> obviously would not work for non-file inputs, but I think we address
> >> that already in index_fd: we push non-seekable things off to index_pipe,
> >> where we spool them to memory.
> >
> > It could be handled that way, but we would be back to the original problem
> > that 32-bit git fails for large files.
> 
> Correct, and you are making an incremental improvement so that such
> a large blob can be handled _when_ the filters can successfully
> munge it back and forth.  If we fail due to out of memory when the
> filters cannot, that would be the same as without your improvement,
> so you are still making progress.

I do not think my proposal makes anything worse than Steffen's patch.
_If_ you have a non-required filter, and _if_ we can run it, then we
stream the filter and hopefully end up with a small enough result to fit
into memory. If we cannot run the filter, we are screwed anyway (we
follow the regular code path and dump the whole thing into memory; i.e.,
the same as without this patch series).

I think the main argument against going further is just that it is not
worth the complexity. Tell people doing reduction filters they need to
use "required", and that accomplishes the same thing.

> >> So it seems like the ideal strategy would be:
> >> 
> >>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
> >> 
> >>  2. If it's not seekable and the filter is required, try streaming. We
> >>     die anyway if we fail.
> 
> Puzzled...  Is it assumed that any content the filters tell us to
> use the contents from the db as-is by exiting with non-zero status
> will always be large not to fit in-core?  For small contents, isn't
> this "ideal" strategy a regression?

I am not sure what you mean by regression here. We will try to stream
more often, but I do not see that as a bad thing.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
  2014-08-26 18:00         ` Jeff King
@ 2014-08-26 19:32           ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-08-26 19:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Steffen Prohaska, Git Mailing List, pclouds, john, schacon

Jeff King <peff@peff.net> writes:

> On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote:
>
>> Steffen Prohaska <prohaska@zib.de> writes:
>> 
>> >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> >> obviously would not work for non-file inputs, but I think we address
>> >> that already in index_fd: we push non-seekable things off to index_pipe,
>> >> where we spool them to memory.
>> >
>> > It could be handled that way, but we would be back to the original problem
>> > that 32-bit git fails for large files.
>> 
>> Correct, and you are making an incremental improvement so that such
>> a large blob can be handled _when_ the filters can successfully
>> munge it back and forth.  If we fail due to out of memory when the
>> filters cannot, that would be the same as without your improvement,
>> so you are still making progress.
>
> I do not think my proposal makes anything worse than Steffen's patch.

I think we are saying the same thing, but perhaps I didn't phrase it
well.

> I think the main argument against going further is just that it is not
> worth the complexity. Tell people doing reduction filters they need to
> use "required", and that accomplishes the same thing.
>
>> >> So it seems like the ideal strategy would be:
>> >> 
>> >>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> >> 
>> >>  2. If it's not seekable and the filter is required, try streaming. We
>> >>     die anyway if we fail.
>> 
>> Puzzled...  Is it assumed that any content the filters tell us to
>> use the contents from the db as-is by exiting with non-zero status
>> will always be large not to fit in-core?  For small contents, isn't
>> this "ideal" strategy a regression?
>
> I am not sure what you mean by regression here. We will try to stream
> more often, but I do not see that as a bad thing.

I thought the proposed flow I was commenting on was

    - try streaming and die if the filter fails

For an optional filter working on contents that would fit in core,
we currently do

    - slurp in memory, filter it, use the original if the filter fails

If we switched to 2., then... ahh, ok, I misread "is required" part.
The "regression" does not apply to that case at all.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-08-26 19:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
2014-08-25 22:55   ` Junio C Hamano
2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
2014-08-25 11:38   ` Jeff King
2014-08-25 15:06     ` Steffen Prohaska
2014-08-25 15:12       ` Jeff King
2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
2014-08-25 12:43   ` Jeff King
2014-08-25 16:55     ` Steffen Prohaska
2014-08-25 18:35       ` Junio C Hamano
2014-08-26 18:00         ` Jeff King
2014-08-26 19:32           ` Junio C Hamano
2014-08-26 17:54       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.