All of lore.kernel.org
 help / color / mirror / Atom feed
* Git chokes on large file
@ 2014-05-27 16:47 Dale R. Worley
  2014-05-28 13:32 ` Duy Nguyen
  2014-05-28 15:05 ` Git chokes on large file Thomas Braun
  0 siblings, 2 replies; 49+ messages in thread
From: Dale R. Worley @ 2014-05-27 16:47 UTC (permalink / raw)
  To: git

I've discovered a problem using Git.  It's not clear to me what the
"correct" behavior should be, but it seems to me that Git is failing
in an undesirable way.

The problem arises when trying to handle a very large file.  For
example:

    $ git --version
    git version 1.8.3.1
    $ mkdir $$
    $ cd $$
    $ git init
    Initialized empty Git repository in /common/not-replicated/worley/temp/5627/.git/
    $ truncate --size=20G big_file
    $ ls -l
    total 0
    -rw-rw-r--. 1 worley worley 21474836480 May 27 11:59 big_file
    $ time git add big_file

    real	4m48.752s
    user	4m31.295s
    sys	0m16.747s
    $

At this point, either 'git fsck' or 'git commit' fails:

    $ git fsck --full --strict
    notice: HEAD points to an unborn branch (master)
    Checking object directories: 100% (256/256), done.
    fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

    $ git commit -m Test.
    [master (root-commit) 3df3655] Test.
    fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

The central problem is that one can accidentally add a file that
leaves the repository in a "broken" state, where various normal
commands simply don't work.  The most worrying aspect is that "git
fsck" fails -- of all the commands, the one that verifies the validity
of the repository (and diagnoses errors) should be the most robust!

Even doing a 'git reset' does not put the repository in a state where
'git fsck' will complete:

    $ git reset
    $ git fsck --full --strict
    notice: HEAD points to an unborn branch (master)
    Checking object directories: 100% (256/256), done.
    fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

Dale

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-27 16:47 Git chokes on large file Dale R. Worley
@ 2014-05-28 13:32 ` Duy Nguyen
  2014-05-28 17:10   ` Junio C Hamano
                     ` (2 more replies)
  2014-05-28 15:05 ` Git chokes on large file Thomas Braun
  1 sibling, 3 replies; 49+ messages in thread
From: Duy Nguyen @ 2014-05-28 13:32 UTC (permalink / raw)
  To: Dale R. Worley; +Cc: Git Mailing List

On Tue, May 27, 2014 at 11:47 PM, Dale R. Worley <worley@alum.mit.edu> wrote:
> I've discovered a problem using Git.  It's not clear to me what the
> "correct" behavior should be, but it seems to me that Git is failing
> in an undesirable way.
>
> The problem arises when trying to handle a very large file.  For
> example:
>
>     $ git --version
>     git version 1.8.3.1
>     $ mkdir $$
>     $ cd $$
>     $ git init
>     Initialized empty Git repository in /common/not-replicated/worley/temp/5627/.git/
>     $ truncate --size=20G big_file
>     $ ls -l
>     total 0
>     -rw-rw-r--. 1 worley worley 21474836480 May 27 11:59 big_file
>     $ time git add big_file
>
>     real        4m48.752s
>     user        4m31.295s
>     sys 0m16.747s
>     $
>
> At this point, either 'git fsck' or 'git commit' fails:
>
>     $ git fsck --full --strict
>     notice: HEAD points to an unborn branch (master)
>     Checking object directories: 100% (256/256), done.
>     fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

Back trace for this one

#3  0x000000000055cf39 in xmalloc (size=21474836481) at wrapper.c:49
#4  0x000000000055cffd in xmallocz (size=21474836480) at wrapper.c:73
#5  0x0000000000537858 in unpack_compressed_entry (p=0x858ac0,
w_curs=0x7fffffffc0f8, curpos=18, size=21474836480) at
sha1_file.c:1924
#6  0x0000000000538364 in unpack_entry (p=0x858ac0, obj_offset=12,
final_type=0x7fffffffc1e4, final_size=0x7fffffffc1d8) at
sha1_file.c:2206
#7  0x00000000004fb0a2 in verify_packfile (p=0x858ac0,
w_curs=0x7fffffffc320, fn=0x43f5f2 <fsck_obj_buffer>,
progress=0x858a90, base_count=0) at pack-check.c:119
#8  0x00000000004fb3f4 in verify_pack (p=0x858ac0, fn=0x43f5f2
<fsck_obj_buffer>, progress=0x858a90, base_count=0) at
pack-check.c:177
#9  0x00000000004401d7 in cmd_fsck (argc=0, argv=0x7fffffffd650,
prefix=0x0) at builtin/fsck.c:677

Not easy to fix. I started working on converting fsck to use
index-pack code for pack verification. index-pack supports large files
well, so in the end it might fix this (as well as speeding up fsck).
But that work has stalled for a long time.

>
>     $ git commit -m Test.
>     [master (root-commit) 3df3655] Test.
>     fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

And back trace

#11 0x00000000004b9da0 in read_sha1_file (sha1=0x8558a0
"\256/s\324\370\304\344\212\304I\v\342\334MS\002\352\214\061\222",
type=0x7fffffffc6c4, size=0x8558d0) at cache.h:820
#12 0x00000000004c1b98 in diff_populate_filespec (s=0x8558a0,
size_only=0) at diff.c:2749
#13 0x00000000004c0110 in diff_filespec_is_binary (one=0x8558a0) at diff.c:2188
#14 0x00000000004c0f0b in builtin_diffstat (name_a=0x858530
"big_file", name_b=0x0, one=0x8584e0, two=0x8558a0,
diffstat=0x7fffffffc8a0, o=0x7fffffffce88, p=0x855910) at diff.c:2435
#15 0x00000000004c2fd4 in run_diffstat (p=0x855910, o=0x7fffffffce88,
diffstat=0x7fffffffc8a0) at diff.c:3168
#16 0x00000000004c603a in diff_flush_stat (p=0x855910,
o=0x7fffffffce88, diffstat=0x7fffffffc8a0) at diff.c:4081
#17 0x00000000004c70e4 in diff_flush (options=0x7fffffffce88) at diff.c:4520
#18 0x00000000004e5d59 in log_tree_diff_flush (opt=0x7fffffffcaf0) at
log-tree.c:715
#19 0x00000000004e5e5a in log_tree_diff (opt=0x7fffffffcaf0,
commit=0x8585b0, log=0x7fffffffc9a0) at log-tree.c:747
#20 0x00000000004e60b1 in log_tree_commit (opt=0x7fffffffcaf0,
commit=0x8585b0) at log-tree.c:810
#21 0x000000000042c45c in print_summary (prefix=0x0,
sha1=0x7fffffffd300 ".&Gȑ\360\243\202\351&!\035\312q\374\345\314LL)",
initial_commit=1) at builtin/commit.c:1426
#22 0x000000000042d213 in cmd_commit (argc=0, argv=0x7fffffffd650,
prefix=0x0) at builtin/commit.c:1750

If we could have an option in read_sha1_file to read max to <n> bytes
(enough for binary detection purpose), it would fix this. Another
option is declare all files larger than core.bigfilethreshold binary.
Easier in both senses of implementation cost and looseness.

> Even doing a 'git reset' does not put the repository in a state where
> 'git fsck' will complete:
>
>     $ git reset
>     $ git fsck --full --strict
>     notice: HEAD points to an unborn branch (master)
>     Checking object directories: 100% (256/256), done.
>     fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

I don't know how many commands are hit by this. If you have time and
gdb, please put a break point in die_builtin() function and send
backtraces for those that fail. You could speed up the process by
creating a smaller file and set the environment variable
GIT_ALLOC_LIMIT (in kilobytes) to a number lower than that size. If
git attempts to allocate a block larger than that limit it'll die.
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-27 16:47 Git chokes on large file Dale R. Worley
  2014-05-28 13:32 ` Duy Nguyen
@ 2014-05-28 15:05 ` Thomas Braun
  1 sibling, 0 replies; 49+ messages in thread
From: Thomas Braun @ 2014-05-28 15:05 UTC (permalink / raw)
  To: Dale R. Worley, git

Am 27.05.2014 18:47, schrieb Dale R. Worley:
> Even doing a 'git reset' does not put the repository in a state where
> 'git fsck' will complete:

You have to remove the offending commit also from the reflog.

The following snipped creates an offending commit, big_file is 2GB which
is too large for git on windows, and later removes it completely so that
git fsck passes again.

-----------------------------------
git init
echo 1 > some_file
git add some_file
git commit -m "add some_file"
git add big_file
git commit -m "add big_file" # reports malloc error without the -q flag
git log --stat # malloc error
git reset HEAD^1
git fsck --full --strict --verbose # fails
git reflog expire --expire=now --all # remove all reflog entries
git gc --prune=now
git fsck --full --strict --verbose # passes
-----------------------------------

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 13:32 ` Duy Nguyen
@ 2014-05-28 17:10   ` Junio C Hamano
  2014-05-28 18:18     ` Dale R. Worley
  2014-05-28 18:15   ` Dale R. Worley
  2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2014-05-28 17:10 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Dale R. Worley, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

>>     $ git fsck --full --strict
>>     notice: HEAD points to an unborn branch (master)
>>     Checking object directories: 100% (256/256), done.
>>     fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)
>
> Back trace for this one
> ...
> Not easy to fix. I started working on converting fsck to use
> index-pack code for pack verification. index-pack supports large files
> well, so in the end it might fix this (as well as speeding up fsck).
> But that work has stalled for a long time.

You need to have enough memory (virtual is fine if you have enough
time) to do fsck.  Some part of index-pack could be refactored into
a common helper function that could be called from fsck, but I think
it would be a lot of work.

>>     $ git commit -m Test.
>>     [master (root-commit) 3df3655] Test.
>>     fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes)

I suspect that this one is only because you are letting the status
which involves diff to kick in (and for that you would need to have
enough memory).  As you suggested, it might be a good idea to take
bigfilethreshold account when deciding if we would want to run diff.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 13:32 ` Duy Nguyen
  2014-05-28 17:10   ` Junio C Hamano
@ 2014-05-28 18:15   ` Dale R. Worley
  2014-05-28 18:23     ` David Lang
  2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 49+ messages in thread
From: Dale R. Worley @ 2014-05-28 18:15 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git

> From: Duy Nguyen <pclouds@gmail.com>

> I don't know how many commands are hit by this. If you have time and
> gdb, please put a break point in die_builtin() function and send
> backtraces for those that fail. You could speed up the process by
> creating a smaller file and set the environment variable
> GIT_ALLOC_LIMIT (in kilobytes) to a number lower than that size. If
> git attempts to allocate a block larger than that limit it'll die.

I don't use Git enough to exercise it well.  And there are dozens of
commands with hundreds of options.

As someone else has noted, if I run 'git commit -q --no-status', it
doesn't crash.

It seems that much of Git was coded under the assumption that any file
could always be held entirely in RAM.  Who made that mistake?  Are
people so out of touch with reality?

Dale

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 17:10   ` Junio C Hamano
@ 2014-05-28 18:18     ` Dale R. Worley
  0 siblings, 0 replies; 49+ messages in thread
From: Dale R. Worley @ 2014-05-28 18:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: pclouds, git

> From: Junio C Hamano <gitster@pobox.com>

> You need to have enough memory (virtual is fine if you have enough
> time) to do fsck.  Some part of index-pack could be refactored into
> a common helper function that could be called from fsck, but I think
> it would be a lot of work.

How much memory is "enough"?  And how is it that fsck is coded so that
the available RAM is a limit on which repositories can be checked?
Did someone forget that RAM may be considerably smaller than the
amount of data a program may have to deal with?

Dale

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 18:15   ` Dale R. Worley
@ 2014-05-28 18:23     ` David Lang
  2014-05-28 18:47       ` Dale R. Worley
  2014-05-28 18:54       ` Junio C Hamano
  0 siblings, 2 replies; 49+ messages in thread
From: David Lang @ 2014-05-28 18:23 UTC (permalink / raw)
  To: Dale R. Worley; +Cc: Duy Nguyen, git

On Wed, 28 May 2014, Dale R. Worley wrote:

>> From: Duy Nguyen <pclouds@gmail.com>
>
>> I don't know how many commands are hit by this. If you have time and
>> gdb, please put a break point in die_builtin() function and send
>> backtraces for those that fail. You could speed up the process by
>> creating a smaller file and set the environment variable
>> GIT_ALLOC_LIMIT (in kilobytes) to a number lower than that size. If
>> git attempts to allocate a block larger than that limit it'll die.
>
> I don't use Git enough to exercise it well.  And there are dozens of
> commands with hundreds of options.
>
> As someone else has noted, if I run 'git commit -q --no-status', it
> doesn't crash.
>
> It seems that much of Git was coded under the assumption that any file
> could always be held entirely in RAM.  Who made that mistake?  Are
> people so out of touch with reality?

Git was designed to track source code, there are warts that show up in the 
implementation when you use individual files >4GB

such files tend to also not diff well. git-annex and other offshoots hae methods 
boled on that handle such large files better than core git does.

David Lang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 18:23     ` David Lang
@ 2014-05-28 18:47       ` Dale R. Worley
  2014-05-28 19:05         ` David Lang
  2014-05-28 18:54       ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Dale R. Worley @ 2014-05-28 18:47 UTC (permalink / raw)
  To: David Lang; +Cc: pclouds, git

> From: David Lang <david@lang.hm>

> Git was designed to track source code, there are warts that show up
> in the implementation when you use individual files >4GB

I'd expect that if you want to deal with files over 100k, you should
assume that it doesn't all fit in memory.

Dale

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 18:23     ` David Lang
  2014-05-28 18:47       ` Dale R. Worley
@ 2014-05-28 18:54       ` Junio C Hamano
  2014-05-28 19:09         ` David Lang
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2014-05-28 18:54 UTC (permalink / raw)
  To: David Lang; +Cc: Dale R. Worley, Duy Nguyen, git

David Lang <david@lang.hm> writes:

> On Wed, 28 May 2014, Dale R. Worley wrote:
>
>> It seems that much of Git was coded under the assumption that any file
>> could always be held entirely in RAM.  Who made that mistake?  Are
>> people so out of touch with reality?
>
> Git was designed to track source code, there are warts that show up in
> the implementation when you use individual files >4GB
>
> such files tend to also not diff well. git-annex and other offshoots
> hae methods boled on that handle such large files better than core git
> does.

Very well explained, but perhaps you went a bit too far, I am
afraid.

The fact that our primary focus being the source code does not at
all mean that we are not interested to enhance the system to also
cater to those who want to put materials that are traditionally
considered non-source to it, now that we have become fairly good at
doing the source code.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 18:47       ` Dale R. Worley
@ 2014-05-28 19:05         ` David Lang
  2014-05-29 19:12           ` Dale R. Worley
  0 siblings, 1 reply; 49+ messages in thread
From: David Lang @ 2014-05-28 19:05 UTC (permalink / raw)
  To: Dale R. Worley; +Cc: pclouds, git

On Wed, 28 May 2014, Dale R. Worley wrote:

>> From: David Lang <david@lang.hm>
>
>> Git was designed to track source code, there are warts that show up
>> in the implementation when you use individual files >4GB
>
> I'd expect that if you want to deal with files over 100k, you should
> assume that it doesn't all fit in memory.

well, as others noted, the problem is actually caused by doing the diffs, and 
that is something that is a very common thing to do with source code.

And I would assume that files of several MB would be able to fit in memory 
(again, this was assumed to be for development, and compilers take a lot of ram 
to run, so having enough ram to hold any individual source file while the 
compiler is _not_ using ram doesn't seem likely to be a problem)

David Lang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 18:54       ` Junio C Hamano
@ 2014-05-28 19:09         ` David Lang
  0 siblings, 0 replies; 49+ messages in thread
From: David Lang @ 2014-05-28 19:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Dale R. Worley, Duy Nguyen, git

On Wed, 28 May 2014, Junio C Hamano wrote:

> David Lang <david@lang.hm> writes:
>
>> On Wed, 28 May 2014, Dale R. Worley wrote:
>>
>>> It seems that much of Git was coded under the assumption that any file
>>> could always be held entirely in RAM.  Who made that mistake?  Are
>>> people so out of touch with reality?
>>
>> Git was designed to track source code, there are warts that show up in
>> the implementation when you use individual files >4GB
>>
>> such files tend to also not diff well. git-annex and other offshoots
>> hae methods boled on that handle such large files better than core git
>> does.
>
> Very well explained, but perhaps you went a bit too far, I am
> afraid.
>
> The fact that our primary focus being the source code does not at
> all mean that we are not interested to enhance the system to also
> cater to those who want to put materials that are traditionally
> considered non-source to it, now that we have become fairly good at
> doing the source code.

Correct, I didn't mean to imply that git is only for source files, just noting 
it's origional purpose.

now that there are multiple different add-ons for git to handle large files in 
different ways, I'm watching to see what can get folded back into the core.

David Lang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die()
  2014-05-28 13:32 ` Duy Nguyen
  2014-05-28 17:10   ` Junio C Hamano
  2014-05-28 18:15   ` Dale R. Worley
@ 2014-05-29 12:57   ` Nguyễn Thái Ngọc Duy
  2014-05-29 12:57     ` [PATCH 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
                       ` (3 more replies)
  2 siblings, 4 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-29 12:57 UTC (permalink / raw)
  To: git; +Cc: worley, Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 git-compat-util.h |  1 +
 wrapper.c         | 68 ++++++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index f6d3a46..f23e4e4 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -524,6 +524,7 @@ extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 extern char *xstrdup(const char *str);
 extern void *xmalloc(size_t size);
 extern void *xmallocz(size_t size);
+extern void *xmallocz_gentle(size_t size);
 extern void *xmemdupz(const void *data, size_t len);
 extern char *xstrndup(const char *str, size_t len);
 extern void *xrealloc(void *ptr, size_t size);
diff --git a/wrapper.c b/wrapper.c
index 0cc5636..7ab9a98 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -9,16 +9,23 @@ static void do_nothing(size_t size)
 
 static void (*try_to_free_routine)(size_t size) = do_nothing;
 
-static void memory_limit_check(size_t size)
+static int memory_limit_check(size_t size, int gentle)
 {
 	static int limit = -1;
 	if (limit == -1) {
 		const char *env = getenv("GIT_ALLOC_LIMIT");
 		limit = env ? atoi(env) * 1024 : 0;
 	}
-	if (limit && size > limit)
-		die("attempting to allocate %"PRIuMAX" over limit %d",
-		    (intmax_t)size, limit);
+	if (limit && size > limit) {
+		if (gentle) {
+			error("attempting to allocate %"PRIuMAX" over limit %d",
+			      (intmax_t)size, limit);
+			return -1;
+		} else
+			die("attempting to allocate %"PRIuMAX" over limit %d",
+			    (intmax_t)size, limit);
+	}
+	return 0;
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
@@ -42,11 +49,12 @@ char *xstrdup(const char *str)
 	return ret;
 }
 
-void *xmalloc(size_t size)
+static void *do_xmalloc(size_t size, int gentle)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	if (memory_limit_check(size, gentle))
+		return NULL;
 	ret = malloc(size);
 	if (!ret && !size)
 		ret = malloc(1);
@@ -55,9 +63,16 @@ void *xmalloc(size_t size)
 		ret = malloc(size);
 		if (!ret && !size)
 			ret = malloc(1);
-		if (!ret)
-			die("Out of memory, malloc failed (tried to allocate %lu bytes)",
-			    (unsigned long)size);
+		if (!ret) {
+			if (!gentle)
+				die("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				    (unsigned long)size);
+			else {
+				error("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				      (unsigned long)size);
+				return NULL;
+			}
+		}
 	}
 #ifdef XMALLOC_POISON
 	memset(ret, 0xA5, size);
@@ -65,16 +80,37 @@ void *xmalloc(size_t size)
 	return ret;
 }
 
-void *xmallocz(size_t size)
+void *xmalloc(size_t size)
+{
+	return do_xmalloc(size, 0);
+}
+
+static void *do_xmallocz(size_t size, int gentle)
 {
 	void *ret;
-	if (unsigned_add_overflows(size, 1))
-		die("Data too large to fit into virtual memory space.");
-	ret = xmalloc(size + 1);
-	((char*)ret)[size] = 0;
+	if (unsigned_add_overflows(size, 1)) {
+		if (gentle) {
+			error("Data too large to fit into virtual memory space.");
+			return NULL;
+		} else
+			die("Data too large to fit into virtual memory space.");
+	}
+	ret = do_xmalloc(size + 1, gentle);
+	if (ret)
+		((char*)ret)[size] = 0;
 	return ret;
 }
 
+void *xmallocz(size_t size)
+{
+	return do_xmallocz(size, 0);
+}
+
+void *xmallocz_gentle(size_t size)
+{
+	return do_xmallocz(size, 1);
+}
+
 /*
  * xmemdupz() allocates (len + 1) bytes of memory, duplicates "len" bytes of
  * "data" to the allocated memory, zero terminates the allocated memory,
@@ -96,7 +132,7 @@ void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	memory_limit_check(size, 0);
 	ret = realloc(ptr, size);
 	if (!ret && !size)
 		ret = realloc(ptr, 1);
@@ -115,7 +151,7 @@ void *xcalloc(size_t nmemb, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size * nmemb);
+	memory_limit_check(size * nmemb, 0);
 	ret = calloc(nmemb, size);
 	if (!ret && (!nmemb || !size))
 		ret = calloc(1, 1);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 2/4] fsck: do not die when not enough memory to examine a pack entry
  2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
@ 2014-05-29 12:57     ` Nguyễn Thái Ngọc Duy
  2014-05-29 12:57     ` [PATCH 3/4] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-29 12:57 UTC (permalink / raw)
  To: git; +Cc: worley, Junio C Hamano, Nguyễn Thái Ngọc Duy

fsck is a tool that error() is more preferred than die(), but many
functions embed die() inside beyond fsck's control.
unpack_compressed_entry()'s using xmallocz is such a function,
triggered from verify_packfile() -> unpack_entry(). Make it use
xmallocz_gentle() instead.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 sha1_file.c      | 4 +++-
 t/t1050-large.sh | 5 +++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 3e9f55f..8ad906a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1921,7 +1921,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	git_zstream stream;
 	unsigned char *buffer, *in;
 
-	buffer = xmallocz(size);
+	buffer = xmallocz_gentle(size);
+	if (!buffer)
+		return NULL;
 	memset(&stream, 0, sizeof(stream));
 	stream.next_out = buffer;
 	stream.avail_out = size + 1;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index fd10528..333909b 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -163,4 +163,9 @@ test_expect_success 'zip achiving, deflate' '
 	git archive --format=zip HEAD >/dev/null
 '
 
+test_expect_success 'fsck' '
+	test_must_fail git fsck 2>err &&
+	grep "attempting to allocate .* over limit" err
+'
+
 test_done
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 3/4] diff.c: allow to pass more flags to diff_populate_filespec
  2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2014-05-29 12:57     ` [PATCH 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
@ 2014-05-29 12:57     ` Nguyễn Thái Ngọc Duy
  2014-05-29 12:57     ` [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
  2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  3 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-29 12:57 UTC (permalink / raw)
  To: git; +Cc: worley, Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff.c            | 13 +++++++------
 diffcore-rename.c |  6 ++++--
 diffcore.h        |  3 ++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/diff.c b/diff.c
index f72769a..54281cb 100644
--- a/diff.c
+++ b/diff.c
@@ -373,7 +373,7 @@ static unsigned long diff_filespec_size(struct diff_filespec *one)
 {
 	if (!DIFF_FILE_VALID(one))
 		return 0;
-	diff_populate_filespec(one, 1);
+	diff_populate_filespec(one, DIFF_POPULATE_SIZE_ONLY);
 	return one->size;
 }
 
@@ -1907,11 +1907,11 @@ static void show_dirstat(struct diff_options *options)
 			diff_free_filespec_data(p->one);
 			diff_free_filespec_data(p->two);
 		} else if (DIFF_FILE_VALID(p->one)) {
-			diff_populate_filespec(p->one, 1);
+			diff_populate_filespec(p->one, DIFF_POPULATE_SIZE_ONLY);
 			copied = added = 0;
 			diff_free_filespec_data(p->one);
 		} else if (DIFF_FILE_VALID(p->two)) {
-			diff_populate_filespec(p->two, 1);
+			diff_populate_filespec(p->two, DIFF_POPULATE_SIZE_ONLY);
 			copied = 0;
 			added = p->two->size;
 			diff_free_filespec_data(p->two);
@@ -2664,8 +2664,9 @@ static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
  * grab the data for the blob (or file) for our own in-core comparison.
  * diff_filespec has data and size fields for this purpose.
  */
-int diff_populate_filespec(struct diff_filespec *s, int size_only)
+int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 {
+	int size_only = flags & DIFF_POPULATE_SIZE_ONLY;
 	int err = 0;
 	/*
 	 * demote FAIL to WARN to allow inspecting the situation
@@ -4695,8 +4696,8 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    !DIFF_FILE_VALID(p->two) ||
 	    (p->one->sha1_valid && p->two->sha1_valid) ||
 	    (p->one->mode != p->two->mode) ||
-	    diff_populate_filespec(p->one, 1) ||
-	    diff_populate_filespec(p->two, 1) ||
+	    diff_populate_filespec(p->one, DIFF_POPULATE_SIZE_ONLY) ||
+	    diff_populate_filespec(p->two, DIFF_POPULATE_SIZE_ONLY) ||
 	    (p->one->size != p->two->size) ||
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 749a35d..8437917 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -147,9 +147,11 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * is a possible size - we really should have a flag to
 	 * say whether the size is valid or not!)
 	 */
-	if (!src->cnt_data && diff_populate_filespec(src, 1))
+	if (!src->cnt_data &&
+	    diff_populate_filespec(src, DIFF_POPULATE_SIZE_ONLY))
 		return 0;
-	if (!dst->cnt_data && diff_populate_filespec(dst, 1))
+	if (!dst->cnt_data &&
+	    diff_populate_filespec(dst, DIFF_POPULATE_SIZE_ONLY))
 		return 0;
 
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
diff --git a/diffcore.h b/diffcore.h
index c876dac..a186d7c 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -55,7 +55,8 @@ extern void free_filespec(struct diff_filespec *);
 extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
-extern int diff_populate_filespec(struct diff_filespec *, int);
+#define DIFF_POPULATE_SIZE_ONLY 1
+extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
 extern int diff_filespec_is_binary(struct diff_filespec *);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2014-05-29 12:57     ` [PATCH 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
  2014-05-29 12:57     ` [PATCH 3/4] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
@ 2014-05-29 12:57     ` Nguyễn Thái Ngọc Duy
  2014-06-19 12:27       ` Thomas Braun
  2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  3 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-05-29 12:57 UTC (permalink / raw)
  To: git; +Cc: worley, Junio C Hamano, Nguyễn Thái Ngọc Duy

Too large files may lead to failure to allocate memory. If it happens
here, it could impact quite a few commands that involve
diff. Moreover, too large files are inefficient to compare anyway (and
most likely non-text), so mark them binary and skip looking at their
content.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff.c           | 26 ++++++++++++++++++--------
 diffcore.h       |  1 +
 t/t1050-large.sh |  4 ++++
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 54281cb..0a2f865 100644
--- a/diff.c
+++ b/diff.c
@@ -2185,8 +2185,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
 			one->is_binary = one->driver->binary;
 		else {
 			if (!one->data && DIFF_FILE_VALID(one))
-				diff_populate_filespec(one, 0);
-			if (one->data)
+				diff_populate_filespec(one, DIFF_POPULATE_IS_BINARY);
+			if (one->is_binary == -1 && one->data)
 				one->is_binary = buffer_is_binary(one->data,
 						one->size);
 			if (one->is_binary == -1)
@@ -2721,6 +2721,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		}
 		if (size_only)
 			return 0;
+		if ((flags & DIFF_POPULATE_IS_BINARY) &&
+		    s->size > big_file_threshold && s->is_binary == -1) {
+			s->is_binary = 1;
+			return 0;
+		}
 		fd = open(s->path, O_RDONLY);
 		if (fd < 0)
 			goto err_empty;
@@ -2742,16 +2747,21 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 	}
 	else {
 		enum object_type type;
-		if (size_only) {
+		if (size_only || (flags & DIFF_POPULATE_IS_BINARY)) {
 			type = sha1_object_info(s->sha1, &s->size);
 			if (type < 0)
 				die("unable to read %s", sha1_to_hex(s->sha1));
-		} else {
-			s->data = read_sha1_file(s->sha1, &type, &s->size);
-			if (!s->data)
-				die("unable to read %s", sha1_to_hex(s->sha1));
-			s->should_free = 1;
+			if (size_only)
+				return 0;
+			if (s->size > big_file_threshold && s->is_binary == -1) {
+				s->is_binary = 1;
+				return 0;
+			}
 		}
+		s->data = read_sha1_file(s->sha1, &type, &s->size);
+		if (!s->data)
+			die("unable to read %s", sha1_to_hex(s->sha1));
+		s->should_free = 1;
 	}
 	return 0;
 }
diff --git a/diffcore.h b/diffcore.h
index a186d7c..e7760d9 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -56,6 +56,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
 #define DIFF_POPULATE_SIZE_ONLY 1
+#define DIFF_POPULATE_IS_BINARY 2
 extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 333909b..4d922e2 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -112,6 +112,10 @@ test_expect_success 'diff --raw' '
 	git diff --raw HEAD^
 '
 
+test_expect_success 'diff --stat' '
+	git diff --stat HEAD^ HEAD
+'
+
 test_expect_success 'hash-object' '
 	git hash-object large1
 '
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Git chokes on large file
  2014-05-28 19:05         ` David Lang
@ 2014-05-29 19:12           ` Dale R. Worley
  0 siblings, 0 replies; 49+ messages in thread
From: Dale R. Worley @ 2014-05-29 19:12 UTC (permalink / raw)
  To: git

> From: David Lang <david@lang.hm>

> well, as others noted, the problem is actually caused by doing the diffs, and 
> that is something that is a very common thing to do with source code.

To some degree, my attitude comes from When I Was A Boy, when you got
16k for both your bytecode and your data, so you never kept more than
one line of a file in memory unless you had to.

Regardless of that, the problem with "git fsck" is not due to doing
diffs, and "git commit" by default does diffs even if you don't ask
for them, so the observed problems cannot be subsumed under the label
"you asked for a diff of a file that can't be diffed".

> And I would assume that files of several MB would be able to fit in
> memory (again, this was assumed to be for development, and compilers
> take a lot of ram to run, so having enough ram to hold any
> individual source file while the compiler is _not_ using ram doesn't
> seem likely to be a problem)

At least the first versions of GCC only kept one function (and the
global symbol table) in memory at once, so you could compile a source
file that was larger than the available memory.

In any case, if Git is no longer limited to handling source files, it
needs to be updated so it can handle large files.

Dale

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-05-29 12:57     ` [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-06-19 12:27       ` Thomas Braun
  2014-06-23 12:18         ` Duy Nguyen
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Braun @ 2014-06-19 12:27 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley, Junio C Hamano

Am Donnerstag, den 29.05.2014, 19:57 +0700 schrieb Nguyễn Thái Ngọc Duy:

Hi,

sorry for chiming in so late.

I've just played around with patch 3 and 4 of that series.
And I like it very much as I work often with large files so any further 
enhancement in that area is really nice.

(see comments below)

> Too large files may lead to failure to allocate memory. If it happens
> here, it could impact quite a few commands that involve
> diff. Moreover, too large files are inefficient to compare anyway (and
> most likely non-text), so mark them binary and skip looking at their
> content.
> 
> Noticed-by: Dale R. Worley <worley@alum.mit.edu>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  diff.c           | 26 ++++++++++++++++++--------
>  diffcore.h       |  1 +
>  t/t1050-large.sh |  4 ++++
>  3 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/diff.c b/diff.c
> index 54281cb..0a2f865 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2185,8 +2185,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
>  			one->is_binary = one->driver->binary;
>  		else {
>  			if (!one->data && DIFF_FILE_VALID(one))
> -				diff_populate_filespec(one, 0);
> -			if (one->data)
> +				diff_populate_filespec(one, DIFF_POPULATE_IS_BINARY);
> +			if (one->is_binary == -1 && one->data)
>  				one->is_binary = buffer_is_binary(one->data,
>  						one->size);
>  			if (one->is_binary == -1)
> @@ -2721,6 +2721,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>  		}
>  		if (size_only)
>  			return 0;
> +		if ((flags & DIFF_POPULATE_IS_BINARY) &&
> +		    s->size > big_file_threshold && s->is_binary == -1) {
> +			s->is_binary = 1;
> +			return 0;
> +		}

Why do you check for s->is_binary == -1 here? I think it does not matter
what s_is_binary says here.

>  		fd = open(s->path, O_RDONLY);
>  		if (fd < 0)
>  			goto err_empty;
> @@ -2742,16 +2747,21 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>  	}
>  	else {
>  		enum object_type type;
> -		if (size_only) {
> +		if (size_only || (flags & DIFF_POPULATE_IS_BINARY)) {
>  			type = sha1_object_info(s->sha1, &s->size);
>  			if (type < 0)
>  				die("unable to read %s", sha1_to_hex(s->sha1));
> -		} else {
> -			s->data = read_sha1_file(s->sha1, &type, &s->size);
> -			if (!s->data)
> -				die("unable to read %s", sha1_to_hex(s->sha1));
> -			s->should_free = 1;
> +			if (size_only)
> +				return 0;
> +			if (s->size > big_file_threshold && s->is_binary == -1) {
same as above.
> +				s->is_binary = 1;
> +				return 0;
> +			}
>  		}
> +		s->data = read_sha1_file(s->sha1, &type, &s->size);
> +		if (!s->data)
> +			die("unable to read %s", sha1_to_hex(s->sha1));
> +		s->should_free = 1;
>  	}
>  	return 0;
>  }
> diff --git a/diffcore.h b/diffcore.h
> index a186d7c..e7760d9 100644
> --- a/diffcore.h
> +++ b/diffcore.h
> @@ -56,6 +56,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
>  			  int, unsigned short);
>  
>  #define DIFF_POPULATE_SIZE_ONLY 1
> +#define DIFF_POPULATE_IS_BINARY 2
>  extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
>  extern void diff_free_filespec_data(struct diff_filespec *);
>  extern void diff_free_filespec_blob(struct diff_filespec *);
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 333909b..4d922e2 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -112,6 +112,10 @@ test_expect_success 'diff --raw' '
>  	git diff --raw HEAD^
>  '
>  
> +test_expect_success 'diff --stat' '
> +	git diff --stat HEAD^ HEAD
> +'
> +
>  test_expect_success 'hash-object' '
>  	git hash-object large1
>  '

I would also add a note to the documentation e. g:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9f467d3..7a2f27d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -499,7 +499,8 @@ core.bigFileThreshold::
        Files larger than this size are stored deflated, without
        attempting delta compression.  Storing large files without
        delta compression avoids excessive memory usage, at the
-       slight expense of increased disk usage.
+       slight expense of increased disk usage.  Additionally files
+       larger than this size are allways treated as binary.
 +
 Default is 512 MiB on all platforms.  This should be reasonable
 for most projects as source code and other text files can still

Thomas

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-06-19 12:27       ` Thomas Braun
@ 2014-06-23 12:18         ` Duy Nguyen
  2014-06-23 19:21           ` Thomas Braun
  0 siblings, 1 reply; 49+ messages in thread
From: Duy Nguyen @ 2014-06-23 12:18 UTC (permalink / raw)
  To: Thomas Braun; +Cc: Git Mailing List, Dale R. Worley, Junio C Hamano

On Thu, Jun 19, 2014 at 7:27 PM, Thomas Braun
<thomas.braun@virtuell-zuhause.de> wrote:
>> @@ -2721,6 +2721,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>>               }
>>               if (size_only)
>>                       return 0;
>> +             if ((flags & DIFF_POPULATE_IS_BINARY) &&
>> +                 s->size > big_file_threshold && s->is_binary == -1) {
>> +                     s->is_binary = 1;
>> +                     return 0;
>> +             }
>
> Why do you check for s->is_binary == -1 here? I think it does not matter
> what s_is_binary says here.

If some .gitattributes to mark one file not-binary, we should respect
that, I think. Same for below too.

> I would also add a note to the documentation e. g:
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 9f467d3..7a2f27d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -499,7 +499,8 @@ core.bigFileThreshold::
>         Files larger than this size are stored deflated, without
>         attempting delta compression.  Storing large files without
>         delta compression avoids excessive memory usage, at the
> -       slight expense of increased disk usage.
> +       slight expense of increased disk usage.  Additionally files
> +       larger than this size are allways treated as binary.
>  +
>  Default is 512 MiB on all platforms.  This should be reasonable
>  for most projects as source code and other text files can still

Thanks. Will do. Sorry a little busy these days and could not reply earlier.
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-06-23 12:18         ` Duy Nguyen
@ 2014-06-23 19:21           ` Thomas Braun
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Braun @ 2014-06-23 19:21 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Dale R. Worley, Junio C Hamano

Am 23.06.2014 14:18, schrieb Duy Nguyen:
> On Thu, Jun 19, 2014 at 7:27 PM, Thomas Braun
> <thomas.braun@virtuell-zuhause.de> wrote:
>>> @@ -2721,6 +2721,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
>>>               }
>>>               if (size_only)
>>>                       return 0;
>>> +             if ((flags & DIFF_POPULATE_IS_BINARY) &&
>>> +                 s->size > big_file_threshold && s->is_binary == -1) {
>>> +                     s->is_binary = 1;
>>> +                     return 0;
>>> +             }
>>
>> Why do you check for s->is_binary == -1 here? I think it does not matter
>> what s_is_binary says here.
> 
> If some .gitattributes to mark one file not-binary, we should respect
> that, I think. Same for below too.

Reading diffcore.h I thought is_binary being -1 means it is not yet
decided if it is binary or text.
Respecting .gitattributes is obviously a good thing :)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die()
  2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
                       ` (2 preceding siblings ...)
  2014-05-29 12:57     ` [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-06-24 11:45     ` Nguyễn Thái Ngọc Duy
  2014-06-24 11:45       ` [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
                         ` (3 more replies)
  3 siblings, 4 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-06-24 11:45 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 git-compat-util.h |  1 +
 wrapper.c         | 68 ++++++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index b6f03b3..7eee0f2 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -540,6 +540,7 @@ extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 extern char *xstrdup(const char *str);
 extern void *xmalloc(size_t size);
 extern void *xmallocz(size_t size);
+extern void *xmallocz_gentle(size_t size);
 extern void *xmemdupz(const void *data, size_t len);
 extern char *xstrndup(const char *str, size_t len);
 extern void *xrealloc(void *ptr, size_t size);
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..06601c4 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -9,16 +9,23 @@ static void do_nothing(size_t size)
 
 static void (*try_to_free_routine)(size_t size) = do_nothing;
 
-static void memory_limit_check(size_t size)
+static int memory_limit_check(size_t size, int gentle)
 {
 	static int limit = -1;
 	if (limit == -1) {
 		const char *env = getenv("GIT_ALLOC_LIMIT");
 		limit = env ? atoi(env) * 1024 : 0;
 	}
-	if (limit && size > limit)
-		die("attempting to allocate %"PRIuMAX" over limit %d",
-		    (intmax_t)size, limit);
+	if (limit && size > limit) {
+		if (gentle) {
+			error("attempting to allocate %"PRIuMAX" over limit %d",
+			      (intmax_t)size, limit);
+			return -1;
+		} else
+			die("attempting to allocate %"PRIuMAX" over limit %d",
+			    (intmax_t)size, limit);
+	}
+	return 0;
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
@@ -42,11 +49,12 @@ char *xstrdup(const char *str)
 	return ret;
 }
 
-void *xmalloc(size_t size)
+static void *do_xmalloc(size_t size, int gentle)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	if (memory_limit_check(size, gentle))
+		return NULL;
 	ret = malloc(size);
 	if (!ret && !size)
 		ret = malloc(1);
@@ -55,9 +63,16 @@ void *xmalloc(size_t size)
 		ret = malloc(size);
 		if (!ret && !size)
 			ret = malloc(1);
-		if (!ret)
-			die("Out of memory, malloc failed (tried to allocate %lu bytes)",
-			    (unsigned long)size);
+		if (!ret) {
+			if (!gentle)
+				die("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				    (unsigned long)size);
+			else {
+				error("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				      (unsigned long)size);
+				return NULL;
+			}
+		}
 	}
 #ifdef XMALLOC_POISON
 	memset(ret, 0xA5, size);
@@ -65,16 +80,37 @@ void *xmalloc(size_t size)
 	return ret;
 }
 
-void *xmallocz(size_t size)
+void *xmalloc(size_t size)
+{
+	return do_xmalloc(size, 0);
+}
+
+static void *do_xmallocz(size_t size, int gentle)
 {
 	void *ret;
-	if (unsigned_add_overflows(size, 1))
-		die("Data too large to fit into virtual memory space.");
-	ret = xmalloc(size + 1);
-	((char*)ret)[size] = 0;
+	if (unsigned_add_overflows(size, 1)) {
+		if (gentle) {
+			error("Data too large to fit into virtual memory space.");
+			return NULL;
+		} else
+			die("Data too large to fit into virtual memory space.");
+	}
+	ret = do_xmalloc(size + 1, gentle);
+	if (ret)
+		((char*)ret)[size] = 0;
 	return ret;
 }
 
+void *xmallocz(size_t size)
+{
+	return do_xmallocz(size, 0);
+}
+
+void *xmallocz_gentle(size_t size)
+{
+	return do_xmallocz(size, 1);
+}
+
 /*
  * xmemdupz() allocates (len + 1) bytes of memory, duplicates "len" bytes of
  * "data" to the allocated memory, zero terminates the allocated memory,
@@ -96,7 +132,7 @@ void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	memory_limit_check(size, 0);
 	ret = realloc(ptr, size);
 	if (!ret && !size)
 		ret = realloc(ptr, 1);
@@ -115,7 +151,7 @@ void *xcalloc(size_t nmemb, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size * nmemb);
+	memory_limit_check(size * nmemb, 0);
 	ret = calloc(nmemb, size);
 	if (!ret && (!nmemb || !size))
 		ret = calloc(1, 1);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry
  2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
@ 2014-06-24 11:45       ` Nguyễn Thái Ngọc Duy
  2014-06-26 18:09         ` Junio C Hamano
  2014-06-24 11:45       ` [PATCH v2 3/4] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-06-24 11:45 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

fsck is a tool that error() is more preferred than die(), but many
functions embed die() inside beyond fsck's control.
unpack_compressed_entry()'s using xmallocz is such a function,
triggered from verify_packfile() -> unpack_entry(). Make it use
xmallocz_gentle() instead.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 sha1_file.c      | 4 +++-
 t/t1050-large.sh | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 34d527f..eb69c78 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1925,7 +1925,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	git_zstream stream;
 	unsigned char *buffer, *in;
 
-	buffer = xmallocz(size);
+	buffer = xmallocz_gentle(size);
+	if (!buffer)
+		return NULL;
 	memset(&stream, 0, sizeof(stream));
 	stream.next_out = buffer;
 	stream.avail_out = size + 1;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..5642f84 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -163,4 +163,10 @@ test_expect_success 'zip achiving, deflate' '
 	git archive --format=zip HEAD >/dev/null
 '
 
+test_expect_success 'fsck' '
+	test_must_fail git fsck 2>err &&
+	n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
+	test "$n" -gt 1
+'
+
 test_done
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 3/4] diff.c: allow to pass more flags to diff_populate_filespec
  2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2014-06-24 11:45       ` [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
@ 2014-06-24 11:45       ` Nguyễn Thái Ngọc Duy
  2014-06-24 11:45       ` [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
  3 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-06-24 11:45 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff.c            | 13 +++++++------
 diffcore-rename.c |  6 ++++--
 diffcore.h        |  3 ++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/diff.c b/diff.c
index bba9a55..a489540 100644
--- a/diff.c
+++ b/diff.c
@@ -373,7 +373,7 @@ static unsigned long diff_filespec_size(struct diff_filespec *one)
 {
 	if (!DIFF_FILE_VALID(one))
 		return 0;
-	diff_populate_filespec(one, 1);
+	diff_populate_filespec(one, DIFF_POPULATE_SIZE_ONLY);
 	return one->size;
 }
 
@@ -1907,11 +1907,11 @@ static void show_dirstat(struct diff_options *options)
 			diff_free_filespec_data(p->one);
 			diff_free_filespec_data(p->two);
 		} else if (DIFF_FILE_VALID(p->one)) {
-			diff_populate_filespec(p->one, 1);
+			diff_populate_filespec(p->one, DIFF_POPULATE_SIZE_ONLY);
 			copied = added = 0;
 			diff_free_filespec_data(p->one);
 		} else if (DIFF_FILE_VALID(p->two)) {
-			diff_populate_filespec(p->two, 1);
+			diff_populate_filespec(p->two, DIFF_POPULATE_SIZE_ONLY);
 			copied = 0;
 			added = p->two->size;
 			diff_free_filespec_data(p->two);
@@ -2664,8 +2664,9 @@ static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
  * grab the data for the blob (or file) for our own in-core comparison.
  * diff_filespec has data and size fields for this purpose.
  */
-int diff_populate_filespec(struct diff_filespec *s, int size_only)
+int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 {
+	int size_only = flags & DIFF_POPULATE_SIZE_ONLY;
 	int err = 0;
 	/*
 	 * demote FAIL to WARN to allow inspecting the situation
@@ -4693,8 +4694,8 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    !DIFF_FILE_VALID(p->two) ||
 	    (p->one->sha1_valid && p->two->sha1_valid) ||
 	    (p->one->mode != p->two->mode) ||
-	    diff_populate_filespec(p->one, 1) ||
-	    diff_populate_filespec(p->two, 1) ||
+	    diff_populate_filespec(p->one, DIFF_POPULATE_SIZE_ONLY) ||
+	    diff_populate_filespec(p->two, DIFF_POPULATE_SIZE_ONLY) ||
 	    (p->one->size != p->two->size) ||
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 749a35d..8437917 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -147,9 +147,11 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * is a possible size - we really should have a flag to
 	 * say whether the size is valid or not!)
 	 */
-	if (!src->cnt_data && diff_populate_filespec(src, 1))
+	if (!src->cnt_data &&
+	    diff_populate_filespec(src, DIFF_POPULATE_SIZE_ONLY))
 		return 0;
-	if (!dst->cnt_data && diff_populate_filespec(dst, 1))
+	if (!dst->cnt_data &&
+	    diff_populate_filespec(dst, DIFF_POPULATE_SIZE_ONLY))
 		return 0;
 
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
diff --git a/diffcore.h b/diffcore.h
index c876dac..a186d7c 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -55,7 +55,8 @@ extern void free_filespec(struct diff_filespec *);
 extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
-extern int diff_populate_filespec(struct diff_filespec *, int);
+#define DIFF_POPULATE_SIZE_ONLY 1
+extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
 extern int diff_filespec_is_binary(struct diff_filespec *);
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2014-06-24 11:45       ` [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
  2014-06-24 11:45       ` [PATCH v2 3/4] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
@ 2014-06-24 11:45       ` Nguyễn Thái Ngọc Duy
  2014-06-26 17:55         ` Junio C Hamano
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
  3 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-06-24 11:45 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Too large files may lead to failure to allocate memory. If it happens
here, it could impact quite a few commands that involve
diff. Moreover, too large files are inefficient to compare anyway (and
most likely non-text), so mark them binary and skip looking at their
content.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt        |  3 ++-
 Documentation/gitattributes.txt |  4 ++--
 diff.c                          | 26 ++++++++++++++++++--------
 diffcore.h                      |  1 +
 t/t1050-large.sh                |  4 ++++
 5 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9f467d3..a865850 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -499,7 +499,8 @@ core.bigFileThreshold::
 	Files larger than this size are stored deflated, without
 	attempting delta compression.  Storing large files without
 	delta compression avoids excessive memory usage, at the
-	slight expense of increased disk usage.
+	slight expense of increased disk usage. Additionally files
+	larger than this size are allways treated as binary.
 +
 Default is 512 MiB on all platforms.  This should be reasonable
 for most projects as source code and other text files can still
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 643c1ba..9b45bda 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -440,8 +440,8 @@ Unspecified::
 
 	A path to which the `diff` attribute is unspecified
 	first gets its contents inspected, and if it looks like
-	text, it is treated as text.  Otherwise it would
-	generate `Binary files differ`.
+	text and is smaller than core.bigFileThreshold, it is treated
+	as text. Otherwise it would generate `Binary files differ`.
 
 String::
 
diff --git a/diff.c b/diff.c
index a489540..7a977aa 100644
--- a/diff.c
+++ b/diff.c
@@ -2185,8 +2185,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
 			one->is_binary = one->driver->binary;
 		else {
 			if (!one->data && DIFF_FILE_VALID(one))
-				diff_populate_filespec(one, 0);
-			if (one->data)
+				diff_populate_filespec(one, DIFF_POPULATE_IS_BINARY);
+			if (one->is_binary == -1 && one->data)
 				one->is_binary = buffer_is_binary(one->data,
 						one->size);
 			if (one->is_binary == -1)
@@ -2721,6 +2721,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		}
 		if (size_only)
 			return 0;
+		if ((flags & DIFF_POPULATE_IS_BINARY) &&
+		    s->size > big_file_threshold && s->is_binary == -1) {
+			s->is_binary = 1;
+			return 0;
+		}
 		fd = open(s->path, O_RDONLY);
 		if (fd < 0)
 			goto err_empty;
@@ -2742,16 +2747,21 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 	}
 	else {
 		enum object_type type;
-		if (size_only) {
+		if (size_only || (flags & DIFF_POPULATE_IS_BINARY)) {
 			type = sha1_object_info(s->sha1, &s->size);
 			if (type < 0)
 				die("unable to read %s", sha1_to_hex(s->sha1));
-		} else {
-			s->data = read_sha1_file(s->sha1, &type, &s->size);
-			if (!s->data)
-				die("unable to read %s", sha1_to_hex(s->sha1));
-			s->should_free = 1;
+			if (size_only)
+				return 0;
+			if (s->size > big_file_threshold && s->is_binary == -1) {
+				s->is_binary = 1;
+				return 0;
+			}
 		}
+		s->data = read_sha1_file(s->sha1, &type, &s->size);
+		if (!s->data)
+			die("unable to read %s", sha1_to_hex(s->sha1));
+		s->should_free = 1;
 	}
 	return 0;
 }
diff --git a/diffcore.h b/diffcore.h
index a186d7c..e7760d9 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -56,6 +56,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
 #define DIFF_POPULATE_SIZE_ONLY 1
+#define DIFF_POPULATE_IS_BINARY 2
 extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 5642f84..00d2f33 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -112,6 +112,10 @@ test_expect_success 'diff --raw' '
 	git diff --raw HEAD^
 '
 
+test_expect_success 'diff --stat' '
+	git diff --stat HEAD^ HEAD
+'
+
 test_expect_success 'hash-object' '
 	git hash-object large1
 '
-- 
1.9.1.346.ga2b5940

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-06-24 11:45       ` [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-06-26 17:55         ` Junio C Hamano
  2014-06-27 18:56           ` Thomas Braun
  0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2014-06-26 17:55 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Too large files may lead to failure to allocate memory. If it happens
> here, it could impact quite a few commands that involve
> diff. Moreover, too large files are inefficient to compare anyway (and
> most likely non-text), so mark them binary and skip looking at their
> content.
> ...
> diff --git a/diff.c b/diff.c
> index a489540..7a977aa 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2185,8 +2185,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
>  			one->is_binary = one->driver->binary;
>  		else {
>  			if (!one->data && DIFF_FILE_VALID(one))
> -				diff_populate_filespec(one, 0);
> -			if (one->data)
> +				diff_populate_filespec(one, DIFF_POPULATE_IS_BINARY);
> +			if (one->is_binary == -1 && one->data)
>  				one->is_binary = buffer_is_binary(one->data,
>  						one->size);
>  			if (one->is_binary == -1)

The name is misleading and forced me to read it twice before I
realized that this is "populating the is-binary bit".  It might make
it a bit better if you renamed it to DIFF_POPULATE_IS_BINARY_BIT or
perhaps DIFF_POPULATE_CHECK_BINARY or something.  For consistency,
the other bit may want to be also renamed from SIZE_ONLY to either

 (1) CHECK_SIZE_ONLY

 (2) One bit for CHECK_SIZE, another for NO_CONTENTS, and optionally
     make SIZE_ONLY the union of two

I do not have strong preference either way; the latter may be more
logical in that "not loading contents" and "check size" are sort of
orthogonal in that you can later choose to check, without loading
contents, only the binary-ness without checking size, but no calles
that passes a non-zero flag to the populate-filespec function will
want to slurp in the contents in practice, so in that sense we could
declare that the NO_CONENTS bit is implied.

But more importantly, would this patch actually help?  For one
thing, this wouldn't (and shouldn't) help if the user wants --binary
diff out of us anyway, I suspect, but I wonder what the following
codepath in the builtin_diff() function would do:

		...
	} else if (!DIFF_OPT_TST(o, TEXT) &&
	    ( (!textconv_one && diff_filespec_is_binary(one)) ||
	      (!textconv_two && diff_filespec_is_binary(two)) )) {
		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
			die("unable to read files to diff");
		/* Quite common confusing case */
		if (mf1.size == mf2.size &&
		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
			if (must_show_header)
				fprintf(o->file, "%s", header.buf);
			goto free_ab_and_return;
		}
		fprintf(o->file, "%s", header.buf);
		strbuf_reset(&header);
		if (DIFF_OPT_TST(o, BINARY))
			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
		else
			fprintf(o->file, "%sBinary files %s and %s differ\n",
				line_prefix, lbl[0], lbl[1]);
		o->found_changes = 1;
	} else {
		...

If we weren't told with --text/-a to force textual output, and
at least one of the sides is marked as binary (and this patch marks
a large blob as binary unless attributes says otherwise), we still
call fill_mmfile() on them to slurp the contents to be compared, no?

And before you get to the DIFF_OPT_TST(o, BINARY), you memcmp(3) to
check if the sides are identical, so...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry
  2014-06-24 11:45       ` [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
@ 2014-06-26 18:09         ` Junio C Hamano
  2014-06-29  0:40           ` Duy Nguyen
  0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2014-06-26 18:09 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> fsck is a tool that error() is more preferred than die(), but many

"more preferred" without justifying why it is "more preferred" is
not quite a justification, is it?  Also, an object failing to load
in-core is not a missing object, so if your aim is to let "fsck"
diagnose a too-large-to-load object as missing and let it continue,
I do not know if it is "more preferred" in the first place.  Adding
a "too large--cannot check" bin of objects may be needed for it to
be useful.  Also, we might need to give at the end "oh by the way,
because we couldn't read some objects to even determine its type,
the unreachable report from this fsck run is totally useless."

The log message tries to justify that this may be a good thing for
"fsck", but the patch actually tries to change the behaviour of all
code paths that try to load an object in-core without considering
the ramifications of such a change.  I _think_ all callers should be
prepared to receive NULL when we encounter a corrupt object (and
otherwise we should fix them), but it is unclear how much audit of
the callers (if any) was done to prepare this change.

Not very excited X-<.

> functions embed die() inside beyond fsck's control.
> unpack_compressed_entry()'s using xmallocz is such a function,
> triggered from verify_packfile() -> unpack_entry(). Make it use
> xmallocz_gentle() instead.
>
> Noticed-by: Dale R. Worley <worley@alum.mit.edu>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  sha1_file.c      | 4 +++-
>  t/t1050-large.sh | 6 ++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 34d527f..eb69c78 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1925,7 +1925,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
>  	git_zstream stream;
>  	unsigned char *buffer, *in;
>  
> -	buffer = xmallocz(size);
> +	buffer = xmallocz_gentle(size);
> +	if (!buffer)
> +		return NULL;
>  	memset(&stream, 0, sizeof(stream));
>  	stream.next_out = buffer;
>  	stream.avail_out = size + 1;
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index aea4936..5642f84 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -163,4 +163,10 @@ test_expect_success 'zip achiving, deflate' '
>  	git archive --format=zip HEAD >/dev/null
>  '
>  
> +test_expect_success 'fsck' '
> +	test_must_fail git fsck 2>err &&
> +	n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
> +	test "$n" -gt 1
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-06-26 17:55         ` Junio C Hamano
@ 2014-06-27 18:56           ` Thomas Braun
  2014-06-29  1:11             ` Duy Nguyen
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Braun @ 2014-06-27 18:56 UTC (permalink / raw)
  To: Junio C Hamano, Nguyễn Thái Ngọc Duy; +Cc: git, worley

Am 26.06.2014 19:55, schrieb Junio C Hamano:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
> 
>> Too large files may lead to failure to allocate memory. If it happens
>> here, it could impact quite a few commands that involve
>> diff. Moreover, too large files are inefficient to compare anyway (and
>> most likely non-text), so mark them binary and skip looking at their
>> content.
>> ...
>> diff --git a/diff.c b/diff.c
>> index a489540..7a977aa 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -2185,8 +2185,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
>>  			one->is_binary = one->driver->binary;
>>  		else {
>>  			if (!one->data && DIFF_FILE_VALID(one))
>> -				diff_populate_filespec(one, 0);
>> -			if (one->data)
>> +				diff_populate_filespec(one, DIFF_POPULATE_IS_BINARY);
>> +			if (one->is_binary == -1 && one->data)
>>  				one->is_binary = buffer_is_binary(one->data,
>>  						one->size);
>>  			if (one->is_binary == -1)
> 
> The name is misleading and forced me to read it twice before I
> realized that this is "populating the is-binary bit".  It might make
> it a bit better if you renamed it to DIFF_POPULATE_IS_BINARY_BIT or
> perhaps DIFF_POPULATE_CHECK_BINARY or something.  For consistency,
> the other bit may want to be also renamed from SIZE_ONLY to either
> 
>  (1) CHECK_SIZE_ONLY
> 
>  (2) One bit for CHECK_SIZE, another for NO_CONTENTS, and optionally
>      make SIZE_ONLY the union of two
> 
> I do not have strong preference either way; the latter may be more
> logical in that "not loading contents" and "check size" are sort of
> orthogonal in that you can later choose to check, without loading
> contents, only the binary-ness without checking size, but no calles
> that passes a non-zero flag to the populate-filespec function will
> want to slurp in the contents in practice, so in that sense we could
> declare that the NO_CONENTS bit is implied.
> 
> But more importantly, would this patch actually help?  For one
> thing, this wouldn't (and shouldn't) help if the user wants --binary
> diff out of us anyway, I suspect, but I wonder what the following
> codepath in the builtin_diff() function would do:
> 
> 		...
> 	} else if (!DIFF_OPT_TST(o, TEXT) &&
> 	    ( (!textconv_one && diff_filespec_is_binary(one)) ||
> 	      (!textconv_two && diff_filespec_is_binary(two)) )) {
> 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
> 			die("unable to read files to diff");
> 		/* Quite common confusing case */
> 		if (mf1.size == mf2.size &&
> 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
> 			if (must_show_header)
> 				fprintf(o->file, "%s", header.buf);
> 			goto free_ab_and_return;
> 		}
> 		fprintf(o->file, "%s", header.buf);
> 		strbuf_reset(&header);
> 		if (DIFF_OPT_TST(o, BINARY))
> 			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
> 		else
> 			fprintf(o->file, "%sBinary files %s and %s differ\n",
> 				line_prefix, lbl[0], lbl[1]);
> 		o->found_changes = 1;
> 	} else {
> 		...
> 
> If we weren't told with --text/-a to force textual output, and
> at least one of the sides is marked as binary (and this patch marks
> a large blob as binary unless attributes says otherwise), we still
> call fill_mmfile() on them to slurp the contents to be compared, no?
> 
> And before you get to the DIFF_OPT_TST(o, BINARY), you memcmp(3) to
> check if the sides are identical, so...

Good point. So how about an additional change roughly sketched as

@@ -2223,6 +2223,14 @@ struct userdiff_driver *get_textconv(struct
diff_filespec *one)
 	return userdiff_get_textconv(one->driver);
 }

+/* read the files in small chunks into memory and compare them */
+static int filecmp_chunked(struct diff_filespec *one,
+	struct diff_filespec *two)
+{
+	// TODO add implementation
+	return 0;
+}
+
 static void builtin_diff(const char *name_a,
 			 const char *name_b,
 			 struct diff_filespec *one,
@@ -2325,19 +2333,26 @@ static void builtin_diff(const char *name_a,
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
 	    ( (!textconv_one && diff_filespec_is_binary(one)) ||
 	      (!textconv_two && diff_filespec_is_binary(two)) )) {
-		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2,two)< 0)
-			die("unable to read files to diff");
+
+		unsigned long size1 = diff_filespec_size(one);
+		unsigned long size2 = diff_filespec_size(two);
+
+		if (size1 == 0 || size2 == 0)
+			die("unable to retrieve file sizes for diff");
 		/* Quite common confusing case */
-		if (mf1.size == mf2.size &&
-		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
+		if (size1 == size2 && !filecmp_chunked(one,two)) {
 			if (must_show_header)
 				fprintf(o->file, "%s", header.buf);
 			goto free_ab_and_return;
 		}
 		fprintf(o->file, "%s", header.buf);
 		strbuf_reset(&header);
-		if (DIFF_OPT_TST(o, BINARY))
+		if (DIFF_OPT_TST(o, BINARY)) {
+			if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
+				die("unable to read files to diff");
 			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
+		}
 		else
 			fprintf(o->file, "%sBinary files %s and %s differ\n",
 				line_prefix, lbl[0], lbl[1]);

Then the default diff case, no BINARY flag, would not read both files into memory.
filecmp_chunked will be slower than file_mmfile and memcmp, but its whole purpose is to read and compare the files in chunks.
The chunk size can be something like 64MiB.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry
  2014-06-26 18:09         ` Junio C Hamano
@ 2014-06-29  0:40           ` Duy Nguyen
  0 siblings, 0 replies; 49+ messages in thread
From: Duy Nguyen @ 2014-06-29  0:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dale R. Worley

On Fri, Jun 27, 2014 at 1:09 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> fsck is a tool that error() is more preferred than die(), but many
>
> "more preferred" without justifying why it is "more preferred" is
> not quite a justification, is it?  Also, an object failing to load
> in-core is not a missing object, so if your aim is to let "fsck"
> diagnose a too-large-to-load object as missing and let it continue,
> I do not know if it is "more preferred" in the first place.  Adding
> a "too large--cannot check" bin of objects may be needed for it to
> be useful.  Also, we might need to give at the end "oh by the way,
> because we couldn't read some objects to even determine its type,
> the unreachable report from this fsck run is totally useless."

Fair enough. I think avoiding dying in xmalloc() in this code path is
still a good thing. At least "failed to read object %s" is more
informative than simply "Out of memory". The error cascading effect in
fsck is something I think we already have. I'll try to rephrase the
commit message. But if you think this is not a good direction,
dropping it is not so bad.

I'm going to look at xmalloc() in unpack-objects. That's where we
really should not abort because of memory shortage as the user may try
to get as many objects as possible out of the pack.

> The log message tries to justify that this may be a good thing for
> "fsck", but the patch actually tries to change the behaviour of all
> code paths that try to load an object in-core without considering
> the ramifications of such a change.  I _think_ all callers should be
> prepared to receive NULL when we encounter a corrupt object (and
> otherwise we should fix them), but it is unclear how much audit of
> the callers (if any) was done to prepare this change.
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary
  2014-06-27 18:56           ` Thomas Braun
@ 2014-06-29  1:11             ` Duy Nguyen
  0 siblings, 0 replies; 49+ messages in thread
From: Duy Nguyen @ 2014-06-29  1:11 UTC (permalink / raw)
  To: Thomas Braun; +Cc: Junio C Hamano, Git Mailing List, Dale R. Worley

On Sat, Jun 28, 2014 at 1:56 AM, Thomas Braun
<thomas.braun@virtuell-zuhause.de> wrote:
>> The name is misleading and forced me to read it twice before I
>> realized that this is "populating the is-binary bit".  It might make
>> it a bit better if you renamed it to DIFF_POPULATE_IS_BINARY_BIT or
>> perhaps DIFF_POPULATE_CHECK_BINARY or something.  For consistency,
>> the other bit may want to be also renamed from SIZE_ONLY to either
>>
>>  (1) CHECK_SIZE_ONLY
>>
>>  (2) One bit for CHECK_SIZE, another for NO_CONTENTS, and optionally
>>      make SIZE_ONLY the union of two
>>
>> I do not have strong preference either way; the latter may be more
>> logical in that "not loading contents" and "check size" are sort of
>> orthogonal in that you can later choose to check, without loading
>> contents, only the binary-ness without checking size, but no calles
>> that passes a non-zero flag to the populate-filespec function will
>> want to slurp in the contents in practice, so in that sense we could
>> declare that the NO_CONENTS bit is implied.

Will do (and probably go with (1) as I still prefer zero as "good defaults")

>> But more importantly, would this patch actually help?

Well yes as demonstrated by the new test ;-) Unfortunately the scope
of help is limited to --stat.. I should have done more thorough
testing.

>> For one
>> thing, this wouldn't (and shouldn't) help if the user wants --binary
>> diff out of us anyway, I suspect, but I wonder what the following
>> codepath in the builtin_diff() function would do:
>>
>>               ...
>>       } else if (!DIFF_OPT_TST(o, TEXT) &&
>>           ( (!textconv_one && diff_filespec_is_binary(one)) ||
>>             (!textconv_two && diff_filespec_is_binary(two)) )) {
>>               if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
>>                       die("unable to read files to diff");
>>               /* Quite common confusing case */
>>               if (mf1.size == mf2.size &&
>>                   !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
>>                       if (must_show_header)
>>                               fprintf(o->file, "%s", header.buf);
>>                       goto free_ab_and_return;
>>               }
>>               fprintf(o->file, "%s", header.buf);
>>               strbuf_reset(&header);
>>               if (DIFF_OPT_TST(o, BINARY))
>>                       emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
>>               else
>>                       fprintf(o->file, "%sBinary files %s and %s differ\n",
>>                               line_prefix, lbl[0], lbl[1]);
>>               o->found_changes = 1;
>>       } else {
>>               ...
>>
>> If we weren't told with --text/-a to force textual output, and
>> at least one of the sides is marked as binary (and this patch marks
>> a large blob as binary unless attributes says otherwise), we still
>> call fill_mmfile() on them to slurp the contents to be compared, no?
>>
>> And before you get to the DIFF_OPT_TST(o, BINARY), you memcmp(3) to
>> check if the sides are identical, so...
>
> Good point. So how about an additional change roughly sketched as
>
> @@ -2223,6 +2223,14 @@ struct userdiff_driver *get_textconv(struct
> diff_filespec *one)
>         return userdiff_get_textconv(one->driver);
>  }
>
> +/* read the files in small chunks into memory and compare them */
> +static int filecmp_chunked(struct diff_filespec *one,
> +       struct diff_filespec *two)
> +{
> +       // TODO add implementation
> +       return 0;
> +}
> +


We have object streaming interface to do similar like this. In fact
index-pack already does large file memcmp() for hash collision test. I
think I can move some code around and support large file in this code
path..
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 0/6] Large file improvements
  2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
                         ` (2 preceding siblings ...)
  2014-06-24 11:45       ` [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57       ` Nguyễn Thái Ngọc Duy
  2014-08-13 10:57         ` [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die() Nguyễn Thái Ngọc Duy
                           ` (6 more replies)
  3 siblings, 7 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Since v2:

 - reword the fsck patch to center around unpack_compressed_entry instead
 - make sure unpack-objects survive out of memory error because of large files
 - rename diff_filespec_population flags
 - make "git diff <tree> <tree>" work on large files

Nguyễn Thái Ngọc Duy (6):
  wrapper.c: introduce gentle xmalloc(z) that does not die()
  sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  unpack-objects: continue when fail to malloc due to large objects
  diff.c: allow to pass more flags to diff_populate_filespec
  diff --stat: mark any file larger than core.bigfilethreshold binary
  diff: shortcut for diff'ing two binary SHA-1 objects

 Documentation/config.txt        |  3 +-
 Documentation/gitattributes.txt |  4 +--
 builtin/unpack-objects.c        | 42 +++++++++++++++++++++++-
 diff.c                          | 52 +++++++++++++++++++++--------
 diffcore-rename.c               |  6 ++--
 diffcore.h                      |  4 ++-
 git-compat-util.h               |  2 ++
 sha1_file.c                     |  4 ++-
 t/t1050-large.sh                | 25 ++++++++++++++
 wrapper.c                       | 73 ++++++++++++++++++++++++++++++++---------
 10 files changed, 177 insertions(+), 38 deletions(-)

-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die()
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57         ` Nguyễn Thái Ngọc Duy
  2014-08-14 16:38           ` Junio C Hamano
  2014-08-13 10:57         ` [PATCH v3 2/6] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 git-compat-util.h |  2 ++
 wrapper.c         | 73 +++++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index f587749..0e541e7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -592,7 +592,9 @@ extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 #endif
 extern char *xstrdup(const char *str);
 extern void *xmalloc(size_t size);
+extern void *xmalloc_gentle(size_t size);
 extern void *xmallocz(size_t size);
+extern void *xmallocz_gentle(size_t size);
 extern void *xmemdupz(const void *data, size_t len);
 extern char *xstrndup(const char *str, size_t len);
 extern void *xrealloc(void *ptr, size_t size);
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..ad0992a 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -9,16 +9,23 @@ static void do_nothing(size_t size)
 
 static void (*try_to_free_routine)(size_t size) = do_nothing;
 
-static void memory_limit_check(size_t size)
+static int memory_limit_check(size_t size, int gentle)
 {
 	static int limit = -1;
 	if (limit == -1) {
 		const char *env = getenv("GIT_ALLOC_LIMIT");
 		limit = env ? atoi(env) * 1024 : 0;
 	}
-	if (limit && size > limit)
-		die("attempting to allocate %"PRIuMAX" over limit %d",
-		    (intmax_t)size, limit);
+	if (limit && size > limit) {
+		if (gentle) {
+			error("attempting to allocate %"PRIuMAX" over limit %d",
+			      (intmax_t)size, limit);
+			return -1;
+		} else
+			die("attempting to allocate %"PRIuMAX" over limit %d",
+			    (intmax_t)size, limit);
+	}
+	return 0;
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
@@ -42,11 +49,12 @@ char *xstrdup(const char *str)
 	return ret;
 }
 
-void *xmalloc(size_t size)
+static void *do_xmalloc(size_t size, int gentle)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	if (memory_limit_check(size, gentle))
+		return NULL;
 	ret = malloc(size);
 	if (!ret && !size)
 		ret = malloc(1);
@@ -55,9 +63,16 @@ void *xmalloc(size_t size)
 		ret = malloc(size);
 		if (!ret && !size)
 			ret = malloc(1);
-		if (!ret)
-			die("Out of memory, malloc failed (tried to allocate %lu bytes)",
-			    (unsigned long)size);
+		if (!ret) {
+			if (!gentle)
+				die("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				    (unsigned long)size);
+			else {
+				error("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				      (unsigned long)size);
+				return NULL;
+			}
+		}
 	}
 #ifdef XMALLOC_POISON
 	memset(ret, 0xA5, size);
@@ -65,16 +80,42 @@ void *xmalloc(size_t size)
 	return ret;
 }
 
-void *xmallocz(size_t size)
+void *xmalloc(size_t size)
+{
+	return do_xmalloc(size, 0);
+}
+
+void *xmalloc_gentle(size_t size)
+{
+	return do_xmalloc(size, 1);
+}
+
+static void *do_xmallocz(size_t size, int gentle)
 {
 	void *ret;
-	if (unsigned_add_overflows(size, 1))
-		die("Data too large to fit into virtual memory space.");
-	ret = xmalloc(size + 1);
-	((char*)ret)[size] = 0;
+	if (unsigned_add_overflows(size, 1)) {
+		if (gentle) {
+			error("Data too large to fit into virtual memory space.");
+			return NULL;
+		} else
+			die("Data too large to fit into virtual memory space.");
+	}
+	ret = do_xmalloc(size + 1, gentle);
+	if (ret)
+		((char*)ret)[size] = 0;
 	return ret;
 }
 
+void *xmallocz(size_t size)
+{
+	return do_xmallocz(size, 0);
+}
+
+void *xmallocz_gentle(size_t size)
+{
+	return do_xmallocz(size, 1);
+}
+
 /*
  * xmemdupz() allocates (len + 1) bytes of memory, duplicates "len" bytes of
  * "data" to the allocated memory, zero terminates the allocated memory,
@@ -96,7 +137,7 @@ void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	memory_limit_check(size, 0);
 	ret = realloc(ptr, size);
 	if (!ret && !size)
 		ret = realloc(ptr, 1);
@@ -115,7 +156,7 @@ void *xcalloc(size_t nmemb, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size * nmemb);
+	memory_limit_check(size * nmemb, 0);
 	ret = calloc(nmemb, size);
 	if (!ret && (!nmemb || !size))
 		ret = calloc(1, 1);
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/6] sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
  2014-08-13 10:57         ` [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die() Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57         ` Nguyễn Thái Ngọc Duy
  2014-08-13 21:13           ` Junio C Hamano
  2014-08-13 10:57         ` [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects Nguyễn Thái Ngọc Duy
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Fewer die() gives better control to the caller, provided that the
caller _can_ handle it. And in unpack_compressed_entry() case, it can,
because unpack_compressed_entry() already returns NULL if it fails to
inflate data.

A side effect from this is fsck continues to run when very large blobs
are present (and do not fit in memory).

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 sha1_file.c      | 4 +++-
 t/t1050-large.sh | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..330862b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1923,7 +1923,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	git_zstream stream;
 	unsigned char *buffer, *in;
 
-	buffer = xmallocz(size);
+	buffer = xmallocz_gentle(size);
+	if (!buffer)
+		return NULL;
 	memset(&stream, 0, sizeof(stream));
 	stream.next_out = buffer;
 	stream.avail_out = size + 1;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..5642f84 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -163,4 +163,10 @@ test_expect_success 'zip achiving, deflate' '
 	git archive --format=zip HEAD >/dev/null
 '
 
+test_expect_success 'fsck' '
+	test_must_fail git fsck 2>err &&
+	n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
+	test "$n" -gt 1
+'
+
 test_done
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
  2014-08-13 10:57         ` [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die() Nguyễn Thái Ngọc Duy
  2014-08-13 10:57         ` [PATCH v3 2/6] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57         ` Nguyễn Thái Ngọc Duy
  2014-08-14 16:58           ` Junio C Hamano
  2014-08-13 10:57         ` [PATCH v3 4/6] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

As a recovery tool, unpack-objects should go on unpacking as many
objects as it can.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/unpack-objects.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 t/t1050-large.sh         |  7 +++++++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 99cde45..8b5c67e 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -88,10 +88,50 @@ static void use(int bytes)
 	consumed_bytes += bytes;
 }
 
+static void inflate_and_throw_away(unsigned long size)
+{
+	git_zstream stream;
+	char buf[8192];
+
+	memset(&stream, 0, sizeof(stream));
+	stream.next_out = (unsigned char *)buf;
+	stream.avail_out = sizeof(buf);
+	stream.next_in = fill(1);
+	stream.avail_in = len;
+	git_inflate_init(&stream);
+
+	for (;;) {
+		int ret = git_inflate(&stream, 0);
+		use(len - stream.avail_in);
+		if (stream.total_out == size && ret == Z_STREAM_END)
+			break;
+		if (ret != Z_OK) {
+			error("inflate returned %d", ret);
+			if (!recover)
+				exit(1);
+			has_errors = 1;
+			break;
+		}
+		stream.next_out = (unsigned char *)buf;
+		stream.avail_out = sizeof(buf);
+		stream.next_in = fill(1);
+		stream.avail_in = len;
+	}
+	git_inflate_end(&stream);
+}
+
 static void *get_data(unsigned long size)
 {
 	git_zstream stream;
-	void *buf = xmalloc(size);
+	void *buf = xmalloc_gentle(size);
+
+	if (!buf) {
+		if (!recover)
+			exit(1);
+		has_errors = 1;
+		inflate_and_throw_away(size);
+		return NULL;
+	}
 
 	memset(&stream, 0, sizeof(stream));
 
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 5642f84..eec2cca 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -169,4 +169,11 @@ test_expect_success 'fsck' '
 	test "$n" -gt 1
 '
 
+test_expect_success 'unpack-objects' '
+	P=`ls .git/objects/pack/*.pack` &&
+	git unpack-objects -n -r <$P 2>err
+	test $? = 1 &&
+	grep "error: attempting to allocate .* over limit" err
+'
+
 test_done
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/6] diff.c: allow to pass more flags to diff_populate_filespec
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
                           ` (2 preceding siblings ...)
  2014-08-13 10:57         ` [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57         ` Nguyễn Thái Ngọc Duy
  2014-08-13 10:57         ` [PATCH v3 5/6] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff.c            | 13 +++++++------
 diffcore-rename.c |  6 ++++--
 diffcore.h        |  3 ++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/diff.c b/diff.c
index 867f034..f4b7421 100644
--- a/diff.c
+++ b/diff.c
@@ -376,7 +376,7 @@ static unsigned long diff_filespec_size(struct diff_filespec *one)
 {
 	if (!DIFF_FILE_VALID(one))
 		return 0;
-	diff_populate_filespec(one, 1);
+	diff_populate_filespec(one, CHECK_SIZE_ONLY);
 	return one->size;
 }
 
@@ -1910,11 +1910,11 @@ static void show_dirstat(struct diff_options *options)
 			diff_free_filespec_data(p->one);
 			diff_free_filespec_data(p->two);
 		} else if (DIFF_FILE_VALID(p->one)) {
-			diff_populate_filespec(p->one, 1);
+			diff_populate_filespec(p->one, CHECK_SIZE_ONLY);
 			copied = added = 0;
 			diff_free_filespec_data(p->one);
 		} else if (DIFF_FILE_VALID(p->two)) {
-			diff_populate_filespec(p->two, 1);
+			diff_populate_filespec(p->two, CHECK_SIZE_ONLY);
 			copied = 0;
 			added = p->two->size;
 			diff_free_filespec_data(p->two);
@@ -2668,8 +2668,9 @@ static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
  * grab the data for the blob (or file) for our own in-core comparison.
  * diff_filespec has data and size fields for this purpose.
  */
-int diff_populate_filespec(struct diff_filespec *s, int size_only)
+int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 {
+	int size_only = flags & CHECK_SIZE_ONLY;
 	int err = 0;
 	/*
 	 * demote FAIL to WARN to allow inspecting the situation
@@ -4688,8 +4689,8 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    !DIFF_FILE_VALID(p->two) ||
 	    (p->one->sha1_valid && p->two->sha1_valid) ||
 	    (p->one->mode != p->two->mode) ||
-	    diff_populate_filespec(p->one, 1) ||
-	    diff_populate_filespec(p->two, 1) ||
+	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
+	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
 	    (p->one->size != p->two->size) ||
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 2e44a37..4e132f1 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -147,9 +147,11 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * is a possible size - we really should have a flag to
 	 * say whether the size is valid or not!)
 	 */
-	if (!src->cnt_data && diff_populate_filespec(src, 1))
+	if (!src->cnt_data &&
+	    diff_populate_filespec(src, CHECK_SIZE_ONLY))
 		return 0;
-	if (!dst->cnt_data && diff_populate_filespec(dst, 1))
+	if (!dst->cnt_data &&
+	    diff_populate_filespec(dst, CHECK_SIZE_ONLY))
 		return 0;
 
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
diff --git a/diffcore.h b/diffcore.h
index c876dac..c80df18 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -55,7 +55,8 @@ extern void free_filespec(struct diff_filespec *);
 extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
-extern int diff_populate_filespec(struct diff_filespec *, int);
+#define CHECK_SIZE_ONLY 1
+extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
 extern int diff_filespec_is_binary(struct diff_filespec *);
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 5/6] diff --stat: mark any file larger than core.bigfilethreshold binary
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
                           ` (3 preceding siblings ...)
  2014-08-13 10:57         ` [PATCH v3 4/6] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57         ` Nguyễn Thái Ngọc Duy
  2014-08-13 19:32           ` Eric Sunshine
  2014-08-13 10:57         ` [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
  6 siblings, 1 reply; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Too large files may lead to failure to allocate memory. If it happens
here, it could impact quite a few commands that involve
diff. Moreover, too large files are inefficient to compare anyway (and
most likely non-text), so mark them binary and skip looking at their
content.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt        |  3 ++-
 Documentation/gitattributes.txt |  4 ++--
 diff.c                          | 26 ++++++++++++++++++--------
 diffcore.h                      |  1 +
 t/t1050-large.sh                |  4 ++++
 5 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c55c22a..53df40e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -499,7 +499,8 @@ core.bigFileThreshold::
 	Files larger than this size are stored deflated, without
 	attempting delta compression.  Storing large files without
 	delta compression avoids excessive memory usage, at the
-	slight expense of increased disk usage.
+	slight expense of increased disk usage. Additionally files
+	larger than this size are allways treated as binary.
 +
 Default is 512 MiB on all platforms.  This should be reasonable
 for most projects as source code and other text files can still
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 643c1ba..9b45bda 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -440,8 +440,8 @@ Unspecified::
 
 	A path to which the `diff` attribute is unspecified
 	first gets its contents inspected, and if it looks like
-	text, it is treated as text.  Otherwise it would
-	generate `Binary files differ`.
+	text and is smaller than core.bigFileThreshold, it is treated
+	as text. Otherwise it would generate `Binary files differ`.
 
 String::
 
diff --git a/diff.c b/diff.c
index f4b7421..d381a6f 100644
--- a/diff.c
+++ b/diff.c
@@ -2188,8 +2188,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
 			one->is_binary = one->driver->binary;
 		else {
 			if (!one->data && DIFF_FILE_VALID(one))
-				diff_populate_filespec(one, 0);
-			if (one->data)
+				diff_populate_filespec(one, CHECK_BINARY);
+			if (one->is_binary == -1 && one->data)
 				one->is_binary = buffer_is_binary(one->data,
 						one->size);
 			if (one->is_binary == -1)
@@ -2725,6 +2725,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		}
 		if (size_only)
 			return 0;
+		if ((flags & CHECK_BINARY) &&
+		    s->size > big_file_threshold && s->is_binary == -1) {
+			s->is_binary = 1;
+			return 0;
+		}
 		fd = open(s->path, O_RDONLY);
 		if (fd < 0)
 			goto err_empty;
@@ -2746,16 +2751,21 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 	}
 	else {
 		enum object_type type;
-		if (size_only) {
+		if (size_only || (flags & CHECK_BINARY)) {
 			type = sha1_object_info(s->sha1, &s->size);
 			if (type < 0)
 				die("unable to read %s", sha1_to_hex(s->sha1));
-		} else {
-			s->data = read_sha1_file(s->sha1, &type, &s->size);
-			if (!s->data)
-				die("unable to read %s", sha1_to_hex(s->sha1));
-			s->should_free = 1;
+			if (size_only)
+				return 0;
+			if (s->size > big_file_threshold && s->is_binary == -1) {
+				s->is_binary = 1;
+				return 0;
+			}
 		}
+		s->data = read_sha1_file(s->sha1, &type, &s->size);
+		if (!s->data)
+			die("unable to read %s", sha1_to_hex(s->sha1));
+		s->should_free = 1;
 	}
 	return 0;
 }
diff --git a/diffcore.h b/diffcore.h
index c80df18..33ea2de 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -56,6 +56,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
 #define CHECK_SIZE_ONLY 1
+#define CHECK_BINARY    2
 extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index eec2cca..711f22c 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -112,6 +112,10 @@ test_expect_success 'diff --raw' '
 	git diff --raw HEAD^
 '
 
+test_expect_success 'diff --stat' '
+	git diff --stat HEAD^ HEAD
+'
+
 test_expect_success 'hash-object' '
 	git hash-object large1
 '
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
                           ` (4 preceding siblings ...)
  2014-08-13 10:57         ` [PATCH v3 5/6] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-08-13 10:57         ` Nguyễn Thái Ngọc Duy
  2014-08-14 17:00           ` Junio C Hamano
  2014-08-14 17:17           ` Junio C Hamano
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
  6 siblings, 2 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-13 10:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

If we are given two SHA-1 and asked to determine if they are different
(but not _what_ differences), we know right away by comparing SHA-1.

A side effect of this patch is, because large files are marked binary,
diff-tree will not need to unpack them. 'diff-index --cached' will not
either. But 'diff-files' still does.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff.c           | 13 +++++++++++++
 t/t1050-large.sh |  8 ++++++++
 2 files changed, 21 insertions(+)

diff --git a/diff.c b/diff.c
index d381a6f..b85bcfb 100644
--- a/diff.c
+++ b/diff.c
@@ -2324,6 +2324,19 @@ static void builtin_diff(const char *name_a,
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
 	    ( (!textconv_one && diff_filespec_is_binary(one)) ||
 	      (!textconv_two && diff_filespec_is_binary(two)) )) {
+		if (!one->data && !two->data &&
+		    S_ISREG(one->mode) && S_ISREG(two->mode) &&
+		    !DIFF_OPT_TST(o, BINARY)) {
+			if (!hashcmp(one->sha1, two->sha1)) {
+				if (must_show_header)
+					fprintf(o->file, "%s", header.buf);
+				goto free_ab_and_return;
+			}
+			fprintf(o->file, "%s", header.buf);
+			fprintf(o->file, "%sBinary files %s and %s differ\n",
+				line_prefix, lbl[0], lbl[1]);
+			goto free_ab_and_return;
+		}
 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
 			die("unable to read files to diff");
 		/* Quite common confusing case */
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 711f22c..b294963 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -116,6 +116,14 @@ test_expect_success 'diff --stat' '
 	git diff --stat HEAD^ HEAD
 '
 
+test_expect_success 'diff' '
+	git diff HEAD^ HEAD
+'
+
+test_expect_success 'diff --cached' '
+	git diff --cached HEAD^
+'
+
 test_expect_success 'hash-object' '
 	git hash-object large1
 '
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/6] diff --stat: mark any file larger than core.bigfilethreshold binary
  2014-08-13 10:57         ` [PATCH v3 5/6] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-08-13 19:32           ` Eric Sunshine
  0 siblings, 0 replies; 49+ messages in thread
From: Eric Sunshine @ 2014-08-13 19:32 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Git List, Junio C Hamano, Dale R. Worley

On Wed, Aug 13, 2014 at 6:57 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Too large files may lead to failure to allocate memory. If it happens
> here, it could impact quite a few commands that involve
> diff. Moreover, too large files are inefficient to compare anyway (and
> most likely non-text), so mark them binary and skip looking at their
> content.
>
> Noticed-by: Dale R. Worley <worley@alum.mit.edu>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/config.txt        |  3 ++-
>  Documentation/gitattributes.txt |  4 ++--
>  diff.c                          | 26 ++++++++++++++++++--------
>  diffcore.h                      |  1 +
>  t/t1050-large.sh                |  4 ++++
>  5 files changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index c55c22a..53df40e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -499,7 +499,8 @@ core.bigFileThreshold::
>         Files larger than this size are stored deflated, without
>         attempting delta compression.  Storing large files without
>         delta compression avoids excessive memory usage, at the
> -       slight expense of increased disk usage.
> +       slight expense of increased disk usage. Additionally files
> +       larger than this size are allways treated as binary.

s/allways/always/

>  Default is 512 MiB on all platforms.  This should be reasonable
>  for most projects as source code and other text files can still

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/6] sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  2014-08-13 10:57         ` [PATCH v3 2/6] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
@ 2014-08-13 21:13           ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2014-08-13 21:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git Mailing List, Dale R. Worley

Looks very sensible. Thanks.

On Wed, Aug 13, 2014 at 3:57 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Fewer die() gives better control to the caller, provided that the
> caller _can_ handle it. And in unpack_compressed_entry() case, it can,
> because unpack_compressed_entry() already returns NULL if it fails to
> inflate data.
>
> A side effect from this is fsck continues to run when very large blobs
> are present (and do not fit in memory).
>
> Noticed-by: Dale R. Worley <worley@alum.mit.edu>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  sha1_file.c      | 4 +++-
>  t/t1050-large.sh | 6 ++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 3f70b1d..330862b 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1923,7 +1923,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
>         git_zstream stream;
>         unsigned char *buffer, *in;
>
> -       buffer = xmallocz(size);
> +       buffer = xmallocz_gentle(size);
> +       if (!buffer)
> +               return NULL;
>         memset(&stream, 0, sizeof(stream));
>         stream.next_out = buffer;
>         stream.avail_out = size + 1;
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index aea4936..5642f84 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -163,4 +163,10 @@ test_expect_success 'zip achiving, deflate' '
>         git archive --format=zip HEAD >/dev/null
>  '
>
> +test_expect_success 'fsck' '
> +       test_must_fail git fsck 2>err &&
> +       n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
> +       test "$n" -gt 1
> +'
> +
>  test_done
> --
> 2.1.0.rc0.78.gc0d8480
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die()
  2014-08-13 10:57         ` [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die() Nguyễn Thái Ngọc Duy
@ 2014-08-14 16:38           ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2014-08-14 16:38 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

I think the basic idea is sound.

"git grep -e _gentle -e _gently -e _gentler" hints me that the new
functions are somewhat misnamed, though.

>  git-compat-util.h |  2 ++
>  wrapper.c         | 73 +++++++++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 59 insertions(+), 16 deletions(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index f587749..0e541e7 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -592,7 +592,9 @@ extern try_to_free_t set_try_to_free_routine(try_to_free_t);
>  #endif
>  extern char *xstrdup(const char *str);
>  extern void *xmalloc(size_t size);
> +extern void *xmalloc_gentle(size_t size);
>  extern void *xmallocz(size_t size);
> +extern void *xmallocz_gentle(size_t size);
>  extern void *xmemdupz(const void *data, size_t len);
>  extern char *xstrndup(const char *str, size_t len);
>  extern void *xrealloc(void *ptr, size_t size);
> diff --git a/wrapper.c b/wrapper.c
> index bc1bfb8..ad0992a 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -9,16 +9,23 @@ static void do_nothing(size_t size)
>  
>  static void (*try_to_free_routine)(size_t size) = do_nothing;
>  
> -static void memory_limit_check(size_t size)
> +static int memory_limit_check(size_t size, int gentle)
>  {
>  	static int limit = -1;
>  	if (limit == -1) {
>  		const char *env = getenv("GIT_ALLOC_LIMIT");
>  		limit = env ? atoi(env) * 1024 : 0;
>  	}
> -	if (limit && size > limit)
> -		die("attempting to allocate %"PRIuMAX" over limit %d",
> -		    (intmax_t)size, limit);
> +	if (limit && size > limit) {
> +		if (gentle) {
> +			error("attempting to allocate %"PRIuMAX" over limit %d",
> +			      (intmax_t)size, limit);
> +			return -1;
> +		} else
> +			die("attempting to allocate %"PRIuMAX" over limit %d",
> +			    (intmax_t)size, limit);
> +	}
> +	return 0;
>  }
>  
>  try_to_free_t set_try_to_free_routine(try_to_free_t routine)
> @@ -42,11 +49,12 @@ char *xstrdup(const char *str)
>  	return ret;
>  }
>  
> -void *xmalloc(size_t size)
> +static void *do_xmalloc(size_t size, int gentle)
>  {
>  	void *ret;
>  
> -	memory_limit_check(size);
> +	if (memory_limit_check(size, gentle))
> +		return NULL;
>  	ret = malloc(size);
>  	if (!ret && !size)
>  		ret = malloc(1);
> @@ -55,9 +63,16 @@ void *xmalloc(size_t size)
>  		ret = malloc(size);
>  		if (!ret && !size)
>  			ret = malloc(1);
> -		if (!ret)
> -			die("Out of memory, malloc failed (tried to allocate %lu bytes)",
> -			    (unsigned long)size);
> +		if (!ret) {
> +			if (!gentle)
> +				die("Out of memory, malloc failed (tried to allocate %lu bytes)",
> +				    (unsigned long)size);
> +			else {
> +				error("Out of memory, malloc failed (tried to allocate %lu bytes)",
> +				      (unsigned long)size);
> +				return NULL;
> +			}
> +		}
>  	}
>  #ifdef XMALLOC_POISON
>  	memset(ret, 0xA5, size);
> @@ -65,16 +80,42 @@ void *xmalloc(size_t size)
>  	return ret;
>  }
>  
> -void *xmallocz(size_t size)
> +void *xmalloc(size_t size)
> +{
> +	return do_xmalloc(size, 0);
> +}
> +
> +void *xmalloc_gentle(size_t size)
> +{
> +	return do_xmalloc(size, 1);
> +}
> +
> +static void *do_xmallocz(size_t size, int gentle)
>  {
>  	void *ret;
> -	if (unsigned_add_overflows(size, 1))
> -		die("Data too large to fit into virtual memory space.");
> -	ret = xmalloc(size + 1);
> -	((char*)ret)[size] = 0;
> +	if (unsigned_add_overflows(size, 1)) {
> +		if (gentle) {
> +			error("Data too large to fit into virtual memory space.");
> +			return NULL;
> +		} else
> +			die("Data too large to fit into virtual memory space.");
> +	}
> +	ret = do_xmalloc(size + 1, gentle);
> +	if (ret)
> +		((char*)ret)[size] = 0;
>  	return ret;
>  }
>  
> +void *xmallocz(size_t size)
> +{
> +	return do_xmallocz(size, 0);
> +}
> +
> +void *xmallocz_gentle(size_t size)
> +{
> +	return do_xmallocz(size, 1);
> +}
> +
>  /*
>   * xmemdupz() allocates (len + 1) bytes of memory, duplicates "len" bytes of
>   * "data" to the allocated memory, zero terminates the allocated memory,
> @@ -96,7 +137,7 @@ void *xrealloc(void *ptr, size_t size)
>  {
>  	void *ret;
>  
> -	memory_limit_check(size);
> +	memory_limit_check(size, 0);
>  	ret = realloc(ptr, size);
>  	if (!ret && !size)
>  		ret = realloc(ptr, 1);
> @@ -115,7 +156,7 @@ void *xcalloc(size_t nmemb, size_t size)
>  {
>  	void *ret;
>  
> -	memory_limit_check(size * nmemb);
> +	memory_limit_check(size * nmemb, 0);
>  	ret = calloc(nmemb, size);
>  	if (!ret && (!nmemb || !size))
>  		ret = calloc(1, 1);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects
  2014-08-13 10:57         ` [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects Nguyễn Thái Ngọc Duy
@ 2014-08-14 16:58           ` Junio C Hamano
  2014-08-15  5:24             ` Duy Nguyen
  0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2014-08-14 16:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> As a recovery tool, unpack-objects should go on unpacking as many
> objects as it can.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/unpack-objects.c | 42 +++++++++++++++++++++++++++++++++++++++++-
>  t/t1050-large.sh         |  7 +++++++
>  2 files changed, 48 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index 99cde45..8b5c67e 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -88,10 +88,50 @@ static void use(int bytes)
>  	consumed_bytes += bytes;
>  }
>  
> +static void inflate_and_throw_away(unsigned long size)
> +{
> +	git_zstream stream;
> +	char buf[8192];
> +
> +	memset(&stream, 0, sizeof(stream));
> +	stream.next_out = (unsigned char *)buf;
> +	stream.avail_out = sizeof(buf);
> +	stream.next_in = fill(1);
> +	stream.avail_in = len;
> +	git_inflate_init(&stream);
> +
> +	for (;;) {
> +		int ret = git_inflate(&stream, 0);
> +		use(len - stream.avail_in);
> +		if (stream.total_out == size && ret == Z_STREAM_END)
> +			break;
> +		if (ret != Z_OK) {
> +			error("inflate returned %d", ret);
> +			if (!recover)
> +				exit(1);
> +			has_errors = 1;
> +			break;
> +		}
> +		stream.next_out = (unsigned char *)buf;
> +		stream.avail_out = sizeof(buf);
> +		stream.next_in = fill(1);
> +		stream.avail_in = len;
> +	}
> +	git_inflate_end(&stream);
> +}

This looks wrong in a few ways.

You already know that we saw an error when you get to this function,
whether you will see an out-of-sync stream in this loop or not, so
there is no reason to copy the assignment to has_errors from the
other function.  You also know 'recover' is true---otherwise you
wouldn't be here.

But more importantly, the basic structure of this loop is the same
as the loop we already have in the only caller of this new function,
not just the regular "zlib produced this much that is not yet the
expected size, go on reading more" and "we are at the end of the
stream with Z_STREAM_END, and we are done", but even to "the stream
is corrupt, we need to exit the loop", they are identical.  Is a
copy-and-paste like this the best we can do to add this "skip to the
end of the current stream"?  We would really want to keep the number
of copies of this loop down; we saw a same bug introduced on the
termination condition multiple times to different copies X-<.

>  static void *get_data(unsigned long size)
>  {
>  	git_zstream stream;
> -	void *buf = xmalloc(size);
> +	void *buf = xmalloc_gentle(size);
> +
> +	if (!buf) {
> +		if (!recover)
> +			exit(1);
> +		has_errors = 1;
> +		inflate_and_throw_away(size);
> +		return NULL;
> +	}
>  
>  	memset(&stream, 0, sizeof(stream));
>  
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 5642f84..eec2cca 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -169,4 +169,11 @@ test_expect_success 'fsck' '
>  	test "$n" -gt 1
>  '
>  
> +test_expect_success 'unpack-objects' '
> +	P=`ls .git/objects/pack/*.pack` &&
> +	git unpack-objects -n -r <$P 2>err
> +	test $? = 1 &&
> +	grep "error: attempting to allocate .* over limit" err
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects
  2014-08-13 10:57         ` [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
@ 2014-08-14 17:00           ` Junio C Hamano
  2014-08-15 12:11             ` Duy Nguyen
  2014-08-14 17:17           ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2014-08-14 17:00 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 711f22c..b294963 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -116,6 +116,14 @@ test_expect_success 'diff --stat' '
>  	git diff --stat HEAD^ HEAD
>  '
>  
> +test_expect_success 'diff' '
> +	git diff HEAD^ HEAD
> +'
> +
> +test_expect_success 'diff --cached' '
> +	git diff --cached HEAD^
> +'

What are these checking?  No check for their outcome?

>  test_expect_success 'hash-object' '
>  	git hash-object large1
>  '

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects
  2014-08-13 10:57         ` [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
  2014-08-14 17:00           ` Junio C Hamano
@ 2014-08-14 17:17           ` Junio C Hamano
  1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2014-08-14 17:17 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, worley

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> If we are given two SHA-1 and asked to determine if they are different
> (but not _what_ differences), we know right away by comparing SHA-1.
>
> A side effect of this patch is, because large files are marked binary,
> diff-tree will not need to unpack them. 'diff-index --cached' will not
> either. But 'diff-files' still does.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  diff.c           | 13 +++++++++++++
>  t/t1050-large.sh |  8 ++++++++
>  2 files changed, 21 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index d381a6f..b85bcfb 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2324,6 +2324,19 @@ static void builtin_diff(const char *name_a,
>  	} else if (!DIFF_OPT_TST(o, TEXT) &&
>  	    ( (!textconv_one && diff_filespec_is_binary(one)) ||
>  	      (!textconv_two && diff_filespec_is_binary(two)) )) {
> +		if (!one->data && !two->data &&
> +		    S_ISREG(one->mode) && S_ISREG(two->mode) &&
> +		    !DIFF_OPT_TST(o, BINARY)) {
> +			if (!hashcmp(one->sha1, two->sha1)) {
> +				if (must_show_header)
> +					fprintf(o->file, "%s", header.buf);
> +				goto free_ab_and_return;
> +			}
> +			fprintf(o->file, "%s", header.buf);
> +			fprintf(o->file, "%sBinary files %s and %s differ\n",
> +				line_prefix, lbl[0], lbl[1]);
> +			goto free_ab_and_return;
> +		}

A tangent.

I think one and two can point at the same object only when this
filepair is involved in rename/copy.  In other words, one and two
with the same <mode,sha1,name> would not be given to this code.  And
must-show-header would be set to true long before we get here in
fill-metainfo in such a case.

I think this new code and the original below which you copied this
one from can probably be simplified.  It already felt wrong to see
two copies of "fprintf(o->file "%s", header.buf)" and now we have
four of them.  Because this is a copy-and-paste of the identical
logic from below, I do not want you to attempt fixing this tangent
in this patch, though.

Thanks.

>  		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
>  			die("unable to read files to diff");
>  		/* Quite common confusing case */
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 711f22c..b294963 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -116,6 +116,14 @@ test_expect_success 'diff --stat' '
>  	git diff --stat HEAD^ HEAD
>  '
>  
> +test_expect_success 'diff' '
> +	git diff HEAD^ HEAD
> +'
> +
> +test_expect_success 'diff --cached' '
> +	git diff --cached HEAD^
> +'
> +
>  test_expect_success 'hash-object' '
>  	git hash-object large1
>  '

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects
  2014-08-14 16:58           ` Junio C Hamano
@ 2014-08-15  5:24             ` Duy Nguyen
  0 siblings, 0 replies; 49+ messages in thread
From: Duy Nguyen @ 2014-08-15  5:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dale R. Worley

On Thu, Aug 14, 2014 at 11:58 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +static void inflate_and_throw_away(unsigned long size)
>> +{
>
> But more importantly, the basic structure of this loop is the same
> as the loop we already have in the only caller of this new function,
> not just the regular "zlib produced this much that is not yet the
> expected size, go on reading more" and "we are at the end of the
> stream with Z_STREAM_END, and we are done", but even to "the stream
> is corrupt, we need to exit the loop", they are identical.  Is a
> copy-and-paste like this the best we can do to add this "skip to the
> end of the current stream"?  We would really want to keep the number
> of copies of this loop down; we saw a same bug introduced on the
> termination condition multiple times to different copies X-<.

I know. I'm the author of one of those bugs. I considered updating the
inflate loop in get_data() to support this throw-away mode but was
afraid I may make the same mistake like in index-pack again. I'll
probably drop this patch and think if I can unify inflate loop (for
other places too, not just unpack-objects).
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects
  2014-08-14 17:00           ` Junio C Hamano
@ 2014-08-15 12:11             ` Duy Nguyen
  0 siblings, 0 replies; 49+ messages in thread
From: Duy Nguyen @ 2014-08-15 12:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Dale R. Worley

On Fri, Aug 15, 2014 at 12:00 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
>> index 711f22c..b294963 100755
>> --- a/t/t1050-large.sh
>> +++ b/t/t1050-large.sh
>> @@ -116,6 +116,14 @@ test_expect_success 'diff --stat' '
>>       git diff --stat HEAD^ HEAD
>>  '
>>
>> +test_expect_success 'diff' '
>> +     git diff HEAD^ HEAD
>> +'
>> +
>> +test_expect_success 'diff --cached' '
>> +     git diff --cached HEAD^
>> +'
>
> What are these checking?  No check for their outcome?

The first test in this file set $GIT_ALLOC_LIMIT and these commands
would not succeed without the patch. But yes I can check the "binary
files differ" too.
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v4 0/5] Large file improvements
  2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
                           ` (5 preceding siblings ...)
  2014-08-13 10:57         ` [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
@ 2014-08-16  3:08         ` Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 1/5] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
                             ` (4 more replies)
  6 siblings, 5 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-16  3:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Since v3:

 - rename xmallocz_gentle to xmallocz_gently
 - drop the unpack-objects patch
 - fix allways type

Nguyễn Thái Ngọc Duy (5):
  wrapper.c: introduce gentle xmallocz that does not die()
  sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  diff.c: allow to pass more flags to diff_populate_filespec
  diff --stat: mark any file larger than core.bigfilethreshold binary
  diff: shortcut for diff'ing two binary SHA-1 objects

 Documentation/config.txt        |  3 +-
 Documentation/gitattributes.txt |  4 +--
 diff.c                          | 52 ++++++++++++++++++++++---------
 diffcore-rename.c               |  6 ++--
 diffcore.h                      |  4 ++-
 git-compat-util.h               |  1 +
 sha1_file.c                     |  4 ++-
 t/t1050-large.sh                | 20 ++++++++++++
 wrapper.c                       | 68 +++++++++++++++++++++++++++++++----------
 9 files changed, 125 insertions(+), 37 deletions(-)

-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v4 1/5] wrapper.c: introduce gentle xmallocz that does not die()
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
@ 2014-08-16  3:08           ` Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 2/5] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-16  3:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 git-compat-util.h |  1 +
 wrapper.c         | 68 ++++++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index f587749..8785fd3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -593,6 +593,7 @@ extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 extern char *xstrdup(const char *str);
 extern void *xmalloc(size_t size);
 extern void *xmallocz(size_t size);
+extern void *xmallocz_gently(size_t size);
 extern void *xmemdupz(const void *data, size_t len);
 extern char *xstrndup(const char *str, size_t len);
 extern void *xrealloc(void *ptr, size_t size);
diff --git a/wrapper.c b/wrapper.c
index bc1bfb8..dc9c8f4 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -9,16 +9,23 @@ static void do_nothing(size_t size)
 
 static void (*try_to_free_routine)(size_t size) = do_nothing;
 
-static void memory_limit_check(size_t size)
+static int memory_limit_check(size_t size, int gentle)
 {
 	static int limit = -1;
 	if (limit == -1) {
 		const char *env = getenv("GIT_ALLOC_LIMIT");
 		limit = env ? atoi(env) * 1024 : 0;
 	}
-	if (limit && size > limit)
-		die("attempting to allocate %"PRIuMAX" over limit %d",
-		    (intmax_t)size, limit);
+	if (limit && size > limit) {
+		if (gentle) {
+			error("attempting to allocate %"PRIuMAX" over limit %d",
+			      (intmax_t)size, limit);
+			return -1;
+		} else
+			die("attempting to allocate %"PRIuMAX" over limit %d",
+			    (intmax_t)size, limit);
+	}
+	return 0;
 }
 
 try_to_free_t set_try_to_free_routine(try_to_free_t routine)
@@ -42,11 +49,12 @@ char *xstrdup(const char *str)
 	return ret;
 }
 
-void *xmalloc(size_t size)
+static void *do_xmalloc(size_t size, int gentle)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	if (memory_limit_check(size, gentle))
+		return NULL;
 	ret = malloc(size);
 	if (!ret && !size)
 		ret = malloc(1);
@@ -55,9 +63,16 @@ void *xmalloc(size_t size)
 		ret = malloc(size);
 		if (!ret && !size)
 			ret = malloc(1);
-		if (!ret)
-			die("Out of memory, malloc failed (tried to allocate %lu bytes)",
-			    (unsigned long)size);
+		if (!ret) {
+			if (!gentle)
+				die("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				    (unsigned long)size);
+			else {
+				error("Out of memory, malloc failed (tried to allocate %lu bytes)",
+				      (unsigned long)size);
+				return NULL;
+			}
+		}
 	}
 #ifdef XMALLOC_POISON
 	memset(ret, 0xA5, size);
@@ -65,16 +80,37 @@ void *xmalloc(size_t size)
 	return ret;
 }
 
-void *xmallocz(size_t size)
+void *xmalloc(size_t size)
+{
+	return do_xmalloc(size, 0);
+}
+
+static void *do_xmallocz(size_t size, int gentle)
 {
 	void *ret;
-	if (unsigned_add_overflows(size, 1))
-		die("Data too large to fit into virtual memory space.");
-	ret = xmalloc(size + 1);
-	((char*)ret)[size] = 0;
+	if (unsigned_add_overflows(size, 1)) {
+		if (gentle) {
+			error("Data too large to fit into virtual memory space.");
+			return NULL;
+		} else
+			die("Data too large to fit into virtual memory space.");
+	}
+	ret = do_xmalloc(size + 1, gentle);
+	if (ret)
+		((char*)ret)[size] = 0;
 	return ret;
 }
 
+void *xmallocz(size_t size)
+{
+	return do_xmallocz(size, 0);
+}
+
+void *xmallocz_gently(size_t size)
+{
+	return do_xmallocz(size, 1);
+}
+
 /*
  * xmemdupz() allocates (len + 1) bytes of memory, duplicates "len" bytes of
  * "data" to the allocated memory, zero terminates the allocated memory,
@@ -96,7 +132,7 @@ void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size);
+	memory_limit_check(size, 0);
 	ret = realloc(ptr, size);
 	if (!ret && !size)
 		ret = realloc(ptr, 1);
@@ -115,7 +151,7 @@ void *xcalloc(size_t nmemb, size_t size)
 {
 	void *ret;
 
-	memory_limit_check(size * nmemb);
+	memory_limit_check(size * nmemb, 0);
 	ret = calloc(nmemb, size);
 	if (!ret && (!nmemb || !size))
 		ret = calloc(1, 1);
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 2/5] sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 1/5] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
@ 2014-08-16  3:08           ` Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 3/5] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-16  3:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Fewer die() gives better control to the caller, provided that the
caller _can_ handle it. And in unpack_compressed_entry() case, it can,
because unpack_compressed_entry() already returns NULL if it fails to
inflate data.

A side effect from this is fsck continues to run when very large blobs
are present (and do not fit in memory).

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 sha1_file.c      | 4 +++-
 t/t1050-large.sh | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 3f70b1d..8db73f0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1923,7 +1923,9 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	git_zstream stream;
 	unsigned char *buffer, *in;
 
-	buffer = xmallocz(size);
+	buffer = xmallocz_gently(size);
+	if (!buffer)
+		return NULL;
 	memset(&stream, 0, sizeof(stream));
 	stream.next_out = buffer;
 	stream.avail_out = size + 1;
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index aea4936..5642f84 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -163,4 +163,10 @@ test_expect_success 'zip achiving, deflate' '
 	git archive --format=zip HEAD >/dev/null
 '
 
+test_expect_success 'fsck' '
+	test_must_fail git fsck 2>err &&
+	n=$(grep "error: attempting to allocate .* over limit" err | wc -l) &&
+	test "$n" -gt 1
+'
+
 test_done
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 3/5] diff.c: allow to pass more flags to diff_populate_filespec
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 1/5] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 2/5] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
@ 2014-08-16  3:08           ` Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 4/5] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 5/5] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
  4 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-16  3:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c            | 13 +++++++------
 diffcore-rename.c |  6 ++++--
 diffcore.h        |  3 ++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/diff.c b/diff.c
index 867f034..f4b7421 100644
--- a/diff.c
+++ b/diff.c
@@ -376,7 +376,7 @@ static unsigned long diff_filespec_size(struct diff_filespec *one)
 {
 	if (!DIFF_FILE_VALID(one))
 		return 0;
-	diff_populate_filespec(one, 1);
+	diff_populate_filespec(one, CHECK_SIZE_ONLY);
 	return one->size;
 }
 
@@ -1910,11 +1910,11 @@ static void show_dirstat(struct diff_options *options)
 			diff_free_filespec_data(p->one);
 			diff_free_filespec_data(p->two);
 		} else if (DIFF_FILE_VALID(p->one)) {
-			diff_populate_filespec(p->one, 1);
+			diff_populate_filespec(p->one, CHECK_SIZE_ONLY);
 			copied = added = 0;
 			diff_free_filespec_data(p->one);
 		} else if (DIFF_FILE_VALID(p->two)) {
-			diff_populate_filespec(p->two, 1);
+			diff_populate_filespec(p->two, CHECK_SIZE_ONLY);
 			copied = 0;
 			added = p->two->size;
 			diff_free_filespec_data(p->two);
@@ -2668,8 +2668,9 @@ static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
  * grab the data for the blob (or file) for our own in-core comparison.
  * diff_filespec has data and size fields for this purpose.
  */
-int diff_populate_filespec(struct diff_filespec *s, int size_only)
+int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 {
+	int size_only = flags & CHECK_SIZE_ONLY;
 	int err = 0;
 	/*
 	 * demote FAIL to WARN to allow inspecting the situation
@@ -4688,8 +4689,8 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p)
 	    !DIFF_FILE_VALID(p->two) ||
 	    (p->one->sha1_valid && p->two->sha1_valid) ||
 	    (p->one->mode != p->two->mode) ||
-	    diff_populate_filespec(p->one, 1) ||
-	    diff_populate_filespec(p->two, 1) ||
+	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
+	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
 	    (p->one->size != p->two->size) ||
 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
 		p->skip_stat_unmatch_result = 1;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 2e44a37..4e132f1 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -147,9 +147,11 @@ static int estimate_similarity(struct diff_filespec *src,
 	 * is a possible size - we really should have a flag to
 	 * say whether the size is valid or not!)
 	 */
-	if (!src->cnt_data && diff_populate_filespec(src, 1))
+	if (!src->cnt_data &&
+	    diff_populate_filespec(src, CHECK_SIZE_ONLY))
 		return 0;
-	if (!dst->cnt_data && diff_populate_filespec(dst, 1))
+	if (!dst->cnt_data &&
+	    diff_populate_filespec(dst, CHECK_SIZE_ONLY))
 		return 0;
 
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
diff --git a/diffcore.h b/diffcore.h
index c876dac..c80df18 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -55,7 +55,8 @@ extern void free_filespec(struct diff_filespec *);
 extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
-extern int diff_populate_filespec(struct diff_filespec *, int);
+#define CHECK_SIZE_ONLY 1
+extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
 extern int diff_filespec_is_binary(struct diff_filespec *);
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 4/5] diff --stat: mark any file larger than core.bigfilethreshold binary
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
                             ` (2 preceding siblings ...)
  2014-08-16  3:08           ` [PATCH v4 3/5] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
@ 2014-08-16  3:08           ` Nguyễn Thái Ngọc Duy
  2014-08-16  3:08           ` [PATCH v4 5/5] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
  4 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-16  3:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

Too large files may lead to failure to allocate memory. If it happens
here, it could impact quite a few commands that involve
diff. Moreover, too large files are inefficient to compare anyway (and
most likely non-text), so mark them binary and skip looking at their
content.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/config.txt        |  3 ++-
 Documentation/gitattributes.txt |  4 ++--
 diff.c                          | 26 ++++++++++++++++++--------
 diffcore.h                      |  1 +
 t/t1050-large.sh                |  4 ++++
 5 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c55c22a..3b5b24a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -499,7 +499,8 @@ core.bigFileThreshold::
 	Files larger than this size are stored deflated, without
 	attempting delta compression.  Storing large files without
 	delta compression avoids excessive memory usage, at the
-	slight expense of increased disk usage.
+	slight expense of increased disk usage. Additionally files
+	larger than this size are always treated as binary.
 +
 Default is 512 MiB on all platforms.  This should be reasonable
 for most projects as source code and other text files can still
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 643c1ba..9b45bda 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -440,8 +440,8 @@ Unspecified::
 
 	A path to which the `diff` attribute is unspecified
 	first gets its contents inspected, and if it looks like
-	text, it is treated as text.  Otherwise it would
-	generate `Binary files differ`.
+	text and is smaller than core.bigFileThreshold, it is treated
+	as text. Otherwise it would generate `Binary files differ`.
 
 String::
 
diff --git a/diff.c b/diff.c
index f4b7421..d381a6f 100644
--- a/diff.c
+++ b/diff.c
@@ -2188,8 +2188,8 @@ int diff_filespec_is_binary(struct diff_filespec *one)
 			one->is_binary = one->driver->binary;
 		else {
 			if (!one->data && DIFF_FILE_VALID(one))
-				diff_populate_filespec(one, 0);
-			if (one->data)
+				diff_populate_filespec(one, CHECK_BINARY);
+			if (one->is_binary == -1 && one->data)
 				one->is_binary = buffer_is_binary(one->data,
 						one->size);
 			if (one->is_binary == -1)
@@ -2725,6 +2725,11 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		}
 		if (size_only)
 			return 0;
+		if ((flags & CHECK_BINARY) &&
+		    s->size > big_file_threshold && s->is_binary == -1) {
+			s->is_binary = 1;
+			return 0;
+		}
 		fd = open(s->path, O_RDONLY);
 		if (fd < 0)
 			goto err_empty;
@@ -2746,16 +2751,21 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 	}
 	else {
 		enum object_type type;
-		if (size_only) {
+		if (size_only || (flags & CHECK_BINARY)) {
 			type = sha1_object_info(s->sha1, &s->size);
 			if (type < 0)
 				die("unable to read %s", sha1_to_hex(s->sha1));
-		} else {
-			s->data = read_sha1_file(s->sha1, &type, &s->size);
-			if (!s->data)
-				die("unable to read %s", sha1_to_hex(s->sha1));
-			s->should_free = 1;
+			if (size_only)
+				return 0;
+			if (s->size > big_file_threshold && s->is_binary == -1) {
+				s->is_binary = 1;
+				return 0;
+			}
 		}
+		s->data = read_sha1_file(s->sha1, &type, &s->size);
+		if (!s->data)
+			die("unable to read %s", sha1_to_hex(s->sha1));
+		s->should_free = 1;
 	}
 	return 0;
 }
diff --git a/diffcore.h b/diffcore.h
index c80df18..33ea2de 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -56,6 +56,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 			  int, unsigned short);
 
 #define CHECK_SIZE_ONLY 1
+#define CHECK_BINARY    2
 extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 5642f84..00d2f33 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -112,6 +112,10 @@ test_expect_success 'diff --raw' '
 	git diff --raw HEAD^
 '
 
+test_expect_success 'diff --stat' '
+	git diff --stat HEAD^ HEAD
+'
+
 test_expect_success 'hash-object' '
 	git hash-object large1
 '
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 5/5] diff: shortcut for diff'ing two binary SHA-1 objects
  2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
                             ` (3 preceding siblings ...)
  2014-08-16  3:08           ` [PATCH v4 4/5] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
@ 2014-08-16  3:08           ` Nguyễn Thái Ngọc Duy
  4 siblings, 0 replies; 49+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-08-16  3:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, worley, Nguyễn Thái Ngọc Duy

If we are given two SHA-1 and asked to determine if they are different
(but not _what_ differences), we know right away by comparing SHA-1.

A side effect of this patch is, because large files are marked binary,
diff-tree will not need to unpack them. 'diff-index --cached' will not
either. But 'diff-files' still does.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff.c           | 13 +++++++++++++
 t/t1050-large.sh | 10 ++++++++++
 2 files changed, 23 insertions(+)

diff --git a/diff.c b/diff.c
index d381a6f..b85bcfb 100644
--- a/diff.c
+++ b/diff.c
@@ -2324,6 +2324,19 @@ static void builtin_diff(const char *name_a,
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
 	    ( (!textconv_one && diff_filespec_is_binary(one)) ||
 	      (!textconv_two && diff_filespec_is_binary(two)) )) {
+		if (!one->data && !two->data &&
+		    S_ISREG(one->mode) && S_ISREG(two->mode) &&
+		    !DIFF_OPT_TST(o, BINARY)) {
+			if (!hashcmp(one->sha1, two->sha1)) {
+				if (must_show_header)
+					fprintf(o->file, "%s", header.buf);
+				goto free_ab_and_return;
+			}
+			fprintf(o->file, "%s", header.buf);
+			fprintf(o->file, "%sBinary files %s and %s differ\n",
+				line_prefix, lbl[0], lbl[1]);
+			goto free_ab_and_return;
+		}
 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
 			die("unable to read files to diff");
 		/* Quite common confusing case */
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 00d2f33..05a1e1d 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -116,6 +116,16 @@ test_expect_success 'diff --stat' '
 	git diff --stat HEAD^ HEAD
 '
 
+test_expect_success 'diff' '
+	git diff HEAD^ HEAD >actual &&
+	grep "Binary files.*differ" actual
+'
+
+test_expect_success 'diff --cached' '
+	git diff --cached HEAD^ >actual &&
+	grep "Binary files.*differ" actual
+'
+
 test_expect_success 'hash-object' '
 	git hash-object large1
 '
-- 
2.1.0.rc0.78.gc0d8480

^ permalink raw reply related	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-08-16  3:08 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-27 16:47 Git chokes on large file Dale R. Worley
2014-05-28 13:32 ` Duy Nguyen
2014-05-28 17:10   ` Junio C Hamano
2014-05-28 18:18     ` Dale R. Worley
2014-05-28 18:15   ` Dale R. Worley
2014-05-28 18:23     ` David Lang
2014-05-28 18:47       ` Dale R. Worley
2014-05-28 19:05         ` David Lang
2014-05-29 19:12           ` Dale R. Worley
2014-05-28 18:54       ` Junio C Hamano
2014-05-28 19:09         ` David Lang
2014-05-29 12:57   ` [PATCH 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
2014-05-29 12:57     ` [PATCH 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
2014-05-29 12:57     ` [PATCH 3/4] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
2014-05-29 12:57     ` [PATCH 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
2014-06-19 12:27       ` Thomas Braun
2014-06-23 12:18         ` Duy Nguyen
2014-06-23 19:21           ` Thomas Braun
2014-06-24 11:45     ` [PATCH v2 1/4] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
2014-06-24 11:45       ` [PATCH v2 2/4] fsck: do not die when not enough memory to examine a pack entry Nguyễn Thái Ngọc Duy
2014-06-26 18:09         ` Junio C Hamano
2014-06-29  0:40           ` Duy Nguyen
2014-06-24 11:45       ` [PATCH v2 3/4] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
2014-06-24 11:45       ` [PATCH v2 4/4] diff: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
2014-06-26 17:55         ` Junio C Hamano
2014-06-27 18:56           ` Thomas Braun
2014-06-29  1:11             ` Duy Nguyen
2014-08-13 10:57       ` [PATCH v3 0/6] Large file improvements Nguyễn Thái Ngọc Duy
2014-08-13 10:57         ` [PATCH v3 1/6] wrapper.c: introduce gentle xmalloc(z) that does not die() Nguyễn Thái Ngọc Duy
2014-08-14 16:38           ` Junio C Hamano
2014-08-13 10:57         ` [PATCH v3 2/6] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
2014-08-13 21:13           ` Junio C Hamano
2014-08-13 10:57         ` [PATCH v3 3/6] unpack-objects: continue when fail to malloc due to large objects Nguyễn Thái Ngọc Duy
2014-08-14 16:58           ` Junio C Hamano
2014-08-15  5:24             ` Duy Nguyen
2014-08-13 10:57         ` [PATCH v3 4/6] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
2014-08-13 10:57         ` [PATCH v3 5/6] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
2014-08-13 19:32           ` Eric Sunshine
2014-08-13 10:57         ` [PATCH v3 6/6] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
2014-08-14 17:00           ` Junio C Hamano
2014-08-15 12:11             ` Duy Nguyen
2014-08-14 17:17           ` Junio C Hamano
2014-08-16  3:08         ` [PATCH v4 0/5] Large file improvements Nguyễn Thái Ngọc Duy
2014-08-16  3:08           ` [PATCH v4 1/5] wrapper.c: introduce gentle xmallocz that does not die() Nguyễn Thái Ngọc Duy
2014-08-16  3:08           ` [PATCH v4 2/5] sha1_file.c: do not die failing to malloc in unpack_compressed_entry Nguyễn Thái Ngọc Duy
2014-08-16  3:08           ` [PATCH v4 3/5] diff.c: allow to pass more flags to diff_populate_filespec Nguyễn Thái Ngọc Duy
2014-08-16  3:08           ` [PATCH v4 4/5] diff --stat: mark any file larger than core.bigfilethreshold binary Nguyễn Thái Ngọc Duy
2014-08-16  3:08           ` [PATCH v4 5/5] diff: shortcut for diff'ing two binary SHA-1 objects Nguyễn Thái Ngọc Duy
2014-05-28 15:05 ` Git chokes on large file Thomas Braun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.