* [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
@ 2010-02-21 4:27 Nicolas Pitre
2010-02-21 19:45 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2010-02-21 4:27 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
There is no real advantage to malloc the whole output buffer and
deflate the data in a single pass when writing loose objects. That is
like only 1% faster while using more memory, especially with large
files where memory usage is far more. It is best to deflate and write
the data out in small chunks reusing the same memory instead.
For example, using 'git add' on a few large files averaging 40 MB ...
Before:
21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+142640minor)pagefaults 0swaps
After:
21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+104408minor)pagefaults 0swaps
While the runtime stayed relatively the same, the number of minor page
faults went down significantly.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
---
I think this is worth doing independently of the paranoid mode being
discussed.
diff --git a/sha1_file.c b/sha1_file.c
index 657825e..9196b57 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2281,8 +2281,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
void *buf, unsigned long len, time_t mtime)
{
int fd, ret;
- size_t size;
- unsigned char *compressed;
+ unsigned char compressed[4096];
z_stream stream;
char *filename;
static char tmpfile[PATH_MAX];
@@ -2301,12 +2300,8 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
/* Set it up */
memset(&stream, 0, sizeof(stream));
deflateInit(&stream, zlib_compression_level);
- size = 8 + deflateBound(&stream, len+hdrlen);
- compressed = xmalloc(size);
-
- /* Compress it */
stream.next_out = compressed;
- stream.avail_out = size;
+ stream.avail_out = sizeof(compressed);
/* First header.. */
stream.next_in = (unsigned char *)hdr;
@@ -2317,20 +2312,21 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
/* Then the data itself.. */
stream.next_in = buf;
stream.avail_in = len;
- ret = deflate(&stream, Z_FINISH);
+ do {
+ ret = deflate(&stream, Z_FINISH);
+ if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
+ die("unable to write sha1 file");
+ stream.next_out = compressed;
+ stream.avail_out = sizeof(compressed);
+ } while (ret == Z_OK);
+
if (ret != Z_STREAM_END)
die("unable to deflate new object %s (%d)", sha1_to_hex(sha1), ret);
-
ret = deflateEnd(&stream);
if (ret != Z_OK)
die("deflateEnd on object %s failed (%d)", sha1_to_hex(sha1), ret);
- size = stream.total_out;
-
- if (write_buffer(fd, compressed, size) < 0)
- die("unable to write sha1 file");
close_sha1_file(fd);
- free(compressed);
if (mtime) {
struct utimbuf utb;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-21 4:27 [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects Nicolas Pitre
@ 2010-02-21 19:45 ` Junio C Hamano
2010-02-21 21:26 ` Nicolas Pitre
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2010-02-21 19:45 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Nicolas Pitre <nico@fluxnic.net> writes:
> I think this is worth doing independently of the paranoid mode being
> discussed.
While I agree it might be worth doing, I can see that you really hate
"paranoia". Now your loop is letting deflate() decide how much it happens
to like to consume in a given round, it is much trickier to plug the
paranoia in without majorly rewriting the loop this patch introduces.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-21 19:45 ` Junio C Hamano
@ 2010-02-21 21:26 ` Nicolas Pitre
2010-02-21 22:22 ` Junio C Hamano
2010-02-21 22:30 ` Junio C Hamano
0 siblings, 2 replies; 13+ messages in thread
From: Nicolas Pitre @ 2010-02-21 21:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, 21 Feb 2010, Junio C Hamano wrote:
> Nicolas Pitre <nico@fluxnic.net> writes:
>
> > I think this is worth doing independently of the paranoid mode being
> > discussed.
>
> While I agree it might be worth doing, I can see that you really hate
> "paranoia". Now your loop is letting deflate() decide how much it happens
> to like to consume in a given round, it is much trickier to plug the
> paranoia in without majorly rewriting the loop this patch introduces.
I disagree.
Here's my take on the paranoia issue. Now the question is whether or
not this should really be optional. I would think no.
FWIW, we already have that double SHA1 protection when dealing with pack
files with fixup_pack_header_footer() (see commit abeb40e5aa).
---------- >8
From: Nicolas Pitre <nico@fluxnic.net>
Date: Sun, 21 Feb 2010 15:48:06 -0500
Subject: [PATCH] sha1_file: be paranoid when creating loose objects
We don't want the data being deflated and stored into loose objects
to be different from what we expect. While the deflated data is
protected by a CRC which is good enough for safe data retrieval
operations, we still want to be doubly sure that the source data used
at object creation time is still what we expected once that data has
been deflated and its CRC32 computed.
The most plausible data corruption may occur if the source file is
modified while Git is deflating and writing it out in a loose object.
Or Git itself could have a bug causing memory corruption. Or even bad
RAM could cause trouble. So it is best to make sure everything is
coherent and checksum protected from beginning to end.
To do so we compute the SHA1 of the data being deflated _after_ the
deflate operation has consumed that data, and make sure it matches
with the expected SHA1. This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
coherent with its SHA1 hash.
There is some overhead of course. Using 'git add' on a set of large files:
Before:
real 0m25.210s
user 0m23.783s
sys 0m1.408s
After:
real 0m26.537s
user 0m25.175s
sys 0m1.358s
The overhead is around 5% for full data coherency guarantee.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
diff --git a/sha1_file.c b/sha1_file.c
index 9196b57..c0214d7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2283,6 +2283,8 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
int fd, ret;
unsigned char compressed[4096];
z_stream stream;
+ git_SHA_CTX c;
+ unsigned char parano_sha1[20];
char *filename;
static char tmpfile[PATH_MAX];
@@ -2302,18 +2304,22 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
deflateInit(&stream, zlib_compression_level);
stream.next_out = compressed;
stream.avail_out = sizeof(compressed);
+ git_SHA1_Init(&c);
/* First header.. */
stream.next_in = (unsigned char *)hdr;
stream.avail_in = hdrlen;
while (deflate(&stream, 0) == Z_OK)
/* nothing */;
+ git_SHA1_Update(&c, hdr, hdrlen);
/* Then the data itself.. */
stream.next_in = buf;
stream.avail_in = len;
do {
+ unsigned char *in0 = stream.next_in;
ret = deflate(&stream, Z_FINISH);
+ git_SHA1_Update(&c, in0, stream.next_in - in0);
if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
die("unable to write sha1 file");
stream.next_out = compressed;
@@ -2325,6 +2331,9 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
ret = deflateEnd(&stream);
if (ret != Z_OK)
die("deflateEnd on object %s failed (%d)", sha1_to_hex(sha1), ret);
+ git_SHA1_Final(parano_sha1, &c);
+ if (hashcmp(sha1, parano_sha1) != 0)
+ die("confused by unstable object source data for %s", sha1_to_hex(sha1));
close_sha1_file(fd);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-21 21:26 ` Nicolas Pitre
@ 2010-02-21 22:22 ` Junio C Hamano
2010-02-21 22:30 ` Junio C Hamano
1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2010-02-21 22:22 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Nicolas Pitre <nico@fluxnic.net> writes:
> I disagree.
>
> Here's my take on the paranoia issue.
Ahh, yes, of course.
You are always a better programmer than I am and I keep getting reminded.
Thanks, and I agree it is a sane thing to do this unconditionally.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-21 21:26 ` Nicolas Pitre
2010-02-21 22:22 ` Junio C Hamano
@ 2010-02-21 22:30 ` Junio C Hamano
2010-02-22 1:35 ` Nicolas Pitre
1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2010-02-21 22:30 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Nicolas Pitre <nico@fluxnic.net> writes:
> /* Then the data itself.. */
> stream.next_in = buf;
> stream.avail_in = len;
> do {
> + unsigned char *in0 = stream.next_in;
> ret = deflate(&stream, Z_FINISH);
> + git_SHA1_Update(&c, in0, stream.next_in - in0);
Actually, I have to take my earlier comment back. This is not "paranoia".
I do not see anything that protects the memory area between in0 and
stream.next_in from getting modified while deflate() nor SHA1_Update() run
from the outside. Unless you copy the data away to somewhere stable at
the beginning of each iteration of this loop and run deflate() and
SHA1_Update(), you cannot have "paranoia".
My comment about "trickier" is about determining the size of that buffer
used as "somewhere stable".
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-21 22:30 ` Junio C Hamano
@ 2010-02-22 1:35 ` Nicolas Pitre
2010-02-22 5:30 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2010-02-22 1:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, 21 Feb 2010, Junio C Hamano wrote:
> Nicolas Pitre <nico@fluxnic.net> writes:
>
> > /* Then the data itself.. */
> > stream.next_in = buf;
> > stream.avail_in = len;
> > do {
> > + unsigned char *in0 = stream.next_in;
> > ret = deflate(&stream, Z_FINISH);
> > + git_SHA1_Update(&c, in0, stream.next_in - in0);
>
> Actually, I have to take my earlier comment back. This is not "paranoia".
>
> I do not see anything that protects the memory area between in0 and
> stream.next_in from getting modified while deflate() nor SHA1_Update() run
> from the outside.
So what?
> Unless you copy the data away to somewhere stable at
> the beginning of each iteration of this loop and run deflate() and
> SHA1_Update(), you cannot have "paranoia".
No.
The whole point is to detect data incoherencyes.
So current sequence of events is as follows:
T0 write_sha1_file_prepare() is called
T1 start initial SHA1 computation on data buffer
T2 in the middle of initial SHA1 computation
T3 end of initial SHA1 computation -> object name is determined
T4 write_loose_object() is called
... enter the write loop
T5+n deflate() called on buffer n
T6+n git_SHA1_Update(() called on the same buffer n
T7+n deflated data written out
...
Tend abort if result of T6+n doesn't match object name from T3
So... what can happen:
1) Data is externally modified before T5+n: deflated data and its CRC32
will be coherent with the SHA1 computed in T6+n, but incoherent with
the SHA1 used for the object name. Wrong data is written to the
object even if it will inflate OK. We really want to prevent that
from happening. The test in Tend will fail.
2) Data is externally modified between T5+n and T6+n: the deflated data
and CRC32 will be coherent with the object name but incoherent with
the parano_sha1. Although written data will be OK, this is way too
close from being wrong, and the test in Tend will fail. If there is
more than one round into the loop and the external modifications are
large enough then this becomes the same as case 1 above.
3) Data is externally modified in T2: again the test in Tend will fail.
So in all possible cases I can think of, the write will abort. No copy
buffer needed, no filesystem mtime required, etc. If the whole data is
not stable between T1 and Tend then the object is not added to the
repository. Of course it is possible that the data be modified at the
beginning of the file while the loop in T[5-7] is passed that point.
But still, there is no data inconsistency at that point.
> My comment about "trickier" is about determining the size of that buffer
> used as "somewhere stable".
We don't care about such buffer.
Nicolas
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 1:35 ` Nicolas Pitre
@ 2010-02-22 5:30 ` Junio C Hamano
2010-02-22 5:50 ` Nicolas Pitre
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2010-02-22 5:30 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git
Nicolas Pitre <nico@fluxnic.net> writes:
> The whole point is to detect data incoherencyes.
Yes. We want to make sure that the SHA-1 we compute is over what we fed
deflate().
> So current sequence of events is as follows:
>
> T0 write_sha1_file_prepare() is called
> T1 start initial SHA1 computation on data buffer
> T2 in the middle of initial SHA1 computation
> T3 end of initial SHA1 computation -> object name is determined
> T4 write_loose_object() is called
> ... enter the write loop
> T5+n deflate() called on buffer n
> T6+n git_SHA1_Update(() called on the same buffer n
> T7+n deflated data written out
> ...
> Tend abort if result of T6+n doesn't match object name from T3
>
> So... what can happen:
>
> 1) Data is externally modified before T5+n: deflated data and its CRC32
> will be coherent with the SHA1 computed in T6+n, but incoherent with
> the SHA1 used for the object name. Wrong data is written to the
> object even if it will inflate OK. We really want to prevent that
> from happening. The test in Tend will fail.
>
> 2) Data is externally modified between T5+n and T6+n: the deflated data
> and CRC32 will be coherent with the object name but incoherent with
> the parano_sha1. Although written data will be OK, this is way too
> close from being wrong, and the test in Tend will fail. If there is
> more than one round into the loop and the external modifications are
> large enough then this becomes the same as case 1 above.
>
> 3) Data is externally modified in T2: again the test in Tend will fail.
>
> So in all possible cases I can think of, the write will abort.
There is one pathological case.
Immediately before T5+n (or between T5+n and T6+n), the external process
changes the data deflate() is working on, but before T6+n, the external
process changes the data back. Two SHA-1's computed may match, but it is
not a hash over what was deflated(); you won't be able to abort.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 5:30 ` Junio C Hamano
@ 2010-02-22 5:50 ` Nicolas Pitre
2010-02-22 6:17 ` Junio C Hamano
2010-02-22 6:27 ` Dmitry Potapov
0 siblings, 2 replies; 13+ messages in thread
From: Nicolas Pitre @ 2010-02-22 5:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, 21 Feb 2010, Junio C Hamano wrote:
> Nicolas Pitre <nico@fluxnic.net> writes:
>
> > The whole point is to detect data incoherencyes.
>
> Yes. We want to make sure that the SHA-1 we compute is over what we fed
> deflate().
>
> > So current sequence of events is as follows:
> >
> > T0 write_sha1_file_prepare() is called
> > T1 start initial SHA1 computation on data buffer
> > T2 in the middle of initial SHA1 computation
> > T3 end of initial SHA1 computation -> object name is determined
> > T4 write_loose_object() is called
> > ... enter the write loop
> > T5+n deflate() called on buffer n
> > T6+n git_SHA1_Update(() called on the same buffer n
> > T7+n deflated data written out
> > ...
> > Tend abort if result of T6+n doesn't match object name from T3
> >
> > So... what can happen:
> >
> > 1) Data is externally modified before T5+n: deflated data and its CRC32
> > will be coherent with the SHA1 computed in T6+n, but incoherent with
> > the SHA1 used for the object name. Wrong data is written to the
> > object even if it will inflate OK. We really want to prevent that
> > from happening. The test in Tend will fail.
> >
> > 2) Data is externally modified between T5+n and T6+n: the deflated data
> > and CRC32 will be coherent with the object name but incoherent with
> > the parano_sha1. Although written data will be OK, this is way too
> > close from being wrong, and the test in Tend will fail. If there is
> > more than one round into the loop and the external modifications are
> > large enough then this becomes the same as case 1 above.
> >
> > 3) Data is externally modified in T2: again the test in Tend will fail.
> >
> > So in all possible cases I can think of, the write will abort.
>
> There is one pathological case.
>
> Immediately before T5+n (or between T5+n and T6+n), the external process
> changes the data deflate() is working on, but before T6+n, the external
> process changes the data back. Two SHA-1's computed may match, but it is
> not a hash over what was deflated(); you won't be able to abort.
And what real life case would trigger this? Given the size of the
window for this to happen, what are your chances?
Of course the odds for me to be struck by lightning also exist. And if
I work really really hard at it then I might be able to trigger that
pathological case above even before the next thunderstorm. But in
practice I'm hardly concerned by either of those possibilities.
Nicolas
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 5:50 ` Nicolas Pitre
@ 2010-02-22 6:17 ` Junio C Hamano
2010-02-22 6:31 ` Junio C Hamano
2010-02-22 6:27 ` Dmitry Potapov
1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2010-02-22 6:17 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Nicolas Pitre <nico@fluxnic.net> writes:
> And what real life case would trigger this? Given the size of the
> window for this to happen, what are your chances?
> Of course the odds for me to be struck by lightning also exist. And if
> I work really really hard at it then I might be able to trigger that
> pathological case above even before the next thunderstorm. But in
> practice I'm hardly concerned by either of those possibilities.
The real life case for any of this triggers for me is zero, as I won't be
mistreating git as a continuous & asynchronous back-up tool.
But then that would make the whole discussion moot. There are people who
file "bug reports" with an artificial reproduction recipe built around a
loop that runs dd continuously overwriting a file while "git add" is asked
to add it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 5:50 ` Nicolas Pitre
2010-02-22 6:17 ` Junio C Hamano
@ 2010-02-22 6:27 ` Dmitry Potapov
1 sibling, 0 replies; 13+ messages in thread
From: Dmitry Potapov @ 2010-02-22 6:27 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git
On Mon, Feb 22, 2010 at 12:50:18AM -0500, Nicolas Pitre wrote:
>
> And what real life case would trigger this? Given the size of the
> window for this to happen, what are your chances?
If some process changes just one byte (or one word) back and forth
then the possibility of this is 25%. Whether such a process can
exist, I don't know... I could not imagine that anyone would want
to change the file when we add it to the repository. In any case,
it is wrong to call it a _paranoic_ mode if there is even a small
(but practically feasable) chance of this to happen.
Dmitry
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 6:17 ` Junio C Hamano
@ 2010-02-22 6:31 ` Junio C Hamano
2010-02-22 17:36 ` Nicolas Pitre
0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2010-02-22 6:31 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> Nicolas Pitre <nico@fluxnic.net> writes:
>
>> And what real life case would trigger this? Given the size of the
>> window for this to happen, what are your chances?
>
>> Of course the odds for me to be struck by lightning also exist. And if
>> I work really really hard at it then I might be able to trigger that
>> pathological case above even before the next thunderstorm. But in
>> practice I'm hardly concerned by either of those possibilities.
>
> The real life case for any of this triggers for me is zero, as I won't be
> mistreating git as a continuous & asynchronous back-up tool.
>
> But then that would make the whole discussion moot. There are people who
> file "bug reports" with an artificial reproduction recipe built around a
> loop that runs dd continuously overwriting a file while "git add" is asked
> to add it.
Having said all that, I like your approach better. It is not worth paying
the price of unnecessary memcpy(3) that would _only_ help catching the
insanely artificial test case, but your patch strikes a good balance of
small overhead to catch the easier-to-trigger (either by stupidity, malice
or mistake) cases.
So I am tempted to discard the "paranoia" patch, and replace with your two
patches, with the following caveats in the log message.
--- /var/tmp/2 2010-02-21 22:23:30.000000000 -0800
+++ /var/tmp/1 2010-02-21 22:23:22.000000000 -0800
@@ -21,7 +21,9 @@
deflate operation has consumed that data, and make sure it matches
with the expected SHA1. This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
- coherent with its SHA1 hash.
+ coherent with its SHA1 hash. One pathological case we ignore is when
+ the data is modified before (or during) deflate call, but changed back
+ before it is hashed.
There is some overhead of course. Using 'git add' on a set of large files:
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 6:31 ` Junio C Hamano
@ 2010-02-22 17:36 ` Nicolas Pitre
2010-02-22 19:55 ` Junio C Hamano
0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2010-02-22 17:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, 21 Feb 2010, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Nicolas Pitre <nico@fluxnic.net> writes:
> >
> >> And what real life case would trigger this? Given the size of the
> >> window for this to happen, what are your chances?
> >
> >> Of course the odds for me to be struck by lightning also exist. And if
> >> I work really really hard at it then I might be able to trigger that
> >> pathological case above even before the next thunderstorm. But in
> >> practice I'm hardly concerned by either of those possibilities.
> >
> > The real life case for any of this triggers for me is zero, as I won't be
> > mistreating git as a continuous & asynchronous back-up tool.
> >
> > But then that would make the whole discussion moot. There are people who
> > file "bug reports" with an artificial reproduction recipe built around a
> > loop that runs dd continuously overwriting a file while "git add" is asked
> > to add it.
>
> Having said all that, I like your approach better. It is not worth paying
> the price of unnecessary memcpy(3) that would _only_ help catching the
> insanely artificial test case, but your patch strikes a good balance of
> small overhead to catch the easier-to-trigger (either by stupidity, malice
> or mistake) cases.
I think it also catches the bad RAM case which is probably more common
too.
> So I am tempted to discard the "paranoia" patch, and replace with your two
> patches, with the following caveats in the log message.
>
> --- /var/tmp/2 2010-02-21 22:23:30.000000000 -0800
> +++ /var/tmp/1 2010-02-21 22:23:22.000000000 -0800
> @@ -21,7 +21,9 @@
> deflate operation has consumed that data, and make sure it matches
> with the expected SHA1. This way we can rely on the CRC32 checked by
> the inflate operation to provide a good indication that the data is still
> - coherent with its SHA1 hash.
> + coherent with its SHA1 hash. One pathological case we ignore is when
> + the data is modified before (or during) deflate call, but changed back
> + before it is hashed.
ACK.
Nicolas
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects
2010-02-22 17:36 ` Nicolas Pitre
@ 2010-02-22 19:55 ` Junio C Hamano
0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2010-02-22 19:55 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Nicolas Pitre <nico@fluxnic.net> writes:
>> Having said all that, I like your approach better. It is not worth paying
>> the price of unnecessary memcpy(3) that would _only_ help catching the
>> insanely artificial test case, but your patch strikes a good balance of
>> small overhead to catch the easier-to-trigger (either by stupidity, malice
>> or mistake) cases.
>
> I think it also catches the bad RAM case which is probably more common
> too.
That is true; a broken RAM that returns unstable values will yield
different values between the time the first hash runs and the time the
deflate loop runs will trigger the safety.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-02-22 19:56 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-21 4:27 [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects Nicolas Pitre
2010-02-21 19:45 ` Junio C Hamano
2010-02-21 21:26 ` Nicolas Pitre
2010-02-21 22:22 ` Junio C Hamano
2010-02-21 22:30 ` Junio C Hamano
2010-02-22 1:35 ` Nicolas Pitre
2010-02-22 5:30 ` Junio C Hamano
2010-02-22 5:50 ` Nicolas Pitre
2010-02-22 6:17 ` Junio C Hamano
2010-02-22 6:31 ` Junio C Hamano
2010-02-22 17:36 ` Nicolas Pitre
2010-02-22 19:55 ` Junio C Hamano
2010-02-22 6:27 ` Dmitry Potapov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).