All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sha1_file.c: zlib can only process 4GB at a time
@ 2011-06-08 19:03 Junio C Hamano
  2011-06-08 19:16 ` Andreas Schwab
  2011-06-09 20:11 ` [PATCH] unpack-objects: " Junio C Hamano
  0 siblings, 2 replies; 3+ messages in thread
From: Junio C Hamano @ 2011-06-08 19:03 UTC (permalink / raw)
  To: git

The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * There are a lot more places than just this call site, but I wanted to
   send this out to get a quick sanity check by other people if the
   approach of fixing these issues is sane.

   Of course, we should be using streaming for more codepath, but that is
   totally a separate issue. We would want our code to work correctly when
   streaming is not used and your machine is beefy enough with zetabytes
   of ram, and that is the topic of this patch.

 cache.h     |   11 +++++++++++
 sha1_file.c |   11 +++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index e11cf6a..b917ae5 100644
--- a/cache.h
+++ b/cache.h
@@ -24,6 +24,17 @@ void git_inflate_init(z_streamp strm);
 void git_inflate_end(z_streamp strm);
 int git_inflate(z_streamp strm, int flush);
 
+/*
+ * avail_in and avail_out are counted in uInt, which typically limits
+ * the size of the buffer we can use to 4GB when interacting with zlib
+ * in a single call to inflate/deflate.
+ */
+#define ZLIB_BUF_MAX ((1UL << ((sizeof(uInt) * 8))) - 1)
+static inline uInt zlib_buf_cap(unsigned long len)
+{
+	return (ZLIB_BUF_MAX < len ? ZLIB_BUF_MAX : len);
+}
+
 #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
 #define DTYPE(de)	((de)->d_type)
 #else
diff --git a/sha1_file.c b/sha1_file.c
index 064a330..51236ab 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2429,6 +2429,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	unsigned char parano_sha1[20];
 	char *filename;
 	static char tmpfile[PATH_MAX];
+	unsigned long bytes_to_deflate;
 
 	filename = sha1_file_name(sha1);
 	fd = create_tmpfile(tmpfile, sizeof(tmpfile), filename);
@@ -2454,14 +2455,20 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	git_SHA1_Update(&c, hdr, hdrlen);
 
 	/* Then the data itself.. */
+	bytes_to_deflate = len;
 	stream.next_in = (void *)buf;
-	stream.avail_in = len;
+	stream.avail_in = zlib_buf_cap(bytes_to_deflate);
 	do {
 		unsigned char *in0 = stream.next_in;
+		size_t consumed;
+
 		ret = deflate(&stream, Z_FINISH);
-		git_SHA1_Update(&c, in0, stream.next_in - in0);
+		consumed = stream.next_in - in0;
+		git_SHA1_Update(&c, in0, consumed);
 		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
 			die("unable to write sha1 file");
+		bytes_to_deflate -= consumed;
+		stream.avail_in = zlib_buf_cap(bytes_to_deflate);
 		stream.next_out = compressed;
 		stream.avail_out = sizeof(compressed);
 	} while (ret == Z_OK);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sha1_file.c: zlib can only process 4GB at a time
  2011-06-08 19:03 [PATCH] sha1_file.c: zlib can only process 4GB at a time Junio C Hamano
@ 2011-06-08 19:16 ` Andreas Schwab
  2011-06-09 20:11 ` [PATCH] unpack-objects: " Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Andreas Schwab @ 2011-06-08 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> +#define ZLIB_BUF_MAX ((1UL << ((sizeof(uInt) * 8))) - 1)

This is undefined if unsigned long has 32 bits.

#define ZLIB_BUF_MAX ((uInt)-1)

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] unpack-objects: zlib can only process 4GB at a time
  2011-06-08 19:03 [PATCH] sha1_file.c: zlib can only process 4GB at a time Junio C Hamano
  2011-06-08 19:16 ` Andreas Schwab
@ 2011-06-09 20:11 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2011-06-09 20:11 UTC (permalink / raw)
  To: git

This codepath tried to tell zlib that we have allocated a large enough
output buffer and expected a single call to inflate() to succeed, but
zlib cannot expand into a buffer that is larger than 4GB at a time.

Ideally we should be transferring a better cue between send-pack and
receive-pack (or fetch-pack and upload-pack) to tell that we would be
better off not to explode into individual loose objects, but that is not
an excuse to leave our use of zlib broken and fail to unpack large objects
in a pack even on a beefy machine with enough memory.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/unpack-objects.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index f63973c..b208d6e 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -92,18 +92,22 @@ static void *get_data(unsigned long size)
 {
 	z_stream stream;
 	void *buf = xmalloc(size);
+	unsigned long bytes_to_inflate;
 
 	memset(&stream, 0, sizeof(stream));
 
 	stream.next_out = buf;
-	stream.avail_out = size;
+	bytes_to_inflate = size;
+	stream.avail_out = zlib_buf_cap(bytes_to_inflate);
 	stream.next_in = fill(1);
 	stream.avail_in = len;
 	git_inflate_init(&stream);
 
 	for (;;) {
+		unsigned char *out0 = stream.next_out;
 		int ret = git_inflate(&stream, 0);
 		use(len - stream.avail_in);
+		bytes_to_inflate -= (stream.next_out - out0);
 		if (stream.total_out == size && ret == Z_STREAM_END)
 			break;
 		if (ret != Z_OK) {
@@ -117,6 +121,7 @@ static void *get_data(unsigned long size)
 		}
 		stream.next_in = fill(1);
 		stream.avail_in = len;
+		stream.avail_out = zlib_buf_cap(bytes_to_inflate);
 	}
 	git_inflate_end(&stream);
 	return buf;
-- 
1.7.6.rc1.118.ge175b4a

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-06-09 20:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-08 19:03 [PATCH] sha1_file.c: zlib can only process 4GB at a time Junio C Hamano
2011-06-08 19:16 ` Andreas Schwab
2011-06-09 20:11 ` [PATCH] unpack-objects: " Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.