From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 1/7] oid_array: use size_t for count and allocation
Date: Mon, 30 Mar 2020 10:03:09 -0400 [thread overview]
Message-ID: <20200330140309.GA2456038@coredump.intra.peff.net> (raw)
In-Reply-To: <20200330140247.GA476088@coredump.intra.peff.net>
The oid_array object uses an "int" to store the number of items and the
allocated size. It's rather unlikely for somebody to have more than 2^31
objects in a repository (the sha1's alone would be 40GB!), but if they
do, we'd overflow our alloc variable.
You can reproduce this case with something like:
git init repo
cd repo
# make a pack with 2^24 objects
perl -e '
my $nr = 2**24;
for (my $i = 0; $i < $nr; $i++) {
print "blob\n";
print "data 4\n";
print pack("N", $i);
}
' | git fast-import
# now make 256 copies of it; most of these objects will be duplicates,
# but oid_array doesn't de-dup until all values are read and it can
# sort the result.
cd .git/objects/pack/
pack=$(echo *.pack)
idx=$(echo *.idx)
for i in $(seq 0 255); do
# no need to waste disk space
ln "$pack" "pack-extra-$i.pack"
ln "$idx" "pack-extra-$i.idx"
done
# and now force an oid_array to store all of it
git cat-file --batch-all-objects --batch-check
which results in:
fatal: size_t overflow: 32 * 18446744071562067968
So the good news is that st_mult() sees the problem (the large number is
because our int wraps negative, and then that gets cast to a size_t),
doing the job it was meant to: bailing in crazy situations rather than
causing an undersized buffer.
But we should avoid hitting this case at all, and instead limit
ourselves based on what malloc() is willing to give us. We can easily do
that by switching to size_t.
The cat-file process above made it to ~120GB virtual set size before the
integer overflow (our internal hash storage is 32-bytes now in
preparation for sha256, so we'd expect ~128GB total needed, plus
potentially more to copy from one realloc'd block to another)). After
this patch (and about 130GB of RAM+swap), it does eventually read in the
whole set. No test for obvious reasons.
Signed-off-by: Jeff King <peff@peff.net>
---
sha1-array.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sha1-array.h b/sha1-array.h
index dc1bca9c9a..c5e4b9324f 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -49,8 +49,8 @@
*/
struct oid_array {
struct object_id *oid;
- int nr;
- int alloc;
+ size_t nr;
+ size_t alloc;
int sorted;
};
--
2.26.0.597.g7e08ed78ff
next prev parent reply other threads:[~2020-03-30 14:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-30 14:02 [PATCH 0/7] oid_array cleanups Jeff King
2020-03-30 14:03 ` Jeff King [this message]
2020-03-30 14:09 ` [PATCH 1/7] oid_array: use size_t for count and allocation Jeff King
2020-04-15 0:27 ` Taylor Blau
2020-03-30 14:03 ` [PATCH 2/7] oid_array: use size_t for iteration Jeff King
2020-03-30 14:03 ` [PATCH 3/7] oid_array: rename source file from sha1-array Jeff King
2020-04-15 0:34 ` Taylor Blau
2020-03-30 14:04 ` [PATCH 4/7] test-tool: rename sha1-array to oid-array Jeff King
2020-03-30 14:04 ` [PATCH 5/7] bisect: stop referring to sha1_array Jeff King
2020-03-30 14:04 ` [PATCH 6/7] ref-filter: stop referring to "sha1 array" Jeff King
2020-03-30 14:04 ` [PATCH 7/7] oidset: stop referring to sha1-array Jeff King
2020-04-15 0:35 ` [PATCH 0/7] oid_array cleanups Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200330140309.GA2456038@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).