git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Han Xin" <chiyutianyi@gmail.com>,
	"Jiang Xin" <worldhello.net@gmail.com>,
	"René Scharfe" <l.s.r@web.de>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Philip Oakley" <philipoakley@iee.email>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v11 0/8] unpack-objects: support streaming blobs to disk
Date: Sat, 19 Mar 2022 01:23:17 +0100	[thread overview]
Message-ID: <cover-v11-0.8-00000000000-20220319T001411Z-avarab@gmail.com> (raw)
In-Reply-To: <cover-v10-0.6-00000000000-20220204T135538Z-avarab@gmail.com>

This series by Han Xin was waiting on some in-flight patches that
landed in 430883a70c7 (Merge branch 'ab/object-file-api-updates',
2022-03-16).

This series teaches "git unpack-objects" to stream objects larger than
core.bigFileThreshold to disk. As 8/8 shows streaming e.g. a 100MB
blob now uses ~5MB of memory instead of ~105MB. This streaming method
is slower if you've got memory to handle the blobs in-core, but if you
don't it allows you to unpack objects at all, as you might otherwise
OOM.

Changes since v10:

 * Renamed the new test file, its number conflicted with a
   since-landed commit-graph test.

 * Some minor code changes to make diffs to the pre-image smaller
   (e.g. the top of the range-diff below)

 * The whole "find dest.git" to see if we have loose objects is now
   either a test for "do we have objects at all?" (--dry-run mode), or
   uses a simpler implementation. We could use
   "test_stdout_line_count" for that.

 * We also test that as we use "unpack-objects" to stream directly to
   a pack that the result is byte-for-byte the same as the source.

 * A new 4/8 that I added allows for more code sharing in
   object-file.c, our two end-state functions now share more logic.

 * Minor typo/grammar/comment etc. fixes throughout.

 * Updated 8/8 with benchmarks, somewhere along the line we lost the
   code to run the benchmark mentioned in the commit message...

1. https://lore.kernel.org/git/cover-v10-0.6-00000000000-20220204T135538Z-avarab@gmail.com/

Han Xin (4):
  unpack-objects: low memory footprint for get_data() in dry_run mode
  object-file.c: refactor write_loose_object() to several steps
  object-file.c: add "stream_loose_object()" to handle large object
  unpack-objects: use stream_loose_object() to unpack large objects

Ævar Arnfjörð Bjarmason (4):
  object-file.c: do fsync() and close() before post-write die()
  object-file.c: factor out deflate part of write_loose_object()
  core doc: modernize core.bigFileThreshold documentation
  unpack-objects: refactor away unpack_non_delta_entry()

 Documentation/config/core.txt   |  33 +++--
 builtin/unpack-objects.c        | 109 +++++++++++---
 object-file.c                   | 250 +++++++++++++++++++++++++++-----
 object-store.h                  |   8 +
 t/t5351-unpack-large-objects.sh |  61 ++++++++
 5 files changed, 397 insertions(+), 64 deletions(-)
 create mode 100755 t/t5351-unpack-large-objects.sh

Range-diff against v10:
1:  e46eb75b98f ! 1:  2103d5bfd96 unpack-objects: low memory footprint for get_data() in dry_run mode
    @@ builtin/unpack-objects.c: static void use(int bytes)
      {
      	git_zstream stream;
     -	void *buf = xmallocz(size);
    -+	unsigned long bufsize;
    -+	void *buf;
    ++	unsigned long bufsize = dry_run && size > 8192 ? 8192 : size;
    ++	void *buf = xmallocz(bufsize);
      
      	memset(&stream, 0, sizeof(stream));
    -+	if (dry_run && size > 8192)
    -+		bufsize = 8192;
    -+	else
    -+		bufsize = size;
    -+	buf = xmallocz(bufsize);
      
      	stream.next_out = buf;
     -	stream.avail_out = size;
    @@ builtin/unpack-objects.c: static void unpack_delta_entry(enum object_type type,
      		hi = nr;
      		while (lo < hi) {
     
    - ## t/t5328-unpack-large-objects.sh (new) ##
    + ## t/t5351-unpack-large-objects.sh (new) ##
     @@
     +#!/bin/sh
     +#
    @@ t/t5328-unpack-large-objects.sh (new)
     +	git init --bare dest.git
     +}
     +
    -+test_no_loose () {
    -+	test $(find dest.git/objects/?? -type f | wc -l) = 0
    -+}
    -+
     +test_expect_success "create large objects (1.5 MB) and PACK" '
     +	test-tool genrandom foo 1500000 >big-blob &&
     +	test_commit --append foo big-blob &&
     +	test-tool genrandom bar 1500000 >big-blob &&
     +	test_commit --append bar big-blob &&
    -+	PACK=$(echo HEAD | git pack-objects --revs test)
    ++	PACK=$(echo HEAD | git pack-objects --revs pack)
     +'
     +
     +test_expect_success 'set memory limitation to 1MB' '
    @@ t/t5328-unpack-large-objects.sh (new)
     +
     +test_expect_success 'unpack-objects failed under memory limitation' '
     +	prepare_dest &&
    -+	test_must_fail git -C dest.git unpack-objects <test-$PACK.pack 2>err &&
    ++	test_must_fail git -C dest.git unpack-objects <pack-$PACK.pack 2>err &&
     +	grep "fatal: attempting to allocate" err
     +'
     +
     +test_expect_success 'unpack-objects works with memory limitation in dry-run mode' '
     +	prepare_dest &&
    -+	git -C dest.git unpack-objects -n <test-$PACK.pack &&
    -+	test_no_loose &&
    ++	git -C dest.git unpack-objects -n <pack-$PACK.pack &&
    ++	test_stdout_line_count = 0 find dest.git/objects -type f &&
     +	test_dir_is_empty dest.git/objects/pack
     +'
     +
2:  48bf9090058 = 2:  6acd8759772 object-file.c: do fsync() and close() before post-write die()
3:  0e33d2a6e35 = 3:  f7b02c307fc object-file.c: refactor write_loose_object() to several steps
-:  ----------- > 4:  20d97cc2605 object-file.c: factor out deflate part of write_loose_object()
4:  9644df5c744 ! 5:  db40f4160c4 object-file.c: add "stream_loose_object()" to handle large object
    @@ Commit message
         Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-file.c ##
     @@ object-file.c: static int freshen_packed_object(const struct object_id *oid)
    @@ object-file.c: static int freshen_packed_object(const struct object_id *oid)
     +	strbuf_addf(&filename, "%s/", get_object_directory());
     +	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
     +
    -+	/* Common steps for write_loose_object and stream_loose_object to
    -+	 * start writing loose oject:
    ++	/*
    ++	 * Common steps for write_loose_object and stream_loose_object to
    ++	 * start writing loose objects:
     +	 *
     +	 *  - Create tmpfile for the loose object.
     +	 *  - Setup zlib stream for compression.
    @@ object-file.c: static int freshen_packed_object(const struct object_id *oid)
     +	/* Then the data itself.. */
     +	do {
     +		unsigned char *in0 = stream.next_in;
    ++
     +		if (!stream.avail_in && !in_stream->is_finished) {
     +			const void *in = in_stream->read(in_stream, &stream.avail_in);
     +			stream.next_in = (void *)in;
     +			in0 = (unsigned char *)in;
     +			/* All data has been read. */
     +			if (in_stream->is_finished)
    -+				flush = Z_FINISH;
    ++				flush = 1;
     +		}
    -+		ret = git_deflate(&stream, flush);
    -+		the_hash_algo->update_fn(&c, in0, stream.next_in - in0);
    -+		if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
    -+			die(_("unable to write loose object file"));
    -+		stream.next_out = compressed;
    -+		stream.avail_out = sizeof(compressed);
    ++		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
    ++						compressed, sizeof(compressed));
     +		/*
     +		 * Unlike write_loose_object(), we do not have the entire
     +		 * buffer. If we get Z_BUF_ERROR due to too few input bytes,
5:  4550f3a2745 = 6:  d8ae2eadb98 core doc: modernize core.bigFileThreshold documentation
-:  ----------- > 7:  2b403e7cd9c unpack-objects: refactor away unpack_non_delta_entry()
6:  6a70e49a346 ! 8:  5eded902496 unpack-objects: use stream_loose_object() to unpack large objects
    @@ Commit message
         malloc() the size of the blob before unpacking it, which could cause
         OOM with very large blobs.
     
    -    We could use this new interface to unpack all blobs, but doing so
    -    would result in a performance penalty of around 10%, as the below
    -    "hyperfine" benchmark will show. We therefore limit this to files
    -    larger than "core.bigFileThreshold":
    -
    -        $ hyperfine \
    -          --setup \
    -          'if ! test -d scalar.git; then git clone --bare
    -           https://github.com/microsoft/scalar.git;
    -           cp scalar.git/objects/pack/*.pack small.pack; fi' \
    -          --prepare 'rm -rf dest.git && git init --bare dest.git' \
    -          ...
    -
    -        Summary
    -          './git -C dest.git -c core.bigFileThreshold=512m
    -          unpack-objects <small.pack' in 'origin/master'
    -            1.01 ± 0.04 times faster than './git -C dest.git
    -                    -c core.bigFileThreshold=512m unpack-objects
    -                    <small.pack' in 'HEAD~1'
    -            1.01 ± 0.04 times faster than './git -C dest.git
    -                    -c core.bigFileThreshold=512m unpack-objects
    -                    <small.pack' in 'HEAD~0'
    -            1.03 ± 0.10 times faster than './git -C dest.git
    -                    -c core.bigFileThreshold=16k unpack-objects
    -                    <small.pack' in 'origin/master'
    -            1.02 ± 0.07 times faster than './git -C dest.git
    -                    -c core.bigFileThreshold=16k unpack-objects
    -                    <small.pack' in 'HEAD~0'
    -            1.10 ± 0.04 times faster than './git -C dest.git
    -                    -c core.bigFileThreshold=16k unpack-objects
    -                    <small.pack' in 'HEAD~1'
    +    We could use the new streaming interface to unpack all blobs, but
    +    doing so would be much slower, as demonstrated e.g. with this
    +    benchmark using git-hyperfine[0]:
    +
    +            rm -rf /tmp/scalar.git &&
    +            git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git &&
    +            mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack &&
    +            git hyperfine \
    +                    -r 2 --warmup 1 \
    +                    -L rev origin/master,HEAD -L v "10,512,1k,1m" \
    +                    -s 'make' \
    +                    -p 'git init --bare dest.git' \
    +                    -c 'rm -rf dest.git' \
    +                    './git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/scalar.git/my.pack'
    +
    +    Here we'll perform worse with lower core.bigFileThreshold settings
    +    with this change in terms of speed, but we're getting lower memory use
    +    in return:
    +
    +            Summary
    +              './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master' ran
    +                1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
    +                1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
    +                1.01 ± 0.02 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    +                1.02 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
    +                1.09 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    +                1.10 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    +                1.11 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
    +
    +    A better benchmark to demonstrate the benefits of that this one, which
    +    creates an artificial repo with a 1, 25, 50, 75 and 100MB blob:
    +
    +            rm -rf /tmp/repo &&
    +            git init /tmp/repo &&
    +            (
    +                    cd /tmp/repo &&
    +                    for i in 1 25 50 75 100
    +                    do
    +                            dd if=/dev/urandom of=blob.$i count=$(($i*1024)) bs=1024
    +                    done &&
    +                    git add blob.* &&
    +                    git commit -mblobs &&
    +                    git gc &&
    +                    PACK=$(echo .git/objects/pack/pack-*.pack) &&
    +                    cp "$PACK" my.pack
    +            ) &&
    +            git hyperfine \
    +                    --show-output \
    +                    -L rev origin/master,HEAD -L v "512,50m,100m" \
    +                    -s 'make' \
    +                    -p 'git init --bare dest.git' \
    +                    -c 'rm -rf dest.git' \
    +                    '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum'
    +
    +    Using this test we'll always use >100MB of memory on
    +    origin/master (around ~105MB), but max out at e.g. ~55MB if we set
    +    core.bigFileThreshold=50m.
    +
    +    The relevant "Maximum resident set size" lines were manually added
    +    below the relevant benchmark:
    +
    +      '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master' ran
    +            Maximum resident set size (kbytes): 107080
    +        1.02 ± 0.78 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
    +            Maximum resident set size (kbytes): 106968
    +        1.09 ± 0.79 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
    +            Maximum resident set size (kbytes): 107032
    +        1.42 ± 1.07 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
    +            Maximum resident set size (kbytes): 107072
    +        1.83 ± 1.02 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
    +            Maximum resident set size (kbytes): 55704
    +        2.16 ± 1.19 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
    +            Maximum resident set size (kbytes): 4564
    +
    +    This shows that if you have enough memory this new streaming method is
    +    slower the lower you set the streaming threshold, but the benefit is
    +    more bounded memory use.
     
         An earlier version of this patch introduced a new
         "core.bigFileStreamingThreshold" instead of re-using the existing
    @@ Commit message
         split up "core.bigFileThreshold" in the future if there's a need for
         that.
     
    +    0. https://github.com/avar/git-hyperfine/
         1. https://lore.kernel.org/git/20211210103435.83656-1-chiyutianyi@gmail.com/
         2. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/
     
    @@ Commit message
         Helped-by: Derrick Stolee <stolee@gmail.com>
         Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/config/core.txt ##
     @@ Documentation/config/core.txt: usage, at the slight expense of increased disk usage.
    @@ builtin/unpack-objects.c: static void added_object(unsigned nr, enum object_type
     +	return data->buf;
     +}
     +
    -+static void write_stream_blob(unsigned nr, size_t size)
    ++static void stream_blob(unsigned long size, unsigned nr)
     +{
     +	git_zstream zstream = { 0 };
     +	struct input_zstream_data data = { 0 };
    @@ builtin/unpack-objects.c: static void added_object(unsigned nr, enum object_type
     +		.read = feed_input_zstream,
     +		.data = &data,
     +	};
    ++	struct obj_info *info = &obj_list[nr];
     +
     +	data.zstream = &zstream;
     +	git_inflate_init(&zstream);
     +
    -+	if (stream_loose_object(&in_stream, size, &obj_list[nr].oid))
    ++	if (stream_loose_object(&in_stream, size, &info->oid))
     +		die(_("failed to write object in stream"));
     +
     +	if (data.status != Z_STREAM_END)
    @@ builtin/unpack-objects.c: static void added_object(unsigned nr, enum object_type
     +	git_inflate_end(&zstream);
     +
     +	if (strict) {
    -+		struct blob *blob =
    -+			lookup_blob(the_repository, &obj_list[nr].oid);
    -+		if (blob)
    -+			blob->object.flags |= FLAG_WRITTEN;
    -+		else
    ++		struct blob *blob = lookup_blob(the_repository, &info->oid);
    ++
    ++		if (!blob)
     +			die(_("invalid blob object from stream"));
    ++		blob->object.flags |= FLAG_WRITTEN;
     +	}
    -+	obj_list[nr].obj = NULL;
    ++	info->obj = NULL;
     +}
     +
    - static void unpack_non_delta_entry(enum object_type type, unsigned long size,
    - 				   unsigned nr)
    + static int resolve_against_held(unsigned nr, const struct object_id *base,
    + 				void *delta_data, unsigned long delta_size)
      {
    --	void *buf = get_data(size);
    -+	void *buf;
    -+
    -+	/* Write large blob in stream without allocating full buffer. */
    -+	if (!dry_run && type == OBJ_BLOB && size > big_file_threshold) {
    -+		write_stream_blob(nr, size);
    -+		return;
    -+	}
    +@@ builtin/unpack-objects.c: static void unpack_one(unsigned nr)
      
    -+	buf = get_data(size);
    - 	if (buf)
    - 		write_object(nr, type, buf, size);
    - }
    + 	switch (type) {
    + 	case OBJ_BLOB:
    ++		if (!dry_run && size > big_file_threshold) {
    ++			stream_blob(size, nr);
    ++			return;
    ++		}
    ++		/* fallthrough */
    + 	case OBJ_COMMIT:
    + 	case OBJ_TREE:
    + 	case OBJ_TAG:
     
    - ## t/t5328-unpack-large-objects.sh ##
    -@@ t/t5328-unpack-large-objects.sh: test_description='git unpack-objects with large objects'
    + ## t/t5351-unpack-large-objects.sh ##
    +@@ t/t5351-unpack-large-objects.sh: test_description='git unpack-objects with large objects'
      
      prepare_dest () {
      	test_when_finished "rm -rf dest.git" &&
     -	git init --bare dest.git
     +	git init --bare dest.git &&
    -+	if test -n "$1"
    -+	then
    -+		git -C dest.git config core.bigFileThreshold $1
    -+	fi
    ++	git -C dest.git config core.bigFileThreshold "$1"
      }
      
    - test_no_loose () {
    -@@ t/t5328-unpack-large-objects.sh: test_expect_success 'set memory limitation to 1MB' '
    + test_expect_success "create large objects (1.5 MB) and PACK" '
    +@@ t/t5351-unpack-large-objects.sh: test_expect_success 'set memory limitation to 1MB' '
      '
      
      test_expect_success 'unpack-objects failed under memory limitation' '
     -	prepare_dest &&
     +	prepare_dest 2m &&
    - 	test_must_fail git -C dest.git unpack-objects <test-$PACK.pack 2>err &&
    + 	test_must_fail git -C dest.git unpack-objects <pack-$PACK.pack 2>err &&
      	grep "fatal: attempting to allocate" err
      '
      
      test_expect_success 'unpack-objects works with memory limitation in dry-run mode' '
     -	prepare_dest &&
     +	prepare_dest 2m &&
    - 	git -C dest.git unpack-objects -n <test-$PACK.pack &&
    - 	test_no_loose &&
    + 	git -C dest.git unpack-objects -n <pack-$PACK.pack &&
    + 	test_stdout_line_count = 0 find dest.git/objects -type f &&
      	test_dir_is_empty dest.git/objects/pack
      '
      
     +test_expect_success 'unpack big object in stream' '
     +	prepare_dest 1m &&
    -+	git -C dest.git unpack-objects <test-$PACK.pack &&
    ++	git -C dest.git unpack-objects <pack-$PACK.pack &&
     +	test_dir_is_empty dest.git/objects/pack
     +'
     +
     +test_expect_success 'do not unpack existing large objects' '
     +	prepare_dest 1m &&
    -+	git -C dest.git index-pack --stdin <test-$PACK.pack &&
    -+	git -C dest.git unpack-objects <test-$PACK.pack &&
    -+	test_no_loose
    ++	git -C dest.git index-pack --stdin <pack-$PACK.pack &&
    ++	git -C dest.git unpack-objects <pack-$PACK.pack &&
    ++
    ++	# The destination came up with the exact same pack...
    ++	DEST_PACK=$(echo dest.git/objects/pack/pack-*.pack) &&
    ++	test_cmp pack-$PACK.pack $DEST_PACK &&
    ++
    ++	# ...and wrote no loose objects
    ++	test_stdout_line_count = 0 find dest.git/objects -type f ! -name "pack-*"
     +'
     +
      test_done
-- 
2.35.1.1438.g8874c8eeb35


  parent reply	other threads:[~2022-03-19  0:23 UTC|newest]

Thread overview: 211+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-09  8:20 [PATCH] unpack-objects: unpack large object in stream Han Xin
2021-10-19  7:37 ` Han Xin
2021-10-20 14:42 ` Philip Oakley
2021-10-21  3:42   ` Han Xin
2021-10-21 22:47     ` Philip Oakley
2021-11-03  1:48 ` Han Xin
2021-11-03 10:07   ` Philip Oakley
2021-11-12  9:40 ` [PATCH v2 1/6] object-file: refactor write_loose_object() to support inputstream Han Xin
2021-11-18  4:59   ` Jiang Xin
2021-11-18  6:45     ` Junio C Hamano
2021-11-12  9:40 ` [PATCH v2 2/6] object-file.c: add dry_run mode for write_loose_object() Han Xin
2021-11-18  5:42   ` Jiang Xin
2021-11-12  9:40 ` [PATCH v2 3/6] object-file.c: handle nil oid in write_loose_object() Han Xin
2021-11-18  5:49   ` Jiang Xin
2021-11-12  9:40 ` [PATCH v2 4/6] object-file.c: read input stream repeatedly " Han Xin
2021-11-18  5:56   ` Jiang Xin
2021-11-12  9:40 ` [PATCH v2 5/6] object-store.h: add write_loose_object() Han Xin
2021-11-12  9:40 ` [PATCH v2 6/6] unpack-objects: unpack large object in stream Han Xin
2021-11-18  7:14   ` Jiang Xin
2021-11-22  3:32 ` [PATCH v3 0/5] unpack large objects " Han Xin
2021-11-29  7:01   ` Han Xin
2021-11-29 19:12     ` Jeff King
2021-11-30  2:57       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 " Han Xin
2021-12-07 16:18     ` Derrick Stolee
2021-12-10 10:34     ` [PATCH v5 0/6] unpack large blobs " Han Xin
2021-12-17 11:26       ` Han Xin
2021-12-21 11:51         ` [PATCH v7 0/5] " Han Xin
2021-12-21 11:51         ` [PATCH v7 1/5] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-21 14:09           ` Ævar Arnfjörð Bjarmason
2021-12-21 14:43             ` René Scharfe
2021-12-21 15:04               ` Ævar Arnfjörð Bjarmason
2021-12-22 11:15               ` Jiang Xin
2021-12-22 11:29             ` Jiang Xin
2021-12-31  3:06           ` Jiang Xin
2021-12-21 11:51         ` [PATCH v7 2/5] object-file API: add a format_object_header() function Han Xin
2021-12-21 14:30           ` René Scharfe
2022-02-01 14:28             ` C99 %z (was: [PATCH v7 2/5] object-file API: add a format_object_header() function) Ævar Arnfjörð Bjarmason
2021-12-31  3:12           ` [PATCH v7 2/5] object-file API: add a format_object_header() function Jiang Xin
2021-12-21 11:51         ` [PATCH v7 3/5] object-file.c: refactor write_loose_object() to reuse in stream version Han Xin
2021-12-21 14:16           ` Ævar Arnfjörð Bjarmason
2021-12-22 12:02             ` Jiang Xin
2021-12-21 11:52         ` [PATCH v7 4/5] object-file.c: add "write_stream_object_file()" to support read in stream Han Xin
2021-12-21 14:20           ` Ævar Arnfjörð Bjarmason
2021-12-21 15:05             ` Ævar Arnfjörð Bjarmason
2021-12-21 11:52         ` [PATCH v7 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-21 15:06           ` Ævar Arnfjörð Bjarmason
2021-12-31  3:19           ` Jiang Xin
2022-01-08  8:54         ` [PATCH v8 0/6] unpack large blobs in stream Han Xin
2022-01-20 11:21           ` [PATCH v9 0/5] " Han Xin
2022-02-01 21:24             ` Ævar Arnfjörð Bjarmason
2022-02-02  8:32               ` Han Xin
2022-02-02 10:59                 ` Ævar Arnfjörð Bjarmason
2022-02-04 14:07             ` [PATCH v10 0/6] unpack-objects: support streaming large objects to disk Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 1/6] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 2/6] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 3/6] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 4/6] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 5/6] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 6/6] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-03-19  0:23               ` Ævar Arnfjörð Bjarmason [this message]
2022-03-19  0:23                 ` [PATCH v11 1/8] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 2/8] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 3/8] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-03-19 10:11                   ` René Scharfe
2022-03-19  0:23                 ` [PATCH v11 4/8] object-file.c: factor out deflate part of write_loose_object() Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 5/8] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 6/8] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 7/8] unpack-objects: refactor away unpack_non_delta_entry() Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 8/8] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-03-29 13:56                 ` [PATCH v12 0/8] unpack-objects: support streaming blobs to disk Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 1/8] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 2/8] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 3/8] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-03-30  7:13                     ` Han Xin
2022-03-30 17:34                       ` Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 4/8] object-file.c: factor out deflate part of write_loose_object() Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 5/8] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-03-31 19:54                     ` Neeraj Singh
2022-03-29 13:56                   ` [PATCH v12 6/8] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 7/8] unpack-objects: refactor away unpack_non_delta_entry() Ævar Arnfjörð Bjarmason
2022-03-30 19:40                     ` René Scharfe
2022-03-31 12:42                       ` Ævar Arnfjörð Bjarmason
2022-03-31 16:38                         ` René Scharfe
2022-03-29 13:56                   ` [PATCH v12 8/8] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-06-04 10:10                   ` [PATCH v13 0/7] unpack-objects: support streaming blobs to disk Ævar Arnfjörð Bjarmason
2022-06-04 10:10                     ` [PATCH v13 1/7] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-06-06 18:35                       ` Junio C Hamano
2022-06-09  4:10                         ` Han Xin
2022-06-09 18:27                           ` Junio C Hamano
2022-06-10  1:50                             ` Han Xin
2022-06-10  2:05                               ` Ævar Arnfjörð Bjarmason
2022-06-10 12:04                                 ` Han Xin
2022-06-04 10:10                     ` [PATCH v13 2/7] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-06-06 18:45                       ` Junio C Hamano
2022-06-04 10:10                     ` [PATCH v13 3/7] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-06-04 10:10                     ` [PATCH v13 4/7] object-file.c: factor out deflate part of write_loose_object() Ævar Arnfjörð Bjarmason
2022-06-04 10:10                     ` [PATCH v13 5/7] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-06-06 19:44                       ` Junio C Hamano
2022-06-06 20:02                         ` Junio C Hamano
2022-06-09  6:04                           ` Han Xin
2022-06-09  6:14                         ` Han Xin
2022-06-07 19:53                       ` Neeraj Singh
2022-06-08 15:34                         ` Junio C Hamano
2022-06-09  3:05                         ` [RFC PATCH] object-file.c: batched disk flushes for stream_loose_object() Han Xin
2022-06-09  7:35                           ` Neeraj Singh
2022-06-09  9:30                           ` Johannes Schindelin
2022-06-10 12:55                             ` Han Xin
2022-06-04 10:10                     ` [PATCH v13 6/7] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-06-06 19:50                       ` Junio C Hamano
2022-06-04 10:10                     ` [PATCH v13 7/7] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-06-10 14:46                     ` [PATCH v14 0/7] unpack-objects: support streaming blobs to disk Han Xin
2022-06-10 14:46                       ` [PATCH v14 1/7] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-06-10 14:46                       ` [PATCH v14 2/7] object-file.c: do fsync() and close() before post-write die() Han Xin
2022-06-10 21:10                         ` René Scharfe
2022-06-10 21:33                           ` Junio C Hamano
2022-06-11  1:50                             ` Han Xin
2022-06-10 14:46                       ` [PATCH v14 3/7] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-06-10 14:46                       ` [PATCH v14 4/7] object-file.c: factor out deflate part of write_loose_object() Han Xin
2022-06-10 14:46                       ` [PATCH v14 5/7] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-06-10 14:46                       ` [PATCH v14 6/7] core doc: modernize core.bigFileThreshold documentation Han Xin
2022-06-10 21:01                         ` Junio C Hamano
2022-06-10 14:46                       ` [PATCH v14 7/7] unpack-objects: use stream_loose_object() to unpack large objects Han Xin
2022-06-11  2:44                       ` [PATCH v15 0/6] unpack-objects: support streaming blobs to disk Han Xin
2022-06-11  2:44                         ` [PATCH v15 1/6] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-06-11  2:44                         ` [PATCH v15 2/6] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-06-11  2:44                         ` [PATCH v15 3/6] object-file.c: factor out deflate part of write_loose_object() Han Xin
2022-06-11  2:44                         ` [PATCH v15 4/6] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-06-11  2:44                         ` [PATCH v15 5/6] core doc: modernize core.bigFileThreshold documentation Han Xin
2022-06-11  2:44                         ` [PATCH v15 6/6] unpack-objects: use stream_loose_object() to unpack large objects Han Xin
2022-07-01  2:01                           ` Junio C Hamano
2022-05-20  3:05                 ` [PATCH 0/1] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-05-20  3:05                   ` [PATCH 1/1] " Han Xin
2022-01-20 11:21           ` [PATCH v9 1/5] " Han Xin
2022-01-20 11:21           ` [PATCH v9 2/5] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-01-20 11:21           ` [PATCH v9 3/5] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-01-20 11:21           ` [PATCH v9 4/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2022-01-20 11:21           ` [PATCH v9 5/5] object-file API: add a format_object_header() function Han Xin
2022-01-08  8:54         ` [PATCH v8 1/6] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-01-08 12:28           ` René Scharfe
2022-01-11 10:41             ` Han Xin
2022-01-08  8:54         ` [PATCH v8 2/6] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-01-08 12:28           ` René Scharfe
2022-01-11 10:33             ` Han Xin
2022-01-08  8:54         ` [PATCH v8 3/6] object-file.c: remove the slash for directory_size() Han Xin
2022-01-08 17:24           ` René Scharfe
2022-01-11 10:14             ` Han Xin
2022-01-08  8:54         ` [PATCH v8 4/6] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-01-08  8:54         ` [PATCH v8 5/6] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2022-01-08  8:54         ` [PATCH v8 6/6] object-file API: add a format_object_header() function Han Xin
2021-12-17 11:26       ` [PATCH v6 1/6] object-file.c: release strbuf in write_loose_object() Han Xin
2021-12-17 19:28         ` René Scharfe
2021-12-18  0:09           ` Junio C Hamano
2021-12-17 11:26       ` [PATCH v6 2/6] object-file.c: refactor object header generation into a function Han Xin
2021-12-20 12:10         ` [RFC PATCH] object-file API: add a format_loose_header() function Ævar Arnfjörð Bjarmason
2021-12-20 12:48           ` Philip Oakley
2021-12-20 22:25           ` Junio C Hamano
2021-12-21  1:42             ` Ævar Arnfjörð Bjarmason
2021-12-21  2:11               ` Junio C Hamano
2021-12-21  2:27                 ` Ævar Arnfjörð Bjarmason
2021-12-21 11:43           ` Han Xin
2021-12-17 11:26       ` [PATCH v6 3/6] object-file.c: refactor write_loose_object() to reuse in stream version Han Xin
2021-12-17 11:26       ` [PATCH v6 4/6] object-file.c: make "write_object_file_flags()" to support read in stream Han Xin
2021-12-17 22:52         ` René Scharfe
2021-12-17 11:26       ` [PATCH v6 5/6] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-17 21:22         ` René Scharfe
2021-12-17 11:26       ` [PATCH v6 6/6] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-10 10:34     ` [PATCH v5 1/6] object-file: refactor write_loose_object() to support read from stream Han Xin
2021-12-10 10:34     ` [PATCH v5 2/6] object-file.c: handle undetermined oid in write_loose_object() Han Xin
2021-12-13  7:32       ` Ævar Arnfjörð Bjarmason
2021-12-10 10:34     ` [PATCH v5 3/6] object-file.c: read stream in a loop " Han Xin
2021-12-10 10:34     ` [PATCH v5 4/6] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-10 10:34     ` [PATCH v5 5/6] object-file.c: make "write_object_file_flags()" to support "HASH_STREAM" Han Xin
2021-12-10 10:34     ` [PATCH v5 6/6] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-13  8:05       ` Ævar Arnfjörð Bjarmason
2021-12-03  9:35   ` [PATCH v4 1/5] object-file: refactor write_loose_object() to read buffer from stream Han Xin
2021-12-03 13:28     ` Ævar Arnfjörð Bjarmason
2021-12-06  2:07       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 2/5] object-file.c: handle undetermined oid in write_loose_object() Han Xin
2021-12-03 13:21     ` Ævar Arnfjörð Bjarmason
2021-12-06  2:51       ` Han Xin
2021-12-03 13:41     ` Ævar Arnfjörð Bjarmason
2021-12-06  3:12       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 3/5] object-file.c: read stream in a loop " Han Xin
2021-12-03  9:35   ` [PATCH v4 4/5] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-03 13:59     ` Ævar Arnfjörð Bjarmason
2021-12-06  3:20       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-03 13:07     ` Ævar Arnfjörð Bjarmason
2021-12-07  6:42       ` Han Xin
2021-12-03 13:54     ` Ævar Arnfjörð Bjarmason
2021-12-07  6:17       ` Han Xin
2021-12-03 14:05     ` Ævar Arnfjörð Bjarmason
2021-12-07  6:48       ` Han Xin
2021-11-22  3:32 ` [PATCH v3 1/5] object-file: refactor write_loose_object() to read buffer from stream Han Xin
2021-11-23 23:24   ` Junio C Hamano
2021-11-24  9:00     ` Han Xin
2021-11-22  3:32 ` [PATCH v3 2/5] object-file.c: handle undetermined oid in write_loose_object() Han Xin
2021-11-29 15:10   ` Derrick Stolee
2021-11-29 20:44     ` Junio C Hamano
2021-11-29 22:18       ` Derrick Stolee
2021-11-30  3:23         ` Han Xin
2021-11-22  3:32 ` [PATCH v3 3/5] object-file.c: read stream in a loop " Han Xin
2021-11-22  3:32 ` [PATCH v3 4/5] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-11-22  3:32 ` [PATCH v3 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-11-29 17:37   ` Derrick Stolee
2021-11-30 13:49     ` Han Xin
2021-11-30 18:38       ` Derrick Stolee
2021-12-01 20:37         ` "git hyperfine" (was: [PATCH v3 5/5] unpack-objects[...]) Ævar Arnfjörð Bjarmason
2021-12-02  7:33         ` [PATCH v3 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-02 13:53           ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover-v11-0.8-00000000000-20220319T001411Z-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=chiyutianyi@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=philipoakley@iee.email \
    --cc=stolee@gmail.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).