All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] writing out a huge blob to working tree
@ 2011-05-16  0:30 Junio C Hamano
  2011-05-16  0:30 ` [PATCH 01/11] packed_object_info_detail(): do not return a string Junio C Hamano
                   ` (14 more replies)
  0 siblings, 15 replies; 49+ messages in thread
From: Junio C Hamano @ 2011-05-16  0:30 UTC (permalink / raw)
  To: git

Traditionally, git always read the full contents of an object in memory
before performing various operations on it, e.g. comparing for diff,
writing it to the working tree, etc.  A huge blob that you cannot fit
in memory was very cumbersome to handle.

Recently "diff" learned to avoid reading the contents only to say "Binary
files differ" when these large blobs are marked as binary. Also there is a
topic cooking to teach "git add" to stream a large file directly to a
packfile without keeping the whole thing in core.

The "checkout" codepath is to learn the trick next, and this is the series
to attempt to do so.  These would apply cleanly on top of three other
topics still in 'next' or 'pu', namely:

 - jc/convert that cleans up the conversion;
 - jc/replacing that cleans up the object replacement;
 - jc/bigfile that teaches "git add" to handle large files.

Patch 1 and 5 are trivial clean-ups and refactoring. These could be
separated out of the series and applied much earlier, but nothing other
than this series directly benefit from these changes, so they are here in
the series.

Patch 2, 3, and 4 enhances the sha1_file layer.

Patch 6 introduces a new API that takes an object name and gives back a
"handle" you can read from (think: FILE *) the contents of the object.
The implementation at this step is deliberately kept simple: it just calls
read_sha1_file() to read everything in memory.

Patch 7 then uses the new API in the "git checkout" codepath, namely, in
entry.c::write_entry() function.  At this point, any blob that does not
require smudge filters including crlf conversion would pass through this
new codepath and used the 'incore' case of the streaming API, which means
that (1) "hold everything in memory and process" limitation is not lifted
yet, and that (2) breakage detected in here would have meant either the
simple 'incore' implementation of the streaming API is broken (not likely),
or its caller streaming_write_entry() is broken (more likely).

Patch 8 teaches the new write-out codepath to detect and make holes in the
resulting file. This is primarily meant to help testing---when you add a
large test file that weighs 1GB with "git add" (see how it is done in the
test t/t1050-large.sh on jc/bigfile topic) and check it out, you do not
want to end up with 1GB file fully populated with real blocks in your
working tree.

Patch 9 teaches the streaming API how to read a non-delta object directly
from packfile, without holding the entire result in the memory. This is
the representation jc/bigfile topic creates for a huge file, and the
primary interest of this topic.

Patch 10 and 11 teaches the streaming API how to read a loose object,
without holding the entire result in the memory. This is not strictly
necessary for the purpose of handling the output from jc/bigfile, but not
having to hold everything in core by itself may be a plus.

Interested parties may want to measure the performance impact of the last
three patches. The series deliberately ignores core.bigfileThreashold and
let small and large blobs alike go through the streaming_write_entry()
codepath, but it _might_ turn out that we would want to use the new code
only for large-ish blobs.


Junio C Hamano (11):
  packed_object_info_detail(): do not return a string
  sha1_object_info_extended(): expose a bit more info
  sha1_object_info_extended(): hint about objects in delta-base cache
  unpack_object_header(): make it public
  write_entry(): separate two helper functions out
  streaming: a new API to read from the object store
  streaming_write_entry(): use streaming API in write_entry()
  streaming_write_entry(): support files with holes
  streaming: read non-delta incrementally from a pack
  sha1_file.c: expose helpers to read loose objects
  streaming: read loose objects incrementally

 Makefile              |    2 +
 builtin/verify-pack.c |    4 +-
 cache.h               |   36 +++++-
 convert.c             |   23 +++
 entry.c               |  111 ++++++++++++---
 sha1_file.c           |   71 ++++++++--
 streaming.c           |  376 +++++++++++++++++++++++++++++++++++++++++++++++++
 streaming.h           |   12 ++
 8 files changed, 600 insertions(+), 35 deletions(-)
 create mode 100644 streaming.c
 create mode 100644 streaming.h

-- 
1.7.5.1.365.g32b65

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2011-05-21  1:50 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-16  0:30 [PATCH 00/11] writing out a huge blob to working tree Junio C Hamano
2011-05-16  0:30 ` [PATCH 01/11] packed_object_info_detail(): do not return a string Junio C Hamano
2011-05-17  0:45   ` Thiago Farina
2011-05-17  2:36     ` Junio C Hamano
2011-05-16  0:30 ` [PATCH 02/11] sha1_object_info_extended(): expose a bit more info Junio C Hamano
2011-05-16  0:30 ` [PATCH 03/11] sha1_object_info_extended(): hint about objects in delta-base cache Junio C Hamano
2011-05-16  0:40   ` Shawn Pearce
2011-05-16  0:30 ` [PATCH 04/11] unpack_object_header(): make it public Junio C Hamano
2011-05-16  0:30 ` [PATCH 05/11] write_entry(): separate two helper functions out Junio C Hamano
2011-05-16  0:30 ` [PATCH 06/11] streaming: a new API to read from the object store Junio C Hamano
2011-05-18  8:09   ` Jeff King
2011-05-19  1:52     ` Junio C Hamano
2011-05-16  0:30 ` [PATCH 07/11] streaming_write_entry(): use streaming API in write_entry() Junio C Hamano
2011-05-16  0:30 ` [PATCH 08/11] streaming_write_entry(): support files with holes Junio C Hamano
2011-05-16 10:53   ` Nguyen Thai Ngoc Duy
2011-05-16 14:39     ` Junio C Hamano
2011-05-17  1:18       ` Nguyen Thai Ngoc Duy
2011-05-17  5:23         ` Junio C Hamano
2011-05-16 13:03   ` Thiago Farina
2011-05-16  0:30 ` [PATCH 09/11] streaming: read non-delta incrementally from a pack Junio C Hamano
2011-05-16  0:58   ` Shawn Pearce
2011-05-16  5:00     ` Junio C Hamano
2011-05-16  0:30 ` [PATCH 10/11] sha1_file.c: expose helpers to read loose objects Junio C Hamano
2011-05-16  0:30 ` [PATCH 11/11] streaming: read loose objects incrementally Junio C Hamano
2011-05-16  0:47 ` [PATCH 00/11] writing out a huge blob to working tree Shawn Pearce
2011-05-18  6:41 ` Jeff King
2011-05-18  7:08   ` Jeff King
2011-05-18  7:50     ` Jeff King
2011-05-18 15:12       ` Junio C Hamano
2011-05-18  8:17 ` Jeff King
2011-05-19 21:33 ` [PATCH v2 " Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 01/11] packed_object_info_detail(): do not return a string Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 02/11] sha1_object_info_extended(): expose a bit more info Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 03/11] sha1_object_info_extended(): hint about objects in delta-base cache Junio C Hamano
2011-05-20 23:05     ` René Scharfe
2011-05-21  1:49       ` Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 04/11] unpack_object_header(): make it public Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 05/11] write_entry(): separate two helper functions out Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 06/11] streaming: a new API to read from the object store Junio C Hamano
2011-05-20 23:05     ` René Scharfe
2011-05-21  1:49       ` Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 07/11] streaming_write_entry(): use streaming API in write_entry() Junio C Hamano
2011-05-20 22:52     ` Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 08/11] streaming_write_entry(): support files with holes Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 09/11] streaming: read non-delta incrementally from a pack Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 10/11] sha1_file.c: expose helpers to read loose objects Junio C Hamano
2011-05-19 21:33   ` [PATCH v2 11/11] streaming: read loose objects incrementally Junio C Hamano
2011-05-19 21:44   ` [Not A PATCH v2 02/11] interdiff Junio C Hamano
2011-05-19 22:21   ` [PATCH v2 00/11] writing out a huge blob to working tree Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.