Understanding version 4 packs

* Understanding version 4 packs
@ 2007-03-24 20:23 Peter Eriksen
  2007-03-24 23:24 ` Nicolas Pitre
  2007-03-25  8:46 ` Shawn O. Pearce
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Eriksen @ 2007-03-24 20:23 UTC (permalink / raw)
  To: git

Hello Shawn (and Nicolas and other interested parties),

I have been reading the commits in the
git://repo.or.cz/git/fastimport.git/ repository (git makes it quite easy
to see what differs from mainline using "git log master..pack4"), and I
think, I have understood some of the details.

The easiest thing to get was the file name table, which is placed in the
beginning of the pack (after the header) using the format:

+------------+-------------------------------+
| NR_ENTRIES |  Compressed file name table   |
+------------+-------------------------------+
   4 bytes

The uncompressed file name table contains NR_ENTRIES entries,
and looks like this:

+------+--------------+------+------------------------+----
| MODE |  Full path 1 | MODE |   Full path 2          | ...
+------+--------------+------+------------------------+----
 2 bytes   n1 bytes    2 bytes     n2 bytes     

The table is sorted by path then mode for easy binary lookup, and so
that pointers into this table can be compared directly instead of
comparing the corresponding paths and modes.

There is a new tree type called OBJ_DICT_TREE, which looks something
like the following:

+-----------------+------------------------------------------------+----
|  Table offset   |  SHA-1 of the blob corresponding to the path.  | ...
+-----------------+------------------------------------------------+----
      6 bytes                     20 bytes

These new tree objects will remain uncompressed in the pack file, but
sorted with, and deltaed against other tree objects. All normal tree
objects are converted to OBJ_DICT_TREE when packing, and are converted
back on the fly to callers who need an ordinary OBJ_TREE.

The index (.idx) files are extended to have a 4 byte pointer to the
offset of this file name table in the pack file for easy lookup.

There is something similar with a table of common strings in commit
objects (e.g. author and timezone), and a new object OBJ_DICT_COMMIT,
but I have not understood that quite yet.

Is there something, I have gotten wrong with regards to my
understanding?

Regards,

Peter

^ permalink raw reply	[flat|nested] 19+ messages in thread