All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH] define the way new representation types are encoded in the pack
@ 2011-10-28  6:04 Junio C Hamano
  2011-10-28  6:12 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-10-28  6:04 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Shawn O. Pearce, Jeff King

In addition to four basic types (commit, tree, blob and tag), the pack
stream can encode a few other "representation" types, such as REF_DELTA
and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
we do not have much room to add new representation types in place, but we
do have one value reserved for future expansion.

This patch is about defining how that reserved value is used.

The first byte in the pack stream data consists of the following for the
current representation types:

 - Bit 0-3 are used for the low 4-bit of "some" size (not necessarily the
   size of the representation);

 - Bit 4-6 are used for object types 0-7, but we have not used type 5 so
   far and reserved it for future expansion (we could also use type 0
   recorded in the pack stream for future expansion, just like how I
   convert 5 into the real "extended" representation type in this patch);

 - Bit 7 is used to signal if the second byte needs to be read for sizes
   that do not fit in the 4-bit.

When bit 4-6 encodes type 5, the first byte is used this way:

 - Bit 0-3 denotes the real "extended" representation type. Because types
   0-7 can already be encoded without using the extended format, we can
   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
   type 11 = 3 + 8);

 - Bit 4-6 has the value "5";

 - Bit 7 is used to signal if the _third_ byte needs to be read for larger
   size that cannot be represented with 8-bit.

As it is unlikely for us to pack things that do not need to record any
size, the second byte is always used in full to encode the low 8-bit of
the size.

I haven't started using type=8 and upwards for anything yet, but because
we have only one "future expansion" value left, I want us to be extremely
careful in order to avoid painting us into a corner that we cannot get out
of, so I am sending this out early for a preliminary review.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache.h     |    3 ++-
 sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 2e6ad36..b02139b 100644
--- a/cache.h
+++ b/cache.h
@@ -380,9 +380,10 @@ enum object_type {
 	OBJ_TREE = 2,
 	OBJ_BLOB = 3,
 	OBJ_TAG = 4,
-	/* 5 for future expansion */
+	OBJ_EXT = 5, /* 5 for future expansion */
 	OBJ_OFS_DELTA = 6,
 	OBJ_REF_DELTA = 7,
+	OBJ_CAT_TREE = 8,
 	OBJ_ANY,
 	OBJ_MAX
 };
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..4dcd023 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1254,16 +1254,43 @@ static int experimental_loose_object(unsigned char *map)
 }
 
 unsigned long unpack_object_header_buffer(const unsigned char *buf,
-		unsigned long len, enum object_type *type, unsigned long *sizep)
+	unsigned long len, enum object_type *typep, unsigned long *sizep)
 {
 	unsigned shift;
 	unsigned long size, c;
 	unsigned long used = 0;
+	enum object_type type;
 
+	/*
+	 * MSB of the first byte is used to tell if the second byte
+	 * needs to be read for the size, so type field is only 3-bit
+	 * wide.
+	 */
 	c = buf[used++];
-	*type = (c >> 4) & 7;
-	size = c & 15;
-	shift = 4;
+	type = (c >> 4) & 7;
+
+	if (type != OBJ_EXT) {
+		/*
+		 * For basic types of object representations, the low
+		 * 4-bit of the first byte is used for the lowermost
+		 * 4-bit of the size. The MSB of the first byte tells
+		 * if the second byte needs to be read for size.
+		 */
+		size = c & 15;
+		shift = 4;
+	} else {
+		/*
+		 * For extended types, the low 4-bit of the first byte
+		 * is used for the representation type (offset by 8),
+		 * and the size begins at the second byte. The MSB of
+		 * the first byte is still used to indicate the next
+		 * byte (i.e. the third byte) needs to be read for the
+		 * size.
+		 */
+		type = (c & 15) + 8;
+		size = buf[used++];
+		shift = 8;
+	}
 	while (c & 0x80) {
 		if (len <= used || bitsizeof(long) <= shift) {
 			error("bad object header");
@@ -1274,6 +1301,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
 		shift += 7;
 	}
 	*sizep = size;
+	*typep = type;
 	return used;
 }
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28  6:04 [RFC/PATCH] define the way new representation types are encoded in the pack Junio C Hamano
@ 2011-10-28  6:12 ` Junio C Hamano
  2011-10-30  5:40   ` Nguyen Thai Ngoc Duy
  2011-10-28 13:41 ` Jakub Narebski
  2011-10-28 15:44 ` Shawn Pearce
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2011-10-28  6:12 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre, Shawn O. Pearce, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> I haven't started using type=8 and upwards for anything yet, but because
> we have only one "future expansion" value left, I want us to be extremely
> careful in order to avoid painting us into a corner that we cannot get out
> of, so I am sending this out early for a preliminary review.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  cache.h     |    3 ++-
>  sha1_file.c |   36 ++++++++++++++++++++++++++++++++----
>  2 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 2e6ad36..b02139b 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -380,9 +380,10 @@ enum object_type {
>  	OBJ_TREE = 2,
>  	OBJ_BLOB = 3,
>  	OBJ_TAG = 4,
> -	/* 5 for future expansion */
> +	OBJ_EXT = 5, /* 5 for future expansion */
>  	OBJ_OFS_DELTA = 6,
>  	OBJ_REF_DELTA = 7,
> +	OBJ_CAT_TREE = 8,
>  	OBJ_ANY,
>  	OBJ_MAX
>  };

As people may be able to guess from the name, CAT_TREE is envisioned to
encode a large data (primarily of type "blob") by recording the object
name of a tree object and probably the total length, and would represent
the concatenation of all blobs contained in the tree object when the tree
is traversed in some fixed order (e.g. Avery's "bup split"). I am guessing
that the payload for CAT_TREE representation type will be:

 - 20-byte object name for the top-level tree object;

 - type of the basic object (commit, tree, blob, or tag) it represents,
   even though it is unlikely that we would want to record such a large
   commit or tag that needs CAT_TREE representation;

 - the total length of the basic object it represents, even though it is
   redundant (you could traverse and sum the sizes of blobs contained in
   the tree object), it would help sha1_object_info() and friends. This
   will be the "some size" I mentioned in the previous message for this
   representation type.

We would probably add loose object representation for CAT_TREE, which may
look like:

    "cattree" <size of this cat-tree in decimal> NUL
    <basic object type> <size of the basic object> NUL
    <object name of the top-level tree>

and would need to teach unpack_sha1_file() about it. One caveat is that
we would want to keep the "contents name the object" invariant, so even if
a large blob is expressed as a CAT_TREE, its object name must still be
what we would get by hashing '"blob" <size> NUL <payload>'.

A loose object file in "cattree" representation will not hash to the value
a naïve implementation would expect, and fsck_sha1() needs to be aware of
it. I haven't thought things through in this area.

Further work would involve (no way exhaustive, of course):

 - Teach fsck and connectivity tools that objects that are reachable from
   any object (even a blob) that is represented as a CAT_TREE are needed
   and reachable by that object;

 - Teach pack-objects that anything that is represented as a CAT_TREE does
   not need to be deltified (the objects used as its representation would
   go through the usual deltification rules);

 - Teach unpack-objects to expand CAT_TREE representation into a "cattree"
   loose object.

 - Perhaps teach the attributes mechanism to lie to anybody who asks that
   any object in CAT_TREE representation is a binary file to trigger the
   "we do not unnecessarily look at binary" logic in "git diff" machinery.

 - Teach fast-import to write out CAT_TREE representation. This is
   probably the quickest and least-impact way to exploit existing support
   for large files in index_fd().

 - Update grep.c::grep_buffer() to take not <buf,size> but git_istream,
   and rewrite builtin/grep.c::grep_sha1() to use streaming interface.
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28  6:04 [RFC/PATCH] define the way new representation types are encoded in the pack Junio C Hamano
  2011-10-28  6:12 ` Junio C Hamano
@ 2011-10-28 13:41 ` Jakub Narebski
  2011-10-28 15:24   ` Junio C Hamano
  2011-10-28 15:44 ` Shawn Pearce
  2 siblings, 1 reply; 12+ messages in thread
From: Jakub Narebski @ 2011-10-28 13:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre, Shawn O. Pearce, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> When bit 4-6 encodes type 5, the first byte is used this way:
> 
>  - Bit 0-3 denotes the real "extended" representation type. Because types
>    0-7 can already be encoded without using the extended format, we can
>    offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
>    type 11 = 3 + 8);

Why not use third byte for that instead?

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28 13:41 ` Jakub Narebski
@ 2011-10-28 15:24   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-10-28 15:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Nicolas Pitre, Shawn O. Pearce, Jeff King

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> When bit 4-6 encodes type 5, the first byte is used this way:
>> 
>>  - Bit 0-3 denotes the real "extended" representation type. Because types
>>    0-7 can already be encoded without using the extended format, we can
>>    offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
>>    type 11 = 3 + 8);
>
> Why not use third byte for that instead?

Is it a good enough reason that there is no upside for doing so?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28  6:04 [RFC/PATCH] define the way new representation types are encoded in the pack Junio C Hamano
  2011-10-28  6:12 ` Junio C Hamano
  2011-10-28 13:41 ` Jakub Narebski
@ 2011-10-28 15:44 ` Shawn Pearce
  2011-10-28 22:48   ` Nicolas Pitre
  2 siblings, 1 reply; 12+ messages in thread
From: Shawn Pearce @ 2011-10-28 15:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre, Jeff King

On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@pobox.com> wrote:
> In addition to four basic types (commit, tree, blob and tag), the pack
> stream can encode a few other "representation" types, such as REF_DELTA
> and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> we do not have much room to add new representation types in place, but we
> do have one value reserved for future expansion.

We have 2 values reserved, 0 and 5.

> When bit 4-6 encodes type 5, the first byte is used this way:
>
>  - Bit 0-3 denotes the real "extended" representation type. Because types
>   0-7 can already be encoded without using the extended format, we can
>   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
>   type 11 = 3 + 8);
>
>  - Bit 4-6 has the value "5";
>
>  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
>   size that cannot be represented with 8-bit.

This is very complicated. We don't need more complex logic in the pack
encoding. We _especially_ do not need yet another variant of how to
store a variable length integer in the pack file. I'm sorry, but we
already have two different variants and this just adds a third. It is
beyond crazy.

Last time (this is now years ago but whatever) Nico and I discussed
adding a new type to packs it was for the alternate tree encoding in
"pack v4". Trees happen so often that type code 5 is a good value to
use for these. Later you talked about using the extended type to store
a cattree blob thing, which would not appear nearly as often as a
normal directory listing type tree that was encoded using the pack v4
style encoding... I think saving type 5 for a small frequently
occurring type is a good thing.

> As it is unlikely for us to pack things that do not need to record any
> size, the second byte is always used in full to encode the low 8-bit of
> the size.
>
> I haven't started using type=8 and upwards for anything yet, but because
> we have only one "future expansion" value left, I want us to be extremely
> careful in order to avoid painting us into a corner that we cannot get out
> of, so I am sending this out early for a preliminary review.

I would have said something more like:

When bit 4-6 encodes "0", then:

- Bit 0-3 and bit 7 are used normally to encode a variable length
"size" integer. These may be 0 indicating no size information.

- 2nd-nth bytes store remaining "size" information, if bit 7 was set.

- The immediate next byte encodes the extended type. This type is
stored using the OFS_DELTA offset varint encoding, and thus may be
larger than 256 if we ever need it to be.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28 15:44 ` Shawn Pearce
@ 2011-10-28 22:48   ` Nicolas Pitre
  2011-10-28 23:07     ` Shawn Pearce
  0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2011-10-28 22:48 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3385 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@pobox.com> wrote:
> > In addition to four basic types (commit, tree, blob and tag), the pack
> > stream can encode a few other "representation" types, such as REF_DELTA
> > and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> > we do not have much room to add new representation types in place, but we
> > do have one value reserved for future expansion.
> 
> We have 2 values reserved, 0 and 5.
> 
> > When bit 4-6 encodes type 5, the first byte is used this way:
> >
> >  - Bit 0-3 denotes the real "extended" representation type. Because types
> >   0-7 can already be encoded without using the extended format, we can
> >   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
> >   type 11 = 3 + 8);
> >
> >  - Bit 4-6 has the value "5";
> >
> >  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
> >   size that cannot be represented with 8-bit.
> 
> This is very complicated. We don't need more complex logic in the pack
> encoding. We _especially_ do not need yet another variant of how to
> store a variable length integer in the pack file. I'm sorry, but we
> already have two different variants and this just adds a third. It is
> beyond crazy.
> 
> Last time (this is now years ago but whatever) Nico and I discussed
> adding a new type to packs it was for the alternate tree encoding in
> "pack v4". Trees happen so often that type code 5 is a good value to
> use for these. Later you talked about using the extended type to store
> a cattree blob thing, which would not appear nearly as often as a
> normal directory listing type tree that was encoded using the pack v4
> style encoding... I think saving type 5 for a small frequently
> occurring type is a good thing.
> 
> > As it is unlikely for us to pack things that do not need to record any
> > size, the second byte is always used in full to encode the low 8-bit of
> > the size.
> >
> > I haven't started using type=8 and upwards for anything yet, but because
> > we have only one "future expansion" value left, I want us to be extremely
> > careful in order to avoid painting us into a corner that we cannot get out
> > of, so I am sending this out early for a preliminary review.
> 
> I would have said something more like:
> 
> When bit 4-6 encodes "0", then:
> 
> - Bit 0-3 and bit 7 are used normally to encode a variable length
> "size" integer. These may be 0 indicating no size information.
> 
> - 2nd-nth bytes store remaining "size" information, if bit 7 was set.
> 
> - The immediate next byte encodes the extended type. This type is
> stored using the OFS_DELTA offset varint encoding, and thus may be
> larger than 256 if we ever need it to be.

I'd say it is just a byte.  No encoding needed.  Let's not be silly 
about it.  If we really have more than 255 object types one day (and I 
really hope this will never happen) then the value 0 in that byte could 
indicate yet another extended object type encoding.  But I truly hope 
we'll have pack v9 or v10 by then and that we'll have obsoleted the 
current 3-bit encoding completely at that point anyway.

For the record, I spent around 20 hours working on pack v4 while in the 
Caribbeans for a week last winter as I said I would.  Maybe I'll repeat 
the operation this year.


Nicolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28 22:48   ` Nicolas Pitre
@ 2011-10-28 23:07     ` Shawn Pearce
  2011-10-28 23:30       ` Nicolas Pitre
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Pearce @ 2011-10-28 23:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Jeff King

On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Fri, 28 Oct 2011, Shawn Pearce wrote:
>> - The immediate next byte encodes the extended type. This type is
>> stored using the OFS_DELTA offset varint encoding, and thus may be
>> larger than 256 if we ever need it to be.
>
> I'd say it is just a byte.  No encoding needed.  Let's not be silly
> about it.  If we really have more than 255 object types one day (and I
> really hope this will never happen) then the value 0 in that byte could
> indicate yet another extended object type encoding.  But I truly hope
> we'll have pack v9 or v10 by then and that we'll have obsoleted the
> current 3-bit encoding completely at that point anyway.

Yes. I probably wouldn't code the parser to use a varint here. I would
say the extended types stored in this byte must be >= 8, and must be
<= 127. Any values out of this range are unsupported and should be
rejected. We can later reserve the right to set the high bit and
switch to the OFS_DELTA varint encoding if we need that many more
types, and we explicitly define codes 0-7 as illegal if detected here
in the extended byte field.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28 23:07     ` Shawn Pearce
@ 2011-10-28 23:30       ` Nicolas Pitre
  0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Pitre @ 2011-10-28 23:30 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1352 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Fri, 28 Oct 2011, Shawn Pearce wrote:
> >> - The immediate next byte encodes the extended type. This type is
> >> stored using the OFS_DELTA offset varint encoding, and thus may be
> >> larger than 256 if we ever need it to be.
> >
> > I'd say it is just a byte.  No encoding needed.  Let's not be silly
> > about it.  If we really have more than 255 object types one day (and I
> > really hope this will never happen) then the value 0 in that byte could
> > indicate yet another extended object type encoding.  But I truly hope
> > we'll have pack v9 or v10 by then and that we'll have obsoleted the
> > current 3-bit encoding completely at that point anyway.
> 
> Yes. I probably wouldn't code the parser to use a varint here. I would
> say the extended types stored in this byte must be >= 8, and must be
> <= 127. Any values out of this range are unsupported and should be
> rejected. We can later reserve the right to set the high bit and
> switch to the OFS_DELTA varint encoding if we need that many more
> types, and we explicitly define codes 0-7 as illegal if detected here
> in the extended byte field.

I wouldn't go as far as rejecting codes 1-7 as illegal though, but I 
otherwise agree with what you say.


Nicolas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-28  6:12 ` Junio C Hamano
@ 2011-10-30  5:40   ` Nguyen Thai Ngoc Duy
  2011-10-30  7:11     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-10-30  5:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre, Shawn O. Pearce, Jeff King

On Fri, Oct 28, 2011 at 1:12 PM, Junio C Hamano <gitster@pobox.com> wrote:
> As people may be able to guess from the name, CAT_TREE is envisioned to
> encode a large data (primarily of type "blob") by recording the object
> name of a tree object and probably the total length, and would represent
> the concatenation of all blobs contained in the tree object when the tree
> is traversed in some fixed order (e.g. Avery's "bup split"). I am guessing
> that the payload for CAT_TREE representation type will be:
>
>  - 20-byte object name for the top-level tree object;

Because all blobs in this tree object must be in a fixed order, and
they won't likely have meaningful names nor permission, should
CAT_TREE payload is a SHA-1 sequence of all blobs (or cat-trees if we
want nested trees) instead? IOW the tree is integrated into cat-tree
object, not as a separate tree object.

>  - type of the basic object (commit, tree, blob, or tag) it represents,
>   even though it is unlikely that we would want to record such a large
>   commit or tag that needs CAT_TREE representation;
>
>  - the total length of the basic object it represents, even though it is
>   redundant (you could traverse and sum the sizes of blobs contained in
>   the tree object), it would help sha1_object_info() and friends. This
>   will be the "some size" I mentioned in the previous message for this
>   representation type.

Not sure if it's related to representation types, but is there any way
(perhaps FLAT_BLOB type?) we can mark an object uncompressed, so we
can mmap() and access it directly?
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-30  5:40   ` Nguyen Thai Ngoc Duy
@ 2011-10-30  7:11     ` Junio C Hamano
  2011-10-30  7:13       ` Junio C Hamano
  2011-10-30  9:31       ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-10-30  7:11 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Nicolas Pitre, Shawn O. Pearce, Jeff King

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> Because all blobs in this tree object must be in a fixed order, and
> they won't likely have meaningful names nor permission, should
> CAT_TREE payload is a SHA-1 sequence of all blobs (or cat-trees if we
> want nested trees) instead? IOW the tree is integrated into cat-tree
> object, not as a separate tree object.

I have no problem with that (I am not worried about minor details of the
actual implementation of cat-tree yet).

> Not sure if it's related to representation types, but is there any way
> (perhaps FLAT_BLOB type?) we can mark an object uncompressed, so we
> can mmap() and access it directly?

In pack? Loose? Both?

What kind of payload and use case do you have in mind?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-30  7:11     ` Junio C Hamano
@ 2011-10-30  7:13       ` Junio C Hamano
  2011-10-30  9:31       ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2011-10-30  7:13 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Nicolas Pitre, Shawn O. Pearce, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> Because all blobs in this tree object must be in a fixed order, and
>> they won't likely have meaningful names nor permission, should
>> CAT_TREE payload is a SHA-1 sequence of all blobs (or cat-trees if we
>> want nested trees) instead? IOW the tree is integrated into cat-tree
>> object, not as a separate tree object.
>
> I have no problem with that (I am not worried about minor details of the
> actual implementation of cat-tree yet).

Side note. It should be renamed "split-object" or something if we go the
route you suggest, as "tree"-ness of the actual representation is not
essential.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
  2011-10-30  7:11     ` Junio C Hamano
  2011-10-30  7:13       ` Junio C Hamano
@ 2011-10-30  9:31       ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-10-30  9:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre, Shawn O. Pearce, Jeff King

On Sun, Oct 30, 2011 at 2:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Not sure if it's related to representation types, but is there any way
>> (perhaps FLAT_BLOB type?) we can mark an object uncompressed, so we
>> can mmap() and access it directly?
>
> In pack? Loose? Both?
>
> What kind of payload and use case do you have in mind?

Hmm.. big files in general with partial file transfer (e.g. git
torrent). But if CAT_TREE is properly used, then all blob pieces
should be reasonably small and decompress time should not be a
problem. I think we won't need this. Sorry for the noise.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-10-30  9:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-28  6:04 [RFC/PATCH] define the way new representation types are encoded in the pack Junio C Hamano
2011-10-28  6:12 ` Junio C Hamano
2011-10-30  5:40   ` Nguyen Thai Ngoc Duy
2011-10-30  7:11     ` Junio C Hamano
2011-10-30  7:13       ` Junio C Hamano
2011-10-30  9:31       ` Nguyen Thai Ngoc Duy
2011-10-28 13:41 ` Jakub Narebski
2011-10-28 15:24   ` Junio C Hamano
2011-10-28 15:44 ` Shawn Pearce
2011-10-28 22:48   ` Nicolas Pitre
2011-10-28 23:07     ` Shawn Pearce
2011-10-28 23:30       ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.