All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Alex Vandiver" <alexmv@dropbox.com>,
	git@vger.kernel.org, jonathantanmy@google.com, bmwill@google.com,
	stolee@gmail.com, sbeller@google.com, peff@peff.net,
	johannes.schindelin@gmx.de,
	"Michael Haggerty" <mhagger@alum.mit.edu>
Subject: Re: Including object type and size in object id (Re: Git Merge contributor summit notes)
Date: Mon, 26 Mar 2018 17:42:15 -0400	[thread overview]
Message-ID: <b7b6d617-1951-5934-5b1d-bb1a300006ef@jeffhostetler.com> (raw)
In-Reply-To: <20180326210039.GB21735@aiede.svl.corp.google.com>



On 3/26/2018 5:00 PM, Jonathan Nieder wrote:
> Jeff Hostetler wrote:
> [long quote snipped]
> 
>> While we are converting to a new hash function, it would be nice
>> if we could add a couple of fields to the end of the OID:  the object
>> type and the raw uncompressed object size.
>>
>> If would be nice if we could extend the OID to include 6 bytes of data
>> (4 or 8 bits for the type and the rest for the raw object size), and
>> just say that an OID is a {hash,type,size} tuple.
>>
>> There are lots of places where we open an object to see what type it is
>> or how big it is.  This requires uncompressing/undeltafying the object
>> (or at least decoding enough to get the header).  In the case of missing
>> objects (partial clone or a gvfs-like projection) it requires either
>> dynamically fetching the object or asking an object-size-server for the
>> data.
>>
>> All of these cases could be eliminated if the type/size were available
>> in the OID.
> 
> This implies a limit on the object size (e.g. 5 bytes in your
> example).  What happens when someone wants to encode an object larger
> than that limit?

I could say add a full uint64 to the tail end of the hash, but
we currently don't handle blobs/objects larger then 4GB right now
anyway, right?

5 bytes for the size is just a compromise -- 1TB blobs would be
terrible to think about...
  
> 
> This also decreases the number of bits available for the hash, but
> that shouldn't be a big issue.

I was suggesting extending the OIDs by 6 bytes while we are changing
the hash function.

> Aside from those two, I don't see any downsides.  It would mean that
> tree objects contain information about the sizes of blobs contained
> there, which helps with virtual file systems.  It's also possible to
> do that without putting the size in the object id, but maybe having it
> in the object id is simpler.
> 
> Will think more about this.
> 
> Thanks for the idea,
> Jonathan
> 

Thanks
Jeff


  reply	other threads:[~2018-03-26 21:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-10  0:06 Git Merge contributor summit notes Alex Vandiver
2018-03-10 13:01 ` Ævar Arnfjörð Bjarmason
2018-03-11  0:02   ` Junio C Hamano
2018-03-12 23:40   ` Jeff King
2018-03-13  0:49     ` Brandon Williams
2018-03-12 23:33 ` Jeff King
2018-03-25 22:58 ` Ævar Arnfjörð Bjarmason
2018-03-26 17:33   ` Jeff Hostetler
2018-03-26 17:56     ` Stefan Beller
2018-03-26 18:54       ` Jeff Hostetler
2018-03-26 18:05     ` Brandon Williams
2018-04-07 20:37       ` Jakub Narebski
2018-03-26 21:00     ` Including object type and size in object id (Re: Git Merge contributor summit notes) Jonathan Nieder
2018-03-26 21:42       ` Jeff Hostetler [this message]
2018-03-26 22:40       ` Junio C Hamano
2018-03-26 20:54   ` Per-object encryption " Jonathan Nieder
2018-03-26 21:22     ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7b6d617-1951-5934-5b1d-bb1a300006ef@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=alexmv@dropbox.com \
    --cc=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.