All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Ben Peart <peartben@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
	christian.couder@gmail.com
Subject: Re: Partial clone design (with connectivity check for locally-created objects)
Date: Mon, 7 Aug 2017 16:10:31 -0700	[thread overview]
Message-ID: <20170807161031.7c4eae50@twelve2.svl.corp.google.com> (raw)
In-Reply-To: <0633771f-ce19-6211-fabe-3f7f676e53ab@gmail.com>

On Mon, 7 Aug 2017 15:12:11 -0400
Ben Peart <peartben@gmail.com> wrote:

> I missed the offline discussion and so am trying to piece together what 
> this latest design is trying to do.  Please let me know if I'm not 
> understanding something correctly.
> 
>  From what I can tell, objects are going to be segmented into two 
> "types" - those that were fetched from a remote source that allows 
> partial clones/fetches (lazyobject/imported) and those that come from 
> "regular" remote sources (homegrown) that requires all objects to exist 
> locally.
> 
> FWIW, the names here are not making things clearer for me. If I'm 
> correct perhaps "partial" and "normal" would be better to indicate the 
> type of the source? Anyway...

That's right. As for names, I'm leaning now towards "imported" and
"non-imported". "Partial" is a bit strange because such an object is
fully available; it's just that the objects that it references are
promised by the server.

> Once the objects are segmented into the 2 types, the fsck connectivity 
> check code is updated to ignore missing objects from "partial" remotes 
> but still expect/validate them from "normal" remotes.
> 
> This compromise seems reasonable - don't generate errors for missing 
> objects for remotes that returned a partial clone but do generate errors 
> for missing objects from normal clones as a missing object is always an 
> error in this case.

Yes. In addition, the references of "imported" objects are also
potentially used when connectivity-checking "non-imported" objects - if
a "non-imported" object refers to an object that an "imported" object
refers to, that is fine, even though we don't have that object.

> This segmentation is what is driving the need for the object loader to 
> build a new local pack file for every command that has to fetch a 
> missing object.  For example, we can't just write a tree object from a 
> "partial" clone into the loose object store as we have no way for fsck 
> to treat them differently and ignore any missing objects referenced by 
> that tree object.
> 
> My concern with this proposal is the combination of 1) writing a new 
> pack file for every git command that ends up bringing down a missing 
> object and 2) gc not compressing those pack files into a single pack file.
> 
> We all know that git doesn't scale well with a lot of pack files as it 
> has to do a linear search through all the pack files when attempting to 
> find an object.  I can see that very quickly, there would be a lot of 
> pack files generated and with gc ignoring "partial" pack files, this 
> would never get corrected.
> 
> In our usage scenarios, _all_ of the objects come from "partial" clones 
> so all of our objects would end up in a series of "partial" pack files 
> and would have pretty poor performance as a result.

One possible solution...would support for annotating loose objects with
".remote" be sufficient? (That is, for each loose object file created,
create another of the same name but with ".remote" appended.) This means
that a loose-object-creating lazy loader would need to create 2 files
per object instead of one.

The lazy loader protocol will thus be updated to something resembling a
prior version with the loader writing objects directly to the object
database, but now the loader is also responsible for creating the
".remote" files.  (In the Android use case, we probably won't need the
writing-to-partial-packfile mechanism anymore since only comparatively
few and large blobs will go in there.)

  parent reply	other threads:[~2017-08-07 23:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-04 21:51 Partial clone design (with connectivity check for locally-created objects) Jonathan Tan
2017-08-04 22:51 ` Junio C Hamano
2017-08-05  0:21   ` Jonathan Tan
2017-08-07 19:12     ` Ben Peart
2017-08-07 19:21       ` Jonathan Nieder
2017-08-08 14:18         ` Ben Peart
2017-08-07 19:41       ` Junio C Hamano
2017-08-08 16:45         ` Ben Peart
2017-08-08 17:03           ` Jonathan Nieder
2017-08-07 23:10       ` Jonathan Tan [this message]
2017-08-16  0:32 ` [RFC PATCH] Updated "imported object" design Jonathan Tan
2017-08-16 20:32   ` Junio C Hamano
2017-08-16 21:35     ` Jonathan Tan
2017-08-17 20:50       ` Ben Peart
2017-08-17 21:39         ` Jonathan Tan
2017-08-18 14:18           ` Ben Peart
2017-08-18 23:33             ` Jonathan Tan
2017-08-17 20:07   ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170807161031.7c4eae50@twelve2.svl.corp.google.com \
    --to=jonathantanmy@google.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.