git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
	jonathantanmy@google.com,
	Marc Strapetz <marc.strapetz@syntevo.com>
Subject: Re: Questions about partial clone with '--filter=tree:0'
Date: Tue, 20 Oct 2020 18:29:34 -0400	[thread overview]
Message-ID: <20201020222934.GB93217@nand.local> (raw)
In-Reply-To: <aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com>

Hi Alexandr,

On Tue, Oct 20, 2020 at 07:09:36PM +0200, Alexandr Miloslavskiy wrote:
> This is a edited copy of message I sent 2 weeks ago, which unfortunately
> didn't receive any replies. I tried to make make it shorter this time :)

Oops. That can happen sometimes, but thanks for re-sending. I'll try to
answer the basic points below.

> ----
>
> We are implementing a git UI. One interesting case is the repository
> cloned with '--filter=tree:0', because it makes it a lot harder to
> run basic git operations such as file log and blame.
>
> The problems and potential solutions are outlined below. We should be
> able to make patches for (2) and (3) if it makes sense to patch these.
>
> (1) Is it even considered a realistic use case?
> -----------------------------------------------
> Summary: is '--filter=tree:0' a realistic or "crazy" scenario that is
> not considered worthy of supporting?

It's not an unrealistic scenario, but it might be for what you're trying
to build. If your UI needs to run, say, 'git log --patch' to show a
historical revision, then you're going to need to fault in a lot of
missing objects.

If that's not something that you need to do often or ever, then having
'--filter=tree:0' is a good way to get the least amount of data possible
when using a partial clone. But if you're going to be performing
operations that need those missing objects, you're probably better eat
the network/storage cost of it all at once, rather than making the user
wait for Git to fault in the set of missing objects that it happens to
need.

> (2) A command to enrich repo with trees
> ---------------------------------------
> There is no good way to "un-partial" repository that was cloned with
> '--filter=tree:0' to have all trees, but no blobs.

There is no command to do that directly, but it is something that Git is
capable of.

It would look something like:

  $ git config remote.origin.partialclonefilter 'blob:none'

Now your repository is in a state where it has no blobs or trees, but
the filter does not prohibit it from getting the trees, so you can ask
it to grab everything you're missing with:

  $ git fetch origin

This should even be a pretty fast operation for repositories that have
bitmaps due to some topics that Peff and I sent to the list a while ago.
If it isn't, please let me know.

> There seems to be a dirty way of doing that by abusing 'fetch --deepen'
> which happens to skip "ref tip already present locally" check, but
> it will also re-download all commits, which means extra ~0.5gb network
> in case of Linux repo.

Mmm, this is probably not what you're looking for. You may be confusing
shallow clones (of which --deepen is relevant) with partial clones
(to which --deepen is irrelevant).

> (3) A command to download ALL trees and/or blobs for a subpath
> -----------------------------------------------
> Summary: Running a Blame or file log in '--filter=tree:0' repo is
> currently very inefficient, up to a point where it can be discussed
> as not really working.

This may be a "don't hold it that way" kind of response, but I don't
think that this is quite what you want. Recall that cloning a
repository with an object filter happens in two steps: first, an initial
download of all of the objects that it thinks you need, and then
(second) a follow-up fetch requesting the objects that you need to
populate your checkout.

I think what you probably want is a step 1.5 to tell Git "I'm not going
to ask for or care about the entirety of my working copy, I really just
want objects in path...", and you can do that with sparse checkouts. See
https://git-scm.com/docs/git-sparse-checkout for more.

The flow might be something like:

  $ git clone --sparse --filter=tree:0 git@yourhost.com:repo.git

and then:

  $ cd repo
  $ git sparse-checkout add foo bar baz
  $ git checkout .

Thanks,
Taylor

  reply	other threads:[~2020-10-20 22:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20 17:09 Alexandr Miloslavskiy
2020-10-20 22:29 ` Taylor Blau [this message]
2020-10-21 17:10   ` Alexandr Miloslavskiy
2020-10-21 17:31     ` Taylor Blau
2020-10-21 17:46       ` Alexandr Miloslavskiy
2020-10-26 18:24 ` Jonathan Tan
2020-10-26 18:44   ` Alexandr Miloslavskiy
2020-10-26 19:46     ` Jonathan Tan
2020-10-26 20:08       ` Alexandr Miloslavskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201020222934.GB93217@nand.local \
    --to=me@ttaylorr.com \
    --cc=alexandr.miloslavskiy@syntevo.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=marc.strapetz@syntevo.com \
    --subject='Re: Questions about partial clone with '\''--filter=tree:0'\''' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).