From: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
jonathantanmy@google.com,
Marc Strapetz <marc.strapetz@syntevo.com>
Subject: Re: Questions about partial clone with '--filter=tree:0'
Date: Wed, 21 Oct 2020 19:10:02 +0200 [thread overview]
Message-ID: <a4a20c67-4ee3-77b2-8d57-f30843572aa4@syntevo.com> (raw)
In-Reply-To: <20201020222934.GB93217@nand.local>
On 21.10.2020 0:29, Taylor Blau wrote:
> Oops. That can happen sometimes, but thanks for re-sending. I'll try to
> answer the basic points below.
Thanks for stepping in!
>> (1) Is it even considered a realistic use case?
>> -----------------------------------------------
>> Summary: is '--filter=tree:0' a realistic or "crazy" scenario that is
>> not considered worthy of supporting?
>
> It's not an unrealistic scenario, but it might be for what you're trying
> to build. If your UI needs to run, say, 'git log --patch' to show a
> historical revision, then you're going to need to fault in a lot of
> missing objects.
>
> If that's not something that you need to do often or ever, then having
> '--filter=tree:0' is a good way to get the least amount of data possible
> when using a partial clone. But if you're going to be performing
> operations that need those missing objects, you're probably better eat
> the network/storage cost of it all at once, rather than making the user
> wait for Git to fault in the set of missing objects that it happens to
> need.
We currently do not intend to use '--filter=tree:0' ourself, but we are
trying to support all kinds of user repositories with our UI. So we
basically have these choices:
A) Declare '--filter=tree:0' repos as completely wrong and unsupported
in out UI, also giving an option to "un-partial" them.
B) Support '--filter=tree:0' repos, but don't support operations such
as blame and file log
C) Use some magic to efficiently download objects that will be needed
for a command such as Blame, while keeping the rest of the repository
partial. This is where the command described in (3) will help a lot.
We would of course prefer (C) if it's reasonably possible.
>> (2) A command to enrich repo with trees
>> ---------------------------------------
>> There is no good way to "un-partial" repository that was cloned with
>> '--filter=tree:0' to have all trees, but no blobs.
>
> There is no command to do that directly, but it is something that Git is
> capable of.
>
> It would look something like:
>
> $ git config remote.origin.partialclonefilter 'blob:none'
>
> Now your repository is in a state where it has no blobs or trees, but
> the filter does not prohibit it from getting the trees, so you can ask
> it to grab everything you're missing with:
>
> $ git fetch origin
>
> This should even be a pretty fast operation for repositories that have
> bitmaps due to some topics that Peff and I sent to the list a while ago.
> If it isn't, please let me know.
Unfortunately this does not work as expected. Try the following steps:
A) Clone repo with '--filter=tree:0'
$ git clone --bare --filter=tree:0 --branch master
https://github.com/git/git.git
B) Change filter to 'blob:none'
$ cd git.git
$ git config remote.origin.partialclonefilter 'blob:none'
C) fetch
$ git fetch origin
Note that there is no 'Receiving objects:' output.
D) Verify that trees were downloaded
$ git cat-file -p HEAD | grep tree
tree ee5b5b41305cda618862beebc9c94859ae276e5a
$ git cat-file -t ee5b5b41305cda618862beebc9c94859ae276e5a
Note that 1 object gets downloaded. This confirms that (C) didn't
achieve the goal.
It happens due to 'check_exist_and_connected()' test in 'fetch_refs()'.
Since the tip of the ref is already available locally (even though it
is missing all trees), nothing is downloaded.
>> There seems to be a dirty way of doing that by abusing 'fetch --deepen'
>> which happens to skip "ref tip already present locally" check, but
>> it will also re-download all commits, which means extra ~0.5gb network
>> in case of Linux repo.
>
> Mmm, this is probably not what you're looking for. You may be confusing
> shallow clones (of which --deepen is relevant) with partial clones
> (to which --deepen is irrelevant).
Yes, '--deepen' is intended for shallow clones. But abusing it for
partial clones allows to skip 'check_exist_and_connected()' test.
However, I did more testing today, and in many cases server itself
refuses to send objects, probably due to sent 'HAVE' or something
else. So even '--deepen' doesn't really help.
> I think what you probably want is a step 1.5 to tell Git "I'm not going
> to ask for or care about the entirety of my working copy, I really just
> want objects in path...", and you can do that with sparse checkouts. See
> https://git-scm.com/docs/git-sparse-checkout for more.
For simplicity of discussion, let's focus on the problem of running
Blame efficiently in a repo that was cloned with '--filter=tree:0'. In
order to blame file '/1/2/Foo.txt', we will need the following:
* Trees '/1'
* Trees '/1/2'
* Blobs '/1/2/Foo.txt'
All of these will be needed to unknown commit depth. For simplicity,
the proposed command will download these for all commits. Specifying
a range of revisions could be nice, but I feel that it's not worth the
complexity.
Correct me if I'm wrong: I think that sparse checkout will not help to
achieve the goal?
This is why I suggest a command that will accept paths and send
requested objects, also forcing server to assume that all of them are
missing in client's repository.
next prev parent reply other threads:[~2020-10-21 17:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-20 17:09 Questions about partial clone with '--filter=tree:0' Alexandr Miloslavskiy
2020-10-20 22:29 ` Taylor Blau
2020-10-21 17:10 ` Alexandr Miloslavskiy [this message]
2020-10-21 17:31 ` Taylor Blau
2020-10-21 17:46 ` Alexandr Miloslavskiy
2020-10-26 18:24 ` Jonathan Tan
2020-10-26 18:44 ` Alexandr Miloslavskiy
2020-10-26 19:46 ` Jonathan Tan
2020-10-26 20:08 ` Alexandr Miloslavskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4a20c67-4ee3-77b2-8d57-f30843572aa4@syntevo.com \
--to=alexandr.miloslavskiy@syntevo.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=marc.strapetz@syntevo.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).