git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, jonathantanmy@google.com,
	Marc Strapetz <marc.strapetz@syntevo.com>
Subject: Questions about partial clone with '--filter=tree:0'
Date: Tue, 20 Oct 2020 19:09:36 +0200	[thread overview]
Message-ID: <aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com> (raw)

This is a edited copy of message I sent 2 weeks ago, which unfortunately
didn't receive any replies. I tried to make make it shorter this time :)

----

We are implementing a git UI. One interesting case is the repository
cloned with '--filter=tree:0', because it makes it a lot harder to
run basic git operations such as file log and blame.

The problems and potential solutions are outlined below. We should be
able to make patches for (2) and (3) if it makes sense to patch these.

(1) Is it even considered a realistic use case?
-----------------------------------------------
Summary: is '--filter=tree:0' a realistic or "crazy" scenario that is
not considered worthy of supporting?

I decided to use Linux repo, which is reasonably large, and it seems
that '--filter=tree:0' could be desired because it helps with disk
space (~0.66gb) and network (~0.54gb):

https://github.com/torvalds/linux.git
   951025 commits total.

   git clone --bare <url>
	7'624'042 objects
	   2.86gb network
	   3.10gb disk
   git clone --bare --filter=blob:none <url>
	5'484'714 (71.9%) objects
	   1.01gb (35.3%) network
	   1.16gb (37.4%) disk
   git clone --bare --filter=tree:0 <url>
	  951'693 (12.5%) objects
	   0.47gb (16.4%) network
	   0.50gb (16.1%) disk
   git clone --bare --depth 1 --branch master <url>
	   74'380 ( 0.9%) objects
	   0.19gb ( 6.6%) network
	   0.19gb ( 6.1%) disk

(2) A command to enrich repo with trees
---------------------------------------
There is no good way to "un-partial" repository that was cloned with
'--filter=tree:0' to have all trees, but no blobs.

There seems to be a dirty way of doing that by abusing 'fetch --deepen'
which happens to skip "ref tip already present locally" check, but
it will also re-download all commits, which means extra ~0.5gb network
in case of Linux repo.

(3) A command to download ALL trees and/or blobs for a subpath
-----------------------------------------------
Summary: Running a Blame or file log in '--filter=tree:0' repo is
currently very inefficient, up to a point where it can be discussed
as not really working.

The suggested command will be able to accept a path and download ALL
trees and/or blobs that match it.

This will solve many problems at once:
* Solve (2)
* Make it possible to prepare for efficient blame and file log
* Make a new experience with super-mono-repos, where user will now
   be able to only download a part of it by path.

Currently '--filter=sparse:oid' is there to support that, but it is
very hard to use on client side, because it requires paths to be
already present in a commit on server.

For a possible solution, it sounds reasonable to have such filter:
   --filter=sparse:pathlist=/1/2'
Path list could be delimited with some special character, and paths
themselves could be escaped.

             reply	other threads:[~2020-10-20 17:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20 17:09 Alexandr Miloslavskiy [this message]
2020-10-20 22:29 ` Questions about partial clone with '--filter=tree:0' Taylor Blau
2020-10-21 17:10   ` Alexandr Miloslavskiy
2020-10-21 17:31     ` Taylor Blau
2020-10-21 17:46       ` Alexandr Miloslavskiy
2020-10-26 18:24 ` Jonathan Tan
2020-10-26 18:44   ` Alexandr Miloslavskiy
2020-10-26 19:46     ` Jonathan Tan
2020-10-26 20:08       ` Alexandr Miloslavskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com \
    --to=alexandr.miloslavskiy@syntevo.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=marc.strapetz@syntevo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).