From: Taylor Blau <firstname.lastname@example.org> To: Alexandr Miloslavskiy <email@example.com> Cc: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, Marc Strapetz <email@example.com> Subject: Re: Questions about partial clone with '--filter=tree:0' Date: Tue, 20 Oct 2020 18:29:34 -0400 [thread overview] Message-ID: <20201020222934.GB93217@nand.local> (raw) In-Reply-To: <firstname.lastname@example.org> Hi Alexandr, On Tue, Oct 20, 2020 at 07:09:36PM +0200, Alexandr Miloslavskiy wrote: > This is a edited copy of message I sent 2 weeks ago, which unfortunately > didn't receive any replies. I tried to make make it shorter this time :) Oops. That can happen sometimes, but thanks for re-sending. I'll try to answer the basic points below. > ---- > > We are implementing a git UI. One interesting case is the repository > cloned with '--filter=tree:0', because it makes it a lot harder to > run basic git operations such as file log and blame. > > The problems and potential solutions are outlined below. We should be > able to make patches for (2) and (3) if it makes sense to patch these. > > (1) Is it even considered a realistic use case? > ----------------------------------------------- > Summary: is '--filter=tree:0' a realistic or "crazy" scenario that is > not considered worthy of supporting? It's not an unrealistic scenario, but it might be for what you're trying to build. If your UI needs to run, say, 'git log --patch' to show a historical revision, then you're going to need to fault in a lot of missing objects. If that's not something that you need to do often or ever, then having '--filter=tree:0' is a good way to get the least amount of data possible when using a partial clone. But if you're going to be performing operations that need those missing objects, you're probably better eat the network/storage cost of it all at once, rather than making the user wait for Git to fault in the set of missing objects that it happens to need. > (2) A command to enrich repo with trees > --------------------------------------- > There is no good way to "un-partial" repository that was cloned with > '--filter=tree:0' to have all trees, but no blobs. There is no command to do that directly, but it is something that Git is capable of. It would look something like: $ git config remote.origin.partialclonefilter 'blob:none' Now your repository is in a state where it has no blobs or trees, but the filter does not prohibit it from getting the trees, so you can ask it to grab everything you're missing with: $ git fetch origin This should even be a pretty fast operation for repositories that have bitmaps due to some topics that Peff and I sent to the list a while ago. If it isn't, please let me know. > There seems to be a dirty way of doing that by abusing 'fetch --deepen' > which happens to skip "ref tip already present locally" check, but > it will also re-download all commits, which means extra ~0.5gb network > in case of Linux repo. Mmm, this is probably not what you're looking for. You may be confusing shallow clones (of which --deepen is relevant) with partial clones (to which --deepen is irrelevant). > (3) A command to download ALL trees and/or blobs for a subpath > ----------------------------------------------- > Summary: Running a Blame or file log in '--filter=tree:0' repo is > currently very inefficient, up to a point where it can be discussed > as not really working. This may be a "don't hold it that way" kind of response, but I don't think that this is quite what you want. Recall that cloning a repository with an object filter happens in two steps: first, an initial download of all of the objects that it thinks you need, and then (second) a follow-up fetch requesting the objects that you need to populate your checkout. I think what you probably want is a step 1.5 to tell Git "I'm not going to ask for or care about the entirety of my working copy, I really just want objects in path...", and you can do that with sparse checkouts. See https://git-scm.com/docs/git-sparse-checkout for more. The flow might be something like: $ git clone --sparse --filter=tree:0 email@example.com:repo.git and then: $ cd repo $ git sparse-checkout add foo bar baz $ git checkout . Thanks, Taylor
next prev parent reply other threads:[~2020-10-20 22:29 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-20 17:09 Alexandr Miloslavskiy 2020-10-20 22:29 ` Taylor Blau [this message] 2020-10-21 17:10 ` Alexandr Miloslavskiy 2020-10-21 17:31 ` Taylor Blau 2020-10-21 17:46 ` Alexandr Miloslavskiy 2020-10-26 18:24 ` Jonathan Tan 2020-10-26 18:44 ` Alexandr Miloslavskiy 2020-10-26 19:46 ` Jonathan Tan 2020-10-26 20:08 ` Alexandr Miloslavskiy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201020222934.GB93217@nand.local \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: Questions about partial clone with '\''--filter=tree:0'\''' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).