From: Jonathan Tan <email@example.com> To: firstname.lastname@example.org Cc: email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Subject: Re: Questions about partial clone with '--filter=tree:0' Date: Mon, 26 Oct 2020 11:24:17 -0700 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> > (1) Is it even considered a realistic use case? > ----------------------------------------------- > Summary: is '--filter=tree:0' a realistic or "crazy" scenario that is > not considered worthy of supporting? > > I decided to use Linux repo, which is reasonably large, and it seems > that '--filter=tree:0' could be desired because it helps with disk > space (~0.66gb) and network (~0.54gb): Sorry for the late reply - I have been out of office for a while. As Taylor said in another email, it's good for some use cases but perhaps not for the "blame" one that you describe later. > (2) A command to enrich repo with trees > --------------------------------------- > There is no good way to "un-partial" repository that was cloned with > '--filter=tree:0' to have all trees, but no blobs. > > There seems to be a dirty way of doing that by abusing 'fetch --deepen' > which happens to skip "ref tip already present locally" check, but > it will also re-download all commits, which means extra ~0.5gb network > in case of Linux repo. That's true. I made some progress with cbe566a071 ("negotiator/noop: add noop fetch negotiator", 2020-08-18) (which adds a no-op negotiatior, so the client never reports its own commits as "have") but as you said in another email, we still run into the problem that if we have the commit that we're fetching, we still won't fetch it. > (3) A command to download ALL trees and/or blobs for a subpath > ----------------------------------------------- > Summary: Running a Blame or file log in '--filter=tree:0' repo is > currently very inefficient, up to a point where it can be discussed > as not really working. > > The suggested command will be able to accept a path and download ALL > trees and/or blobs that match it. > > This will solve many problems at once: > * Solve (2) > * Make it possible to prepare for efficient blame and file log > * Make a new experience with super-mono-repos, where user will now > be able to only download a part of it by path. To clarify: we partially support the last point - "git clone" now supports "--sparse". When used with "--filter", only the blobs in the sparse checkout specification will be fetched, so users are already able to download only the objects in a specific path. Having said that, I think you also want the histories of these objects, so admittedly this is not complete for your use case. > Currently '--filter=sparse:oid' is there to support that, but it is > very hard to use on client side, because it requires paths to be > already present in a commit on server. > > For a possible solution, it sounds reasonable to have such filter: > --filter=sparse:pathlist=/1/2' > Path list could be delimited with some special character, and paths > themselves could be escaped. Having such an option (and teaching "blame" to use it to prefetch) would indeed speed up "blame". But if we implement this, what would happen if the user ran "blame" on the same file twice? I can't think of a way of preventing the same fetch from happening twice except by checking the existence of, say, the last 10 OIDs corresponding to that path. But if we have the list of those 10 OIDs, we could just prefetch those 10 OIDs without needing a new filter. Another issue (but a smaller one) is this does not fetch all objects necessary if the file being "blame"d has been renamed, but that is probably solvable - we can just refetch with the old name. Another possible solution that has been discussed before (but a much more involved one) is to teach Git to be able to serve results of computations, and then have "blame" be able to stitch that with local data. (For example, "blame" could check the history of a certain path to find the commit(s) that the remote has information of, query the remote for those commits, and then stitch the results together with local history.) This scheme would work not only for "blame" but for things like "grep" (with history) and "log -S", whereas "--filter=sparse:parthlist" would only work with "blame". But admittedly, this solution is more involved.
next prev parent reply other threads:[~2020-10-26 18:24 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-20 17:09 Alexandr Miloslavskiy 2020-10-20 22:29 ` Taylor Blau 2020-10-21 17:10 ` Alexandr Miloslavskiy 2020-10-21 17:31 ` Taylor Blau 2020-10-21 17:46 ` Alexandr Miloslavskiy 2020-10-26 18:24 ` Jonathan Tan [this message] 2020-10-26 18:44 ` Alexandr Miloslavskiy 2020-10-26 19:46 ` Jonathan Tan 2020-10-26 20:08 ` Alexandr Miloslavskiy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: Questions about partial clone with '\''--filter=tree:0'\''' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).