From: Jonathan Tan <email@example.com> To: firstname.lastname@example.org Cc: email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Subject: Re: Questions about partial clone with '--filter=tree:0' Date: Mon, 26 Oct 2020 12:46:35 -0700 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> > > Having such an option (and teaching "blame" to use it to prefetch) would > > indeed speed up "blame". But if we implement this, what would happen if > > the user ran "blame" on the same file twice? I can't think of a way of > > preventing the same fetch from happening twice except by checking the > > existence of, say, the last 10 OIDs corresponding to that path. But if > > we have the list of those 10 OIDs, we could just prefetch those 10 OIDs > > without needing a new filter. > > I must admit that I didn't notice this problem. Still, it seems easy > enough to solve with this approach: > > 1) Estimate number of missing things > 2) If "many", just download everything for <path> as described before > and consider it done. > 3) If "not so many", assemble a list of OIDs on the boundary of unknown > (for example, all root tree OIDs for commits that are missing any > trees) and use the usual fetch to download all OIDs in one go. > 4) Repeat step 3 multiple times. Only N=<maximum tree depth> requests > are needed, regardless of the number of commits. My point was that if you can estimate it ("have the list of those 10 OIDs"), then you can just fetch it. This does send "quite a bit of OIDs", as you said below - I'll address it below. > > Another possible solution that has been discussed before (but a much > > more involved one) is to teach Git to be able to serve results of > > computations, and then have "blame" be able to stitch that with local > > data. (For example, "blame" could check the history of a certain path to > > find the commit(s) that the remote has information of, query the remote > > for those commits, and then stitch the results together with local > > history.) This scheme would work not only for "blame" but for things > > like "grep" (with history) and "log -S", whereas > > "--filter=sparse:parthlist" would only work with "blame". But > > admittedly, this solution is more involved. > > I understand that you're basically talking about implementing > prefetching in git itself? No - I did talk about prefetching earlier, but here I mean having Git on the server perform the "blame" computation itself. For example, let's say I want to run "blame" on foo.txt at HEAD. HEAD and HEAD^ are commits that only the local client has, whereas HEAD^^ was fetched from the remote. By comparing HEAD, HEAD^, and HEAD^^, Git knows which lines come from HEAD and HEAD^. For the rest, Git would make a request to the server, passing the commit ID and the path, and would get back a list of line numbers and commits. > To my understanding, this will still need > either the command I suggested, or implement graph walking with massive > OID requests as described above in 1)2)3)4). The latter will not require > protocol changes, but will involve sending quite a bit of OIDs around. Yes, prefetching will require graph walking with large OID requests but will not require protocol changes, as you say. I'm not too worried about the large numbers of OIDs - Git servers already have to support relatively large numbers of OIDs to support the bulk prefetch we do during things like checkout and diff.
next prev parent reply other threads:[~2020-10-26 19:46 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-20 17:09 Alexandr Miloslavskiy 2020-10-20 22:29 ` Taylor Blau 2020-10-21 17:10 ` Alexandr Miloslavskiy 2020-10-21 17:31 ` Taylor Blau 2020-10-21 17:46 ` Alexandr Miloslavskiy 2020-10-26 18:24 ` Jonathan Tan 2020-10-26 18:44 ` Alexandr Miloslavskiy 2020-10-26 19:46 ` Jonathan Tan [this message] 2020-10-26 20:08 ` Alexandr Miloslavskiy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: Questions about partial clone with '\''--filter=tree:0'\''' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).