From: Elijah Newren <firstname.lastname@example.org> To: Ben Peart <email@example.com> Cc: "Nguyễn Thái Ngọc" <firstname.lastname@example.org>, "Ben Peart" <Ben.Peart@microsoft.com>, "Git Mailing List" <email@example.com>, "Junio C Hamano" <firstname.lastname@example.org>, "Eric Sunshine" <email@example.com> Subject: Re: [PATCH v3] checkout: optimize "git checkout -b <new_branch>" Date: Mon, 20 Aug 2018 11:16:39 -0700 Message-ID: <CABPp-BFKf2N6TYzCCneRwWUektMzRMnHLZ8JT64q=MGj5WQZkA@mail.gmail.com> (raw) In-Reply-To: <firstname.lastname@example.org> On Mon, Aug 20, 2018 at 6:40 AM Ben Peart <email@example.com> wrote: > On 8/18/2018 9:44 PM, Elijah Newren wrote: ... > > == My opinions == > > - The performance wins are big enough that I can see why Ben is pushing this. > > - I totally see Duy's point that more general optimizations would be > > really nice. > > - I really dislike the "giant list of checks" and the ongoing > > maintenance burden it implies. > > > > Overall, I have to side with Duy and say I don't think we should take > > the patch as-is. Since that'll be frustrating for Ben to hear, I've > > tried to come up with some additional alternatives: > > Thank you for the thorough review and summary. From my perspective, it > is both fair and accurate. > ... > > - Rewrite this patch so it instead does a very small set of checks at > > the beginning of cmd_checkout(); e.g. check if argc == 3 and argv > > == "-b" and if so then perform the minimum operations needed to create > > and checkout the new branch (maybe even calling in to cmd_branch() and > > cmd_symbolic_ref() and calling some reflog update function). Preface > > it with a comment that it's a performance hack that might eventually > > be able to go away. > > I'm happy to do this if it would make the patch better/more acceptable > to the community. For whatever it's worth, it'd change my vote. While I would like general optimizations more, I do think that's overall a longer term prospect, and the wins here are big enough to justify a performance hack -- if the amount of new code and added maintenance overhead is minimized. I think that checking for the args matching checkout -b rather than checking all flags that might imply something other than args being checkout -b would be a good way to help minimize both. > One other change I could do would be to address Duy's concern about the > config option would be to remove that entirely. That means those using > sparse-checkout don't have the option of getting the performance win but > the benefit is that there isn't any documentation or behavior changes > that need to be made - things just get a lot faster if you happen to use > the (rather common) "checkout -b" option. Playing with sparse-checkout, it feels to me like a half-baked feature. It seems like it required too much manual work, and it was sometimes hard to tell if I was misunderstanding configuration rules or I was just running into bugs in the code. I think I hit both but I didn't really want to get side-tracked further, yet. (I do want to eventually come back to it.) The only reason someone would go through that pain is if it provided massive performance benefits. Viewing things when-in-doubt through the lens of strict backwards compatibility may inadvertently mean enforcing buggy and/or slow behavior with sparse-checkouts. So, while git usually focuses strongly on backwards compatibility (which is a good thing), that might actually be wrong in the specific case of the sparse-checkout feature. In particular, since the whole point of the feature is essentially a performance hack, if the initial implementation has ways in which it really hurts performance, it seems like it'd be more in keeping with the point of the feature to take performance fixes and swallow subtle behavioral changes than to require special flags to get decent performance. That might sound like a long winded way of saying I'm totally in favor of dropping the config option in this case, but I'm also trying to build a case that I think there will be other situations where we want to subtly and maybe even not-so-subtly change behavior of sparse-checkouts to make them both more performant and far nicer for end users. I'm not sure others will agree with me on this argument, but sparse checkouts felt painful to me and I think we need something better. > If making these two changes would resolve the remaining concerns, I'm > happy to make them and send another patch series. If it will just > result in another round of "I'm still not comfortable with this" then > I'd rather spend my time elsewhere. :) That's fair. The first change would be enough to resolve the concern for me if others strongly push back on the second (config-option-removal) change, but I'd definitely prefer both changes -- fewer options means less code, less ongoing maintenance work, and less manual setup needed by users. > > - Possibly crazy idea for massive global performance win: modify how > > the cache works so that it can have tree entries instead of just file > > entries; make use of that with sparse checkouts (and partial clones?) > > so that the cache only stores a tree when that entire tree is > > "unwanted". Suddenly, git status, git add, git checkout, etc., etc. > > are all much faster. merge needs some special work to operate, > > though. > > Not crazy at all! It's something we've investigated and discussed > several times over the past couple of years. This is the largest > potential optimization that I am aware of as it has the potential to > dramatically speed up most git operations as they become O(what I care > about) vs O(size of repo/tree) - especially when paired with partial clone. > > The downside is pretty obvious, pulling this off means rewriting an > awful lot of the code! The index is central to so much of git and the > assumption is everywhere that the index is simply a (global) array of > cache entries to be looped through and manipulated at will. > > There would be a lot of refactoring work, creating a new index > abstraction API, updating the code to utilize it, then updating the > internal representation of the index to optimize it around the concept > of being sparse/partial. Likely, we'd want/need a new on disk format > that would support the new structures in an optimal way. Finally, the > various algorithms (like unpack_trees()) could be rewritten to take > advantage of the sparseness. > > Not something that can be taken on lightly - but certainly something > that has huge potential to enable git to scale to much larger repos. Well, the amount of work involved might qualify it as a "crazy" idea. :-) But, I'm glad others think it sounds like a useful path to purse. Not only would it be a tremendous help performance-wise, it might also be beneficial for another actually-crazy feature idea of mine.
next prev parent reply index Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-24 18:01 [PATCH v1] checkout: optionally speed up "git checkout -b foo" Ben Peart 2018-07-24 18:42 ` Eric Sunshine 2018-07-24 19:45 ` Ben Peart 2018-07-26 15:04 ` Junio C Hamano 2018-07-26 18:59 ` Eric Sunshine 2018-07-26 19:08 ` Eric Sunshine 2018-07-24 19:21 ` Junio C Hamano 2018-07-24 20:47 ` Ben Peart 2018-07-31 16:39 ` [PATCH v2] checkout: optimize "git checkout -b <new_branch>" Ben Peart 2018-07-31 20:01 ` Junio C Hamano 2018-08-01 15:10 ` Duy Nguyen 2018-08-02 18:02 ` Ben Peart 2018-08-03 15:58 ` Duy Nguyen 2018-08-06 14:25 ` Ben Peart 2018-08-15 21:05 ` Ben Peart 2018-08-05 8:57 ` Duy Nguyen 2018-08-16 18:27 ` [PATCH v3] " Ben Peart 2018-08-16 18:37 ` Duy Nguyen 2018-08-17 12:37 ` Ben Peart 2018-08-19 1:44 ` Elijah Newren 2018-08-20 13:40 ` Ben Peart 2018-08-20 18:16 ` Elijah Newren [this message] 2018-08-21 14:51 ` Duy Nguyen 2018-08-30 17:22 ` Elijah Newren 2018-09-04 16:46 ` Duy Nguyen 2018-08-20 18:31 ` Junio C Hamano 2018-09-18 5:34 ` [PATCH] config doc: add missing list separator for checkout.optimizeNewBranch Ævar Arnfjörð Bjarmason 2018-09-18 16:57 ` Taylor Blau 2018-09-18 17:16 ` Jeff King 2018-09-18 17:20 ` Taylor Blau 2018-09-18 17:13 ` Jeff King 2018-09-19 4:41 ` Ævar Arnfjörð Bjarmason
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CABPp-BFKf2N6TYzCCneRwWUektMzRMnHLZ8JT64q=MGj5WQZkA@mail.gmail.com' \ --firstname.lastname@example.org \ --cc=Ben.Peart@microsoft.com \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Mailing List Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/git/0 git/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 git git/ https://lore.kernel.org/git \ firstname.lastname@example.org public-inbox-index git Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git