* Re: [PATCH 0/2] Sparse checkout status @ 2020-06-17 7:40 Son Luong Ngoc 2020-06-17 16:48 ` Elijah Newren 0 siblings, 1 reply; 5+ messages in thread From: Son Luong Ngoc @ 2020-06-17 7:40 UTC (permalink / raw) To: gitgitgadget; +Cc: dstolee, git, newren Hi Elijah, > Some of the feedback of folks trying out sparse-checkouts at $dayjob is that > sparse checkouts can sometimes be disorienting; users can forget that they > had a sparse-checkout and then wonder where files went. I agree with this observation: that the current 'git sparse-checkout' experience could be a bit 'lost' for end users, who may or may not be familiar with git's 'arcane magic'. Currently the only way to verify what's going on is to either run 'tree <repo-root-dir>' or 'cat .git/info/sparse-checkout' (human-readable but not easy). > This series adds some output to 'git status' and modifies git-prompt slightly as an attempt > to help. This is a great idea but I suggest to put a config/flag to let users enable/disable this. Git status is often utilized in automated commands (IDE, shell prompt, etc...) and there may be performance implications down the line not being able to skip this bit of information out. > For reference, I suspect that in repositories that are large enough that > people always use sparse-checkouts (e.g. windows or office repos), that this > isn't a problem. But when the repository is approximately > linux-kernel-sized, then it is reasonable for some folks to have a full > checkout. sparse-checkouts, however, can provide various build system and > IDE performance improvements, so we have a split of users who have > sparse-checkouts and those who have full checkouts. It's easy for users who > are bridging in between the two worlds or just trying out sparse-checkouts > for the first time to get confused. One of our users noted that the experience is improved when combining 'git worktree' with sparse-checkout. That way you get the correct sparsity for the topic that you are working on. In a way, the current sparse-checkout experience is similar to a user running 'git checkout <rev>' directly instead of checking out a branch. It does not feel tangible and reproducible. I was hoping that these concerns will be addressed once the In-Tree Sparse-Checkout Definition RFC[1] patch landed. We should then be able to print out which Definition File(s) (we often call it manifests) were used, and ideally, only the top most file(s) in the inheritance tree. So the ideal experience, in my mind, is something of this sort: git sc init --cone # assuming a inherited from b and c git sc add --in-tree manifest-dir/module-a.manifest git sc add --in-tree manifest-dir/module-d.manifest git sc status Your sparse checkout includes following definition(s): (1) manifest-dir/module-a.manifest (2) manifest-dir/module-d.manifest git sc status --all Your sparse checkout includes following definition(s): (1) manifest-dir/module-a.manifest (2) manifest-dir/module-d.manifest (3) manifest-dir/module-b.manifest (included by 1) (4) manifest-dir/module-c.manifest (included by 1) I have a feeling that the current file skipped percentage prompt is not that useful or actionable to end-users, and they would still end up feeling lost/disoriented at the end. Thanks, Son Luong. [1]: https://lore.kernel.org/git/pull.627.git.1588857462.gitgitgadget@gmail.com/T/#u ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] Sparse checkout status 2020-06-17 7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc @ 2020-06-17 16:48 ` Elijah Newren 2020-06-17 17:58 ` Son Luong Ngoc 0 siblings, 1 reply; 5+ messages in thread From: Elijah Newren @ 2020-06-17 16:48 UTC (permalink / raw) To: Son Luong Ngoc Cc: Johannes Schindelin via GitGitGadget, Derrick Stolee, Git Mailing List Hi Son, Thanks for the feedback. On Wed, Jun 17, 2020 at 12:40 AM Son Luong Ngoc <sluongng@gmail.com> wrote: > > Hi Elijah, > > > Some of the feedback of folks trying out sparse-checkouts at $dayjob is that > > sparse checkouts can sometimes be disorienting; users can forget that they > > had a sparse-checkout and then wonder where files went. > > I agree with this observation: that the current 'git sparse-checkout' experience > could be a bit 'lost' for end users, who may or may not be familiar > with git's 'arcane magic'. > > Currently the only way to verify what's going on is to either run > 'tree <repo-root-dir>' > or 'cat .git/info/sparse-checkout' (human-readable but not easy). Well, there is `git sparse-checkout list`, assuming users know they are in a sparse-checkout, but the whole point of my suggested change is that they sometimes don't. > > This series adds some output to 'git status' and modifies git-prompt slightly as an attempt > > to help. > > This is a great idea but I suggest to put a config/flag to let users > enable/disable this. > > Git status is often utilized in automated commands (IDE, shell prompt, > etc...) and there may be > performance implications down the line not being able to skip this bit > of information out. This surprises me; I considered performance while writing it and kept it simple on that basis. In particular: * This does not cause any reading or writing of any extra files; it is done solely with information that is already loaded. * If users aren't in a sparse-checkout, their performance overhead is a single if-check, which I doubt anyone can measure. * If they are in a sparse-checkout, then they'd get one extra loop over files in the index to check the SKIP_WORKTREE bit. In which cases would performance implications be a concern? For a very simple point of reference, in a sparse-checkout of the linux kernel (using --cone mode and only selecting the drivers/ directory), I see the following timings for 'git status' in a clean checkout: Without my change: [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup 1 'git status' Benchmark #1: git status Time (mean ± σ): 78.8 ms ± 2.8 ms [User: 48.9 ms, System: 76.9 ms] Range (min … max): 74.0 ms … 84.0 ms 38 runs With my change: [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup 1 'git status' Benchmark #1: git status Time (mean ± σ): 79.8 ms ± 2.7 ms [User: 49.3 ms, System: 77.7 ms] Range (min … max): 74.8 ms … 84.5 ms 37 runs I know the linux kernel is tiny compared to repos like Windows or Office, but the relative scaling considerations are identical: it's one extra loop through the cached entries checking a bit for each entry. If people are worried about the "extra loop", I could find an existing loop to modify and just add an extra if-block in it so that we have the same number of loops. (I'm doubtful that'd actually help, but if the concern is an extra loop, it'd certainly avoid that.) Anyway, if you've got more information about it being too costly, I'm happy to listen. Otherwise, the overhead seems pretty small to me and it's only paid by those who would benefit from the information. However, all that said, I have good news: Peff already implemented the flag users can use to avoid this extra output, and did so back in September of 2009. It's called "--porcelain". Automated commands should already be using it, and if they aren't, they are what needs fixing -- not the long form status output. > > For reference, I suspect that in repositories that are large enough that > > people always use sparse-checkouts (e.g. windows or office repos), that this > > isn't a problem. But when the repository is approximately > > linux-kernel-sized, then it is reasonable for some folks to have a full > > checkout. sparse-checkouts, however, can provide various build system and > > IDE performance improvements, so we have a split of users who have > > sparse-checkouts and those who have full checkouts. It's easy for users who > > are bridging in between the two worlds or just trying out sparse-checkouts > > for the first time to get confused. > > One of our users noted that the experience is improved when combining > 'git worktree' with sparse-checkout. > That way you get the correct sparsity for the topic that you are working on. > > In a way, the current sparse-checkout experience is similar to a user > running 'git checkout <rev>' directly > instead of checking out a branch. > It does not feel tangible and reproducible. > > I was hoping that these concerns will be addressed once the In-Tree > Sparse-Checkout Definition RFC[1] patch landed. > We should then be able to print out which Definition File(s) (we often > call it manifests) were used, > and ideally, only the top most file(s) in the inheritance tree. > > So the ideal experience, in my mind, is something of this sort: > > git sc init --cone > > # assuming a inherited from b and c > git sc add --in-tree manifest-dir/module-a.manifest > git sc add --in-tree manifest-dir/module-d.manifest > > git sc status > Your sparse checkout includes following definition(s): > (1) manifest-dir/module-a.manifest > (2) manifest-dir/module-d.manifest > > git sc status --all > Your sparse checkout includes following definition(s): > (1) manifest-dir/module-a.manifest > (2) manifest-dir/module-d.manifest > (3) manifest-dir/module-b.manifest (included by 1) > (4) manifest-dir/module-c.manifest (included by 1) I think having a 'git sparse-checkout status' would be a fine subcommand, and output like the above -- possibly also including other bits Stolee or I mentioned elsewhere in this thread -- would be cool and would be helpful; it'd complement what I'm doing here quite nicely. But you're solving a related problem rather than the one I was focusing on, and you have left the issue I was focusing on unaddressed. In particular, if users forgot that they sparsified in the first place, how are they going to know to run `git sparse-checkout status [--all]`? I think having a simple line of output in `git status` would remind them. With that reminder, they could today then go run 'git sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses internally) or './sparsify --info' (as I use internally) to get more info. In the future we could provide additional things for them as well, such as your 'git sparse-checkout status'. An aside, though, since you linked to the in-tree sparse-checkout definitions: When I reviewed that series, the possibility of merge conflicts and not knowing what sparse-checkout should have checked out when the in-tree defintions themselves were in a conflicted state seemed to me to be a pretty tough sticking point. I'm hoping someone has a clever solution, but I still don't yet. Do you? Thanks, Elijah ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/2] Sparse checkout status 2020-06-17 16:48 ` Elijah Newren @ 2020-06-17 17:58 ` Son Luong Ngoc 2020-06-17 22:36 ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren 0 siblings, 1 reply; 5+ messages in thread From: Son Luong Ngoc @ 2020-06-17 17:58 UTC (permalink / raw) To: Elijah Newren Cc: Johannes Schindelin via GitGitGadget, Derrick Stolee, Git Mailing List Hi Elijah, On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote: > > Well, there is `git sparse-checkout list`, assuming users know they > are in a sparse-checkout, but the whole point of my suggested change > is that they sometimes don't. Ah thats true. This was added recently and definitely slipped my mind often. > > This surprises me; I considered performance while writing it and kept > it simple on that basis. In particular: > * This does not cause any reading or writing of any extra files; it > is done solely with information that is already loaded. > * If users aren't in a sparse-checkout, their performance overhead > is a single if-check, which I doubt anyone can measure. > * If they are in a sparse-checkout, then they'd get one extra loop > over files in the index to check the SKIP_WORKTREE bit. > > In which cases would performance implications be a concern? For a > very simple point of reference, in a sparse-checkout of the linux > kernel (using --cone mode and only selecting the drivers/ directory), > I see the following timings for 'git status' in a clean checkout: > > Without my change: > [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup > 1 'git status' > Benchmark #1: git status > Time (mean ± σ): 78.8 ms ± 2.8 ms [User: 48.9 ms, System: 76.9 ms] > Range (min … max): 74.0 ms … 84.0 ms 38 runs > > With my change: > [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup > 1 'git status' > Benchmark #1: git status > Time (mean ± σ): 79.8 ms ± 2.7 ms [User: 49.3 ms, System: 77.7 ms] > Range (min … max): 74.8 ms … 84.5 ms 37 runs > > I know the linux kernel is tiny compared to repos like Windows or > Office, but the relative scaling considerations are identical: it's > one extra loop through the cached entries checking a bit for each > entry. If people are worried about the "extra loop", I could find an > existing loop to modify and just add an extra if-block in it so that > we have the same number of loops. (I'm doubtful that'd actually help, > but if the concern is an extra loop, it'd certainly avoid that.) > Anyway, if you've got more information about it being too costly, I'm > happy to listen. Otherwise, the overhead seems pretty small to me and > it's only paid by those who would benefit from the information. > > However, all that said, I have good news: Peff already implemented the > flag users can use to avoid this extra output, and did so back in > September of 2009. It's called "--porcelain". Automated commands > should already be using it, and if they aren't, they are what needs > fixing -- not the long form status output. When I wrote my initial reaction, the idea of having more than just a percentage reported back stuck in my mind, specifically with using the in-tree checkout that I mentioned. But yeah, that's something down the line to address, you are absolutely correct that the current patch has no performance impact. Thanks for the reminder about '--porcelain'. > > I think having a 'git sparse-checkout status' would be a fine > subcommand, and output like the above -- possibly also including other > bits Stolee or I mentioned elsewhere in this thread -- would be cool > and would be helpful; it'd complement what I'm doing here quite > nicely. > > But you're solving a related problem rather than the one I was > focusing on, and you have left the issue I was focusing on > unaddressed. In particular, if users forgot that they sparsified in > the first place, how are they going to know to run `git > sparse-checkout status [--all]`? > > I think having a simple line of output in `git status` would remind > them. With that reminder, they could today then go run 'git > sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses > internally) or './sparsify --info' (as I use internally) to get more > info. In the future we could provide additional things for them as > well, such as your 'git sparse-checkout status'. > I do concede that this point could be a separate problem set and addressed separately in the future. > > An aside, though, since you linked to the in-tree sparse-checkout > definitions: When I reviewed that series, the possibility of merge > conflicts and not knowing what sparse-checkout should have checked out > when the in-tree defintions themselves were in a conflicted state > seemed to me to be a pretty tough sticking point. I'm hoping someone > has a clever solution, but I still don't yet. Do you? I am no clever person, but I often take great pleasure in reading up works of smarter people. One of which is the Google's and Facebook's Mercurial extension sets that they opensourced a while ago to support large repos. The test suite for FB's 'sparse' extension[1] may address your concerns? The 'sparse' extension defines the sparse checkout definition of a working repository. It supports '--enable-profile' which take in definition files ('.sparse'). These profiles are often checked into the root dir of the repo. > > Thanks, > Elijah Regards, Son Luong. [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) 2020-06-17 17:58 ` Son Luong Ngoc @ 2020-06-17 22:36 ` Elijah Newren 0 siblings, 0 replies; 5+ messages in thread From: Elijah Newren @ 2020-06-17 22:36 UTC (permalink / raw) To: Son Luong Ngoc; +Cc: Derrick Stolee, Git Mailing List Hi Son, On Wed, Jun 17, 2020 at 10:58 AM Son Luong Ngoc <sluongng@gmail.com> wrote: > > Hi Elijah, > > On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote: > > > > An aside, though, since you linked to the in-tree sparse-checkout > > definitions: When I reviewed that series, the possibility of merge > > conflicts and not knowing what sparse-checkout should have checked out > > when the in-tree defintions themselves were in a conflicted state > > seemed to me to be a pretty tough sticking point. I'm hoping someone > > has a clever solution, but I still don't yet. Do you? > > I am no clever person, but I often take great pleasure in reading up > works of smarter people. One of which is the Google's and Facebook's Mercurial > extension sets that they opensourced a while ago to support large repos. > > The test suite for FB's 'sparse' extension[1] may address your concerns? > > The 'sparse' extension defines the sparse checkout definition of a > working repository. It supports '--enable-profile' which take in definition > files ('.sparse'). These profiles are often checked into the root dir > of the repo. > > [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115 Ooh, interesting; thanks for the link. It provides an idea, though I'm not completely sure how it maps to our implementation. The test file says that during a merge you get "unioned files". It's not fully clear what union means, especially when the files have both includes and excludes. For example, does the union of matches mean a union of includes and an intersection of excludes? Also, digging a bit further, it appears mercurial requires all includes to be before all excludes[2]. But git's pattern specification used in .git/info/sparse-checkout (taken from .gitignore rules) allows includes and excludes to be arbitrarily interspersed, so what is an appropriate union in our case? (Can we sidestep this question by limiting the in-tree sparsity definitions to cone mode only, which then only have includes in the form of directory names, since that'd allow easy "unioning"?) A little more digging suggests that mercurial also only allows sparse definitions to be read from commits, not from the working tree[3]. That seems bad to me; it's too much of a pain for users who want to edit and test changes. Sure, if their first commit is bad they could `git commit --amend` after the fact, but I don't like forcing them through that workflow. (This is perhaps especially true if they're trying to fix the definition during a rebase; they shouldn't have to commit first to get a corrected sparsity definition, especially as that can easily mess up rebase state.) However, although I don't like reading sparsity definition from commits rather than the working tree, it probably did have an advantage in that it made it easier for mercurial folks to notice the union idea: since they only get sparsity patterns from revisions, they are kind of forced into thinking about getting them from both parents and then "doing a union". Anyway, following that logic, it'd be tempting to say that we limit the in-tree definitions to cone mode, and then if any of the definitions have conflicts then we just load stages 2 and 3 of the file and union them. But...what if stages 2 and 3 also have conflict markers in them (either because of recursive merges or the more involved rename/rename(2to1) cases)? How do we ensure a well defined "union" of values? I guess a similar question is what if users, while editing, fill the sparse definition file with syntax errors -- and maybe even commit it. Do we sparsify down to nothing? Expand out to everything? Ignore the lines that don't otherwise parse and just use the rest? Something else? The one other thing I noticed of interest from mercurial's sparsify was that it apparently suffers from the same problems we used to in git < 2.27.0: inability to update sparsity definitions when there are any dirty changes[4]. That was a huge pain point; I'm glad we're not stuck with that anymore. Anyway, the mercurial link certainly provides some ideas even if it doesn't answer all the questions. Thanks for pointing it out. Elijah [2] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_59 [3] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_123 [4] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_485 https://fossies.org/linux/mercurial/mercurial/sparse.py#l_526 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 0/2] Sparse checkout status @ 2020-06-16 23:33 Elijah Newren via GitGitGadget 0 siblings, 0 replies; 5+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-06-16 23:33 UTC (permalink / raw) To: git; +Cc: dstolee, Elijah Newren Some of the feedback of folks trying out sparse-checkouts at $dayjob is that sparse checkouts can sometimes be disorienting; users can forget that they had a sparse-checkout and then wonder where files went. This series adds some output to 'git status' and modifies git-prompt slightly as an attempt to help. For reference, I suspect that in repositories that are large enough that people always use sparse-checkouts (e.g. windows or office repos), that this isn't a problem. But when the repository is approximately linux-kernel-sized, then it is reasonable for some folks to have a full checkout. sparse-checkouts, however, can provide various build system and IDE performance improvements, so we have a split of users who have sparse-checkouts and those who have full checkouts. It's easy for users who are bridging in between the two worlds or just trying out sparse-checkouts for the first time to get confused. Elijah Newren (2): [RFC] wt-status: show sparse checkout status as well [RFC] git-prompt: include sparsity state as well contrib/completion/git-prompt.sh | 7 ++++++- wt-status.c | 35 ++++++++++++++++++++++++++++++++ wt-status.h | 1 + 3 files changed, 42 insertions(+), 1 deletion(-) base-commit: eebb51ba8cab97c0b3f3f18eaab7796803b8494b Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-808%2Fnewren%2Fsparse-checkout-status-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-808/newren/sparse-checkout-status-v1 Pull-Request: https://github.com/git/git/pull/808 -- gitgitgadget ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-06-17 22:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-17 7:40 [PATCH 0/2] Sparse checkout status Son Luong Ngoc 2020-06-17 16:48 ` Elijah Newren 2020-06-17 17:58 ` Son Luong Ngoc 2020-06-17 22:36 ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren -- strict thread matches above, loose matches on Subject: below -- 2020-06-16 23:33 [PATCH 0/2] Sparse checkout status Elijah Newren via GitGitGadget
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.