* [PATCH 00/20] Sparse Index: Design, Format, Tests @ 2021-02-23 20:14 Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget ` (21 more replies) 0 siblings, 22 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee Here is the first full patch series submission coming out of the sparse-index RFC [1]. [1] https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ I won't waste too much space here, because PATCH 1 includes a sizeable design document that describes the feature, the reasoning behind it, and my plan for getting this implemented widely throughout the codebase. There are some new things here that were not in the RFC: * Design doc and format updates. (Patch 1) * Performance test script. (Patches 2 and 20) Notably missing in this series from the RFC: * The mega-patch inserting ensure_full_index() throughout the codebase. That will be a follow-up series to this one. * The integrations with git status and git add to demonstrate the improved performance. Those will also appear in their own series later. I plan to keep my latest work in this area in my 'sparse-index/wip' branch [2]. It includes all of the work from the RFC right now, updated with the work from this series. [2] https://github.com/derrickstolee/git/tree/sparse-index/wip Thanks, -Stolee Derrick Stolee (20): sparse-index: design doc and format update t/perf: add performance test for sparse operations t1092: clean up script quoting sparse-index: add guard to ensure full index sparse-index: implement ensure_full_index() t1092: compare sparse-checkout to sparse-index test-read-cache: print cache entries with --table test-tool: don't force full index unpack-trees: ensure full index sparse-checkout: hold pattern list in index sparse-index: convert from full to sparse submodule: sparse-index should not collapse links unpack-trees: allow sparse directories sparse-index: check index conversion happens sparse-index: create extension for compatibility sparse-checkout: toggle sparse index from builtin sparse-checkout: disable sparse-index cache-tree: integrate with sparse directory entries sparse-index: loose integration with cache_tree_verify() p2000: add sparse-index repos Documentation/config/extensions.txt | 7 + Documentation/git-sparse-checkout.txt | 14 ++ Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 167 +++++++++++++ Makefile | 1 + builtin/sparse-checkout.c | 44 +++- cache-tree.c | 40 ++++ cache.h | 12 +- read-cache.c | 35 ++- repo-settings.c | 15 ++ repository.c | 11 +- repository.h | 3 + setup.c | 3 + sparse-index.c | 290 +++++++++++++++++++++++ sparse-index.h | 11 + t/README | 3 + t/helper/test-read-cache.c | 61 ++++- t/perf/p2000-sparse-operations.sh | 104 ++++++++ t/t1091-sparse-checkout-builtin.sh | 13 + t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++-- unpack-trees.c | 16 +- 21 files changed, 953 insertions(+), 40 deletions(-) create mode 100644 Documentation/technical/sparse-index.txt create mode 100644 sparse-index.c create mode 100644 sparse-index.h create mode 100755 t/perf/p2000-sparse-operations.sh base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/883 -- gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 01/20] sparse-index: design doc and format update 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-24 1:13 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget ` (20 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This begins a long effort to update the index format to allow sparse directory entries. This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in their sparse-checkout definition. Currently, the index format is only updated in the presence of extensions.sparseIndex instead of increasing a file format version number. This is temporary, and index v5 is part of the plan for future work in this area. The design document details many of the reasons for embarking on this work, and also the plan for completing it safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 167 +++++++++++++++++++++++ 2 files changed, 174 insertions(+) create mode 100644 Documentation/technical/sparse-index.txt diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index b633482b1bdf..387126582556 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 000000000000..9070836f0655 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,167 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git worktree are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. +These dimensions are also ordered by how expensive they are per item: it +is expensive to detect a modified file than it is to write one that we +know must be populated; changing `HEAD` only really requires updating the +index. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +This addition of sparse-directory entries violates expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In addition, they expect to see all files at `HEAD`. One +way to handle this is to parse trees to replace a sparse-directory entry +with all of the files within that tree as the index is loaded. However, +parsing trees is slower than parsing the index format, so that is a slower +operation than if we left the index alone. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will expand the sparse-directory entries into +the full list of paths at `HEAD`. This will be slower in all cases. The +only noticable change in behavior will be that the serialized index file +contains sparse-directory entries. + +To start, we use a new repository extension, `extensions.sparseIndex`, to +allow inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, but it also prevents other +operations that do not use the index at all. A new format, index v5, will +be introduced that includes sparse-directory entries by default. It might +also introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. After these guards are in place, we can begin +leaving sparse-directory entries in the in-memory index structure. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 01/20] sparse-index: design doc and format update 2021-02-23 20:14 ` [PATCH 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-02-24 1:13 ` Elijah Newren 2021-02-25 15:29 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-24 1:13 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee, Matheus Tavares Bernardino On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > This begins a long effort to update the index format to allow sparse > directory entries. This should result in a significant improvement to > Git commands when HEAD contains millions of files, but the user has > selected many fewer files to keep in their sparse-checkout definition. > > Currently, the index format is only updated in the presence of > extensions.sparseIndex instead of increasing a file format version > number. This is temporary, and index v5 is part of the plan for future > work in this area. > > The design document details many of the reasons for embarking on this > work, and also the plan for completing it safely. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > Documentation/technical/index-format.txt | 7 + > Documentation/technical/sparse-index.txt | 167 +++++++++++++++++++++++ > 2 files changed, 174 insertions(+) > create mode 100644 Documentation/technical/sparse-index.txt > > diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt > index b633482b1bdf..387126582556 100644 > --- a/Documentation/technical/index-format.txt > +++ b/Documentation/technical/index-format.txt > @@ -44,6 +44,13 @@ Git index format > localization, no special casing of directory separator '/'). Entries > with the same name are sorted by their stage field. > > + An index entry typically represents a file. However, if sparse-checkout > + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the > + `extensions.sparseIndex` extension is enabled, then the index may > + contain entries for directories outside of the sparse-checkout definition. > + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and > + the path ends in a directory separator. > + > 32-bit ctime seconds, the last time a file's metadata changed > this is stat(2) data > > diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt > new file mode 100644 > index 000000000000..9070836f0655 > --- /dev/null > +++ b/Documentation/technical/sparse-index.txt > @@ -0,0 +1,167 @@ > +Git Sparse-Index Design Document > +================================ > + > +The sparse-checkout feature allows users to focus a working directory on > +a subset of the files at HEAD. The cone mode patterns, enabled by > +`core.sparseCheckoutCone`, allow for very fast pattern matching to > +discover which files at HEAD belong in the sparse-checkout cone. > + > +Three important scale dimensions for a Git worktree are: > + > +* `HEAD`: How many files are present at `HEAD`? > + > +* Populated: How many files are within the sparse-checkout cone. > + > +* Modified: How many files has the user modified in the working directory? > + > +We will use big-O notation -- O(X) -- to denote how expensive certain > +operations are in terms of these dimensions. > + > +These dimensions are ordered by their magnitude: users (typically) modify > +fewer files than are populated, and we can only populate files at `HEAD`. > +These dimensions are also ordered by how expensive they are per item: it > +is expensive to detect a modified file than it is to write one that we > +know must be populated; changing `HEAD` only really requires updating the > +index. > + > +Problems occur if there is an extreme imbalance in these dimensions. For > +example, if `HEAD` contains millions of paths but the populated set has > +only tens of thousands, then commands like `git status` and `git add` can > +be dominated by operations that require O(`HEAD`) operations instead of > +O(Populated). Primarily, the cost is in parsing and rewriting the index, > +which is filled primarily with files at `HEAD` that are marked with the > +`SKIP_WORKTREE` bit. > + > +The sparse-index intends to take these commands that read and modify the > +index from O(`HEAD`) to O(Populated). To do this, we need to modify the > +index format in a significant way: add "sparse directory" entries. > + > +With cone mode patterns, it is possible to detect when an entire > +directory will have its contents outside of the sparse-checkout definition. > +Instead of listing all of the files it contains as individual entries, a > +sparse-index contains an entry with the directory name, referencing the > +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. > +If we need to discover the details for paths within that directory, we > +can parse trees to find that list. > + > +This addition of sparse-directory entries violates expectations about the Violates current expectations, yes. Documentation tends to live a long time, and I suspect that 2-3 years from now reading this sentence might be jarring since we'll have modified the code to have an updated set of expectations. Maybe a simple "As of time of writing, ..." at the beginning of the sentence here? Or maybe I'm just being overly worried... > +index format and its in-memory data structure. There are many consumers in > +the codebase that expect to iterate through all of the index entries and > +see only files. In addition, they expect to see all files at `HEAD`. One > +way to handle this is to parse trees to replace a sparse-directory entry > +with all of the files within that tree as the index is loaded. However, > +parsing trees is slower than parsing the index format, so that is a slower > +operation than if we left the index alone. > + > +The implementation plan below follows four phases to slowly integrate with > +the sparse-index. The intention is to incrementally update Git commands to > +interact safely with the sparse-index without significant slowdowns. This > +may not always be possible, but the hope is that the primary commands that > +users need in their daily work are dramatically improved. > + > +Phase I: Format and initial speedups > +------------------------------------ > + > +During this phase, Git learns to enable the sparse-index and safely parse > +one. Protections are put in place so that every consumer of the in-memory > +data structure can operate with its current assumption of every file at > +`HEAD`. > + > +At first, every index parse will expand the sparse-directory entries into > +the full list of paths at `HEAD`. This will be slower in all cases. The > +only noticable change in behavior will be that the serialized index file noticeable > +contains sparse-directory entries. > + > +To start, we use a new repository extension, `extensions.sparseIndex`, to > +allow inserting sparse-directory entries into indexes with file format > +versions 2, 3, and 4. This prevents Git versions that do not understand > +the sparse-index from operating on one, but it also prevents other > +operations that do not use the index at all. A new format, index v5, will > +be introduced that includes sparse-directory entries by default. It might > +also introduce other features that have been considered for improving the > +index, as well. > + > +Next, consumers of the index will be guarded against operating on a > +sparse-index by inserting calls to `ensure_full_index()` or > +`expand_index_to_path()`. After these guards are in place, we can begin > +leaving sparse-directory entries in the in-memory index structure. > + > +Even after inserting these guards, we will keep expanding sparse-indexes > +for most Git commands using the `command_requires_full_index` repository > +setting. This setting will be on by default and disabled one builtin at a > +time until we have sufficient confidence that all of the index operations > +are properly guarded. > + > +To complete this phase, the commands `git status` and `git add` will be > +integrated with the sparse-index so that they operate with O(Populated) > +performance. They will be carefully tested for operations within and > +outside the sparse-checkout definition. Good plan so far, but there's something else bugging me a little here. One thing we noticed with our usage of `sparse-checkout` is that although unimportant _tracked_ files go away, leftover build files and other untracked files stick around. So, although 'git status' shouldn't have to check the tracked files anymore, it is still going to have to look at each of the *untracked* files and compare to .gitignore files in order to correctly classify each file as ignored or just plain untracked. Our `sparsify` tool has for a long time tried to warn about such files when changing the sparsity patterns/modules and had an --remove-old-ignores option for clearing out ignored files that are found within directories that are sparse (Meaning the directories where all files under them are marked SKIP_WORKTREE.). I was never sure whether a warning was enough, or if pushing that option more made sense, but about a month ago my colleagues made the tool just auto-invoke that option from other sparsify invocations. To my knowledge, there have been no complaints about that being automatically turned on; but there were complaints/confusion before about the directories being left around. (Of course, non-ignored files are still left around by that option.) I'm worried that since sparse-checkout doesn't do anything to help with all these untracked/ignored files, we might not get all the performance improvements we want from a `git status` with sparse directories. We'll be dropping from walking O(2*HEAD) files (1 source + 1 object file) down to O(HEAD) files (just the object files) rather than actually getting down to O(Populated). > + > +Phase II: Careful integrations > +------------------------------ > + > +This phase focuses on ensuring that all index extensions and APIs work > +well with a sparse-index. This requires significant increases to our test > +coverage, especially for operations that interact with the working > +directory outside of the sparse-checkout definition. Some of these > +behaviors may not be the desirable ones, such as some tests already > +marked for failure in `t1092-sparse-checkout-compatibility.sh`. > + > +The index extensions that may require special integrations are: > + > +* FS Monitor > +* Untracked cache > + > +While integrating with these features, we should look for patterns that > +might lead to better APIs for interacting with the index. Coalescing > +common usage patterns into an API call can reduce the number of places > +where sparse-directories need to be handled carefully. Makes sense. > +Phase III: Important command speedups > +------------------------------------- > + > +At this point, the patterns for testing and implementing sparse-directory > +logic should be relatively stable. This phase focuses on updating some of > +the most common builtins that use the index to operate as O(Populated). > +Here is a potential list of commands that could be valuable to integrate > +at this point: > + > +* `git commit` > +* `git checkout` > +* `git merge` > +* `git rebase` > + > +Along with `git status` and `git add`, these commands cover the majority > +of users' interactions with the working directory. Sounds like a good plan as well. I hope we get to make this specific to the merge-ort backend. It localizes the index-related code to (a) a call to unpack_trees() called from checkout-like code (which would probably automatically be handled by your updates to git checkout), and (b) a single function named record_conflicted_index_entries(). I feel it should be pretty easy to update. In contrast, the idea of attempting to update merge-recursive with this kind of change sounds overwhelming. > In addition, we can > +integrate with these commands: > + > +* `git grep` > +* `git rm` > + > +These have been proposed as some whose behavior could change when in a > +repo with a sparse-checkout definition. It would be good to include this > +behavior automatically when using a sparse-index. Some clarity is needed > +to make the behavior switch clear to the user. Is this leftover from before recent events? I think this portion of the document should just be stricken. I argued about how these were buggy as-is due SKIP_WORKTREE always having been an incomplete implementation of an idea at [1], but didn't hear a further response from you. I'm curious if you disagreed with my reasoning, or it just slipped through the cracks in a busy schedule and this portion of the document was leftover from before. In my opinion, both commands are just buggy and should be fixed for general sparse-checkout usage cases, not just for sparse-index. As for git grep, it has options for searching the working tree (default) OR searching the index (--cached) OR searching an old commit (passing a REVISION). But never some combination or more than one of these. The fact that it combined some in the cases of SKIP_WORKTREE entries looks entirely like a bug to me. For the same reasons I argued that --untracked and --cached are incompatible[2], we shouldn't be combining results from searching the working tree and searching the index. Luckily, this fix has already been submitted[3] and picked up in mt/grep-sparse-checkout and is marked in the cooking emails as "Will merge to next". As for git rm, I'll quote from my email to Matheus: """As far as the longer term discussion about making git rm configurable... _If_ it comes up again in the future, I will argue that if git rm should have configuration to delete paths outside the sparsity specification, then git add should have configuration to add paths outside the sparsity specification that happen to be present despite being SKIP_WORKTREE, that git diff with no revision arguments (nor --cached) should have configuration to diff against paths that are SKIP_WORKTREE but happen to be present, that git status should have configuration to report on changes to paths that are SKIP_WORKTREE but happen to be present, that git checkout should have configuration to write files to the working tree despite matching sparsity paths, etc. And I'll argue that you do ALL of those or you're being inconsistent. I hope that people see these are actually all the same request and that it is horribly inconsistent to do some of these and not others, and that at least by the time I get to mentioning checkout that they realize it's a crazy request. We should just tell users to extend their sparsity if they want the working copy (and commands that interact with the working copy) to handle the additional paths. Maybe I'm just really biased, but I don't see how this makes sense. I would argue more about it, but no one has responded. My plan was to just fix the default behavior, and then see if anyone ever actually cared enough to come back and ask for more configurability.""" Also, for rm, Matheus has already submitted the fix[4], though at Junio's request he separated out some fixes for git-add as a separate preliminary series[5] and then will resubmit the other `add` and `rm` fixes. [1] https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/ [2] https://lore.kernel.org/git/xmqqtuql0yfp.fsf@gitster.c.googlers.com/ [3] https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/ [4] https://lore.kernel.org/git/61a77cd5f45ba02c7dff4b7932abdebb17c1667f.1613593946.git.matheus.bernardino@usp.br/ [5] https://lore.kernel.org/git/cover.1614037664.git.matheus.bernardino@usp.br/ Anyway, that's a long way of saying I think this section of your document is already obsolete. (Which is a good thing -- less work to do to get sparse-index working. Thanks, Matheus!). > +This phase is the first where parallel work might be possible without too > +much conflicts between topics. > + > +Phase IV: The long tail > +----------------------- > + > +This last phase is less a "phase" and more "the new normal" after all of > +the previous work. > + > +To start, the `command_requires_full_index` option could be removed in > +favor of expanding only when hitting an API guard. > + > +There are many Git commands that could use special attention to operate as > +O(Populated), while some might be so rare that it is acceptable to leave > +them with additional overhead when a sparse-index is present. > + > +Here are some commands that might be useful to update: > + > +* `git sparse-checkout set` > +* `git am` > +* `git clean` > +* `git stash` Oh, man, git stash is definitely in need of work. It's still a minimalistic transliteration of shell to C, complete with lots of process forking and piping output between various low-level commands. It might be interesting to rewrite this in terms of the merge machinery, though its separate stashing of staged stuff, unstaged stuff, and possibly untracked stuff means that there is a sequence of two or three merges needed and interesting failure handling to do if those merges fail, especially if the user uses --index. But I digress... Anyway, overall, very nicely written and planned out. Thanks for taking the time to write this all up. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 01/20] sparse-index: design doc and format update 2021-02-24 1:13 ` Elijah Newren @ 2021-02-25 15:29 ` Derrick Stolee 2021-02-25 20:14 ` Elijah Newren 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-02-25 15:29 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee, Matheus Tavares Bernardino On 2/23/2021 8:13 PM, Elijah Newren wrote: > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote:>> +This addition of sparse-directory entries violates expectations about the > > Violates current expectations, yes. Documentation tends to live a > long time, and I suspect that 2-3 years from now reading this sentence > might be jarring since we'll have modified the code to have an updated > set of expectations. Maybe a simple "As of time of writing, ..." at > the beginning of the sentence here? Or maybe I'm just being overly > worried... I was hoping that the phrase "this addition of" places this statement in a moment of time where sparse-directory entries didn't exist, but now they will. I'm open to alternatives and will give this some thought. >> +To complete this phase, the commands `git status` and `git add` will be >> +integrated with the sparse-index so that they operate with O(Populated) >> +performance. They will be carefully tested for operations within and >> +outside the sparse-checkout definition. > > Good plan so far, but there's something else bugging me a little here. > One thing we noticed with our usage of `sparse-checkout` is that > although unimportant _tracked_ files go away, leftover build files and > other untracked files stick around. So, although 'git status' > shouldn't have to check the tracked files anymore, it is still going > to have to look at each of the *untracked* files and compare to > .gitignore files in order to correctly classify each file as ignored > or just plain untracked. Our `sparsify` tool has for a long time > tried to warn about such files when changing the sparsity > patterns/modules and had an --remove-old-ignores option for clearing > out ignored files that are found within directories that are sparse > (Meaning the directories where all files under them are marked > SKIP_WORKTREE.). I was never sure whether a warning was enough, or if > pushing that option more made sense, but about a month ago my > colleagues made the tool just auto-invoke that option from other > sparsify invocations. To my knowledge, there have been no complaints > about that being automatically turned on; but there were > complaints/confusion before about the directories being left around. > (Of course, non-ignored files are still left around by that option.) > > I'm worried that since sparse-checkout doesn't do anything to help > with all these untracked/ignored files, we might not get all the > performance improvements we want from a `git status` with sparse > directories. We'll be dropping from walking O(2*HEAD) files (1 source > + 1 object file) down to O(HEAD) files (just the object files) rather > than actually getting down to O(Populated). This definitely seems like a valuable _enhancement_ to sparse-checkout that could occur in parallel. For a workaround in the moment: is "git clean -xdf" sufficient to help these users? >> +Phase III: Important command speedups >> +------------------------------------- >> + >> +At this point, the patterns for testing and implementing sparse-directory >> +logic should be relatively stable. This phase focuses on updating some of >> +the most common builtins that use the index to operate as O(Populated). >> +Here is a potential list of commands that could be valuable to integrate >> +at this point: >> + >> +* `git commit` >> +* `git checkout` >> +* `git merge` >> +* `git rebase` >> + >> +Along with `git status` and `git add`, these commands cover the majority >> +of users' interactions with the working directory. > > Sounds like a good plan as well. > > I hope we get to make this specific to the merge-ort backend. It > localizes the index-related code to (a) a call to unpack_trees() > called from checkout-like code (which would probably automatically be > handled by your updates to git checkout), and (b) a single function > named record_conflicted_index_entries(). I feel it should be pretty > easy to update. > > In contrast, the idea of attempting to update merge-recursive with > this kind of change sounds overwhelming. Yes, I'm hoping to eventually say "if you are in a sparse-checkout, then you should use ORT by default." Then, if someone opts to do merge-recursive instead, then they pay the index expansion cost. While this is very clear in my head, it might be worth mentioning that explicitly here. >> In addition, we can >> +integrate with these commands: >> + >> +* `git grep` >> +* `git rm` >> + >> +These have been proposed as some whose behavior could change when in a >> +repo with a sparse-checkout definition. It would be good to include this >> +behavior automatically when using a sparse-index. Some clarity is needed >> +to make the behavior switch clear to the user. > > Is this leftover from before recent events? I think this portion of > the document should just be stricken. > > I argued about how these were buggy as-is due SKIP_WORKTREE always > having been an incomplete implementation of an idea at [1], but didn't > hear a further response from you. I'm curious if you disagreed with > my reasoning, or it just slipped through the cracks in a busy schedule > and this portion of the document was leftover from before. In my > opinion, both commands are just buggy and should be fixed for general > sparse-checkout usage cases, not just for sparse-index. This is definitely a case of "I've been too busy to read those topics in detail." I figured that there was something going on that was relevant to the sparse-checkout and would affect the sparse-index, but I shelved it in my mind until I had space to think about it directly. > Anyway, that's a long way of saying I think this section of your > document is already obsolete. (Which is a good thing -- less work to > do to get sparse-index working. Thanks, Matheus!). Thank you for your summary, which helps a lot. Thanks, Matheus, too! If those fixes already include coverage for the behavior, then I'll see if they also translate to sparse-index tests easily. I feel like a lot of these later contributions will be more about adding tests than actually writing a lot of code. >> +This phase is the first where parallel work might be possible without too >> +much conflicts between topics. >> + >> +Phase IV: The long tail >> +----------------------- >> + >> +This last phase is less a "phase" and more "the new normal" after all of >> +the previous work. >> + >> +To start, the `command_requires_full_index` option could be removed in >> +favor of expanding only when hitting an API guard. >> + >> +There are many Git commands that could use special attention to operate as >> +O(Populated), while some might be so rare that it is acceptable to leave >> +them with additional overhead when a sparse-index is present. >> + >> +Here are some commands that might be useful to update: >> + >> +* `git sparse-checkout set` >> +* `git am` >> +* `git clean` >> +* `git stash` > > Oh, man, git stash is definitely in need of work. It's still a > minimalistic transliteration of shell to C, complete with lots of > process forking and piping output between various low-level commands. > It might be interesting to rewrite this in terms of the merge > machinery, though its separate stashing of staged stuff, unstaged > stuff, and possibly untracked stuff means that there is a sequence of > two or three merges needed and interesting failure handling to do if > those merges fail, especially if the user uses --index. But I > digress... I would prefer to leave 'git stash' alone, but it's used by enough people that I need to care about it eventually. > Anyway, overall, very nicely written and planned out. Thanks for > taking the time to write this all up. Thanks for your detailed comments! -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 01/20] sparse-index: design doc and format update 2021-02-25 15:29 ` Derrick Stolee @ 2021-02-25 20:14 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-02-25 20:14 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee, Matheus Tavares Bernardino On Thu, Feb 25, 2021 at 7:29 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 2/23/2021 8:13 PM, Elijah Newren wrote: > > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote:>> +This addition of sparse-directory entries violates expectations about the > > > > Violates current expectations, yes. Documentation tends to live a > > long time, and I suspect that 2-3 years from now reading this sentence > > might be jarring since we'll have modified the code to have an updated > > set of expectations. Maybe a simple "As of time of writing, ..." at > > the beginning of the sentence here? Or maybe I'm just being overly > > worried... > > I was hoping that the phrase "this addition of" places this statement in > a moment of time where sparse-directory entries didn't exist, but now they > will. I'm open to alternatives and will give this some thought. I already listed my only suggestion -- adding a "As of time of writing," at the beginning. I'm totally open to other proposals/suggestions, and it's admittedly a minor point so you can feel free to just ignore it if we can't come up with wording everyone likes. > > >> +To complete this phase, the commands `git status` and `git add` will be > >> +integrated with the sparse-index so that they operate with O(Populated) > >> +performance. They will be carefully tested for operations within and > >> +outside the sparse-checkout definition. > > > > Good plan so far, but there's something else bugging me a little here. > > One thing we noticed with our usage of `sparse-checkout` is that > > although unimportant _tracked_ files go away, leftover build files and > > other untracked files stick around. So, although 'git status' > > shouldn't have to check the tracked files anymore, it is still going > > to have to look at each of the *untracked* files and compare to > > .gitignore files in order to correctly classify each file as ignored > > or just plain untracked. Our `sparsify` tool has for a long time > > tried to warn about such files when changing the sparsity > > patterns/modules and had an --remove-old-ignores option for clearing > > out ignored files that are found within directories that are sparse > > (Meaning the directories where all files under them are marked > > SKIP_WORKTREE.). I was never sure whether a warning was enough, or if > > pushing that option more made sense, but about a month ago my > > colleagues made the tool just auto-invoke that option from other > > sparsify invocations. To my knowledge, there have been no complaints > > about that being automatically turned on; but there were > > complaints/confusion before about the directories being left around. > > (Of course, non-ignored files are still left around by that option.) > > > > I'm worried that since sparse-checkout doesn't do anything to help > > with all these untracked/ignored files, we might not get all the > > performance improvements we want from a `git status` with sparse > > directories. We'll be dropping from walking O(2*HEAD) files (1 source > > + 1 object file) down to O(HEAD) files (just the object files) rather > > than actually getting down to O(Populated). > > This definitely seems like a valuable _enhancement_ to sparse-checkout > that could occur in parallel. Yes, indeed. Your discussion of performance just reminded me of it, and since this idea might be important in order to drive the costs down to O(populated) in practice, I thought I'd mention it. > For a workaround in the moment: is "git clean -xdf" sufficient to help > these users? Not really; that wouldn't remove the ignored stuff (build files) under sparsified directories which is the point. (Builds build everything over here; once you sparsify you have leftover build files from projects you now don't care about.) If you convert it to "git clean -Xdf" then you're closer, but that wouldn't just remove builds info from sparse projects, it'd force users to rebuild all the stuff they're interested in. It's close though; what's wanted is basically a special flag that runs "git clean -Xf <long list of sparsified directories>", without the user having to specify 300 directories. However, for now, since I've got a 'sparsify' script anyway (needed for determining inter-module dependencies and certain directories that always need to be present, etc.), it just has a flag for running "git clean -Xf <long list of sparsified directories>" since it has logic to compute what all those directories are anyway. > >> +Phase III: Important command speedups > >> +------------------------------------- > >> + > >> +At this point, the patterns for testing and implementing sparse-directory > >> +logic should be relatively stable. This phase focuses on updating some of > >> +the most common builtins that use the index to operate as O(Populated). > >> +Here is a potential list of commands that could be valuable to integrate > >> +at this point: > >> + > >> +* `git commit` > >> +* `git checkout` > >> +* `git merge` > >> +* `git rebase` > >> + > >> +Along with `git status` and `git add`, these commands cover the majority > >> +of users' interactions with the working directory. > > > > Sounds like a good plan as well. > > > > I hope we get to make this specific to the merge-ort backend. It > > localizes the index-related code to (a) a call to unpack_trees() > > called from checkout-like code (which would probably automatically be > > handled by your updates to git checkout), and (b) a single function > > named record_conflicted_index_entries(). I feel it should be pretty > > easy to update. > > > > In contrast, the idea of attempting to update merge-recursive with > > this kind of change sounds overwhelming. > > Yes, I'm hoping to eventually say "if you are in a sparse-checkout, then > you should use ORT by default." Then, if someone opts to do merge-recursive > instead, then they pay the index expansion cost. > > While this is very clear in my head, it might be worth mentioning that > explicitly here. :+1: > >> In addition, we can > >> +integrate with these commands: > >> + > >> +* `git grep` > >> +* `git rm` > >> + > >> +These have been proposed as some whose behavior could change when in a > >> +repo with a sparse-checkout definition. It would be good to include this > >> +behavior automatically when using a sparse-index. Some clarity is needed > >> +to make the behavior switch clear to the user. > > > > Is this leftover from before recent events? I think this portion of > > the document should just be stricken. > > > > I argued about how these were buggy as-is due SKIP_WORKTREE always > > having been an incomplete implementation of an idea at [1], but didn't > > hear a further response from you. I'm curious if you disagreed with > > my reasoning, or it just slipped through the cracks in a busy schedule > > and this portion of the document was leftover from before. In my > > opinion, both commands are just buggy and should be fixed for general > > sparse-checkout usage cases, not just for sparse-index. > > This is definitely a case of "I've been too busy to read those topics > in detail." I figured that there was something going on that was relevant > to the sparse-checkout and would affect the sparse-index, but I shelved > it in my mind until I had space to think about it directly. > > > Anyway, that's a long way of saying I think this section of your > > document is already obsolete. (Which is a good thing -- less work to > > do to get sparse-index working. Thanks, Matheus!). > > Thank you for your summary, which helps a lot. Thanks, Matheus, too! > If those fixes already include coverage for the behavior, then I'll see > if they also translate to sparse-index tests easily. > > I feel like a lot of these later contributions will be more about adding > tests than actually writing a lot of code. > > >> +This phase is the first where parallel work might be possible without too > >> +much conflicts between topics. > >> + > >> +Phase IV: The long tail > >> +----------------------- > >> + > >> +This last phase is less a "phase" and more "the new normal" after all of > >> +the previous work. > >> + > >> +To start, the `command_requires_full_index` option could be removed in > >> +favor of expanding only when hitting an API guard. > >> + > >> +There are many Git commands that could use special attention to operate as > >> +O(Populated), while some might be so rare that it is acceptable to leave > >> +them with additional overhead when a sparse-index is present. > >> + > >> +Here are some commands that might be useful to update: > >> + > >> +* `git sparse-checkout set` > >> +* `git am` > >> +* `git clean` > >> +* `git stash` > > > > Oh, man, git stash is definitely in need of work. It's still a > > minimalistic transliteration of shell to C, complete with lots of > > process forking and piping output between various low-level commands. > > It might be interesting to rewrite this in terms of the merge > > machinery, though its separate stashing of staged stuff, unstaged > > stuff, and possibly untracked stuff means that there is a sequence of > > two or three merges needed and interesting failure handling to do if > > those merges fail, especially if the user uses --index. But I > > digress... > > I would prefer to leave 'git stash' alone, but it's used by enough > people that I need to care about it eventually. Oh, it can definitely come later. And I agree about the desirability of touching that code; I was avoiding it for a long time, but there was one important sparse-checkout-related bug recently[1] so I've already been forced to touch it once. That might mean I'm (eventually) on the hook to make it sparse-index friendly, especially since it might involve using merge-ort to do so... [1] https://lore.kernel.org/git/pull.919.git.git.1605891222.gitgitgadget@gmail.com/ > > Anyway, overall, very nicely written and planned out. Thanks for > > taking the time to write this all up. > > Thanks for your detailed comments! > -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 02/20] t/perf: add performance test for sparse operations 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-24 2:30 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget ` (19 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Create a test script that takes the default performance test (the Git codebase) and multiplies it by 256 using four layers of duplicated trees of width four. This results in nearly one million blob entries in the index. Then, we can clone this repository with sparse-checkout patterns that demonstrate four copies of the initial repository. Each clone will use a different index format or mode so peformance can be tested across the different options. Note that the initial repo is stripped of submodules before doing the copies. This preserves the expected data shape of the sparse index, because directories containing submodules are not collapsed to a sparse directory entry. Run a few Git commands on these clones, especially those that use the index (status, add, commit). Here are the results on my Linux machine: Test -------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.37(0.30+0.09) 2000.3: git status (full-index-v4) 0.39(0.32+0.10) 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) It is perhaps noteworthy that there is an improvement when using index version 4. This is because the v3 index uses 108 MiB while the v4 index uses 80 MiB. Since the repeated portions of the directories are very short (f3/f1/f2, for example) this ratio is less pronounced than in similarly-sized real repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 87 +++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100755 t/perf/p2000-sparse-operations.sh diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh new file mode 100755 index 000000000000..52597683376e --- /dev/null +++ b/t/perf/p2000-sparse-operations.sh @@ -0,0 +1,87 @@ +#!/bin/sh + +test_description="test performance of Git operations using the index" + +. ./perf-lib.sh + +test_perf_default_repo + +SPARSE_CONE=f2/f4/f1 + +test_expect_success 'setup repo and indexes' ' + git reset --hard HEAD && + # Remove submodules from the example repo, because our + # duplication of the entire repo creates an unlikly data shape. + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && + rm -f .gitmodules && + git add .gitmodules && + for module in $(awk "{print \$2}" modules) + do + git rm $module || return 1 + done && + git add . && + git commit -m "remove submodules" && + + echo bogus >a && + cp a b && + git add a b && + git commit -m "level 0" && + BLOB=$(git rev-parse HEAD:a) && + OLD_COMMIT=$(git rev-parse HEAD) && + OLD_TREE=$(git rev-parse HEAD^{tree}) && + + for i in $(test_seq 1 4) + do + cat >in <<-EOF && + 100755 blob $BLOB a + 040000 tree $OLD_TREE f1 + 040000 tree $OLD_TREE f2 + 040000 tree $OLD_TREE f3 + 040000 tree $OLD_TREE f4 + EOF + NEW_TREE=$(git mktree <in) && + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && + OLD_TREE=$NEW_TREE && + OLD_COMMIT=$NEW_COMMIT || return 1 + done && + + git sparse-checkout init --cone && + git branch -f wide $OLD_COMMIT && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && + ( + cd full-index-v3 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && + ( + cd full-index-v4 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 + ) +' + +test_perf_on_all () { + command="$@" + for repo in full-index-v3 full-index-v4 + do + test_perf "$command ($repo)" " + ( + cd $repo && + echo >>$SPARSE_CONE/a && + $command + ) + " + done +} + +test_perf_on_all git status +test_perf_on_all git add -A +test_perf_on_all git add . +test_perf_on_all git commit -a -m A + +test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 02/20] t/perf: add performance test for sparse operations 2021-02-23 20:14 ` [PATCH 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-02-24 2:30 ` Elijah Newren 2021-03-09 20:03 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-24 2:30 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Create a test script that takes the default performance test (the Git > codebase) and multiplies it by 256 using four layers of duplicated > trees of width four. This results in nearly one million blob entries in > the index. Then, we can clone this repository with sparse-checkout > patterns that demonstrate four copies of the initial repository. Each > clone will use a different index format or mode so peformance can be > tested across the different options. > > Note that the initial repo is stripped of submodules before doing the > copies. This preserves the expected data shape of the sparse index, > because directories containing submodules are not collapsed to a sparse > directory entry. > > Run a few Git commands on these clones, especially those that use the > index (status, add, commit). > > Here are the results on my Linux machine: > > Test > -------------------------------------------------------------- > 2000.2: git status (full-index-v3) 0.37(0.30+0.09) > 2000.3: git status (full-index-v4) 0.39(0.32+0.10) > 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) > 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) > 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) > 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) > 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) > 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) > > It is perhaps noteworthy that there is an improvement when using index > version 4. This is because the v3 index uses 108 MiB while the v4 > index uses 80 MiB. Since the repeated portions of the directories are > very short (f3/f1/f2, for example) this ratio is less pronounced than in > similarly-sized real repositories. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/perf/p2000-sparse-operations.sh | 87 +++++++++++++++++++++++++++++++ > 1 file changed, 87 insertions(+) > create mode 100755 t/perf/p2000-sparse-operations.sh > > diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh > new file mode 100755 > index 000000000000..52597683376e > --- /dev/null > +++ b/t/perf/p2000-sparse-operations.sh > @@ -0,0 +1,87 @@ > +#!/bin/sh > + > +test_description="test performance of Git operations using the index" > + > +. ./perf-lib.sh > + > +test_perf_default_repo > + > +SPARSE_CONE=f2/f4/f1 > + > +test_expect_success 'setup repo and indexes' ' > + git reset --hard HEAD && > + # Remove submodules from the example repo, because our > + # duplication of the entire repo creates an unlikly data shape. > + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && > + rm -f .gitmodules && > + git add .gitmodules && Why not `git rm [-f] .gitmodules` instead of these two commands? Is there something special about .gitmodules that requires this special handling? > + for module in $(awk "{print \$2}" modules) > + do > + git rm $module || return 1 > + done && > + git add . && What does the `git add .` do? I don't see any changes there weren't already git-add'ed or git-rm'ed. > + git commit -m "remove submodules" && > + > + echo bogus >a && > + cp a b && > + git add a b && > + git commit -m "level 0" && > + BLOB=$(git rev-parse HEAD:a) && > + OLD_COMMIT=$(git rev-parse HEAD) && > + OLD_TREE=$(git rev-parse HEAD^{tree}) && > + > + for i in $(test_seq 1 4) > + do > + cat >in <<-EOF && > + 100755 blob $BLOB a > + 040000 tree $OLD_TREE f1 > + 040000 tree $OLD_TREE f2 > + 040000 tree $OLD_TREE f3 > + 040000 tree $OLD_TREE f4 > + EOF > + NEW_TREE=$(git mktree <in) && > + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && > + OLD_TREE=$NEW_TREE && > + OLD_COMMIT=$NEW_COMMIT || return 1 > + done && > + > + git sparse-checkout init --cone && > + git branch -f wide $OLD_COMMIT && > + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && > + ( > + cd full-index-v3 && > + git sparse-checkout init --cone && > + git sparse-checkout set $SPARSE_CONE && > + git config index.version 3 && > + git update-index --index-version=3 > + ) && > + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && > + ( > + cd full-index-v4 && > + git sparse-checkout init --cone && > + git sparse-checkout set $SPARSE_CONE && > + git config index.version 4 && > + git update-index --index-version=4 > + ) > +' > + > +test_perf_on_all () { > + command="$@" > + for repo in full-index-v3 full-index-v4 > + do > + test_perf "$command ($repo)" " > + ( > + cd $repo && > + echo >>$SPARSE_CONE/a && > + $command > + ) > + " > + done > +} > + > +test_perf_on_all git status > +test_perf_on_all git add -A > +test_perf_on_all git add . > +test_perf_on_all git commit -a -m A > + > +test_done > -- > gitgitgadget Other than the two minor questions, the rest looks good to me. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 02/20] t/perf: add performance test for sparse operations 2021-02-24 2:30 ` Elijah Newren @ 2021-03-09 20:03 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 20:03 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 2/23/2021 9:30 PM, Elijah Newren wrote: > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > +test_expect_success 'setup repo and indexes' ' > + git reset --hard HEAD && > + # Remove submodules from the example repo, because our > + # duplication of the entire repo creates an unlikly data shape. > + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && > + rm -f .gitmodules && > + git add .gitmodules && > Why not `git rm [-f] .gitmodules` instead of these two commands? Is > there something special about .gitmodules that requires this special > handling? No, I'm just being sloppy. Will clean up. >> + for module in $(awk "{print \$2}" modules) >> + do >> + git rm $module || return 1 >> + done && >> + git add . && > What does the `git add .` do? I don't see any changes there weren't > already git-add'ed or git-rm'ed. Same here. Thanks. -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 03/20] t1092: clean up script quoting 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget ` (18 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This test was introduced in 19a0acc83e4 (t1092: test interesting sparse-checkout scenarios, 2021-01-23), but these issues with quoting were not noticed until starting this follow-up series. The old mechanism would drop quoting such as in test_all_match git commit -m "touch README.md" The above happened to work because README.md is a file in the repository, so 'git commit -m touch REAMDE.md' would succeed by accident. Other cases included quoting for no good reason, so clean that up now. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 8cd3e5a8d227..3725d3997e70 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -96,20 +96,20 @@ init_repos () { run_on_sparse () { ( cd sparse-checkout && - $* >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) } run_on_all () { ( cd full-checkout && - $* >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && - run_on_sparse $* + run_on_sparse "$@" } test_all_match () { - run_on_all $* && + run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && test_cmp full-checkout-err sparse-checkout-err } @@ -119,7 +119,7 @@ test_expect_success 'status with options' ' test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && - run_on_all "touch README.md" && + run_on_all touch README.md && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -135,7 +135,7 @@ test_expect_success 'add, commit, checkout' ' write_script edit-contents <<-\EOF && echo text >>$1 EOF - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add README.md && test_all_match git status --porcelain=v2 && @@ -144,7 +144,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add -A && test_all_match git status --porcelain=v2 && @@ -153,7 +153,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents deep/newfile" && + run_on_all ../edit-contents deep/newfile && test_all_match git status --porcelain=v2 -uno && test_all_match git status --porcelain=v2 && @@ -186,7 +186,7 @@ test_expect_success 'diff --staged' ' write_script edit-contents <<-\EOF && echo text >>README.md EOF - run_on_all "../edit-contents" && + run_on_all ../edit-contents && test_all_match git diff && test_all_match git diff --staged && @@ -280,7 +280,7 @@ test_expect_success 'clean' ' echo bogus >>.gitignore && run_on_all cp ../.gitignore . && test_all_match git add .gitignore && - test_all_match git commit -m ignore-bogus-files && + test_all_match git commit -m "ignore bogus files" && run_on_sparse mkdir folder1 && run_on_all touch folder1/bogus && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 04/20] sparse-index: add guard to ensure full index 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-24 2:44 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget ` (17 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Makefile | 1 + repo-settings.c | 8 ++++++++ repository.c | 11 ++++++++++- repository.h | 2 ++ sparse-index.c | 8 ++++++++ sparse-index.h | 7 +++++++ 6 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 sparse-index.c create mode 100644 sparse-index.h diff --git a/Makefile b/Makefile index 5a239cac20e3..3bf61699238d 100644 --- a/Makefile +++ b/Makefile @@ -980,6 +980,7 @@ LIB_OBJS += setup.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-index.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/repo-settings.c b/repo-settings.c index f7fff0f5ab83..d63569e4041e 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r) UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP); UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT); + + /* + * This setting guards all index reads to require a full index + * over a sparse index. After suitable guards are placed in the + * codebase around uses of the index, this setting will be + * removed. + */ + r->settings.command_requires_full_index = 1; } diff --git a/repository.c b/repository.c index c98298acd017..a8acae002f71 100644 --- a/repository.c +++ b/repository.c @@ -10,6 +10,7 @@ #include "object.h" #include "lockfile.h" #include "submodule-config.h" +#include "sparse-index.h" /* The main repository */ static struct repository the_repo; @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo) int repo_read_index(struct repository *repo) { + int res; + if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo) else if (repo->index->repo != repo) BUG("repo's index should point back at itself"); - return read_index_from(repo->index, repo->index_file, repo->gitdir); + res = read_index_from(repo->index, repo->index_file, repo->gitdir); + + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) + ensure_full_index(repo->index); + + return res; } int repo_hold_locked_index(struct repository *repo, diff --git a/repository.h b/repository.h index b385ca3c94b6..e06a23015697 100644 --- a/repository.h +++ b/repository.h @@ -41,6 +41,8 @@ struct repo_settings { enum fetch_negotiation_setting fetch_negotiation_algorithm; int core_multi_pack_index; + + unsigned command_requires_full_index:1; }; struct repository { diff --git a/sparse-index.c b/sparse-index.c new file mode 100644 index 000000000000..82183ead563b --- /dev/null +++ b/sparse-index.c @@ -0,0 +1,8 @@ +#include "cache.h" +#include "repository.h" +#include "sparse-index.h" + +void ensure_full_index(struct index_state *istate) +{ + /* intentionally left blank */ +} diff --git a/sparse-index.h b/sparse-index.h new file mode 100644 index 000000000000..09a20d036c46 --- /dev/null +++ b/sparse-index.h @@ -0,0 +1,7 @@ +#ifndef SPARSE_INDEX_H__ +#define SPARSE_INDEX_H__ + +struct index_state; +void ensure_full_index(struct index_state *istate); + +#endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 04/20] sparse-index: add guard to ensure full index 2021-02-23 20:14 ` [PATCH 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget @ 2021-02-24 2:44 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-02-24 2:44 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Upcoming changes will introduce modifications to the index format that > allow sparse directories. It will be useful to have a mechanism for > converting those sparse index files into full indexes by walking the > tree at those sparse directories. Name this method ensure_full_index() > as it will guarantee that the index is fully expanded. > > This method is not implemented yet, and instead we focus on the > scaffolding to declare it and call it at the appropriate time. > > Add a 'command_requires_full_index' member to struct repo_settings. This > will be an indicator that we need the index in full mode to do certain > index operations. This starts as being true for every command, then we > will set it to false as some commands integrate with sparse indexes. > > If 'command_requires_full_index' is true, then we will immediately > expand a sparse index to a full one upon reading from disk. This > suffices for now, but we will want to add more callers to > ensure_full_index() later. Same as 01/27 of your RFC series; looks good. > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > Makefile | 1 + > repo-settings.c | 8 ++++++++ > repository.c | 11 ++++++++++- > repository.h | 2 ++ > sparse-index.c | 8 ++++++++ > sparse-index.h | 7 +++++++ > 6 files changed, 36 insertions(+), 1 deletion(-) > create mode 100644 sparse-index.c > create mode 100644 sparse-index.h > > diff --git a/Makefile b/Makefile > index 5a239cac20e3..3bf61699238d 100644 > --- a/Makefile > +++ b/Makefile > @@ -980,6 +980,7 @@ LIB_OBJS += setup.o > LIB_OBJS += shallow.o > LIB_OBJS += sideband.o > LIB_OBJS += sigchain.o > +LIB_OBJS += sparse-index.o > LIB_OBJS += split-index.o > LIB_OBJS += stable-qsort.o > LIB_OBJS += strbuf.o > diff --git a/repo-settings.c b/repo-settings.c > index f7fff0f5ab83..d63569e4041e 100644 > --- a/repo-settings.c > +++ b/repo-settings.c > @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r) > UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP); > > UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT); > + > + /* > + * This setting guards all index reads to require a full index > + * over a sparse index. After suitable guards are placed in the > + * codebase around uses of the index, this setting will be > + * removed. > + */ > + r->settings.command_requires_full_index = 1; > } > diff --git a/repository.c b/repository.c > index c98298acd017..a8acae002f71 100644 > --- a/repository.c > +++ b/repository.c > @@ -10,6 +10,7 @@ > #include "object.h" > #include "lockfile.h" > #include "submodule-config.h" > +#include "sparse-index.h" > > /* The main repository */ > static struct repository the_repo; > @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo) > > int repo_read_index(struct repository *repo) > { > + int res; > + > if (!repo->index) > repo->index = xcalloc(1, sizeof(*repo->index)); > > @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo) > else if (repo->index->repo != repo) > BUG("repo's index should point back at itself"); > > - return read_index_from(repo->index, repo->index_file, repo->gitdir); > + res = read_index_from(repo->index, repo->index_file, repo->gitdir); > + > + prepare_repo_settings(repo); > + if (repo->settings.command_requires_full_index) > + ensure_full_index(repo->index); > + > + return res; > } > > int repo_hold_locked_index(struct repository *repo, > diff --git a/repository.h b/repository.h > index b385ca3c94b6..e06a23015697 100644 > --- a/repository.h > +++ b/repository.h > @@ -41,6 +41,8 @@ struct repo_settings { > enum fetch_negotiation_setting fetch_negotiation_algorithm; > > int core_multi_pack_index; > + > + unsigned command_requires_full_index:1; > }; > > struct repository { > diff --git a/sparse-index.c b/sparse-index.c > new file mode 100644 > index 000000000000..82183ead563b > --- /dev/null > +++ b/sparse-index.c > @@ -0,0 +1,8 @@ > +#include "cache.h" > +#include "repository.h" > +#include "sparse-index.h" > + > +void ensure_full_index(struct index_state *istate) > +{ > + /* intentionally left blank */ > +} > diff --git a/sparse-index.h b/sparse-index.h > new file mode 100644 > index 000000000000..09a20d036c46 > --- /dev/null > +++ b/sparse-index.h > @@ -0,0 +1,7 @@ > +#ifndef SPARSE_INDEX_H__ > +#define SPARSE_INDEX_H__ > + > +struct index_state; > +void ensure_full_index(struct index_state *istate); > + > +#endif > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 05/20] sparse-index: implement ensure_full_index() 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-24 3:20 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget ` (16 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache.h | 7 +++- read-cache.c | 9 +++++ sparse-index.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 109 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index d92814961405..1336c8d7435e 100644 --- a/cache.h +++ b/cache.h @@ -204,6 +204,8 @@ struct cache_entry { #error "CE_EXTENDED_FLAGS out of range" #endif +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR) + /* Forward structure decls */ struct pathspec; struct child_process; @@ -319,7 +321,8 @@ struct index_state { drop_cache_tree : 1, updated_workdir : 1, updated_skipworktree : 1, - fsmonitor_has_run_once : 1; + fsmonitor_has_run_once : 1, + sparse_index : 1; struct hashmap name_hash; struct hashmap dir_hash; struct object_id oid; @@ -722,6 +725,8 @@ int read_index_from(struct index_state *, const char *path, const char *gitdir); int is_index_unborn(struct index_state *); +void ensure_full_index(struct index_state *istate); + /* For use with `write_locked_index()`. */ #define COMMIT_LOCK (1 << 0) #define SKIP_IF_UNCHANGED (1 << 1) diff --git a/read-cache.c b/read-cache.c index 29144cf879e7..97dbf2434f30 100644 --- a/read-cache.c +++ b/read-cache.c @@ -101,6 +101,9 @@ static const char *alternate_index_output; static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { + if (S_ISSPARSEDIR(ce->ce_mode)) + istate->sparse_index = 1; + istate->cache[nr] = ce; add_name_hash(istate, ce); } @@ -2255,6 +2258,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr); + if (!istate->repo) + istate->repo = the_repository; + prepare_repo_settings(istate->repo); + if (istate->repo->settings.command_requires_full_index) + ensure_full_index(istate); + return istate->cache_nr; unmap: diff --git a/sparse-index.c b/sparse-index.c index 82183ead563b..316cb949b74b 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -1,8 +1,101 @@ #include "cache.h" #include "repository.h" #include "sparse-index.h" +#include "tree.h" +#include "pathspec.h" +#include "trace2.h" + +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) +{ + ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc); + + istate->cache[nr] = ce; + add_name_hash(istate, ce); +} + +static int add_path_to_index(const struct object_id *oid, + struct strbuf *base, const char *path, + unsigned int mode, int stage, void *context) +{ + struct index_state *istate = (struct index_state *)context; + struct cache_entry *ce; + size_t len = base->len; + + if (S_ISDIR(mode)) + return READ_TREE_RECURSIVE; + + strbuf_addstr(base, path); + + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); + ce->ce_flags |= CE_SKIP_WORKTREE; + set_index_entry(istate, istate->cache_nr++, ce); + + strbuf_setlen(base, len); + return 0; +} void ensure_full_index(struct index_state *istate) { - /* intentionally left blank */ + int i; + struct index_state *full; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + trace2_region_enter("index", "ensure_full_index", istate->repo); + + /* initialize basics of new index */ + full = xcalloc(1, sizeof(struct index_state)); + memcpy(full, istate, sizeof(struct index_state)); + + /* then change the necessary things */ + full->sparse_index = 0; + full->cache_alloc = (3 * istate->cache_alloc) / 2; + full->cache_nr = 0; + ALLOC_ARRAY(full->cache, full->cache_alloc); + + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + struct tree *tree; + struct pathspec ps; + + if (!S_ISSPARSEDIR(ce->ce_mode)) { + set_index_entry(full, full->cache_nr++, ce); + continue; + } + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) + warning(_("index entry is a directory, but not sparse (%08x)"), + ce->ce_flags); + + /* recursively walk into cd->name */ + tree = lookup_tree(istate->repo, &ce->oid); + + memset(&ps, 0, sizeof(ps)); + ps.recursive = 1; + ps.has_wildcard = 1; + ps.max_depth = -1; + + read_tree_recursive(istate->repo, tree, + ce->name, strlen(ce->name), + 0, &ps, + add_path_to_index, full); + + /* free directory entries. full entries are re-used */ + discard_cache_entry(ce); + } + + /* Copy back into original index. */ + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); + istate->sparse_index = 0; + free(istate->cache); + istate->cache = full->cache; + istate->cache_nr = full->cache_nr; + istate->cache_alloc = full->cache_alloc; + + free(full); + + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 05/20] sparse-index: implement ensure_full_index() 2021-02-23 20:14 ` [PATCH 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-02-24 3:20 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-02-24 3:20 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > We will mark an in-memory index_state as having sparse directory entries > with the sparse_index bit. These currently cannot exist, but we will add > a mechanism for collapsing a full index to a sparse one in a later > change. That will happen at write time, so we must first allow parsing > the format before writing it. > > Commands or methods that require a full index in order to operate can > call ensure_full_index() to expand that index in-memory. This requires > parsing trees using that index's repository. > > Sparse directory entries have a specific 'ce_mode' value. The macro > S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. > This ce_mode is not possible with the existing index formats, so we don't > also verify all properties of a sparse-directory entry, which are: > > 1. ce->ce_mode == 0040000 > 2. ce->flags & CE_SKIP_WORKTREE is true > 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) > 4. ce->oid references a tree object. > > These are all semi-enforced in ensure_full_index() to some extent. Any > deviation will cause a warning at minimum or a failure in the worst > case. Thanks for spelling these all out; looks good. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > cache.h | 7 +++- > read-cache.c | 9 +++++ > sparse-index.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 109 insertions(+), 2 deletions(-) > > diff --git a/cache.h b/cache.h > index d92814961405..1336c8d7435e 100644 > --- a/cache.h > +++ b/cache.h > @@ -204,6 +204,8 @@ struct cache_entry { > #error "CE_EXTENDED_FLAGS out of range" > #endif > > +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR) Much nicer, thanks. :-) > + > /* Forward structure decls */ > struct pathspec; > struct child_process; > @@ -319,7 +321,8 @@ struct index_state { > drop_cache_tree : 1, > updated_workdir : 1, > updated_skipworktree : 1, > - fsmonitor_has_run_once : 1; > + fsmonitor_has_run_once : 1, > + sparse_index : 1; > struct hashmap name_hash; > struct hashmap dir_hash; > struct object_id oid; > @@ -722,6 +725,8 @@ int read_index_from(struct index_state *, const char *path, > const char *gitdir); > int is_index_unborn(struct index_state *); > > +void ensure_full_index(struct index_state *istate); > + > /* For use with `write_locked_index()`. */ > #define COMMIT_LOCK (1 << 0) > #define SKIP_IF_UNCHANGED (1 << 1) > diff --git a/read-cache.c b/read-cache.c > index 29144cf879e7..97dbf2434f30 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -101,6 +101,9 @@ static const char *alternate_index_output; > > static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) > { > + if (S_ISSPARSEDIR(ce->ce_mode)) > + istate->sparse_index = 1; A very minor question -- someone who sees "sparse_index" could probably easily think either "index with missing entries, due to having a SKIP_WORKTREE directory instead" or perhaps "index when using the sparse-checkout feature, i.e. it has some SKIP_WORKTREE entries in it". From the code here, clearly the former is your intent. I wonder if it'd help to have a small comment near the declaration of sparse_index to mention its intent. > + > istate->cache[nr] = ce; > add_name_hash(istate, ce); > } > @@ -2255,6 +2258,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) > trace2_data_intmax("index", the_repository, "read/cache_nr", > istate->cache_nr); > > + if (!istate->repo) > + istate->repo = the_repository; > + prepare_repo_settings(istate->repo); > + if (istate->repo->settings.command_requires_full_index) > + ensure_full_index(istate); > + > return istate->cache_nr; > > unmap: > diff --git a/sparse-index.c b/sparse-index.c > index 82183ead563b..316cb949b74b 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -1,8 +1,101 @@ > #include "cache.h" > #include "repository.h" > #include "sparse-index.h" > +#include "tree.h" > +#include "pathspec.h" > +#include "trace2.h" > + > +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) > +{ > + ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc); > + > + istate->cache[nr] = ce; > + add_name_hash(istate, ce); > +} > + > +static int add_path_to_index(const struct object_id *oid, > + struct strbuf *base, const char *path, > + unsigned int mode, int stage, void *context) > +{ > + struct index_state *istate = (struct index_state *)context; > + struct cache_entry *ce; > + size_t len = base->len; > + > + if (S_ISDIR(mode)) > + return READ_TREE_RECURSIVE; > + > + strbuf_addstr(base, path); > + > + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); > + ce->ce_flags |= CE_SKIP_WORKTREE; > + set_index_entry(istate, istate->cache_nr++, ce); > + > + strbuf_setlen(base, len); > + return 0; > +} > > void ensure_full_index(struct index_state *istate) > { > - /* intentionally left blank */ > + int i; > + struct index_state *full; > + > + if (!istate || !istate->sparse_index) > + return; > + > + if (!istate->repo) > + istate->repo = the_repository; > + > + trace2_region_enter("index", "ensure_full_index", istate->repo); > + > + /* initialize basics of new index */ > + full = xcalloc(1, sizeof(struct index_state)); > + memcpy(full, istate, sizeof(struct index_state)); > + > + /* then change the necessary things */ > + full->sparse_index = 0; > + full->cache_alloc = (3 * istate->cache_alloc) / 2; > + full->cache_nr = 0; > + ALLOC_ARRAY(full->cache, full->cache_alloc); > + > + for (i = 0; i < istate->cache_nr; i++) { > + struct cache_entry *ce = istate->cache[i]; > + struct tree *tree; > + struct pathspec ps; > + > + if (!S_ISSPARSEDIR(ce->ce_mode)) { > + set_index_entry(full, full->cache_nr++, ce); > + continue; > + } > + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) > + warning(_("index entry is a directory, but not sparse (%08x)"), > + ce->ce_flags); > + > + /* recursively walk into cd->name */ > + tree = lookup_tree(istate->repo, &ce->oid); > + > + memset(&ps, 0, sizeof(ps)); > + ps.recursive = 1; > + ps.has_wildcard = 1; > + ps.max_depth = -1; > + > + read_tree_recursive(istate->repo, tree, > + ce->name, strlen(ce->name), > + 0, &ps, > + add_path_to_index, full); > + > + /* free directory entries. full entries are re-used */ > + discard_cache_entry(ce); > + } > + > + /* Copy back into original index. */ > + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); > + istate->sparse_index = 0; > + free(istate->cache); Thanks for fixing that leak that from the RFC series. > + istate->cache = full->cache; > + istate->cache_nr = full->cache_nr; > + istate->cache_alloc = full->cache_alloc; > + > + free(full); > + > + trace2_region_leave("index", "ensure_full_index", istate->repo); > } > -- > gitgitgadget Looks good to me. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 06/20] t1092: compare sparse-checkout to sparse-index 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-25 6:37 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (15 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a new 'sparse-index' repo alongside the 'full-checkout' and 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also add run_on_sparse and test_sparse_match helpers. These helpers will be used when the sparse index is implemented. Add GIT_TEST_SPARSE_INDEX environment variable to enable the sparse-index by default. This will be intended to use across the entire test suite, except that it will only affect cases where the sparse-checkout feature is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/README | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/t/README b/t/README index 593d4a4e270c..b98bc563aab5 100644 --- a/t/README +++ b/t/README @@ -439,6 +439,9 @@ and "sha256". GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the 'pack.writeReverseIndex' setting. +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the +sparse-index format by default. + Naming Tests ------------ diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 3725d3997e70..71d6f9e4c014 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' test_expect_success 'setup' ' git init initial-repo && ( + GIT_TEST_SPARSE_INDEX=0 && cd initial-repo && echo a >a && echo "after deep" >e && @@ -87,23 +88,32 @@ init_repos () { cp -r initial-repo sparse-checkout && git -C sparse-checkout reset --hard && - git -C sparse-checkout sparse-checkout init --cone && + + cp -r initial-repo sparse-index && + git -C sparse-index reset --hard && # initialize sparse-checkout definitions - git -C sparse-checkout sparse-checkout set deep + git -C sparse-checkout sparse-checkout init --cone && + git -C sparse-checkout sparse-checkout set deep && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - "$@" >../sparse-checkout-out 2>../sparse-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + ) && + ( + cd sparse-index && + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - "$@" >../full-checkout-out 2>../full-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -114,6 +124,12 @@ test_all_match () { test_cmp full-checkout-err sparse-checkout-err } +test_sparse_match () { + run_on_sparse $* && + test_cmp sparse-checkout-out sparse-index-out && + test_cmp sparse-checkout-err sparse-index-err +} + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 06/20] t1092: compare sparse-checkout to sparse-index 2021-02-23 20:14 ` [PATCH 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-02-25 6:37 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-02-25 6:37 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Add a new 'sparse-index' repo alongside the 'full-checkout' and > 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also > add run_on_sparse and test_sparse_match helpers. These helpers will be > used when the sparse index is implemented. > > Add GIT_TEST_SPARSE_INDEX environment variable to enable the > sparse-index by default. This will be intended to use across the entire > test suite, except that it will only affect cases where the > sparse-checkout feature is enabled. This last sentence was a bit awkward to read. "will be intended to use" -> "is intended to be used"? > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/README | 3 +++ > t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- > 2 files changed, 23 insertions(+), 4 deletions(-) > > diff --git a/t/README b/t/README > index 593d4a4e270c..b98bc563aab5 100644 > --- a/t/README > +++ b/t/README > @@ -439,6 +439,9 @@ and "sha256". > GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the > 'pack.writeReverseIndex' setting. > > +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the > +sparse-index format by default. > + > Naming Tests > ------------ > > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 3725d3997e70..71d6f9e4c014 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' > test_expect_success 'setup' ' > git init initial-repo && > ( > + GIT_TEST_SPARSE_INDEX=0 && > cd initial-repo && > echo a >a && > echo "after deep" >e && > @@ -87,23 +88,32 @@ init_repos () { > > cp -r initial-repo sparse-checkout && > git -C sparse-checkout reset --hard && > - git -C sparse-checkout sparse-checkout init --cone && > + > + cp -r initial-repo sparse-index && > + git -C sparse-index reset --hard && > > # initialize sparse-checkout definitions > - git -C sparse-checkout sparse-checkout set deep > + git -C sparse-checkout sparse-checkout init --cone && > + git -C sparse-checkout sparse-checkout set deep && > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep > } > > run_on_sparse () { > ( > cd sparse-checkout && > - "$@" >../sparse-checkout-out 2>../sparse-checkout-err > + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err > + ) && > + ( > + cd sparse-index && > + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err > ) > } > > run_on_all () { > ( > cd full-checkout && > - "$@" >../full-checkout-out 2>../full-checkout-err > + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err > ) && > run_on_sparse "$@" > } > @@ -114,6 +124,12 @@ test_all_match () { > test_cmp full-checkout-err sparse-checkout-err > } > > +test_sparse_match () { > + run_on_sparse $* && Should this be run_on_sparse "$@" in order to allow arguments with spaces? > + test_cmp sparse-checkout-out sparse-index-out && > + test_cmp sparse-checkout-err sparse-index-err > +} > + > test_expect_success 'status with options' ' > init_repos && > test_all_match git status --porcelain=v2 && > -- > gitgitgadget Other than those minor comments, looks good to me. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 07/20] test-read-cache: print cache entries with --table 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-25 7:02 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget ` (14 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This table is helpful for discovering data in the index to ensure it is being written correctly, especially as we build and test the sparse-index. This table includes an output format similar to 'git ls-tree', but should not be compared to that directly. The biggest reasons are that 'git ls-tree' includes a tree entry for every subdirectory, even those that would not appear as a sparse directory in a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. This does not print the stat() information for the blobs. That could be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. Care must be taken with the final check for the 'cnt' variable. We continue the expectation that the numerical value is the final argument. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 50 ++++++++++++++++++++++++++++++-------- 1 file changed, 40 insertions(+), 10 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 244977a29bdf..e4c3492f7d3e 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -2,35 +2,65 @@ #include "cache.h" #include "config.h" +static void print_cache_entry(struct cache_entry *ce) +{ + printf("%06o ", ce->ce_mode & 0777777); + + if (S_ISSPARSEDIR(ce->ce_mode)) + printf("tree "); + else if (S_ISGITLINK(ce->ce_mode)) + printf("commit "); + else + printf("blob "); + + printf("%s\t%s\n", + oid_to_hex(&ce->oid), + ce->name); +} + +static void print_cache(struct index_state *cache) +{ + int i; + for (i = 0; i < the_index.cache_nr; i++) + print_cache_entry(the_index.cache[i]); +} + int cmd__read_cache(int argc, const char **argv) { + struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; + int table = 0; - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { - argc--; - argv++; + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { + if (skip_prefix(*argv, "--print-and-refresh=", &name)) + continue; + if (!strcmp(*argv, "--table")) + table = 1; } - if (argc == 2) - cnt = strtol(argv[1], NULL, 0); + if (argc == 1) + cnt = strtol(argv[0], NULL, 0); setup_git_directory(); git_config(git_default_config, NULL); + for (i = 0; i < cnt; i++) { - read_cache(); + repo_read_index(r); if (name) { int pos; - refresh_index(&the_index, REFRESH_QUIET, + refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL); - pos = index_name_pos(&the_index, name, strlen(name)); + pos = index_name_pos(r->index, name, strlen(name)); if (pos < 0) die("%s not in index", name); printf("%s is%s up to date\n", name, - ce_uptodate(the_index.cache[pos]) ? "" : " not"); + ce_uptodate(r->index->cache[pos]) ? "" : " not"); write_file(name, "%d\n", i); } - discard_cache(); + if (table) + print_cache(r->index); + discard_index(r->index); } return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 07/20] test-read-cache: print cache entries with --table 2021-02-23 20:14 ` [PATCH 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-02-25 7:02 ` Elijah Newren 2021-03-09 21:00 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-25 7:02 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > This table is helpful for discovering data in the index to ensure it is > being written correctly, especially as we build and test the > sparse-index. This table includes an output format similar to 'git > ls-tree', but should not be compared to that directly. The biggest > reasons are that 'git ls-tree' includes a tree entry for every > subdirectory, even those that would not appear as a sparse directory in > a sparse-index. Further, 'git ls-tree' does not use a trailing directory > separator for its tree rows. > > This does not print the stat() information for the blobs. That could be > added in a future change with another option. The tests that are added > in the next few changes care only about the object types and IDs. > > To make the option parsing slightly more robust, wrap the string > comparisons in a loop adapted from test-dir-iterator.c. > > Care must be taken with the final check for the 'cnt' variable. We > continue the expectation that the numerical value is the final argument. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/helper/test-read-cache.c | 50 ++++++++++++++++++++++++++++++-------- > 1 file changed, 40 insertions(+), 10 deletions(-) > > diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c > index 244977a29bdf..e4c3492f7d3e 100644 > --- a/t/helper/test-read-cache.c > +++ b/t/helper/test-read-cache.c > @@ -2,35 +2,65 @@ > #include "cache.h" > #include "config.h" > > +static void print_cache_entry(struct cache_entry *ce) > +{ > + printf("%06o ", ce->ce_mode & 0777777); This constant is curious. I think it's because you want to strip off the special in-memory bits of the ce_mode where git stores extra data, which would be everything beyond the first 16 bits (as noted in a comment near the beginning of cache.h). But here you keep the first 18 bits. Is CE_UPDATE and CE_REMOVE just 0 in the cases you've viewed so this works (but you really should use 0177777 or 0xFFFF), or am I off in my guess of what you're trying to do and you do want to see those two flags? It also seems surprising to me that this constant isn't already defined somewhere in cache.h or as some variant of S_IFMT, though I'm not finding it. > + > + if (S_ISSPARSEDIR(ce->ce_mode)) > + printf("tree "); > + else if (S_ISGITLINK(ce->ce_mode)) > + printf("commit "); > + else > + printf("blob "); Perhaps make use of the tree_type, commit_type, and blob_type global constants? > + > + printf("%s\t%s\n", > + oid_to_hex(&ce->oid), > + ce->name); > +} > + > +static void print_cache(struct index_state *cache) > +{ > + int i; > + for (i = 0; i < the_index.cache_nr; i++) > + print_cache_entry(the_index.cache[i]); Why are you passing cache as a parameter, then ignoring it and using the_index? > +} > + > int cmd__read_cache(int argc, const char **argv) > { > + struct repository *r = the_repository; > int i, cnt = 1; > const char *name = NULL; > + int table = 0; > > - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { > - argc--; > - argv++; > + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { > + if (skip_prefix(*argv, "--print-and-refresh=", &name)) > + continue; > + if (!strcmp(*argv, "--table")) > + table = 1; > } > > - if (argc == 2) > - cnt = strtol(argv[1], NULL, 0); > + if (argc == 1) > + cnt = strtol(argv[0], NULL, 0); > setup_git_directory(); > git_config(git_default_config, NULL); > + > for (i = 0; i < cnt; i++) { > - read_cache(); > + repo_read_index(r); > if (name) { > int pos; > > - refresh_index(&the_index, REFRESH_QUIET, > + refresh_index(r->index, REFRESH_QUIET, > NULL, NULL, NULL); > - pos = index_name_pos(&the_index, name, strlen(name)); > + pos = index_name_pos(r->index, name, strlen(name)); > if (pos < 0) > die("%s not in index", name); > printf("%s is%s up to date\n", name, > - ce_uptodate(the_index.cache[pos]) ? "" : " not"); > + ce_uptodate(r->index->cache[pos]) ? "" : " not"); > write_file(name, "%d\n", i); > } > - discard_cache(); > + if (table) > + print_cache(r->index); > + discard_index(r->index); > } > return 0; > } > -- > gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 07/20] test-read-cache: print cache entries with --table 2021-02-25 7:02 ` Elijah Newren @ 2021-03-09 21:00 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 21:00 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 2/25/2021 2:02 AM, Elijah Newren wrote: > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> This table is helpful for discovering data in the index to ensure it is >> being written correctly, especially as we build and test the >> sparse-index. This table includes an output format similar to 'git >> ls-tree', but should not be compared to that directly. The biggest >> reasons are that 'git ls-tree' includes a tree entry for every >> subdirectory, even those that would not appear as a sparse directory in >> a sparse-index. Further, 'git ls-tree' does not use a trailing directory >> separator for its tree rows. >> >> This does not print the stat() information for the blobs. That could be >> added in a future change with another option. The tests that are added >> in the next few changes care only about the object types and IDs. >> >> To make the option parsing slightly more robust, wrap the string >> comparisons in a loop adapted from test-dir-iterator.c. >> >> Care must be taken with the final check for the 'cnt' variable. We >> continue the expectation that the numerical value is the final argument. >> >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> >> --- >> t/helper/test-read-cache.c | 50 ++++++++++++++++++++++++++++++-------- >> 1 file changed, 40 insertions(+), 10 deletions(-) >> >> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c >> index 244977a29bdf..e4c3492f7d3e 100644 >> --- a/t/helper/test-read-cache.c >> +++ b/t/helper/test-read-cache.c >> @@ -2,35 +2,65 @@ >> #include "cache.h" >> #include "config.h" >> >> +static void print_cache_entry(struct cache_entry *ce) >> +{ >> + printf("%06o ", ce->ce_mode & 0777777); > > This constant is curious. I think it's because you want to strip off > the special in-memory bits of the ce_mode where git stores extra data, > which would be everything beyond the first 16 bits (as noted in a > comment near the beginning of cache.h). But here you keep the first > 18 bits. Is CE_UPDATE and CE_REMOVE just 0 in the cases you've viewed > so this works (but you really should use 0177777 or 0xFFFF), or am I > off in my guess of what you're trying to do and you do want to see > those two flags? You're right, 0177777 is what I want. I'm focusing only on the file permissions bits that are reported by ls-tree. > It also seems surprising to me that this constant isn't already > defined somewhere in cache.h or as some variant of S_IFMT, though I'm > not finding it. I'm not, either. >> + >> + if (S_ISSPARSEDIR(ce->ce_mode)) >> + printf("tree "); >> + else if (S_ISGITLINK(ce->ce_mode)) >> + printf("commit "); >> + else >> + printf("blob "); > > Perhaps make use of the tree_type, commit_type, and blob_type global constants? Today I Learned... >> + >> + printf("%s\t%s\n", >> + oid_to_hex(&ce->oid), >> + ce->name); >> +} >> + >> +static void print_cache(struct index_state *cache) >> +{ >> + int i; >> + for (i = 0; i < the_index.cache_nr; i++) >> + print_cache_entry(the_index.cache[i]); > > Why are you passing cache as a parameter, then ignoring it and using the_index? Good catch! Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 08/20] test-tool: don't force full index 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget ` (13 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will use 'test-tool read-cache --table' to check that a sparse index is written as part of init_repos. Since we will no longer always expand a sparse index into a full index, add an '--expand' parameter that adds a call to ensure_full_index() so we can compare a sparse index directly against a full index, or at least what the in-memory index looks like when expanded in this way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 13 ++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index e4c3492f7d3e..4780429dca6b 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -1,6 +1,7 @@ #include "test-tool.h" #include "cache.h" #include "config.h" +#include "sparse-index.h" static void print_cache_entry(struct cache_entry *ce) { @@ -30,13 +31,19 @@ int cmd__read_cache(int argc, const char **argv) struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; - int table = 0; + int table = 0, expand = 0; + + initialize_the_repository(); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) continue; if (!strcmp(*argv, "--table")) table = 1; + else if (!strcmp(*argv, "--expand")) + expand = 1; } if (argc == 1) @@ -46,6 +53,10 @@ int cmd__read_cache(int argc, const char **argv) for (i = 0; i < cnt; i++) { repo_read_index(r); + + if (expand) + ensure_full_index(r->index); + if (name) { int pos; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 71d6f9e4c014..4d789fe86b9d 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -130,6 +130,11 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'expanded in-memory index matches full index' ' + init_repos && + test_sparse_match test-tool read-cache --expand --table +' + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 09/20] unpack-trees: ensure full index 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget ` (12 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index f5f668f532d8..4dd99219073a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1567,6 +1567,7 @@ static int verify_absent(const struct cache_entry *, */ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { + struct repository *repo = the_repository; int i, ret; static struct cache_entry *dfc; struct pattern_list pl; @@ -1578,6 +1579,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options trace_performance_enter(); trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) { + ensure_full_index(o->src_index); + ensure_full_index(o->dst_index); + } + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 10/20] sparse-checkout: hold pattern list in index 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-25 7:14 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget ` (11 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outsie the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 17 ++++++++++------- cache.h | 2 ++ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 2306a9ad98e0..e00b82af727b 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl) if (is_index_unborn(r->index)) return UPDATE_SPARSITY_SUCCESS; + r->index->sparse_checkout_patterns = pl; + memset(&o, 0, sizeof(o)); o.verbose_update = isatty(2); o.update = 1; @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl) else rollback_lock_file(&lock_file); + r->index->sparse_checkout_patterns = NULL; return result; } @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) { int result; int changed_config = 0; - struct pattern_list pl; - memset(&pl, 0, sizeof(pl)); + struct pattern_list *pl = xcalloc(1, sizeof(*pl)); switch (m) { case ADD: if (core_sparse_checkout_cone) - add_patterns_cone_mode(argc, argv, &pl); + add_patterns_cone_mode(argc, argv, pl); else - add_patterns_literal(argc, argv, &pl); + add_patterns_literal(argc, argv, pl); break; case REPLACE: - add_patterns_from_input(&pl, argc, argv); + add_patterns_from_input(pl, argc, argv); break; } @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) changed_config = 1; } - result = write_patterns_and_update(&pl); + result = write_patterns_and_update(pl); if (result && changed_config) set_config(MODE_NO_PATTERNS); - clear_pattern_list(&pl); + clear_pattern_list(pl); + free(pl); return result; } diff --git a/cache.h b/cache.h index 1336c8d7435e..d75b352f38d3 100644 --- a/cache.h +++ b/cache.h @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode) struct split_index; struct untracked_cache; struct progress; +struct pattern_list; struct index_state { struct cache_entry **cache; @@ -332,6 +333,7 @@ struct index_state { struct mem_pool *ce_mem_pool; struct progress *progress; struct repository *repo; + struct pattern_list *sparse_checkout_patterns; }; /* Name hashing */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 10/20] sparse-checkout: hold pattern list in index 2021-02-23 20:14 ` [PATCH 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget @ 2021-02-25 7:14 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-02-25 7:14 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > As we modify the sparse-checkout definition, we perform index operations > on a pattern_list that only exists in-memory. This allows easy backing > out in case the index update fails. > > However, if the index write itself cares about the sparse-checkout > pattern set, we need access to that in-memory copy. Place a pointer to > a 'struct pattern_list' in the index so we can access this on-demand. > This will be used in the next change which uses the sparse-checkout > definition to filter out directories that are outsie the sparse cone. Looks like you still have the "outsie" typo. ;-) > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > builtin/sparse-checkout.c | 17 ++++++++++------- > cache.h | 2 ++ > 2 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c > index 2306a9ad98e0..e00b82af727b 100644 > --- a/builtin/sparse-checkout.c > +++ b/builtin/sparse-checkout.c > @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl) > if (is_index_unborn(r->index)) > return UPDATE_SPARSITY_SUCCESS; > > + r->index->sparse_checkout_patterns = pl; > + > memset(&o, 0, sizeof(o)); > o.verbose_update = isatty(2); > o.update = 1; > @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl) > else > rollback_lock_file(&lock_file); > > + r->index->sparse_checkout_patterns = NULL; > return result; > } > > @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) > { > int result; > int changed_config = 0; > - struct pattern_list pl; > - memset(&pl, 0, sizeof(pl)); > + struct pattern_list *pl = xcalloc(1, sizeof(*pl)); > > switch (m) { > case ADD: > if (core_sparse_checkout_cone) > - add_patterns_cone_mode(argc, argv, &pl); > + add_patterns_cone_mode(argc, argv, pl); > else > - add_patterns_literal(argc, argv, &pl); > + add_patterns_literal(argc, argv, pl); > break; > > case REPLACE: > - add_patterns_from_input(&pl, argc, argv); > + add_patterns_from_input(pl, argc, argv); > break; > } > > @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) > changed_config = 1; > } > > - result = write_patterns_and_update(&pl); > + result = write_patterns_and_update(pl); > > if (result && changed_config) > set_config(MODE_NO_PATTERNS); > > - clear_pattern_list(&pl); > + clear_pattern_list(pl); > + free(pl); > return result; > } > > diff --git a/cache.h b/cache.h > index 1336c8d7435e..d75b352f38d3 100644 > --- a/cache.h > +++ b/cache.h > @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode) > struct split_index; > struct untracked_cache; > struct progress; > +struct pattern_list; > > struct index_state { > struct cache_entry **cache; > @@ -332,6 +333,7 @@ struct index_state { > struct mem_pool *ce_mem_pool; > struct progress *progress; > struct repository *repo; > + struct pattern_list *sparse_checkout_patterns; > }; > > /* Name hashing */ > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 11/20] sparse-index: convert from full to sparse 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-25 7:33 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 3 + cache.h | 2 + read-cache.c | 26 ++++- sparse-index.c | 139 +++++++++++++++++++++++ sparse-index.h | 1 + t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- 6 files changed, 227 insertions(+), 5 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 2fb483d3c083..5f07a39e501e 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -6,6 +6,7 @@ #include "object-store.h" #include "replace-object.h" #include "promisor-remote.h" +#include "sparse-index.h" #ifndef DEBUG_CACHE_TREE #define DEBUG_CACHE_TREE 0 @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + ensure_full_index(istate); + if (!istate->cache_tree) istate->cache_tree = cache_tree(); diff --git a/cache.h b/cache.h index d75b352f38d3..e8b7d3b4fb33 100644 --- a/cache.h +++ b/cache.h @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) { if (S_ISLNK(mode)) return S_IFLNK; + if (mode == S_IFDIR) + return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; return S_IFREG | ce_permissions(mode); diff --git a/read-cache.c b/read-cache.c index 97dbf2434f30..67acbf202f4e 100644 --- a/read-cache.c +++ b/read-cache.c @@ -25,6 +25,7 @@ #include "fsmonitor.h" #include "thread-utils.h" #include "progress.h" +#include "sparse-index.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode) c = *path++; if ((c == '.' && !verify_dotfile(path, mode)) || - is_dir_sep(c) || c == '\0') + is_dir_sep(c)) return 0; + /* + * allow terminating directory separators for + * sparse directory enries. + */ + if (c == '\0') + return S_ISDIR(mode); } else if (c == '\\' && protect_ntfs) { if (is_ntfs_dotgit(path)) return 0; @@ -3061,6 +3068,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l unsigned flags) { int ret; + int was_full = !istate->sparse_index; + + ret = convert_to_sparse(istate); + + if (ret) { + warning(_("failed to convert to a sparse-index")); + return ret; + } /* * TODO trace2: replace "the_repository" with the actual repo instance @@ -3072,6 +3087,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l trace2_region_leave_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; if (flags & COMMIT_LOCK) @@ -3162,9 +3180,10 @@ static int write_shared_index(struct index_state *istate, struct tempfile **temp) { struct split_index *si = istate->split_index; - int ret; + int ret, was_full = !istate->sparse_index; move_cache_to_base_index(istate); + convert_to_sparse(istate); trace2_region_enter_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); @@ -3172,6 +3191,9 @@ static int write_shared_index(struct index_state *istate, trace2_region_leave_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; ret = adjust_shared_perm(get_tempfile_path(*temp)); diff --git a/sparse-index.c b/sparse-index.c index 316cb949b74b..cb1f85635fbc 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -4,6 +4,145 @@ #include "tree.h" #include "pathspec.h" #include "trace2.h" +#include "cache-tree.h" +#include "config.h" +#include "dir.h" +#include "fsmonitor.h" + +static struct cache_entry *construct_sparse_dir_entry( + struct index_state *istate, + const char *sparse_dir, + struct cache_tree *tree) +{ + struct cache_entry *de; + + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); + + de->ce_flags |= CE_SKIP_WORKTREE; + return de; +} + +/* + * Returns the number of entries "inserted" into the index. + */ +static int convert_to_sparse_rec(struct index_state *istate, + int num_converted, + int start, int end, + const char *ct_path, size_t ct_pathlen, + struct cache_tree *ct) +{ + int i, can_convert = 1; + int start_converted = num_converted; + enum pattern_match_result match; + int dtype; + struct strbuf child_path = STRBUF_INIT; + struct pattern_list *pl = istate->sparse_checkout_patterns; + + /* + * Is the current path outside of the sparse cone? + * Then check if the region can be replaced by a sparse + * directory entry (everything is sparse and merged). + */ + match = path_matches_pattern_list(ct_path, ct_pathlen, + NULL, &dtype, pl, istate); + if (match != NOT_MATCHED) + can_convert = 0; + + for (i = start; can_convert && i < end; i++) { + struct cache_entry *ce = istate->cache[i]; + + if (ce_stage(ce) || + !(ce->ce_flags & CE_SKIP_WORKTREE)) + can_convert = 0; + } + + if (can_convert) { + struct cache_entry *se; + se = construct_sparse_dir_entry(istate, ct_path, ct); + + istate->cache[num_converted++] = se; + return 1; + } + + for (i = start; i < end; ) { + int count, span, pos = -1; + const char *base, *slash; + struct cache_entry *ce = istate->cache[i]; + + /* + * Detect if this is a normal entry oustide of any subtree + * entry. + */ + base = ce->name + ct_pathlen; + slash = strchr(base, '/'); + + if (slash) + pos = cache_tree_subtree_pos(ct, base, slash - base); + + if (pos < 0) { + istate->cache[num_converted++] = ce; + i++; + continue; + } + + strbuf_setlen(&child_path, 0); + strbuf_add(&child_path, ce->name, slash - ce->name + 1); + + span = ct->down[pos]->cache_tree->entry_count; + count = convert_to_sparse_rec(istate, + num_converted, i, i + span, + child_path.buf, child_path.len, + ct->down[pos]->cache_tree); + num_converted += count; + i += span; + } + + strbuf_release(&child_path); + return num_converted - start_converted; +} + +int convert_to_sparse(struct index_state *istate) +{ + if (istate->split_index || istate->sparse_index || + !core_apply_sparse_checkout || !core_sparse_checkout_cone) + return 0; + + /* + * For now, only create a sparse index with the + * GIT_TEST_SPARSE_INDEX environment variable. We will relax + * this once we have a proper way to opt-in (and later still, + * opt-out). + */ + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + return 0; + + if (!istate->sparse_checkout_patterns) { + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) + return 0; + } + + if (!istate->sparse_checkout_patterns->use_cone_patterns) { + warning(_("attempting to use sparse-index without cone mode")); + return -1; + } + + if (cache_tree_update(istate, 0)) { + warning(_("unable to update cache-tree, staying full")); + return -1; + } + + remove_fsmonitor(istate); + + trace2_region_enter("index", "convert_to_sparse", istate->repo); + istate->cache_nr = convert_to_sparse_rec(istate, + 0, 0, istate->cache_nr, + "", 0, istate->cache_tree); + istate->drop_cache_tree = 1; + istate->sparse_index = 1; + trace2_region_leave("index", "convert_to_sparse", istate->repo); + return 0; +} static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { diff --git a/sparse-index.h b/sparse-index.h index 09a20d036c46..64380e121d80 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -3,5 +3,6 @@ struct index_state; void ensure_full_index(struct index_state *istate); +int convert_to_sparse(struct index_state *istate); #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 4d789fe86b9d..ca87033d30b0 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,6 +2,9 @@ test_description='compare full workdir to sparse workdir' +GIT_TEST_CHECK_CACHE_TREE=0 +GIT_TEST_SPLIT_INDEX=0 + . ./test-lib.sh test_expect_success 'setup' ' @@ -121,15 +124,49 @@ run_on_all () { test_all_match () { run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && - test_cmp full-checkout-err sparse-checkout-err + test_cmp full-checkout-out sparse-index-out && + test_cmp full-checkout-err sparse-checkout-err && + test_cmp full-checkout-err sparse-index-err } test_sparse_match () { - run_on_sparse $* && + run_on_sparse "$@" && test_cmp sparse-checkout-out sparse-index-out && test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'sparse-index contents' ' + init_repos && + + test-tool -C sparse-index read-cache --table >cache && + for dir in folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep/deeper2 folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done +' + test_expect_success 'expanded in-memory index matches full index' ' init_repos && test_sparse_match test-tool read-cache --expand --table @@ -137,6 +174,7 @@ test_expect_success 'expanded in-memory index matches full index' ' test_expect_success 'status with options' ' init_repos && + test_sparse_match ls && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -273,6 +311,17 @@ test_expect_failure 'checkout and reset (mixed)' ' test_all_match git reset update-folder2 ' +# Ensure that sparse-index behaves identically to +# sparse-checkout with a full index. +test_expect_success 'checkout and reset (mixed) [sparse]' ' + init_repos && + + test_sparse_match git checkout -b reset-test update-deep && + test_sparse_match git reset deepest && + test_sparse_match git reset update-folder1 && + test_sparse_match git reset update-folder2 +' + test_expect_success 'merge' ' init_repos && @@ -309,14 +358,20 @@ test_expect_success 'clean' ' test_all_match git status --porcelain=v2 && test_all_match git clean -f && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xdf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && - test_path_is_dir sparse-checkout/folder1 + test_sparse_match test_path_is_dir folder1 ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 11/20] sparse-index: convert from full to sparse 2021-02-23 20:14 ` [PATCH 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-02-25 7:33 ` Elijah Newren 2021-03-09 21:13 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-25 7:33 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > If we have a full index, then we can convert it to a sparse index by > replacing directories outside of the sparse cone with sparse directory > entries. The convert_to_sparse() method does this, when the situation is > appropriate. > > For now, we avoid converting the index to a sparse index if: > > 1. the index is split. > 2. the index is already sparse. > 3. sparse-checkout is disabled. > 4. sparse-checkout does not use cone mode. > > Finally, we currently limit the conversion to when the > GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git > config will be added in a later change. > > The trickiest thing about this conversion is that we might not be able > to mark a directory as a sparse directory just because it is outside the > sparse cone. There might be unmerged files within that directory, so we > need to look for those. Also, if there is some strange reason why a file > is not marked with CE_SKIP_WORKTREE, then we should give up on > converting that directory. There is still hope that some of its > subdirectories might be able to convert to sparse, so we keep looking > deeper. > > The conversion process is assisted by the cache-tree extension. This is > calculated from the full index if it does not already exist. We then > abandon the cache-tree as it no longer applies to the newly-sparse > index. Thus, this cache-tree will be recalculated in every > sparse-full-sparse round-trip until we integrate the cache-tree > extension with the sparse index. > > Some Git commands use the index after writing it. For example, 'git add' > will update the index, then write it to disk, then read its entries to > report information. To keep the in-memory index in a full state after > writing, we re-expand it to a full one after the write. This is wasteful > for commands that only write the index and do not read from it again, > but that is only the case until we make those commands "sparse aware." > > We can compare the behavior of the sparse-index in > t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 > when operating on the 'sparse-index' repo. We can also compare the two > sparse repos directly, such as comparing their indexes (when expanded to > full in the case of the 'sparse-index' repo). We also verify that the > index is actually populated with sparse directory entries. > > The 'checkout and reset (mixed)' test is marked for failure when > comparing a sparse repo to a full repo, but we can compare the two > sparse-checkout cases directly to ensure that we are not changing the > behavior when using a sparse index. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > cache-tree.c | 3 + > cache.h | 2 + > read-cache.c | 26 ++++- > sparse-index.c | 139 +++++++++++++++++++++++ > sparse-index.h | 1 + > t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- > 6 files changed, 227 insertions(+), 5 deletions(-) > > diff --git a/cache-tree.c b/cache-tree.c > index 2fb483d3c083..5f07a39e501e 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -6,6 +6,7 @@ > #include "object-store.h" > #include "replace-object.h" > #include "promisor-remote.h" > +#include "sparse-index.h" > > #ifndef DEBUG_CACHE_TREE > #define DEBUG_CACHE_TREE 0 > @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) > if (i) > return i; > > + ensure_full_index(istate); > + > if (!istate->cache_tree) > istate->cache_tree = cache_tree(); > > diff --git a/cache.h b/cache.h > index d75b352f38d3..e8b7d3b4fb33 100644 > --- a/cache.h > +++ b/cache.h > @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) > { > if (S_ISLNK(mode)) > return S_IFLNK; > + if (mode == S_IFDIR) > + return S_IFDIR; > if (S_ISDIR(mode) || S_ISGITLINK(mode)) > return S_IFGITLINK; > return S_IFREG | ce_permissions(mode); > diff --git a/read-cache.c b/read-cache.c > index 97dbf2434f30..67acbf202f4e 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -25,6 +25,7 @@ > #include "fsmonitor.h" > #include "thread-utils.h" > #include "progress.h" > +#include "sparse-index.h" > > /* Mask for the name length in ce_flags in the on-disk index */ > > @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode) > > c = *path++; > if ((c == '.' && !verify_dotfile(path, mode)) || > - is_dir_sep(c) || c == '\0') > + is_dir_sep(c)) > return 0; > + /* > + * allow terminating directory separators for > + * sparse directory enries. enries -> entries > + */ > + if (c == '\0') > + return S_ISDIR(mode); Yaay, much simpler (than the RFC version). > } else if (c == '\\' && protect_ntfs) { > if (is_ntfs_dotgit(path)) > return 0; > @@ -3061,6 +3068,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l > unsigned flags) > { > int ret; > + int was_full = !istate->sparse_index; > + > + ret = convert_to_sparse(istate); > + > + if (ret) { > + warning(_("failed to convert to a sparse-index")); > + return ret; > + } > > /* > * TODO trace2: replace "the_repository" with the actual repo instance > @@ -3072,6 +3087,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l > trace2_region_leave_printf("index", "do_write_index", the_repository, > "%s", get_lock_file_path(lock)); > > + if (was_full) > + ensure_full_index(istate); > + > if (ret) > return ret; > if (flags & COMMIT_LOCK) > @@ -3162,9 +3180,10 @@ static int write_shared_index(struct index_state *istate, > struct tempfile **temp) > { > struct split_index *si = istate->split_index; > - int ret; > + int ret, was_full = !istate->sparse_index; > > move_cache_to_base_index(istate); > + convert_to_sparse(istate); > > trace2_region_enter_printf("index", "shared/do_write_index", > the_repository, "%s", get_tempfile_path(*temp)); > @@ -3172,6 +3191,9 @@ static int write_shared_index(struct index_state *istate, > trace2_region_leave_printf("index", "shared/do_write_index", > the_repository, "%s", get_tempfile_path(*temp)); > > + if (was_full) > + ensure_full_index(istate); > + > if (ret) > return ret; > ret = adjust_shared_perm(get_tempfile_path(*temp)); > diff --git a/sparse-index.c b/sparse-index.c > index 316cb949b74b..cb1f85635fbc 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -4,6 +4,145 @@ > #include "tree.h" > #include "pathspec.h" > #include "trace2.h" > +#include "cache-tree.h" > +#include "config.h" > +#include "dir.h" > +#include "fsmonitor.h" > + > +static struct cache_entry *construct_sparse_dir_entry( > + struct index_state *istate, > + const char *sparse_dir, > + struct cache_tree *tree) > +{ > + struct cache_entry *de; > + > + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); > + > + de->ce_flags |= CE_SKIP_WORKTREE; > + return de; > +} > + > +/* > + * Returns the number of entries "inserted" into the index. > + */ > +static int convert_to_sparse_rec(struct index_state *istate, > + int num_converted, > + int start, int end, > + const char *ct_path, size_t ct_pathlen, > + struct cache_tree *ct) > +{ > + int i, can_convert = 1; > + int start_converted = num_converted; > + enum pattern_match_result match; > + int dtype; > + struct strbuf child_path = STRBUF_INIT; > + struct pattern_list *pl = istate->sparse_checkout_patterns; > + > + /* > + * Is the current path outside of the sparse cone? > + * Then check if the region can be replaced by a sparse > + * directory entry (everything is sparse and merged). > + */ > + match = path_matches_pattern_list(ct_path, ct_pathlen, > + NULL, &dtype, pl, istate); > + if (match != NOT_MATCHED) > + can_convert = 0; Not sure if you saw my comments on the flow control at https://lore.kernel.org/git/CABPp-BE9wPwmC0=pA4p1_QSRDHrO8RzqfJQdE2NxYZsYL_Rcig@mail.gmail.com/ (the typos elsewhere seem to still be present). If you saw it and decided against it, that's fine, just wanted the idea to at least be floated. > + > + for (i = start; can_convert && i < end; i++) { > + struct cache_entry *ce = istate->cache[i]; > + > + if (ce_stage(ce) || > + !(ce->ce_flags & CE_SKIP_WORKTREE)) > + can_convert = 0; > + } > + > + if (can_convert) { > + struct cache_entry *se; > + se = construct_sparse_dir_entry(istate, ct_path, ct); > + > + istate->cache[num_converted++] = se; > + return 1; > + } > + > + for (i = start; i < end; ) { > + int count, span, pos = -1; > + const char *base, *slash; > + struct cache_entry *ce = istate->cache[i]; > + > + /* > + * Detect if this is a normal entry oustide of any subtree s/oustide/outside/ > + * entry. > + */ > + base = ce->name + ct_pathlen; > + slash = strchr(base, '/'); > + > + if (slash) > + pos = cache_tree_subtree_pos(ct, base, slash - base); > + > + if (pos < 0) { > + istate->cache[num_converted++] = ce; > + i++; > + continue; > + } > + > + strbuf_setlen(&child_path, 0); > + strbuf_add(&child_path, ce->name, slash - ce->name + 1); > + > + span = ct->down[pos]->cache_tree->entry_count; > + count = convert_to_sparse_rec(istate, > + num_converted, i, i + span, > + child_path.buf, child_path.len, > + ct->down[pos]->cache_tree); > + num_converted += count; > + i += span; > + } > + > + strbuf_release(&child_path); > + return num_converted - start_converted; > +} > + > +int convert_to_sparse(struct index_state *istate) > +{ > + if (istate->split_index || istate->sparse_index || > + !core_apply_sparse_checkout || !core_sparse_checkout_cone) > + return 0; > + > + /* > + * For now, only create a sparse index with the > + * GIT_TEST_SPARSE_INDEX environment variable. We will relax > + * this once we have a proper way to opt-in (and later still, > + * opt-out). > + */ > + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) > + return 0; > + > + if (!istate->sparse_checkout_patterns) { > + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); > + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) > + return 0; > + } > + > + if (!istate->sparse_checkout_patterns->use_cone_patterns) { > + warning(_("attempting to use sparse-index without cone mode")); > + return -1; > + } > + > + if (cache_tree_update(istate, 0)) { > + warning(_("unable to update cache-tree, staying full")); > + return -1; > + } > + > + remove_fsmonitor(istate); > + > + trace2_region_enter("index", "convert_to_sparse", istate->repo); > + istate->cache_nr = convert_to_sparse_rec(istate, > + 0, 0, istate->cache_nr, > + "", 0, istate->cache_tree); > + istate->drop_cache_tree = 1; > + istate->sparse_index = 1; > + trace2_region_leave("index", "convert_to_sparse", istate->repo); > + return 0; > +} > > static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) > { > diff --git a/sparse-index.h b/sparse-index.h > index 09a20d036c46..64380e121d80 100644 > --- a/sparse-index.h > +++ b/sparse-index.h > @@ -3,5 +3,6 @@ > > struct index_state; > void ensure_full_index(struct index_state *istate); > +int convert_to_sparse(struct index_state *istate); > > #endif > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 4d789fe86b9d..ca87033d30b0 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -2,6 +2,9 @@ > > test_description='compare full workdir to sparse workdir' > > +GIT_TEST_CHECK_CACHE_TREE=0 Same question as I posted for the RFC series: Why do you need to set this? I vaguely remember needing to mess with this when working with sparse checkouts because it did weird stuff but I don't remember details. But since your patch touches cache_trees, it seems weird to show up without explanation. > +GIT_TEST_SPLIT_INDEX=0 > + > . ./test-lib.sh > > test_expect_success 'setup' ' > @@ -121,15 +124,49 @@ run_on_all () { > test_all_match () { > run_on_all "$@" && > test_cmp full-checkout-out sparse-checkout-out && > - test_cmp full-checkout-err sparse-checkout-err > + test_cmp full-checkout-out sparse-index-out && > + test_cmp full-checkout-err sparse-checkout-err && > + test_cmp full-checkout-err sparse-index-err > } > > test_sparse_match () { > - run_on_sparse $* && > + run_on_sparse "$@" && > test_cmp sparse-checkout-out sparse-index-out && > test_cmp sparse-checkout-err sparse-index-err > } > > +test_expect_success 'sparse-index contents' ' > + init_repos && > + > + test-tool -C sparse-index read-cache --table >cache && > + for dir in folder1 folder2 x > + do > + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > + grep "040000 tree $TREE $dir/" cache \ > + || return 1 > + done && Thanks for making the output look more like ls-tree output; it's easier to parse that way, at least for me. > + > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && > + > + test-tool -C sparse-index read-cache --table >cache && > + for dir in deep folder2 x > + do > + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > + grep "040000 tree $TREE $dir/" cache \ > + || return 1 > + done && > + > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && > + > + test-tool -C sparse-index read-cache --table >cache && > + for dir in deep/deeper2 folder1 folder2 x > + do > + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > + grep "040000 tree $TREE $dir/" cache \ > + || return 1 > + done > +' > + > test_expect_success 'expanded in-memory index matches full index' ' > init_repos && > test_sparse_match test-tool read-cache --expand --table > @@ -137,6 +174,7 @@ test_expect_success 'expanded in-memory index matches full index' ' > > test_expect_success 'status with options' ' > init_repos && > + test_sparse_match ls && > test_all_match git status --porcelain=v2 && > test_all_match git status --porcelain=v2 -z -u && > test_all_match git status --porcelain=v2 -uno && > @@ -273,6 +311,17 @@ test_expect_failure 'checkout and reset (mixed)' ' > test_all_match git reset update-folder2 > ' > > +# Ensure that sparse-index behaves identically to > +# sparse-checkout with a full index. > +test_expect_success 'checkout and reset (mixed) [sparse]' ' > + init_repos && > + > + test_sparse_match git checkout -b reset-test update-deep && > + test_sparse_match git reset deepest && > + test_sparse_match git reset update-folder1 && > + test_sparse_match git reset update-folder2 > +' > + > test_expect_success 'merge' ' > init_repos && > > @@ -309,14 +358,20 @@ test_expect_success 'clean' ' > test_all_match git status --porcelain=v2 && > test_all_match git clean -f && > test_all_match git status --porcelain=v2 && > + test_sparse_match ls && > + test_sparse_match ls folder1 && > > test_all_match git clean -xf && > test_all_match git status --porcelain=v2 && > + test_sparse_match ls && > + test_sparse_match ls folder1 && > > test_all_match git clean -xdf && > test_all_match git status --porcelain=v2 && > + test_sparse_match ls && > + test_sparse_match ls folder1 && > > - test_path_is_dir sparse-checkout/folder1 > + test_sparse_match test_path_is_dir folder1 > ' > > test_done > -- > gitgitgadget I mostly read over the range-diff since it was much shorter. You've addressed a number of questions/comments I had on the RFC version, but there's still some I didn't see a response to so I reposted them here. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 11/20] sparse-index: convert from full to sparse 2021-02-25 7:33 ` Elijah Newren @ 2021-03-09 21:13 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 21:13 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 2/25/2021 2:33 AM, Elijah Newren wrote: > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> + /* >> + * allow terminating directory separators for >> + * sparse directory enries. > > enries -> entries Thanks. >> + */ >> + if (c == '\0') >> + return S_ISDIR(mode); > > Yaay, much simpler (than the RFC version). >> + /* >> + * Is the current path outside of the sparse cone? >> + * Then check if the region can be replaced by a sparse >> + * directory entry (everything is sparse and merged). >> + */ >> + match = path_matches_pattern_list(ct_path, ct_pathlen, >> + NULL, &dtype, pl, istate); >> + if (match != NOT_MATCHED) >> + can_convert = 0; > > Not sure if you saw my comments on the flow control at > https://lore.kernel.org/git/CABPp-BE9wPwmC0=pA4p1_QSRDHrO8RzqfJQdE2NxYZsYL_Rcig@mail.gmail.com/ > (the typos elsewhere seem to still be present). If you saw it and > decided against it, that's fine, just wanted the idea to at least be > floated. Sorry for dropping this one. I _did_ decide against it, and primarily because the "if (can_convert)" condition contains a return statement. I like to use 'gotos' for blocks that will eventually be entered by all paths through the code, such as "goto cleanup;" but here I find the "can_convert" check to be clearer. >> + /* >> + * Detect if this is a normal entry oustide of any subtree > > s/oustide/outside/ Got it. >> +test_expect_success 'sparse-index contents' ' >> + init_repos && >> + >> + test-tool -C sparse-index read-cache --table >cache && >> + for dir in folder1 folder2 x >> + do >> + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && >> + grep "040000 tree $TREE $dir/" cache \ >> + || return 1 >> + done && > > Thanks for making the output look more like ls-tree output; it's > easier to parse that way, at least for me. Excellent. > I mostly read over the range-diff since it was much shorter. You've > addressed a number of questions/comments I had on the RFC version, but > there's still some I didn't see a response to so I reposted them here. Thanks for being diligent! -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 12/20] submodule: sparse-index should not collapse links 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- sparse-index.c | 1 + t/t1092-sparse-checkout-compatibility.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/sparse-index.c b/sparse-index.c index cb1f85635fbc..14029fafc750 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -52,6 +52,7 @@ static int convert_to_sparse_rec(struct index_state *istate, struct cache_entry *ce = istate->cache[i]; if (ce_stage(ce) || + S_ISGITLINK(ce->ce_mode) || !(ce->ce_flags & CE_SKIP_WORKTREE)) can_convert = 0; } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca87033d30b0..b38fab6455d9 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -374,4 +374,21 @@ test_expect_success 'clean' ' test_sparse_match test_path_is_dir folder1 ' +test_expect_success 'submodule handling' ' + init_repos && + + test_all_match mkdir modules && + test_all_match touch modules/a && + test_all_match git add modules && + test_all_match git commit -m "add modules directory" && + + run_on_all git submodule add "$(pwd)/initial-repo" modules/sub && + test_all_match git commit -m "add submodule" && + + # having a submodule prevents "modules" from collapse + test-tool -C sparse-index read-cache --table >cache && + grep "100644 blob .* modules/a" cache && + grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 13/20] unpack-trees: allow sparse directories 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (11 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-25 7:40 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The negation of the 'pos' variable must be conditioned to only when it starts as negative. This is identical behavior as before when the index is full. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 4dd99219073a..b324eec2a5d1 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -746,9 +746,12 @@ static int index_pos_by_traverse_info(struct name_entry *names, strbuf_make_traverse_path(&name, info, names->path, names->pathlen); strbuf_addch(&name, '/'); pos = index_name_pos(o->src_index, name.buf, name.len); - if (pos >= 0) - BUG("This is a directory and should not exist in index"); - pos = -pos - 1; + if (pos >= 0) { + if (!o->src_index->sparse_index || + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) + BUG("This is a directory and should not exist in index"); + } else + pos = -pos - 1; if (pos >= o->src_index->cache_nr || !starts_with(o->src_index->cache[pos]->name, name.buf) || (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 13/20] unpack-trees: allow sparse directories 2021-02-23 20:14 ` [PATCH 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-02-25 7:40 ` Elijah Newren 2021-03-09 21:35 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-25 7:40 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > The index_pos_by_traverse_info() currently throws a BUG() when a > directory entry exists exactly in the index. We need to consider that it > is possible to have a directory in a sparse index as long as that entry > is itself marked with the skip-worktree bit. > > The negation of the 'pos' variable must be conditioned to only when it > starts as negative. This is identical behavior as before when the index > is full. Same comment on the second paragraph as I made in the RFC series -- https://lore.kernel.org/git/CABPp-BGPJgA4guWHVm3AVS=hM0fTixUpRvJe5i9NnHT-3QJMfw@mail.gmail.com/. I apologize if I'm repeating stuff you chose to not change, but I didn't see a response and given the three typos left in previous patches, I'm unsure whether it was unaddressed on purpose or on accident. > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > unpack-trees.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/unpack-trees.c b/unpack-trees.c > index 4dd99219073a..b324eec2a5d1 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -746,9 +746,12 @@ static int index_pos_by_traverse_info(struct name_entry *names, > strbuf_make_traverse_path(&name, info, names->path, names->pathlen); > strbuf_addch(&name, '/'); > pos = index_name_pos(o->src_index, name.buf, name.len); > - if (pos >= 0) > - BUG("This is a directory and should not exist in index"); > - pos = -pos - 1; > + if (pos >= 0) { > + if (!o->src_index->sparse_index || > + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) > + BUG("This is a directory and should not exist in index"); > + } else > + pos = -pos - 1; > if (pos >= o->src_index->cache_nr || > !starts_with(o->src_index->cache[pos]->name, name.buf) || > (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 13/20] unpack-trees: allow sparse directories 2021-02-25 7:40 ` Elijah Newren @ 2021-03-09 21:35 ` Derrick Stolee 2021-03-09 21:39 ` Elijah Newren 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 21:35 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 2/25/2021 2:40 AM, Elijah Newren wrote: > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> The index_pos_by_traverse_info() currently throws a BUG() when a >> directory entry exists exactly in the index. We need to consider that it >> is possible to have a directory in a sparse index as long as that entry >> is itself marked with the skip-worktree bit. >> >> The negation of the 'pos' variable must be conditioned to only when it >> starts as negative. This is identical behavior as before when the index >> is full. > > Same comment on the second paragraph as I made in the RFC series -- > https://lore.kernel.org/git/CABPp-BGPJgA4guWHVm3AVS=hM0fTixUpRvJe5i9NnHT-3QJMfw@mail.gmail.com/. > I apologize if I'm repeating stuff you chose to not change, but I > didn't see a response and given the three typos left in previous > patches, I'm unsure whether it was unaddressed on purpose or on > accident. Yes, I dropped this one. How about this? The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 13/20] unpack-trees: allow sparse directories 2021-03-09 21:35 ` Derrick Stolee @ 2021-03-09 21:39 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-09 21:39 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Mar 9, 2021 at 1:35 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 2/25/2021 2:40 AM, Elijah Newren wrote: > > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> > >> From: Derrick Stolee <dstolee@microsoft.com> > >> > >> The index_pos_by_traverse_info() currently throws a BUG() when a > >> directory entry exists exactly in the index. We need to consider that it > >> is possible to have a directory in a sparse index as long as that entry > >> is itself marked with the skip-worktree bit. > >> > >> The negation of the 'pos' variable must be conditioned to only when it > >> starts as negative. This is identical behavior as before when the index > >> is full. > > > > Same comment on the second paragraph as I made in the RFC series -- > > https://lore.kernel.org/git/CABPp-BGPJgA4guWHVm3AVS=hM0fTixUpRvJe5i9NnHT-3QJMfw@mail.gmail.com/. > > I apologize if I'm repeating stuff you chose to not change, but I > > didn't see a response and given the three typos left in previous > > patches, I'm unsure whether it was unaddressed on purpose or on > > accident. > > Yes, I dropped this one. How about this? > > The 'pos' variable is assigned a negative value if an exact match is not > found. Since a directory name can be an exact match, it is no longer an > error to have a nonnegative 'pos' value. I like it! ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 14/20] sparse-index: check index conversion happens 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (12 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a test case that uses test_region to ensure that we are truly expanding a sparse index to a full one, then converting back to sparse when writing the index. As we integrate more Git commands with the sparse index, we will convert these commands to check that we do _not_ convert the sparse index to a full index and instead stay sparse the entire time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index b38fab6455d9..bfc9e28ef0e1 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -391,4 +391,22 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +test_expect_success 'sparse-index is expanded and converted back' ' + init_repos && + + ( + GIT_TEST_SPARSE_INDEX=1 && + export GIT_TEST_SPARSE_INDEX && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 15/20] sparse-index: create extension for compatibility 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (13 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-25 7:45 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Previously, we enabled the sparse index format only using GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to actually select this mode. Further, sparse directory entries are not understood by the index formats as advertised. We _could_ add a new index version that explicitly adds these capabilities, but there are nuances to index formats 2, 3, and 4 that are still valuable to select as options. For now, create a repo extension, "extensions.sparseIndex", that specifies that the tool reading this repository must understand sparse directory entries. This change only encodes the extension and enables it when GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI mechanism. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/config/extensions.txt | 7 ++++++ cache.h | 1 + repo-settings.c | 7 ++++++ repository.h | 3 ++- setup.c | 3 +++ sparse-index.c | 38 +++++++++++++++++++++++++---- 6 files changed, 53 insertions(+), 6 deletions(-) diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 4e23d73cdcad..5c86b3648732 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -6,3 +6,10 @@ extensions.objectFormat:: Note that this setting should only be set by linkgit:git-init[1] or linkgit:git-clone[1]. Trying to change it after initialization will not work and will produce hard-to-diagnose issues. + +extensions.sparseIndex:: + When combined with `core.sparseCheckout=true` and + `core.sparseCheckoutCone=true`, the index may contain entries + corresponding to directories outside of the sparse-checkout + definition. Versions of Git that do not understand this extension + do not expect directory entries in the index. diff --git a/cache.h b/cache.h index e8b7d3b4fb33..eea61fba7568 100644 --- a/cache.h +++ b/cache.h @@ -1053,6 +1053,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int sparse_index; char *work_tree; struct string_list unknown_extensions; struct string_list v1_only_extensions; diff --git a/repo-settings.c b/repo-settings.c index d63569e4041e..9677d50f9238 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r) * removed. */ r->settings.command_requires_full_index = 1; + + /* + * Initialize this as off. + */ + r->settings.sparse_index = 0; + if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) + r->settings.sparse_index = 1; } diff --git a/repository.h b/repository.h index e06a23015697..a45f7520fd9e 100644 --- a/repository.h +++ b/repository.h @@ -42,7 +42,8 @@ struct repo_settings { int core_multi_pack_index; - unsigned command_requires_full_index:1; + unsigned command_requires_full_index:1, + sparse_index:1; }; struct repository { diff --git a/setup.c b/setup.c index c04cd25a30df..cd8394564613 100644 --- a/setup.c +++ b/setup.c @@ -500,6 +500,9 @@ static enum extension_result handle_extension(const char *var, return error("invalid value for 'extensions.objectformat'"); data->hash_algo = format; return EXTENSION_OK; + } else if (!strcmp(ext, "sparseindex")) { + data->sparse_index = 1; + return EXTENSION_OK; } return EXTENSION_UNKNOWN; } diff --git a/sparse-index.c b/sparse-index.c index 14029fafc750..97b0d0c57857 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -102,19 +102,47 @@ static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; } +static int enable_sparse_index(struct repository *repo) +{ + const char *config_path = repo_git_path(repo, "config.worktree"); + + if (upgrade_repository_format(1) < 0) { + warning(_("unable to upgrade repository format to enable sparse-index")); + return -1; + } + git_config_set_in_file_gently(config_path, + "extensions.sparseIndex", + "true"); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 1; + return 0; +} + int convert_to_sparse(struct index_state *istate) { if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; + if (!istate->repo) + istate->repo = the_repository; + + /* + * The GIT_TEST_SPARSE_INDEX environment variable triggers the + * extensions.sparseIndex config variable to be on. + */ + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { + int err = enable_sparse_index(istate->repo); + if (err < 0) + return err; + } + /* - * For now, only create a sparse index with the - * GIT_TEST_SPARSE_INDEX environment variable. We will relax - * this once we have a proper way to opt-in (and later still, - * opt-out). + * Only convert to sparse if extensions.sparseIndex is set. */ - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + prepare_repo_settings(istate->repo); + if (!istate->repo->settings.sparse_index) return 0; if (!istate->sparse_checkout_patterns) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 15/20] sparse-index: create extension for compatibility 2021-02-23 20:14 ` [PATCH 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget @ 2021-02-25 7:45 ` Elijah Newren 2021-03-09 21:45 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-25 7:45 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Previously, we enabled the sparse index format only using > GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to > actually select this mode. Further, sparse directory entries are not > understood by the index formats as advertised. > > We _could_ add a new index version that explicitly adds these > capabilities, but there are nuances to index formats 2, 3, and 4 that > are still valuable to select as options. For now, create a repo > extension, "extensions.sparseIndex", that specifies that the tool > reading this repository must understand sparse directory entries. This commit is unchanged from the RFC series, but given your comments in the design document about how you do intend to create an index format v5 now, do you want to reference that here? > > This change only encodes the extension and enables it when > GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI > mechanism. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > Documentation/config/extensions.txt | 7 ++++++ > cache.h | 1 + > repo-settings.c | 7 ++++++ > repository.h | 3 ++- > setup.c | 3 +++ > sparse-index.c | 38 +++++++++++++++++++++++++---- > 6 files changed, 53 insertions(+), 6 deletions(-) > > diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt > index 4e23d73cdcad..5c86b3648732 100644 > --- a/Documentation/config/extensions.txt > +++ b/Documentation/config/extensions.txt > @@ -6,3 +6,10 @@ extensions.objectFormat:: > Note that this setting should only be set by linkgit:git-init[1] or > linkgit:git-clone[1]. Trying to change it after initialization will not > work and will produce hard-to-diagnose issues. > + > +extensions.sparseIndex:: > + When combined with `core.sparseCheckout=true` and > + `core.sparseCheckoutCone=true`, the index may contain entries > + corresponding to directories outside of the sparse-checkout > + definition. Versions of Git that do not understand this extension > + do not expect directory entries in the index. I had a wording suggestion for this paragraph in the RFC series -- https://lore.kernel.org/git/CABPp-BFEJE82k4VgkR=Jf7V7sZxZzo2pHMfAGshhi9_vV6iK0w@mail.gmail.com/. Let me know if you just decided to leave it out so I don't bug you about stuff you already considered. > diff --git a/cache.h b/cache.h > index e8b7d3b4fb33..eea61fba7568 100644 > --- a/cache.h > +++ b/cache.h > @@ -1053,6 +1053,7 @@ struct repository_format { > int worktree_config; > int is_bare; > int hash_algo; > + int sparse_index; > char *work_tree; > struct string_list unknown_extensions; > struct string_list v1_only_extensions; > diff --git a/repo-settings.c b/repo-settings.c > index d63569e4041e..9677d50f9238 100644 > --- a/repo-settings.c > +++ b/repo-settings.c > @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r) > * removed. > */ > r->settings.command_requires_full_index = 1; > + > + /* > + * Initialize this as off. > + */ > + r->settings.sparse_index = 0; > + if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) > + r->settings.sparse_index = 1; > } > diff --git a/repository.h b/repository.h > index e06a23015697..a45f7520fd9e 100644 > --- a/repository.h > +++ b/repository.h > @@ -42,7 +42,8 @@ struct repo_settings { > > int core_multi_pack_index; > > - unsigned command_requires_full_index:1; > + unsigned command_requires_full_index:1, > + sparse_index:1; > }; > > struct repository { > diff --git a/setup.c b/setup.c > index c04cd25a30df..cd8394564613 100644 > --- a/setup.c > +++ b/setup.c > @@ -500,6 +500,9 @@ static enum extension_result handle_extension(const char *var, > return error("invalid value for 'extensions.objectformat'"); > data->hash_algo = format; > return EXTENSION_OK; > + } else if (!strcmp(ext, "sparseindex")) { > + data->sparse_index = 1; > + return EXTENSION_OK; > } > return EXTENSION_UNKNOWN; > } > diff --git a/sparse-index.c b/sparse-index.c > index 14029fafc750..97b0d0c57857 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -102,19 +102,47 @@ static int convert_to_sparse_rec(struct index_state *istate, > return num_converted - start_converted; > } > > +static int enable_sparse_index(struct repository *repo) > +{ > + const char *config_path = repo_git_path(repo, "config.worktree"); > + > + if (upgrade_repository_format(1) < 0) { > + warning(_("unable to upgrade repository format to enable sparse-index")); > + return -1; > + } > + git_config_set_in_file_gently(config_path, > + "extensions.sparseIndex", > + "true"); > + > + prepare_repo_settings(repo); > + repo->settings.sparse_index = 1; > + return 0; > +} > + > int convert_to_sparse(struct index_state *istate) > { > if (istate->split_index || istate->sparse_index || > !core_apply_sparse_checkout || !core_sparse_checkout_cone) > return 0; > > + if (!istate->repo) > + istate->repo = the_repository; > + > + /* > + * The GIT_TEST_SPARSE_INDEX environment variable triggers the > + * extensions.sparseIndex config variable to be on. > + */ > + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { > + int err = enable_sparse_index(istate->repo); > + if (err < 0) > + return err; > + } > + > /* > - * For now, only create a sparse index with the > - * GIT_TEST_SPARSE_INDEX environment variable. We will relax > - * this once we have a proper way to opt-in (and later still, > - * opt-out). > + * Only convert to sparse if extensions.sparseIndex is set. > */ > - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) > + prepare_repo_settings(istate->repo); > + if (!istate->repo->settings.sparse_index) > return 0; > > if (!istate->sparse_checkout_patterns) { > -- > gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 15/20] sparse-index: create extension for compatibility 2021-02-25 7:45 ` Elijah Newren @ 2021-03-09 21:45 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 21:45 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 2/25/2021 2:45 AM, Elijah Newren wrote: > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> Previously, we enabled the sparse index format only using >> GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to >> actually select this mode. Further, sparse directory entries are not >> understood by the index formats as advertised. >> >> We _could_ add a new index version that explicitly adds these >> capabilities, but there are nuances to index formats 2, 3, and 4 that >> are still valuable to select as options. For now, create a repo >> extension, "extensions.sparseIndex", that specifies that the tool >> reading this repository must understand sparse directory entries. > > This commit is unchanged from the RFC series, but given your comments > in the design document about how you do intend to create an index > format v5 now, do you want to reference that here? I'll insert detail about v5. >> +extensions.sparseIndex:: >> + When combined with `core.sparseCheckout=true` and >> + `core.sparseCheckoutCone=true`, the index may contain entries >> + corresponding to directories outside of the sparse-checkout >> + definition. Versions of Git that do not understand this extension >> + do not expect directory entries in the index. > > I had a wording suggestion for this paragraph in the RFC series -- > https://lore.kernel.org/git/CABPp-BFEJE82k4VgkR=Jf7V7sZxZzo2pHMfAGshhi9_vV6iK0w@mail.gmail.com/. > Let me know if you just decided to leave it out so I don't bug you > about stuff you already considered. I'll take your suggestion, thanks. -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (14 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-24 19:11 ` Martin Ågren 2021-02-23 20:14 ` [PATCH 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/git-sparse-checkout.txt | 14 +++++++++ builtin/sparse-checkout.c | 17 ++++++++++- sparse-index.c | 37 +++++++++++++++-------- sparse-index.h | 3 ++ t/t1092-sparse-checkout-compatibility.sh | 38 +++++++++++------------- 5 files changed, 76 insertions(+), 33 deletions(-) diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02ee3..b51b8450cfd9 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by other tools. Enabling sparse index +enables the `extensions.spareseIndex` config value, which might cause +other tools to stop working with your repository. If you have trouble with +this compatibility, then run `git sparse-checkout sparse-index disable` to +remove this config and rewrite your index to not be sparse. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e00b82af727b..ca63e2c64e95 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -14,6 +14,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "quote.h" +#include "sparse-index.h" static const char *empty_base = ""; @@ -283,12 +284,13 @@ static int set_config(enum sparse_checkout_mode mode) } static char const * const builtin_sparse_checkout_init_usage[] = { - N_("git sparse-checkout init [--cone]"), + N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"), NULL }; static struct sparse_checkout_init_opts { int cone_mode; + int sparse_index; } init_opts; static int sparse_checkout_init(int argc, const char **argv) @@ -303,11 +305,15 @@ static int sparse_checkout_init(int argc, const char **argv) static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, N_("initialize the sparse-checkout in cone mode")), + OPT_BOOL(0, "sparse-index", &init_opts.sparse_index, + N_("toggle the use of a sparse index")), OPT_END(), }; repo_read_index(the_repository); + init_opts.sparse_index = -1; + argc = parse_options(argc, argv, NULL, builtin_sparse_checkout_init_options, builtin_sparse_checkout_init_usage, 0); @@ -326,6 +332,15 @@ static int sparse_checkout_init(int argc, const char **argv) sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL); + if (init_opts.sparse_index >= 0) { + if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0) + die(_("failed to modify sparse-index config")); + + /* force an index rewrite */ + repo_read_index(the_repository); + the_repository->index->updated_workdir = 1; + } + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); diff --git a/sparse-index.c b/sparse-index.c index 97b0d0c57857..a991c5331e9e 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -104,23 +104,37 @@ static int convert_to_sparse_rec(struct index_state *istate, static int enable_sparse_index(struct repository *repo) { - const char *config_path = repo_git_path(repo, "config.worktree"); + int res; if (upgrade_repository_format(1) < 0) { warning(_("unable to upgrade repository format to enable sparse-index")); return -1; } - git_config_set_in_file_gently(config_path, - "extensions.sparseIndex", - "true"); + res = git_config_set_gently("extensions.sparseindex", "true"); prepare_repo_settings(repo); repo->settings.sparse_index = 1; - return 0; + return res; +} + +int set_sparse_index_config(struct repository *repo, int enable) +{ + int res; + + if (enable) + return enable_sparse_index(repo); + + /* Don't downgrade repository format, just remove the extension. */ + res = git_config_set_gently("extensions.sparseindex", NULL); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 0; + return res; } int convert_to_sparse(struct index_state *istate) { + int test_env; if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ -129,14 +143,13 @@ int convert_to_sparse(struct index_state *istate) istate->repo = the_repository; /* - * The GIT_TEST_SPARSE_INDEX environment variable triggers the - * extensions.sparseIndex config variable to be on. + * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex + * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), + * then purposefully disable the setting. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); - if (err < 0) - return err; - } + test_env = git_env_bool("GIT_TEST_SPARSE_INDEX", -1); + if (test_env >= 0) + set_sparse_index_config(istate->repo, test_env); /* * Only convert to sparse if extensions.sparseIndex is set. diff --git a/sparse-index.h b/sparse-index.h index 64380e121d80..39dcc859735e 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,4 +5,7 @@ struct index_state; void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); +struct repository; +int set_sparse_index_config(struct repository *repo, int enable); + #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index bfc9e28ef0e1..9c2bc4d25f66 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -4,6 +4,7 @@ test_description='compare full workdir to sparse workdir' GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= . ./test-lib.sh @@ -98,25 +99,26 @@ init_repos () { # initialize sparse-checkout definitions git -C sparse-checkout sparse-checkout init --cone && git -C sparse-checkout sparse-checkout set deep && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && + test_cmp_config -C sparse-index true extensions.sparseindex && + git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) && ( cd sparse-index && - GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err + "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -146,7 +148,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + git -C sparse-index sparse-checkout set folder1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep folder2 x @@ -156,7 +158,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + git -C sparse-index sparse-checkout set deep/deeper1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x @@ -394,19 +396,15 @@ test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && - ( - GIT_TEST_SPARSE_INDEX=1 && - export GIT_TEST_SPARSE_INDEX && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && - test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && - - rm trace2.txt && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt - ) + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-02-23 20:14 ` [PATCH 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget @ 2021-02-24 19:11 ` Martin Ågren 2021-03-09 20:52 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Martin Ågren @ 2021-02-24 19:11 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Elijah Newren, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > +that is not completely understood by other tools. Enabling sparse index > +enables the `extensions.spareseIndex` config value, which might cause s/sparese/sparse > +other tools to stop working with your repository. If you have trouble with > +this compatibility, then run `git sparse-checkout sparse-index disable` to > +remove this config and rewrite your index to not be sparse. While I'm commenting on this..: There are several "layers" here, for lack of a better term. "Enabling foo enables bar which may cause baz. If you fail due to baz, try dropping bar by dropping foo." If I remove any mention of the config variable from your text, I get the following. Enabling sparse index might cause other tools to stop working with your repository. If you have trouble with this compatibility, then run `git sparse-checkout sparse-index disable` to rewrite your index to not be sparse. I'm tempted to suggest such a rewrite to relieve readers of knowing of the middle step, which you could say is more of an implementation detail. But if we think that the symptoms / error messages might involve "extensions.sparseIndex" or, as would be the case with an older Git installation, fatal: unknown repository extensions found: sparseindex maybe there is some value in mentioning the config item by name. Just thinking out loud, really, and I don't have any strong opinion. I only came here to point out the typo in the docs. Martin ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-02-24 19:11 ` Martin Ågren @ 2021-03-09 20:52 ` Derrick Stolee 2021-03-09 21:03 ` Elijah Newren 2021-03-14 20:08 ` Martin Ågren 0 siblings, 2 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 20:52 UTC (permalink / raw) To: Martin Ågren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Elijah Newren, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 2/24/2021 2:11 PM, Martin Ågren wrote: > On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> +that is not completely understood by other tools. Enabling sparse index >> +enables the `extensions.spareseIndex` config value, which might cause > > s/sparese/sparse Thanks! >> +other tools to stop working with your repository. If you have trouble with >> +this compatibility, then run `git sparse-checkout sparse-index disable` to >> +remove this config and rewrite your index to not be sparse. > > While I'm commenting on this..: > > There are several "layers" here, for lack of a better term. "Enabling foo > enables bar which may cause baz. If you fail due to baz, try dropping > bar by dropping foo." If I remove any mention of the config variable from > your text, I get the following. > > Enabling sparse index might cause other tools to stop working with your > repository. If you have trouble with this compatibility, then run `git > sparse-checkout sparse-index disable` to rewrite your index to not be > sparse. > > I'm tempted to suggest such a rewrite to relieve readers of knowing of > the middle step, which you could say is more of an implementation > detail. But if we think that the symptoms / error messages might involve > "extensions.sparseIndex" or, as would be the case with an older Git > installation, > > fatal: unknown repository extensions found: > sparseindex > > maybe there is some value in mentioning the config item by name. Just > thinking out loud, really, and I don't have any strong opinion. I only > came here to point out the typo in the docs. I agree that the layers are confusing. We could rearrange and have a similar flow to what you recommend by mentioning the extension at the end: **WARNING:** Using a sparse index requires modifying the index in a way that is not completely understood by other tools. If you have trouble with this compatibility, then run `git sparse-checkout sparse-index disable` to rewrite your index to not be sparse. Older versions of Git will not understand the `sparseIndex` repository extension and may fail to interact with your repository until it is disabled. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-09 20:52 ` Derrick Stolee @ 2021-03-09 21:03 ` Elijah Newren 2021-03-09 21:10 ` Derrick Stolee 2021-03-14 20:08 ` Martin Ågren 1 sibling, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-09 21:03 UTC (permalink / raw) To: Derrick Stolee Cc: Martin Ågren, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 2/24/2021 2:11 PM, Martin Ågren wrote: > > On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> +that is not completely understood by other tools. Enabling sparse index > >> +enables the `extensions.spareseIndex` config value, which might cause > > > > s/sparese/sparse > > Thanks! > > > >> +other tools to stop working with your repository. If you have trouble with > >> +this compatibility, then run `git sparse-checkout sparse-index disable` to > >> +remove this config and rewrite your index to not be sparse. > > > > While I'm commenting on this..: > > > > There are several "layers" here, for lack of a better term. "Enabling foo > > enables bar which may cause baz. If you fail due to baz, try dropping > > bar by dropping foo." If I remove any mention of the config variable from > > your text, I get the following. > > > > Enabling sparse index might cause other tools to stop working with your > > repository. If you have trouble with this compatibility, then run `git > > sparse-checkout sparse-index disable` to rewrite your index to not be > > sparse. > > > > I'm tempted to suggest such a rewrite to relieve readers of knowing of > > the middle step, which you could say is more of an implementation > > detail. But if we think that the symptoms / error messages might involve > > "extensions.sparseIndex" or, as would be the case with an older Git > > installation, > > > > fatal: unknown repository extensions found: > > sparseindex > > > > maybe there is some value in mentioning the config item by name. Just > > thinking out loud, really, and I don't have any strong opinion. I only > > came here to point out the typo in the docs. > > I agree that the layers are confusing. We could rearrange and have > a similar flow to what you recommend by mentioning the extension at > the end: > > **WARNING:** Using a sparse index requires modifying the index in a way > that is not completely understood by other tools. If you have trouble with > this compatibility, then run `git sparse-checkout sparse-index disable` to > rewrite your index to not be sparse. Older versions of Git will not > understand the `sparseIndex` repository extension and may fail to interact > with your repository until it is disabled. > > Thanks, > -Stolee This looks pretty good to me, but could we change the first sentence to read "...modifying the index in a way that may not yet be understood by external tools." ? I'm worried "other tools" might make people worry about different builtin commands (e.g. fast-export, log). I also prefer "may" and "yet" because I suspect most external tools (e.g. git filter-repo just to name a personal example) won't need to read an index format and will thus be unaffected, and any tools that do read the index format will probably eventually learn how to work with the new format. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-09 21:03 ` Elijah Newren @ 2021-03-09 21:10 ` Derrick Stolee 2021-03-09 21:38 ` Elijah Newren 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 21:10 UTC (permalink / raw) To: Elijah Newren Cc: Martin Ågren, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 3/9/2021 4:03 PM, Elijah Newren wrote: > On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 2/24/2021 2:11 PM, Martin Ågren wrote: >>> There are several "layers" here, for lack of a better term. "Enabling foo >>> enables bar which may cause baz. If you fail due to baz, try dropping >>> bar by dropping foo." If I remove any mention of the config variable from >>> your text, I get the following. >>> >>> Enabling sparse index might cause other tools to stop working with your >>> repository. If you have trouble with this compatibility, then run `git >>> sparse-checkout sparse-index disable` to rewrite your index to not be >>> sparse. >>> >>> I'm tempted to suggest such a rewrite to relieve readers of knowing of >>> the middle step, which you could say is more of an implementation >>> detail. But if we think that the symptoms / error messages might involve >>> "extensions.sparseIndex" or, as would be the case with an older Git >>> installation, >>> >>> fatal: unknown repository extensions found: >>> sparseindex >>> >>> maybe there is some value in mentioning the config item by name. Just >>> thinking out loud, really, and I don't have any strong opinion. I only >>> came here to point out the typo in the docs. >> >> I agree that the layers are confusing. We could rearrange and have >> a similar flow to what you recommend by mentioning the extension at >> the end: >> >> **WARNING:** Using a sparse index requires modifying the index in a way >> that is not completely understood by other tools. If you have trouble with >> this compatibility, then run `git sparse-checkout sparse-index disable` to >> rewrite your index to not be sparse. Older versions of Git will not >> understand the `sparseIndex` repository extension and may fail to interact >> with your repository until it is disabled. >> >> Thanks, >> -Stolee > > This looks pretty good to me, but could we change the first sentence > to read "...modifying the index in a way that may not yet be > understood by external tools." ? I'm worried "other tools" might make > people worry about different builtin commands (e.g. fast-export, log). > I also prefer "may" and "yet" because I suspect most external tools > (e.g. git filter-repo just to name a personal example) won't need to > read an index format and will thus be unaffected, and any tools that > do read the index format will probably eventually learn how to work > with the new format. I can make the change, but I do want to point out that the current use of a repository extension _does_ mean that tools that (correctly) interact with a Git repository should fail even if they don't try to access the index file. This is only something to make this work until we introduce a new index file format version and then can drop the extension. "git filter-repo" _should_ be safe because it's really just shelling to Git, right? I'm more concerned about tools like libgit2. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-09 21:10 ` Derrick Stolee @ 2021-03-09 21:38 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-09 21:38 UTC (permalink / raw) To: Derrick Stolee Cc: Martin Ågren, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, Mar 9, 2021 at 1:10 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/9/2021 4:03 PM, Elijah Newren wrote: > > On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote: > >> > >> On 2/24/2021 2:11 PM, Martin Ågren wrote: > >>> There are several "layers" here, for lack of a better term. "Enabling foo > >>> enables bar which may cause baz. If you fail due to baz, try dropping > >>> bar by dropping foo." If I remove any mention of the config variable from > >>> your text, I get the following. > >>> > >>> Enabling sparse index might cause other tools to stop working with your > >>> repository. If you have trouble with this compatibility, then run `git > >>> sparse-checkout sparse-index disable` to rewrite your index to not be > >>> sparse. > >>> > >>> I'm tempted to suggest such a rewrite to relieve readers of knowing of > >>> the middle step, which you could say is more of an implementation > >>> detail. But if we think that the symptoms / error messages might involve > >>> "extensions.sparseIndex" or, as would be the case with an older Git > >>> installation, > >>> > >>> fatal: unknown repository extensions found: > >>> sparseindex > >>> > >>> maybe there is some value in mentioning the config item by name. Just > >>> thinking out loud, really, and I don't have any strong opinion. I only > >>> came here to point out the typo in the docs. > >> > >> I agree that the layers are confusing. We could rearrange and have > >> a similar flow to what you recommend by mentioning the extension at > >> the end: > >> > >> **WARNING:** Using a sparse index requires modifying the index in a way > >> that is not completely understood by other tools. If you have trouble with > >> this compatibility, then run `git sparse-checkout sparse-index disable` to > >> rewrite your index to not be sparse. Older versions of Git will not > >> understand the `sparseIndex` repository extension and may fail to interact > >> with your repository until it is disabled. > >> > >> Thanks, > >> -Stolee > > > > This looks pretty good to me, but could we change the first sentence > > to read "...modifying the index in a way that may not yet be > > understood by external tools." ? I'm worried "other tools" might make > > people worry about different builtin commands (e.g. fast-export, log). > > I also prefer "may" and "yet" because I suspect most external tools > > (e.g. git filter-repo just to name a personal example) won't need to > > read an index format and will thus be unaffected, and any tools that > > do read the index format will probably eventually learn how to work > > with the new format. > > I can make the change, but I do want to point out that the current > use of a repository extension _does_ mean that tools that (correctly) > interact with a Git repository should fail even if they don't try to > access the index file. This is only something to make this work until > we introduce a new index file format version and then can drop the > extension. Good point, though... > "git filter-repo" _should_ be safe because it's really just shelling > to Git, right? I'm more concerned about tools like libgit2. Yes, libgit2 and jgit and similar tools are clearly going to be affected and deeply. Those are of concern, but I suspect most users when they see "external tools" will be thinking of the large multitude of scripts out there that just shell out to git under the hood to provide some higher level wrapper of some sort. And anything that operates that way won't be affected directly by the repository extension. I'm not sure I'd even mark things that shell out to git as _should_ be safe. In general, scripts can make all kinds of assumptions on interpreting output, and I suspect some of those may become invalidated by this new feature. We have a recent guidepost that's very close to home on that too -- git stash had *3* different bugs in it once sparse-checkouts were introduced, based on the fact that it was designed as a just-shell-out-to-low-level-git-commands script and it made assumptions on how those commands worked together. See https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/. Sure git-stash is a builtin (supposedly, anyway), but external tools can make similar logical jumps. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-09 20:52 ` Derrick Stolee 2021-03-09 21:03 ` Elijah Newren @ 2021-03-14 20:08 ` Martin Ågren 2021-03-15 13:36 ` Derrick Stolee 1 sibling, 1 reply; 203+ messages in thread From: Martin Ågren @ 2021-03-14 20:08 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Elijah Newren, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On Tue, 9 Mar 2021 at 21:52, Derrick Stolee <stolee@gmail.com> wrote: > > I agree that the layers are confusing. We could rearrange and have > a similar flow to what you recommend by mentioning the extension at > the end: > > **WARNING:** Using a sparse index requires modifying the index in a way > that is not completely understood by other tools. If you have trouble with > this compatibility, then run `git sparse-checkout sparse-index disable` to > rewrite your index to not be sparse. Older versions of Git will not > understand the `sparseIndex` repository extension and may fail to interact > with your repository until it is disabled. I like it. I find this easier to read than the previous version. That said, is `git sparse-index sparse-checkout disable` really the way to do this? I don't see a "sparse-index" subcommand of git-sparse-checkout. ... Hmm, no, after building and installing your patches, I get $ git sparse-checkout sparse-index disable usage: git sparse-checkout (init|list|set|add|reapply|disable) <options> Should that be `git sparse-checkout init --no-sparse-index`? I just tried that on a fresh, empty repo. It seems to work in the sense that it drops the config item. I'm guessing re-initing a sparse checkout is a safe and sane thing to do? I don't find any tests for this. If re-initing should be ok and in particular if it should allow toggling the use of sparse index, it might be good having a test. At a minimum to see that the command passes and that the config item goes away? And check that the actual index is rewritten back to the "old" format? (Sorry if you have that already and I'm just bad at finding it.) Martin ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-14 20:08 ` Martin Ågren @ 2021-03-15 13:36 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-15 13:36 UTC (permalink / raw) To: Martin Ågren Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Elijah Newren, Junio C Hamano, Nguyễn Thái Ngọc Duy, Jonathan Nieder, Derrick Stolee, Derrick Stolee On 3/14/2021 4:08 PM, Martin Ågren wrote: > On Tue, 9 Mar 2021 at 21:52, Derrick Stolee <stolee@gmail.com> wrote: >> >> I agree that the layers are confusing. We could rearrange and have >> a similar flow to what you recommend by mentioning the extension at >> the end: >> >> **WARNING:** Using a sparse index requires modifying the index in a way >> that is not completely understood by other tools. If you have trouble with >> this compatibility, then run `git sparse-checkout sparse-index disable` to >> rewrite your index to not be sparse. Older versions of Git will not >> understand the `sparseIndex` repository extension and may fail to interact >> with your repository until it is disabled. > > I like it. I find this easier to read than the previous version. That > said, is `git sparse-index sparse-checkout disable` really the way to do > this? I don't see a "sparse-index" subcommand of git-sparse-checkout. > ... Hmm, no, after building and installing your patches, I get > > $ git sparse-checkout sparse-index disable > usage: git sparse-checkout (init|list|set|add|reapply|disable) <options> > > Should that be `git sparse-checkout init --no-sparse-index`? I just > tried that on a fresh, empty repo. It seems to work in the sense that it > drops the config item. I'm guessing re-initing a sparse checkout is a > safe and sane thing to do? Yes! Sorry I missed updating this instance when changing the design. Your suggestion is indeed the proper way to disable the sparse-index. > I don't find any tests for this. If re-initing should be ok and in > particular if it should allow toggling the use of sparse index, it might > be good having a test. At a minimum to see that the command passes and > that the config item goes away? And check that the actual index is > rewritten back to the "old" format? (Sorry if you have that already and > I'm just bad at finding it.) We have tests already that 'git sparse-checkout init' will preserve existing sparse-checkout patterns. I should definitely have a test to ensure that '--no-sparse-index' rewrites the index to be a full one. Thanks! -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 17/20] sparse-checkout: disable sparse-index 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (15 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-27 12:32 ` SZEDER Gábor 2021-02-23 20:14 ` [PATCH 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We use 'git sparse-checkout init --cone --sparse-index' to toggle the sparse-index feature. It makes sense to also disable it when running 'git sparse-checkout disable'. This is particularly important because it removes the extensions.sparseIndex config option, allowing other tools to use this Git repository again. This does mean that 'git sparse-checkout init' will not re-enable the sparse-index feature, even if it was previously enabled. While testing this feature, I noticed that the sparse-index was not being written on the first run, but by a second. This was caught by the call to 'test-tool read-cache --table'. This requires adjusting some assignments to core_apply_sparse_checkout and pl.use_cone_patterns in the sparse_checkout_init() logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 10 +++++++++- t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index ca63e2c64e95..585343fa1972 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode) "core.sparseCheckoutCone", mode == MODE_CONE_PATTERNS ? "true" : NULL); + if (mode == MODE_NO_PATTERNS) + set_sparse_index_config(the_repository, 0); + return 0; } @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv) the_repository->index->updated_workdir = 1; } + core_apply_sparse_checkout = 1; + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); - core_apply_sparse_checkout = 1; return update_working_directory(NULL); } @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv) add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); strbuf_addstr(&pattern, "!/*/"); add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); + pl.use_cone_patterns = init_opts.cone_mode; return write_patterns_and_update(&pl); } @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv) strbuf_addstr(&match_all, "/*"); add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); + prepare_repo_settings(the_repository); + the_repository->settings.sparse_index = 0; + if (update_working_directory(&pl)) die(_("error while refreshing working directory")); diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index fc64e9ed99f4..ff1ad570a255 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' ' check_files repo a deep folder1 folder2 ' +test_expect_success 'sparse-index enabled and disabled' ' + git -C repo sparse-checkout init --cone --sparse-index && + test_cmp_config -C repo true extensions.sparseIndex && + test-tool -C repo read-cache --table >cache && + grep " tree " cache && + + git -C repo sparse-checkout disable && + test-tool -C repo read-cache --table >cache && + ! grep " tree " cache && + git -C repo config --list >config && + ! grep extensions.sparseindex config +' + test_expect_success 'cone mode: init and set' ' git -C repo sparse-checkout init --cone && git -C repo config --list >config && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 17/20] sparse-checkout: disable sparse-index 2021-02-23 20:14 ` [PATCH 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget @ 2021-02-27 12:32 ` SZEDER Gábor 2021-03-09 20:20 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: SZEDER Gábor @ 2021-02-27 12:32 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee On Tue, Feb 23, 2021 at 08:14:26PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > We use 'git sparse-checkout init --cone --sparse-index' to toggle the > sparse-index feature. It makes sense to also disable it when running > 'git sparse-checkout disable'. This is particularly important because it > removes the extensions.sparseIndex config option, allowing other tools > to use this Git repository again. > > This does mean that 'git sparse-checkout init' will not re-enable the > sparse-index feature, even if it was previously enabled. > > While testing this feature, I noticed that the sparse-index was not > being written on the first run, but by a second. This was caught by the > call to 'test-tool read-cache --table'. This requires adjusting some > assignments to core_apply_sparse_checkout and pl.use_cone_patterns in > the sparse_checkout_init() logic. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > builtin/sparse-checkout.c | 10 +++++++++- > t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++ > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c > index ca63e2c64e95..585343fa1972 100644 > --- a/builtin/sparse-checkout.c > +++ b/builtin/sparse-checkout.c > @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode) > "core.sparseCheckoutCone", > mode == MODE_CONE_PATTERNS ? "true" : NULL); > > + if (mode == MODE_NO_PATTERNS) > + set_sparse_index_config(the_repository, 0); > + > return 0; > } > > @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv) > the_repository->index->updated_workdir = 1; > } > > + core_apply_sparse_checkout = 1; > + > /* If we already have a sparse-checkout file, use it. */ > if (res >= 0) { > free(sparse_filename); > - core_apply_sparse_checkout = 1; > return update_working_directory(NULL); > } > > @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv) > add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); > strbuf_addstr(&pattern, "!/*/"); > add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); > + pl.use_cone_patterns = init_opts.cone_mode; > > return write_patterns_and_update(&pl); > } > @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv) > strbuf_addstr(&match_all, "/*"); > add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); > > + prepare_repo_settings(the_repository); > + the_repository->settings.sparse_index = 0; > + > if (update_working_directory(&pl)) > die(_("error while refreshing working directory")); > > diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh > index fc64e9ed99f4..ff1ad570a255 100755 > --- a/t/t1091-sparse-checkout-builtin.sh > +++ b/t/t1091-sparse-checkout-builtin.sh > @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' ' > check_files repo a deep folder1 folder2 > ' > > +test_expect_success 'sparse-index enabled and disabled' ' > + git -C repo sparse-checkout init --cone --sparse-index && > + test_cmp_config -C repo true extensions.sparseIndex && > + test-tool -C repo read-cache --table >cache && > + grep " tree " cache && > + > + git -C repo sparse-checkout disable && > + test-tool -C repo read-cache --table >cache && > + ! grep " tree " cache && > + git -C repo config --list >config && > + ! grep extensions.sparseindex config > +' This test passes with GIT_TEST_SPLIT_INDEX=1 at the moment, because, unfortunately, GIT_TEST_SPLIT_INDEX has been broken for the past two years. However, if I run it with my WIP fixes for that issue [1], then it will fail: +git -C repo sparse-checkout init --cone --sparse-index +test_cmp_config -C repo true extensions.sparseIndex +test-tool -C repo read-cache --table +grep tree cache error: last command exited with $?=1 not ok 16 - sparse-index enabled and disabled https://travis-ci.com/github/szeder/git-cooking-topics-for-travis-ci/jobs/486702444#L2594 [1] Try to run it with: https://github.com/szeder/git split-index-fixes The code is, I believe, close to final, the commit messages, however, are far from being finished. > + > test_expect_success 'cone mode: init and set' ' > git -C repo sparse-checkout init --cone && > git -C repo config --list >config && > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 17/20] sparse-checkout: disable sparse-index 2021-02-27 12:32 ` SZEDER Gábor @ 2021-03-09 20:20 ` Derrick Stolee 2021-03-10 18:20 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-09 20:20 UTC (permalink / raw) To: SZEDER Gábor, Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee On 2/27/2021 7:32 AM, SZEDER Gábor wrote: > On Tue, Feb 23, 2021 at 08:14:26PM +0000, Derrick Stolee via GitGitGadget wrote: >> +test_expect_success 'sparse-index enabled and disabled' ' >> + git -C repo sparse-checkout init --cone --sparse-index && >> + test_cmp_config -C repo true extensions.sparseIndex && >> + test-tool -C repo read-cache --table >cache && >> + grep " tree " cache && >> + >> + git -C repo sparse-checkout disable && >> + test-tool -C repo read-cache --table >cache && >> + ! grep " tree " cache && >> + git -C repo config --list >config && >> + ! grep extensions.sparseindex config >> +' > > This test passes with GIT_TEST_SPLIT_INDEX=1 at the moment, because, > unfortunately, GIT_TEST_SPLIT_INDEX has been broken for the past two > years. However, if I run it with my WIP fixes for that issue [1], > then it will fail: > > +git -C repo sparse-checkout init --cone --sparse-index > +test_cmp_config -C repo true extensions.sparseIndex > +test-tool -C repo read-cache --table > +grep tree cache > error: last command exited with $?=1 > not ok 16 - sparse-index enabled and disabled > > https://travis-ci.com/github/szeder/git-cooking-topics-for-travis-ci/jobs/486702444#L2594 > > [1] Try to run it with: > > https://github.com/szeder/git split-index-fixes > > The code is, I believe, close to final, the commit messages, > however, are far from being finished. I'll keep that in mind. I should have added a variable that disables GIT_TEST_SPLIT_INDEX for this test script, since the sparse-index is (currently) incompatible with the split-index. I bet that the test is failing because it isn't actually writing the sparse-directory entry due to that short-circuit check. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 17/20] sparse-checkout: disable sparse-index 2021-03-09 20:20 ` Derrick Stolee @ 2021-03-10 18:20 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-10 18:20 UTC (permalink / raw) To: SZEDER Gábor, Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee On 3/9/2021 3:20 PM, Derrick Stolee wrote: > On 2/27/2021 7:32 AM, SZEDER Gábor wrote: >> On Tue, Feb 23, 2021 at 08:14:26PM +0000, Derrick Stolee via GitGitGadget wrote: >>> +test_expect_success 'sparse-index enabled and disabled' ' >>> + git -C repo sparse-checkout init --cone --sparse-index && >>> + test_cmp_config -C repo true extensions.sparseIndex && >>> + test-tool -C repo read-cache --table >cache && >>> + grep " tree " cache && >>> + >>> + git -C repo sparse-checkout disable && >>> + test-tool -C repo read-cache --table >cache && >>> + ! grep " tree " cache && >>> + git -C repo config --list >config && >>> + ! grep extensions.sparseindex config >>> +' >> >> This test passes with GIT_TEST_SPLIT_INDEX=1 at the moment, because, >> unfortunately, GIT_TEST_SPLIT_INDEX has been broken for the past two >> years. However, if I run it with my WIP fixes for that issue [1], >> then it will fail: >> >> +git -C repo sparse-checkout init --cone --sparse-index >> +test_cmp_config -C repo true extensions.sparseIndex >> +test-tool -C repo read-cache --table >> +grep tree cache >> error: last command exited with $?=1 >> not ok 16 - sparse-index enabled and disabled >> >> https://travis-ci.com/github/szeder/git-cooking-topics-for-travis-ci/jobs/486702444#L2594 >> >> [1] Try to run it with: >> >> https://github.com/szeder/git split-index-fixes >> >> The code is, I believe, close to final, the commit messages, >> however, are far from being finished. > > I'll keep that in mind. I should have added a variable > that disables GIT_TEST_SPLIT_INDEX for this test script, > since the sparse-index is (currently) incompatible with > the split-index. I bet that the test is failing because > it isn't actually writing the sparse-directory entry due > to that short-circuit check. The next version will include GIT_TEST_SPLIT_INDEX=0 at the start and that will make it work with your branch. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH 18/20] cache-tree: integrate with sparse directory entries 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (16 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 18 ++++++++++++++++++ sparse-index.c | 10 +++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 5f07a39e501e..950a9615db8f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it, *skip_count = 0; + /* + * If the first entry of this region is a sparse directory + * entry corresponding exactly to 'base', then this cache_tree + * struct is a "leaf" in the data structure, pointing to the + * tree OID specified in the entry. + */ + if (entries > 0) { + const struct cache_entry *ce = cache[0]; + + if (S_ISSPARSEDIR(ce->ce_mode) && + ce->ce_namelen == baselen && + !strncmp(ce->name, base, baselen)) { + it->entry_count = 1; + oidcpy(&it->oid, &ce->oid); + return 1; + } + } + if (0 <= it->entry_count && has_object_file(&it->oid)) return it->entry_count; diff --git a/sparse-index.c b/sparse-index.c index a991c5331e9e..e541f251b37a 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -180,7 +180,11 @@ int convert_to_sparse(struct index_state *istate) istate->cache_nr = convert_to_sparse_rec(istate, 0, 0, istate->cache_nr, "", 0, istate->cache_tree); - istate->drop_cache_tree = 1; + + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + istate->sparse_index = 1; trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; @@ -278,5 +282,9 @@ void ensure_full_index(struct index_state *istate) free(full); + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 19/20] sparse-index: loose integration with cache_tree_verify() 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (17 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE is enabled, which it is by default in the test suite. The logic must be adjusted for the presence of these directory entries. For now, leave the test as a simple check for whether the directory entry is sparse. Do not go any further until needed. This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in t1092-sparse-checkout-compatibility.sh. Further, p2000-sparse-operations.sh uses the test suite and hence this is enabled for all tests. We need to integrate with it before we run our performance tests with a sparse-index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 19 +++++++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 1 - 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 950a9615db8f..11bf1fcae6e1 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -808,6 +808,19 @@ int cache_tree_matches_traversal(struct cache_tree *root, return 0; } +static void verify_one_sparse(struct repository *r, + struct index_state *istate, + struct cache_tree *it, + struct strbuf *path, + int pos) +{ + struct cache_entry *ce = istate->cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + BUG("directory '%s' is present in index, but not sparse", + path->buf); +} + static void verify_one(struct repository *r, struct index_state *istate, struct cache_tree *it, @@ -830,6 +843,12 @@ static void verify_one(struct repository *r, if (path->len) { pos = index_name_pos(istate, path->buf, path->len); + + if (pos >= 0) { + verify_one_sparse(r, istate, it, path, pos); + return; + } + pos = -pos - 1; } else { pos = 0; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 9c2bc4d25f66..c2624176c2e0 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,7 +2,6 @@ test_description='compare full workdir to sparse workdir' -GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 GIT_TEST_SPARSE_INDEX= -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH 20/20] p2000: add sparse-index repos 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (18 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 ` Derrick Stolee via GitGitGadget 2021-02-23 23:49 ` [PATCH 00/20] Sparse Index: Design, Format, Tests Elijah Newren 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-02-23 20:14 UTC (permalink / raw) To: git; +Cc: newren, gitster, pclouds, jrnieder, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> p2000-sparse-operations.sh compares different Git commands in repositories with many files at HEAD but using sparse-checkout to focus on a small portion of those files. Add extra copies of the repository that use the sparse-index format so we can track how that affects the performance of different commands. At this point in time, the sparse-index is 100% overhead from the CPU front, and this is measurable in these tests: Test --------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.59(0.51+0.12) 2000.3: git status (full-index-v4) 0.59(0.52+0.11) 2000.4: git status (sparse-index-v3) 1.40(1.32+0.12) 2000.5: git status (sparse-index-v4) 1.41(1.36+0.08) 2000.6: git add -A (full-index-v3) 2.32(1.97+0.19) 2000.7: git add -A (full-index-v4) 2.17(1.92+0.14) 2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15) 2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13) 2000.10: git add . (full-index-v3) 2.39(2.02+0.20) 2000.11: git add . (full-index-v4) 2.20(1.94+0.16) 2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12) 2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16) 2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20) 2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17) 2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16) 2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15) Note that there is very little difference between the v3 and v4 index formats when the sparse-index is enabled. This is primarily due to the fact that the relative file sizes are the same, and the command time is mostly taken up by parsing tree objects to expand the sparse index into a full one. With the current file layout, the index file sizes are given by this table: | full index | sparse index | +-------------+--------------+ v3 | 108 MiB | 1.6 MiB | v4 | 80 MiB | 1.2 MiB | Future updates will improve the performance of Git commands when the index is sparse. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 52597683376e..f9c7f3c6e27e 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -62,12 +62,29 @@ test_expect_success 'setup repo and indexes' ' git sparse-checkout set $SPARSE_CONE && git config index.version 4 && git update-index --index-version=4 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v3 && + ( + cd sparse-index-v3 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v4 && + ( + cd sparse-index-v4 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 ) ' test_perf_on_all () { command="$@" - for repo in full-index-v3 full-index-v4 + for repo in full-index-v3 full-index-v4 \ + sparse-index-v3 sparse-index-v4 do test_perf "$command ($repo)" " ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH 00/20] Sparse Index: Design, Format, Tests 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (19 preceding siblings ...) 2021-02-23 20:14 ` [PATCH 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget @ 2021-02-23 23:49 ` Elijah Newren 2021-02-26 21:28 ` Elijah Newren 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget 21 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-02-23 23:49 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. Wahoo! I'll be reading these over the next few days. > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ > > I won't waste too much space here, because PATCH 1 includes a sizeable > design document that describes the feature, the reasoning behind it, and my > plan for getting this implemented widely throughout the codebase. > > There are some new things here that were not in the RFC: > > * Design doc and format updates. (Patch 1) > * Performance test script. (Patches 2 and 20) > > Notably missing in this series from the RFC: > > * The mega-patch inserting ensure_full_index() throughout the codebase. > That will be a follow-up series to this one. > * The integrations with git status and git add to demonstrate the improved > performance. Those will also appear in their own series later. > > I plan to keep my latest work in this area in my 'sparse-index/wip' branch > [2]. It includes all of the work from the RFC right now, updated with the > work from this series. > > [2] https://github.com/derrickstolee/git/tree/sparse-index/wip > > Thanks, -Stolee > > Derrick Stolee (20): > sparse-index: design doc and format update > t/perf: add performance test for sparse operations > t1092: clean up script quoting > sparse-index: add guard to ensure full index > sparse-index: implement ensure_full_index() > t1092: compare sparse-checkout to sparse-index > test-read-cache: print cache entries with --table > test-tool: don't force full index > unpack-trees: ensure full index > sparse-checkout: hold pattern list in index > sparse-index: convert from full to sparse > submodule: sparse-index should not collapse links > unpack-trees: allow sparse directories > sparse-index: check index conversion happens > sparse-index: create extension for compatibility > sparse-checkout: toggle sparse index from builtin > sparse-checkout: disable sparse-index > cache-tree: integrate with sparse directory entries > sparse-index: loose integration with cache_tree_verify() > p2000: add sparse-index repos > > Documentation/config/extensions.txt | 7 + > Documentation/git-sparse-checkout.txt | 14 ++ > Documentation/technical/index-format.txt | 7 + > Documentation/technical/sparse-index.txt | 167 +++++++++++++ > Makefile | 1 + > builtin/sparse-checkout.c | 44 +++- > cache-tree.c | 40 ++++ > cache.h | 12 +- > read-cache.c | 35 ++- > repo-settings.c | 15 ++ > repository.c | 11 +- > repository.h | 3 + > setup.c | 3 + > sparse-index.c | 290 +++++++++++++++++++++++ > sparse-index.h | 11 + > t/README | 3 + > t/helper/test-read-cache.c | 61 ++++- > t/perf/p2000-sparse-operations.sh | 104 ++++++++ > t/t1091-sparse-checkout-builtin.sh | 13 + > t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++-- > unpack-trees.c | 16 +- > 21 files changed, 953 insertions(+), 40 deletions(-) > create mode 100644 Documentation/technical/sparse-index.txt > create mode 100644 sparse-index.c > create mode 100644 sparse-index.h > create mode 100755 t/perf/p2000-sparse-operations.sh > > > base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/883 > -- > gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH 00/20] Sparse Index: Design, Format, Tests 2021-02-23 23:49 ` [PATCH 00/20] Sparse Index: Design, Format, Tests Elijah Newren @ 2021-02-26 21:28 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-02-26 21:28 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Derrick Stolee On Tue, Feb 23, 2021 at 3:49 PM Elijah Newren <newren@gmail.com> wrote: > > On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > Here is the first full patch series submission coming out of the > > sparse-index RFC [1]. > > Wahoo! I'll be reading these over the next few days. I finally finished the last five patches today, and didn't spot anything on those to comment on. Overall, I find the series well constructed, motivated, and explained. I've left various comments on individual patches, but they're mostly all minor things that should be easy to cleanup and/or just comment on. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v2 00/20] Sparse Index: Design, Format, Tests 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget ` (20 preceding siblings ...) 2021-02-23 23:49 ` [PATCH 00/20] Sparse Index: Design, Format, Tests Elijah Newren @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget ` (21 more replies) 21 siblings, 22 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee Here is the first full patch series submission coming out of the sparse-index RFC [1]. [1] https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ I won't waste too much space here, because PATCH 1 includes a sizeable design document that describes the feature, the reasoning behind it, and my plan for getting this implemented widely throughout the codebase. There are some new things here that were not in the RFC: * Design doc and format updates. (Patch 1) * Performance test script. (Patches 2 and 20) Notably missing in this series from the RFC: * The mega-patch inserting ensure_full_index() throughout the codebase. That will be a follow-up series to this one. * The integrations with git status and git add to demonstrate the improved performance. Those will also appear in their own series later. I plan to keep my latest work in this area in my 'sparse-index/wip' branch [2]. It includes all of the work from the RFC right now, updated with the work from this series. [2] https://github.com/derrickstolee/git/tree/sparse-index/wip Updates in V2 ============= * Various typos and awkward grammar is fixed. * Cleaned up unnecessary commands in p2000-sparse-operations.sh * Added a comment to the sparse_index member of struct index_state. * Used tree_type, commit_type, and blob_type in test-read-cache.c. Thanks, -Stolee Derrick Stolee (20): sparse-index: design doc and format update t/perf: add performance test for sparse operations t1092: clean up script quoting sparse-index: add guard to ensure full index sparse-index: implement ensure_full_index() t1092: compare sparse-checkout to sparse-index test-read-cache: print cache entries with --table test-tool: don't force full index unpack-trees: ensure full index sparse-checkout: hold pattern list in index sparse-index: convert from full to sparse submodule: sparse-index should not collapse links unpack-trees: allow sparse directories sparse-index: check index conversion happens sparse-index: create extension for compatibility sparse-checkout: toggle sparse index from builtin sparse-checkout: disable sparse-index cache-tree: integrate with sparse directory entries sparse-index: loose integration with cache_tree_verify() p2000: add sparse-index repos Documentation/config/extensions.txt | 8 + Documentation/git-sparse-checkout.txt | 14 ++ Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 173 ++++++++++++++ Makefile | 1 + builtin/sparse-checkout.c | 44 +++- cache-tree.c | 40 ++++ cache.h | 18 +- read-cache.c | 35 ++- repo-settings.c | 15 ++ repository.c | 11 +- repository.h | 3 + setup.c | 3 + sparse-index.c | 290 +++++++++++++++++++++++ sparse-index.h | 11 + t/README | 3 + t/helper/test-read-cache.c | 66 +++++- t/perf/p2000-sparse-operations.sh | 102 ++++++++ t/t1091-sparse-checkout-builtin.sh | 13 + t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++-- unpack-trees.c | 16 +- 21 files changed, 969 insertions(+), 40 deletions(-) create mode 100644 Documentation/technical/sparse-index.txt create mode 100644 sparse-index.c create mode 100644 sparse-index.h create mode 100755 t/perf/p2000-sparse-operations.sh base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/883 Range-diff vs v1: 1: daa9a6bcefbc ! 1: 2fe413fdac80 sparse-index: design doc and format update @@ Documentation/technical/sparse-index.txt (new) +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + -+This addition of sparse-directory entries violates expectations about the ++At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In addition, they expect to see all files at `HEAD`. One @@ Documentation/technical/sparse-index.txt (new) +* `git merge` +* `git rebase` + ++Hopefully, commands such as `git merge` and `git rebase` can benefit ++instead from merge algorithms that do not use the index as a data ++structure, such as the merge-ORT strategy. As these topics mature, we ++may enalbe the ORT strategy by default for repositories using the ++sparse-index feature. ++ +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: 2: a8c6322a3dbe ! 2: 540ab5495065 t/perf: add performance test for sparse operations @@ t/perf/p2000-sparse-operations.sh (new) + # Remove submodules from the example repo, because our + # duplication of the entire repo creates an unlikly data shape. + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && -+ rm -f .gitmodules && -+ git add .gitmodules && ++ git rm -f .gitmodules && + for module in $(awk "{print \$2}" modules) + do + git rm $module || return 1 + done && -+ git add . && + git commit -m "remove submodules" && + + echo bogus >a && 3: 6e783c88821e = 3: 5cbedb377b37 t1092: clean up script quoting 4: 01da4c48a1fa = 4: 6e21f776e883 sparse-index: add guard to ensure full index 5: 2b83989fbcd3 ! 5: 399ddb0bad56 sparse-index: implement ensure_full_index() @@ cache.h: struct index_state { updated_skipworktree : 1, - fsmonitor_has_run_once : 1; + fsmonitor_has_run_once : 1, ++ ++ /* ++ * sparse_index == 1 when sparse-directory ++ * entries exist. Requires sparse-checkout ++ * in cone mode. ++ */ + sparse_index : 1; struct hashmap name_hash; struct hashmap dir_hash; 6: c9910a37579c = 6: eac2db5efc22 t1092: compare sparse-checkout to sparse-index 7: 3d92df7a0cf9 ! 7: e9c82d2eda82 test-read-cache: print cache entries with --table @@ Commit message ## t/helper/test-read-cache.c ## @@ + #include "test-tool.h" #include "cache.h" #include "config.h" - ++#include "blob.h" ++#include "commit.h" ++#include "tree.h" ++ +static void print_cache_entry(struct cache_entry *ce) +{ -+ printf("%06o ", ce->ce_mode & 0777777); ++ const char *type; ++ printf("%06o ", ce->ce_mode & 0177777); + + if (S_ISSPARSEDIR(ce->ce_mode)) -+ printf("tree "); ++ type = tree_type; + else if (S_ISGITLINK(ce->ce_mode)) -+ printf("commit "); ++ type = commit_type; + else -+ printf("blob "); ++ type = blob_type; + -+ printf("%s\t%s\n", ++ printf("%s %s\t%s\n", ++ type, + oid_to_hex(&ce->oid), + ce->name); +} + -+static void print_cache(struct index_state *cache) ++static void print_cache(struct index_state *istate) +{ + int i; -+ for (i = 0; i < the_index.cache_nr; i++) -+ print_cache_entry(the_index.cache[i]); ++ for (i = 0; i < istate->cache_nr; i++) ++ print_cache_entry(istate->cache[i]); +} -+ + int cmd__read_cache(int argc, const char **argv) { + struct repository *r = the_repository; 8: 94373e2bfbbc ! 8: 243541fc5820 test-tool: don't force full index @@ Commit message ## t/helper/test-read-cache.c ## @@ - #include "test-tool.h" - #include "cache.h" - #include "config.h" + #include "blob.h" + #include "commit.h" + #include "tree.h" +#include "sparse-index.h" static void print_cache_entry(struct cache_entry *ce) 9: e71f033c2871 = 9: 48f65093b3da unpack-trees: ensure full index 10: f86d3dc154d1 ! 10: 83aac8b7a1ec sparse-checkout: hold pattern list in index @@ Commit message pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout - definition to filter out directories that are outsie the sparse cone. + definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> 11: a2d77c23a0cb ! 11: f6db0c27a285 sparse-index: convert from full to sparse @@ read-cache.c: int verify_path(const char *path, unsigned mode) return 0; + /* + * allow terminating directory separators for -+ * sparse directory enries. ++ * sparse directory entries. + */ + if (c == '\0') + return S_ISDIR(mode); @@ sparse-index.c + struct cache_entry *ce = istate->cache[i]; + + /* -+ * Detect if this is a normal entry oustide of any subtree ++ * Detect if this is a normal entry outside of any subtree + * entry. + */ + base = ce->name + ct_pathlen; 12: 4405a9115c3b = 12: f2a3e7298798 submodule: sparse-index should not collapse links 13: fda23f07e6a2 ! 13: 6f1ebe6ccc08 unpack-trees: allow sparse directories @@ Commit message is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. - The negation of the 'pos' variable must be conditioned to only when it - starts as negative. This is identical behavior as before when the index - is full. + The 'pos' variable is assigned a negative value if an exact match is not + found. Since a directory name can be an exact match, it is no longer an + error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> 14: 7d4627574bb8 = 14: 3fa684b315fb sparse-index: check index conversion happens 15: 564503f78784 ! 15: d74576d677f6 sparse-index: create extension for compatibility @@ Commit message We _could_ add a new index version that explicitly adds these capabilities, but there are nuances to index formats 2, 3, and 4 that - are still valuable to select as options. For now, create a repo - extension, "extensions.sparseIndex", that specifies that the tool - reading this repository must understand sparse directory entries. + are still valuable to select as options. Until we add index format + version 5, create a repo extension, "extensions.sparseIndex", that + specifies that the tool reading this repository must understand sparse + directory entries. This change only encodes the extension and enables it when GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI @@ Documentation/config/extensions.txt: extensions.objectFormat:: + When combined with `core.sparseCheckout=true` and + `core.sparseCheckoutCone=true`, the index may contain entries + corresponding to directories outside of the sparse-checkout -+ definition. Versions of Git that do not understand this extension -+ do not expect directory entries in the index. ++ definition in lieu of containing each path under such directories. ++ Versions of Git that do not understand this extension do not ++ expect directory entries in the index. ## cache.h ## @@ cache.h: struct repository_format { 16: 6d6b230e3318 ! 16: e530ca5f668d sparse-checkout: toggle sparse index from builtin @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way -+that is not completely understood by other tools. Enabling sparse index -+enables the `extensions.spareseIndex` config value, which might cause -+other tools to stop working with your repository. If you have trouble with -+this compatibility, then run `git sparse-checkout sparse-index disable` to -+remove this config and rewrite your index to not be sparse. ++that is not completely understood by external tools. If you have trouble ++with this compatibility, then run `git sparse-checkout sparse-index disable` ++to rewrite your index to not be sparse. Older versions of Git will not ++understand the `sparseIndex` repository extension and may fail to interact ++with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as 17: bcf960ef2362 = 17: 42d0da9c5def sparse-checkout: disable sparse-index 18: e6afec58674e = 18: 6bb0976a6295 cache-tree: integrate with sparse directory entries 19: 2be4981fe698 = 19: 07f34e80609a sparse-index: loose integration with cache_tree_verify() 20: a738b0ba8ab4 = 20: 41e3b56b9c17 p2000: add sparse-index repos -- gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v2 01/20] sparse-index: design doc and format update 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 22:19 ` Elijah Newren 2021-03-10 19:30 ` [PATCH v2 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget ` (20 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This begins a long effort to update the index format to allow sparse directory entries. This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in their sparse-checkout definition. Currently, the index format is only updated in the presence of extensions.sparseIndex instead of increasing a file format version number. This is temporary, and index v5 is part of the plan for future work in this area. The design document details many of the reasons for embarking on this work, and also the plan for completing it safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 173 +++++++++++++++++++++++ 2 files changed, 180 insertions(+) create mode 100644 Documentation/technical/sparse-index.txt diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index b633482b1bdf..387126582556 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 000000000000..787a2a0b3b81 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,173 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git worktree are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. +These dimensions are also ordered by how expensive they are per item: it +is expensive to detect a modified file than it is to write one that we +know must be populated; changing `HEAD` only really requires updating the +index. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In addition, they expect to see all files at `HEAD`. One +way to handle this is to parse trees to replace a sparse-directory entry +with all of the files within that tree as the index is loaded. However, +parsing trees is slower than parsing the index format, so that is a slower +operation than if we left the index alone. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will expand the sparse-directory entries into +the full list of paths at `HEAD`. This will be slower in all cases. The +only noticable change in behavior will be that the serialized index file +contains sparse-directory entries. + +To start, we use a new repository extension, `extensions.sparseIndex`, to +allow inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, but it also prevents other +operations that do not use the index at all. A new format, index v5, will +be introduced that includes sparse-directory entries by default. It might +also introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. After these guards are in place, we can begin +leaving sparse-directory entries in the in-memory index structure. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we +may enalbe the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v2 01/20] sparse-index: design doc and format update 2021-03-10 19:30 ` [PATCH v2 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-10 22:19 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-10 22:19 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > This begins a long effort to update the index format to allow sparse > directory entries. This should result in a significant improvement to > Git commands when HEAD contains millions of files, but the user has > selected many fewer files to keep in their sparse-checkout definition. > > Currently, the index format is only updated in the presence of > extensions.sparseIndex instead of increasing a file format version > number. This is temporary, and index v5 is part of the plan for future > work in this area. > > The design document details many of the reasons for embarking on this > work, and also the plan for completing it safely. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > Documentation/technical/index-format.txt | 7 + > Documentation/technical/sparse-index.txt | 173 +++++++++++++++++++++++ > 2 files changed, 180 insertions(+) > create mode 100644 Documentation/technical/sparse-index.txt > > diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt > index b633482b1bdf..387126582556 100644 > --- a/Documentation/technical/index-format.txt > +++ b/Documentation/technical/index-format.txt > @@ -44,6 +44,13 @@ Git index format > localization, no special casing of directory separator '/'). Entries > with the same name are sorted by their stage field. > > + An index entry typically represents a file. However, if sparse-checkout > + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the > + `extensions.sparseIndex` extension is enabled, then the index may > + contain entries for directories outside of the sparse-checkout definition. > + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and > + the path ends in a directory separator. > + > 32-bit ctime seconds, the last time a file's metadata changed > this is stat(2) data > > diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt > new file mode 100644 > index 000000000000..787a2a0b3b81 > --- /dev/null > +++ b/Documentation/technical/sparse-index.txt > @@ -0,0 +1,173 @@ > +Git Sparse-Index Design Document > +================================ > + > +The sparse-checkout feature allows users to focus a working directory on > +a subset of the files at HEAD. The cone mode patterns, enabled by > +`core.sparseCheckoutCone`, allow for very fast pattern matching to > +discover which files at HEAD belong in the sparse-checkout cone. > + > +Three important scale dimensions for a Git worktree are: > + > +* `HEAD`: How many files are present at `HEAD`? > + > +* Populated: How many files are within the sparse-checkout cone. > + > +* Modified: How many files has the user modified in the working directory? > + > +We will use big-O notation -- O(X) -- to denote how expensive certain > +operations are in terms of these dimensions. > + > +These dimensions are ordered by their magnitude: users (typically) modify > +fewer files than are populated, and we can only populate files at `HEAD`. > +These dimensions are also ordered by how expensive they are per item: it > +is expensive to detect a modified file than it is to write one that we > +know must be populated; changing `HEAD` only really requires updating the > +index. > + > +Problems occur if there is an extreme imbalance in these dimensions. For > +example, if `HEAD` contains millions of paths but the populated set has > +only tens of thousands, then commands like `git status` and `git add` can > +be dominated by operations that require O(`HEAD`) operations instead of > +O(Populated). Primarily, the cost is in parsing and rewriting the index, > +which is filled primarily with files at `HEAD` that are marked with the > +`SKIP_WORKTREE` bit. > + > +The sparse-index intends to take these commands that read and modify the > +index from O(`HEAD`) to O(Populated). To do this, we need to modify the > +index format in a significant way: add "sparse directory" entries. > + > +With cone mode patterns, it is possible to detect when an entire > +directory will have its contents outside of the sparse-checkout definition. > +Instead of listing all of the files it contains as individual entries, a > +sparse-index contains an entry with the directory name, referencing the > +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. > +If we need to discover the details for paths within that directory, we > +can parse trees to find that list. > + > +At time of writing, sparse-directory entries violate expectations about the > +index format and its in-memory data structure. There are many consumers in > +the codebase that expect to iterate through all of the index entries and > +see only files. In addition, they expect to see all files at `HEAD`. One > +way to handle this is to parse trees to replace a sparse-directory entry > +with all of the files within that tree as the index is loaded. However, > +parsing trees is slower than parsing the index format, so that is a slower > +operation than if we left the index alone. > + > +The implementation plan below follows four phases to slowly integrate with > +the sparse-index. The intention is to incrementally update Git commands to > +interact safely with the sparse-index without significant slowdowns. This > +may not always be possible, but the hope is that the primary commands that > +users need in their daily work are dramatically improved. > + > +Phase I: Format and initial speedups > +------------------------------------ > + > +During this phase, Git learns to enable the sparse-index and safely parse > +one. Protections are put in place so that every consumer of the in-memory > +data structure can operate with its current assumption of every file at > +`HEAD`. > + > +At first, every index parse will expand the sparse-directory entries into > +the full list of paths at `HEAD`. This will be slower in all cases. The > +only noticable change in behavior will be that the serialized index file > +contains sparse-directory entries. > + > +To start, we use a new repository extension, `extensions.sparseIndex`, to > +allow inserting sparse-directory entries into indexes with file format > +versions 2, 3, and 4. This prevents Git versions that do not understand > +the sparse-index from operating on one, but it also prevents other > +operations that do not use the index at all. A new format, index v5, will > +be introduced that includes sparse-directory entries by default. It might > +also introduce other features that have been considered for improving the > +index, as well. > + > +Next, consumers of the index will be guarded against operating on a > +sparse-index by inserting calls to `ensure_full_index()` or > +`expand_index_to_path()`. After these guards are in place, we can begin > +leaving sparse-directory entries in the in-memory index structure. > + > +Even after inserting these guards, we will keep expanding sparse-indexes > +for most Git commands using the `command_requires_full_index` repository > +setting. This setting will be on by default and disabled one builtin at a > +time until we have sufficient confidence that all of the index operations > +are properly guarded. > + > +To complete this phase, the commands `git status` and `git add` will be > +integrated with the sparse-index so that they operate with O(Populated) > +performance. They will be carefully tested for operations within and > +outside the sparse-checkout definition. > + > +Phase II: Careful integrations > +------------------------------ > + > +This phase focuses on ensuring that all index extensions and APIs work > +well with a sparse-index. This requires significant increases to our test > +coverage, especially for operations that interact with the working > +directory outside of the sparse-checkout definition. Some of these > +behaviors may not be the desirable ones, such as some tests already > +marked for failure in `t1092-sparse-checkout-compatibility.sh`. > + > +The index extensions that may require special integrations are: > + > +* FS Monitor > +* Untracked cache > + > +While integrating with these features, we should look for patterns that > +might lead to better APIs for interacting with the index. Coalescing > +common usage patterns into an API call can reduce the number of places > +where sparse-directories need to be handled carefully. > + > +Phase III: Important command speedups > +------------------------------------- > + > +At this point, the patterns for testing and implementing sparse-directory > +logic should be relatively stable. This phase focuses on updating some of > +the most common builtins that use the index to operate as O(Populated). > +Here is a potential list of commands that could be valuable to integrate > +at this point: > + > +* `git commit` > +* `git checkout` > +* `git merge` > +* `git rebase` > + > +Hopefully, commands such as `git merge` and `git rebase` can benefit > +instead from merge algorithms that do not use the index as a data > +structure, such as the merge-ORT strategy. As these topics mature, we > +may enalbe the ORT strategy by default for repositories using the s/enalbe/enable/ > +sparse-index feature. > + > +Along with `git status` and `git add`, these commands cover the majority > +of users' interactions with the working directory. In addition, we can > +integrate with these commands: > + > +* `git grep` > +* `git rm` > + > +These have been proposed as some whose behavior could change when in a > +repo with a sparse-checkout definition. It would be good to include this > +behavior automatically when using a sparse-index. Some clarity is needed > +to make the behavior switch clear to the user. > + > +This phase is the first where parallel work might be possible without too > +much conflicts between topics. > + > +Phase IV: The long tail > +----------------------- > + > +This last phase is less a "phase" and more "the new normal" after all of > +the previous work. > + > +To start, the `command_requires_full_index` option could be removed in > +favor of expanding only when hitting an API guard. > + > +There are many Git commands that could use special attention to operate as > +O(Populated), while some might be so rare that it is acceptable to leave > +them with additional overhead when a sparse-index is present. > + > +Here are some commands that might be useful to update: > + > +* `git sparse-checkout set` > +* `git am` > +* `git clean` > +* `git stash` > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v2 02/20] t/perf: add performance test for sparse operations 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget ` (19 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Create a test script that takes the default performance test (the Git codebase) and multiplies it by 256 using four layers of duplicated trees of width four. This results in nearly one million blob entries in the index. Then, we can clone this repository with sparse-checkout patterns that demonstrate four copies of the initial repository. Each clone will use a different index format or mode so peformance can be tested across the different options. Note that the initial repo is stripped of submodules before doing the copies. This preserves the expected data shape of the sparse index, because directories containing submodules are not collapsed to a sparse directory entry. Run a few Git commands on these clones, especially those that use the index (status, add, commit). Here are the results on my Linux machine: Test -------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.37(0.30+0.09) 2000.3: git status (full-index-v4) 0.39(0.32+0.10) 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) It is perhaps noteworthy that there is an improvement when using index version 4. This is because the v3 index uses 108 MiB while the v4 index uses 80 MiB. Since the repeated portions of the directories are very short (f3/f1/f2, for example) this ratio is less pronounced than in similarly-sized real repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 85 +++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100755 t/perf/p2000-sparse-operations.sh diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh new file mode 100755 index 000000000000..2fbc81b22119 --- /dev/null +++ b/t/perf/p2000-sparse-operations.sh @@ -0,0 +1,85 @@ +#!/bin/sh + +test_description="test performance of Git operations using the index" + +. ./perf-lib.sh + +test_perf_default_repo + +SPARSE_CONE=f2/f4/f1 + +test_expect_success 'setup repo and indexes' ' + git reset --hard HEAD && + # Remove submodules from the example repo, because our + # duplication of the entire repo creates an unlikly data shape. + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && + git rm -f .gitmodules && + for module in $(awk "{print \$2}" modules) + do + git rm $module || return 1 + done && + git commit -m "remove submodules" && + + echo bogus >a && + cp a b && + git add a b && + git commit -m "level 0" && + BLOB=$(git rev-parse HEAD:a) && + OLD_COMMIT=$(git rev-parse HEAD) && + OLD_TREE=$(git rev-parse HEAD^{tree}) && + + for i in $(test_seq 1 4) + do + cat >in <<-EOF && + 100755 blob $BLOB a + 040000 tree $OLD_TREE f1 + 040000 tree $OLD_TREE f2 + 040000 tree $OLD_TREE f3 + 040000 tree $OLD_TREE f4 + EOF + NEW_TREE=$(git mktree <in) && + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && + OLD_TREE=$NEW_TREE && + OLD_COMMIT=$NEW_COMMIT || return 1 + done && + + git sparse-checkout init --cone && + git branch -f wide $OLD_COMMIT && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && + ( + cd full-index-v3 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && + ( + cd full-index-v4 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 + ) +' + +test_perf_on_all () { + command="$@" + for repo in full-index-v3 full-index-v4 + do + test_perf "$command ($repo)" " + ( + cd $repo && + echo >>$SPARSE_CONE/a && + $command + ) + " + done +} + +test_perf_on_all git status +test_perf_on_all git add -A +test_perf_on_all git add . +test_perf_on_all git commit -a -m A + +test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 03/20] t1092: clean up script quoting 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget ` (18 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This test was introduced in 19a0acc83e4 (t1092: test interesting sparse-checkout scenarios, 2021-01-23), but these issues with quoting were not noticed until starting this follow-up series. The old mechanism would drop quoting such as in test_all_match git commit -m "touch README.md" The above happened to work because README.md is a file in the repository, so 'git commit -m touch REAMDE.md' would succeed by accident. Other cases included quoting for no good reason, so clean that up now. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 8cd3e5a8d227..3725d3997e70 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -96,20 +96,20 @@ init_repos () { run_on_sparse () { ( cd sparse-checkout && - $* >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) } run_on_all () { ( cd full-checkout && - $* >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && - run_on_sparse $* + run_on_sparse "$@" } test_all_match () { - run_on_all $* && + run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && test_cmp full-checkout-err sparse-checkout-err } @@ -119,7 +119,7 @@ test_expect_success 'status with options' ' test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && - run_on_all "touch README.md" && + run_on_all touch README.md && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -135,7 +135,7 @@ test_expect_success 'add, commit, checkout' ' write_script edit-contents <<-\EOF && echo text >>$1 EOF - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add README.md && test_all_match git status --porcelain=v2 && @@ -144,7 +144,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add -A && test_all_match git status --porcelain=v2 && @@ -153,7 +153,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents deep/newfile" && + run_on_all ../edit-contents deep/newfile && test_all_match git status --porcelain=v2 -uno && test_all_match git status --porcelain=v2 && @@ -186,7 +186,7 @@ test_expect_success 'diff --staged' ' write_script edit-contents <<-\EOF && echo text >>README.md EOF - run_on_all "../edit-contents" && + run_on_all ../edit-contents && test_all_match git diff && test_all_match git diff --staged && @@ -280,7 +280,7 @@ test_expect_success 'clean' ' echo bogus >>.gitignore && run_on_all cp ../.gitignore . && test_all_match git add .gitignore && - test_all_match git commit -m ignore-bogus-files && + test_all_match git commit -m "ignore bogus files" && run_on_sparse mkdir folder1 && run_on_all touch folder1/bogus && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 04/20] sparse-index: add guard to ensure full index 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget ` (17 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Makefile | 1 + repo-settings.c | 8 ++++++++ repository.c | 11 ++++++++++- repository.h | 2 ++ sparse-index.c | 8 ++++++++ sparse-index.h | 7 +++++++ 6 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 sparse-index.c create mode 100644 sparse-index.h diff --git a/Makefile b/Makefile index 5a239cac20e3..3bf61699238d 100644 --- a/Makefile +++ b/Makefile @@ -980,6 +980,7 @@ LIB_OBJS += setup.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-index.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/repo-settings.c b/repo-settings.c index f7fff0f5ab83..d63569e4041e 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r) UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP); UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT); + + /* + * This setting guards all index reads to require a full index + * over a sparse index. After suitable guards are placed in the + * codebase around uses of the index, this setting will be + * removed. + */ + r->settings.command_requires_full_index = 1; } diff --git a/repository.c b/repository.c index c98298acd017..a8acae002f71 100644 --- a/repository.c +++ b/repository.c @@ -10,6 +10,7 @@ #include "object.h" #include "lockfile.h" #include "submodule-config.h" +#include "sparse-index.h" /* The main repository */ static struct repository the_repo; @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo) int repo_read_index(struct repository *repo) { + int res; + if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo) else if (repo->index->repo != repo) BUG("repo's index should point back at itself"); - return read_index_from(repo->index, repo->index_file, repo->gitdir); + res = read_index_from(repo->index, repo->index_file, repo->gitdir); + + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) + ensure_full_index(repo->index); + + return res; } int repo_hold_locked_index(struct repository *repo, diff --git a/repository.h b/repository.h index b385ca3c94b6..e06a23015697 100644 --- a/repository.h +++ b/repository.h @@ -41,6 +41,8 @@ struct repo_settings { enum fetch_negotiation_setting fetch_negotiation_algorithm; int core_multi_pack_index; + + unsigned command_requires_full_index:1; }; struct repository { diff --git a/sparse-index.c b/sparse-index.c new file mode 100644 index 000000000000..82183ead563b --- /dev/null +++ b/sparse-index.c @@ -0,0 +1,8 @@ +#include "cache.h" +#include "repository.h" +#include "sparse-index.h" + +void ensure_full_index(struct index_state *istate) +{ + /* intentionally left blank */ +} diff --git a/sparse-index.h b/sparse-index.h new file mode 100644 index 000000000000..09a20d036c46 --- /dev/null +++ b/sparse-index.h @@ -0,0 +1,7 @@ +#ifndef SPARSE_INDEX_H__ +#define SPARSE_INDEX_H__ + +struct index_state; +void ensure_full_index(struct index_state *istate); + +#endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 05/20] sparse-index: implement ensure_full_index() 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-12 6:50 ` Junio C Hamano 2021-03-10 19:30 ` [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget ` (16 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache.h | 13 ++++++- read-cache.c | 9 +++++ sparse-index.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 115 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index d92814961405..1f0b42264606 100644 --- a/cache.h +++ b/cache.h @@ -204,6 +204,8 @@ struct cache_entry { #error "CE_EXTENDED_FLAGS out of range" #endif +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR) + /* Forward structure decls */ struct pathspec; struct child_process; @@ -319,7 +321,14 @@ struct index_state { drop_cache_tree : 1, updated_workdir : 1, updated_skipworktree : 1, - fsmonitor_has_run_once : 1; + fsmonitor_has_run_once : 1, + + /* + * sparse_index == 1 when sparse-directory + * entries exist. Requires sparse-checkout + * in cone mode. + */ + sparse_index : 1; struct hashmap name_hash; struct hashmap dir_hash; struct object_id oid; @@ -722,6 +731,8 @@ int read_index_from(struct index_state *, const char *path, const char *gitdir); int is_index_unborn(struct index_state *); +void ensure_full_index(struct index_state *istate); + /* For use with `write_locked_index()`. */ #define COMMIT_LOCK (1 << 0) #define SKIP_IF_UNCHANGED (1 << 1) diff --git a/read-cache.c b/read-cache.c index 29144cf879e7..97dbf2434f30 100644 --- a/read-cache.c +++ b/read-cache.c @@ -101,6 +101,9 @@ static const char *alternate_index_output; static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { + if (S_ISSPARSEDIR(ce->ce_mode)) + istate->sparse_index = 1; + istate->cache[nr] = ce; add_name_hash(istate, ce); } @@ -2255,6 +2258,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr); + if (!istate->repo) + istate->repo = the_repository; + prepare_repo_settings(istate->repo); + if (istate->repo->settings.command_requires_full_index) + ensure_full_index(istate); + return istate->cache_nr; unmap: diff --git a/sparse-index.c b/sparse-index.c index 82183ead563b..316cb949b74b 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -1,8 +1,101 @@ #include "cache.h" #include "repository.h" #include "sparse-index.h" +#include "tree.h" +#include "pathspec.h" +#include "trace2.h" + +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) +{ + ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc); + + istate->cache[nr] = ce; + add_name_hash(istate, ce); +} + +static int add_path_to_index(const struct object_id *oid, + struct strbuf *base, const char *path, + unsigned int mode, int stage, void *context) +{ + struct index_state *istate = (struct index_state *)context; + struct cache_entry *ce; + size_t len = base->len; + + if (S_ISDIR(mode)) + return READ_TREE_RECURSIVE; + + strbuf_addstr(base, path); + + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); + ce->ce_flags |= CE_SKIP_WORKTREE; + set_index_entry(istate, istate->cache_nr++, ce); + + strbuf_setlen(base, len); + return 0; +} void ensure_full_index(struct index_state *istate) { - /* intentionally left blank */ + int i; + struct index_state *full; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + trace2_region_enter("index", "ensure_full_index", istate->repo); + + /* initialize basics of new index */ + full = xcalloc(1, sizeof(struct index_state)); + memcpy(full, istate, sizeof(struct index_state)); + + /* then change the necessary things */ + full->sparse_index = 0; + full->cache_alloc = (3 * istate->cache_alloc) / 2; + full->cache_nr = 0; + ALLOC_ARRAY(full->cache, full->cache_alloc); + + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + struct tree *tree; + struct pathspec ps; + + if (!S_ISSPARSEDIR(ce->ce_mode)) { + set_index_entry(full, full->cache_nr++, ce); + continue; + } + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) + warning(_("index entry is a directory, but not sparse (%08x)"), + ce->ce_flags); + + /* recursively walk into cd->name */ + tree = lookup_tree(istate->repo, &ce->oid); + + memset(&ps, 0, sizeof(ps)); + ps.recursive = 1; + ps.has_wildcard = 1; + ps.max_depth = -1; + + read_tree_recursive(istate->repo, tree, + ce->name, strlen(ce->name), + 0, &ps, + add_path_to_index, full); + + /* free directory entries. full entries are re-used */ + discard_cache_entry(ce); + } + + /* Copy back into original index. */ + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); + istate->sparse_index = 0; + free(istate->cache); + istate->cache = full->cache; + istate->cache_nr = full->cache_nr; + istate->cache_alloc = full->cache_alloc; + + free(full); + + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v2 05/20] sparse-index: implement ensure_full_index() 2021-03-10 19:30 ` [PATCH v2 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-03-12 6:50 ` Junio C Hamano 2021-03-12 13:56 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-12 6:50 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee, Ævar Arnfjörð Bjarmason "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > void ensure_full_index(struct index_state *istate) > { > ... > + int i; > + tree = lookup_tree(istate->repo, &ce->oid); > + > + memset(&ps, 0, sizeof(ps)); > + ps.recursive = 1; > + ps.has_wildcard = 1; > + ps.max_depth = -1; > + > + read_tree_recursive(istate->repo, tree, > + ce->name, strlen(ce->name), > + 0, &ps, > + add_path_to_index, full); Ævar, the assumption that led to your e68237bb (tree.h API: remove support for starting at prefix != "", 2021-03-08) closes the door for this code rather badly. Please work with Derrick to figure out what the best course of action would be. Thanks. > + /* free directory entries. full entries are re-used */ > + discard_cache_entry(ce); > + } > + > + /* Copy back into original index. */ > + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); > + istate->sparse_index = 0; > + free(istate->cache); > + istate->cache = full->cache; > + istate->cache_nr = full->cache_nr; > + istate->cache_alloc = full->cache_alloc; > + > + free(full); > + > + trace2_region_leave("index", "ensure_full_index", istate->repo); > } ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v2 05/20] sparse-index: implement ensure_full_index() 2021-03-12 6:50 ` Junio C Hamano @ 2021-03-12 13:56 ` Derrick Stolee 2021-03-12 20:08 ` Junio C Hamano 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-12 13:56 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee, Ævar Arnfjörð Bjarmason On 3/12/2021 1:50 AM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> void ensure_full_index(struct index_state *istate) >> { >> ... >> + int i; >> + tree = lookup_tree(istate->repo, &ce->oid); >> + >> + memset(&ps, 0, sizeof(ps)); >> + ps.recursive = 1; >> + ps.has_wildcard = 1; >> + ps.max_depth = -1; >> + >> + read_tree_recursive(istate->repo, tree, >> + ce->name, strlen(ce->name), >> + 0, &ps, >> + add_path_to_index, full); > > Ævar, the assumption that led to your e68237bb (tree.h API: remove > support for starting at prefix != "", 2021-03-08) closes the door > for this code rather badly. Please work with Derrick to figure out > what the best course of action would be. Thanks for pointing this out, Junio. My preference would be to drop "tree.h API: remove support for starting at prefix != """, but it should be OK to keep "tree.h API: remove "stage" parameter from read_tree_recursive()" (currently b3a078863f6), even though it introduces a semantic conflict here. Since I haven't seen my sparse-index topic get picked up by a tracking branch, I'd be happy to rebase on top of Ævar's topic if I can still set a non-root prefix. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v2 05/20] sparse-index: implement ensure_full_index() 2021-03-12 13:56 ` Derrick Stolee @ 2021-03-12 20:08 ` Junio C Hamano 2021-03-12 20:11 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-12 20:08 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee, Ævar Arnfjörð Bjarmason Derrick Stolee <stolee@gmail.com> writes: >> Ævar, the assumption that led to your e68237bb (tree.h API: remove >> support for starting at prefix != "", 2021-03-08) closes the door >> for this code rather badly. Please work with Derrick to figure out >> what the best course of action would be. > > Thanks for pointing this out, Junio. > > My preference would be to drop "tree.h API: remove support for > starting at prefix != """, but it should be OK to keep "tree.h API: > remove "stage" parameter from read_tree_recursive()" (currently > b3a078863f6), even though it introduces a semantic conflict here. > > Since I haven't seen my sparse-index topic get picked up by a > tracking branch, I'd be happy to rebase on top of Ævar's topic if > I can still set a non-root prefix. I did try to have both in 'seen' (after all, that is the primary way I find out these conflicts early---no one can keep all the details of all the topics in flight in one's head), and saw that we now have a need for non-empty prefix that we thought we no longer have in the other topic --- I think we should probably keep support of non-empty prefix (as the primary reason why that patch exists is because we saw no in-tree users---now if your 05/20 proves to be a good use of the feature, there is one fewer reasons to remove the support) in some form, so discarding e68237bb certainly is an option. If we were to base the sparse-index topic on top of ab/read-tree, we may be able to gain further simplification and clean-up of the API. I think all the clean-up value e68237bb has are on the calling side (they no longer have to pass constant ("", 0) to the function), and we could rewrite e68237bb by - renaming "read_tree_recursive()" to "read_tree_at()", with the non-empty prefix support. - creating a new function "read_tree()", which lacks the support for prefix, as a thin-wrapper around "read_tree_at()". - modifying the callers of "read_tree_recursive()" changed by e68237bb to instead call "read_tree()" (without prefix). to simplify majority of calling sites without losing functionality. Then your [05/20] can use the read_tree_at() to read with a prefix. But that kind of details, I'd want to see you two figure out yourselves. Thanks. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v2 05/20] sparse-index: implement ensure_full_index() 2021-03-12 20:08 ` Junio C Hamano @ 2021-03-12 20:11 ` Derrick Stolee 2021-03-15 23:52 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-12 20:11 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee, Ævar Arnfjörð Bjarmason On 3/12/2021 3:08 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>> Ævar, the assumption that led to your e68237bb (tree.h API: remove >>> support for starting at prefix != "", 2021-03-08) closes the door >>> for this code rather badly. Please work with Derrick to figure out >>> what the best course of action would be. >> >> Thanks for pointing this out, Junio. >> >> My preference would be to drop "tree.h API: remove support for >> starting at prefix != """, but it should be OK to keep "tree.h API: >> remove "stage" parameter from read_tree_recursive()" (currently >> b3a078863f6), even though it introduces a semantic conflict here. >> >> Since I haven't seen my sparse-index topic get picked up by a >> tracking branch, I'd be happy to rebase on top of Ævar's topic if >> I can still set a non-root prefix. > I think all the clean-up value e68237bb has are on the calling side > (they no longer have to pass constant ("", 0) to the function), and > we could rewrite e68237bb by > > - renaming "read_tree_recursive()" to "read_tree_at()", with the > non-empty prefix support. > > - creating a new function "read_tree()", which lacks the support > for prefix, as a thin-wrapper around "read_tree_at()". > > - modifying the callers of "read_tree_recursive()" changed by > e68237bb to instead call "read_tree()" (without prefix). > > to simplify majority of calling sites without losing functionality. > > Then your [05/20] can use the read_tree_at() to read with a prefix. > > > But that kind of details, I'd want to see you two figure out > yourselves. You've given us a great proposal. I'll wait for Ævar to chime in (and probably update his topic) before I submit a new version. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v2 05/20] sparse-index: implement ensure_full_index() 2021-03-12 20:11 ` Derrick Stolee @ 2021-03-15 23:52 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-15 23:52 UTC (permalink / raw) To: Derrick Stolee Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Fri, Mar 12 2021, Derrick Stolee wrote: > On 3/12/2021 3:08 PM, Junio C Hamano wrote: >> Derrick Stolee <stolee@gmail.com> writes: >> >>>> Ævar, the assumption that led to your e68237bb (tree.h API: remove >>>> support for starting at prefix != "", 2021-03-08) closes the door >>>> for this code rather badly. Please work with Derrick to figure out >>>> what the best course of action would be. >>> >>> Thanks for pointing this out, Junio. >>> >>> My preference would be to drop "tree.h API: remove support for >>> starting at prefix != """, but it should be OK to keep "tree.h API: >>> remove "stage" parameter from read_tree_recursive()" (currently >>> b3a078863f6), even though it introduces a semantic conflict here. >>> >>> Since I haven't seen my sparse-index topic get picked up by a >>> tracking branch, I'd be happy to rebase on top of Ævar's topic if >>> I can still set a non-root prefix. >> I think all the clean-up value e68237bb has are on the calling side >> (they no longer have to pass constant ("", 0) to the function), and >> we could rewrite e68237bb by >> >> - renaming "read_tree_recursive()" to "read_tree_at()", with the >> non-empty prefix support. >> >> - creating a new function "read_tree()", which lacks the support >> for prefix, as a thin-wrapper around "read_tree_at()". >> >> - modifying the callers of "read_tree_recursive()" changed by >> e68237bb to instead call "read_tree()" (without prefix). >> >> to simplify majority of calling sites without losing functionality. >> >> Then your [05/20] can use the read_tree_at() to read with a prefix. >> >> >> But that kind of details, I'd want to see you two figure out >> yourselves. > > You've given us a great proposal. I'll wait for Ævar to chime in > (and probably update his topic) before I submit a new version. I've re-rolled my series just now at https://lore.kernel.org/git/20210315234344.28427-1-avarab@gmail.com/ sorry for the delay. You should be able to rebase easily on top of it, although note that the new read_tree_at() uses a strbuf, but is otherwise the same as the old read_tree_recursive(). Note that the pathspec can also be used to get to where read_tree_recursive() would have brought you. I haven't looked at whether there's reasons to convert in-tree (or this) code to pathspec use, or vice-versa convert some things that use pathspecs now (e.g. ls-tree with a path) to providing a prefix via the strbuf. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 23:04 ` Elijah Newren 2021-03-10 19:30 ` [PATCH v2 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (15 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a new 'sparse-index' repo alongside the 'full-checkout' and 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also add run_on_sparse and test_sparse_match helpers. These helpers will be used when the sparse index is implemented. Add GIT_TEST_SPARSE_INDEX environment variable to enable the sparse-index by default. This will be intended to use across the entire test suite, except that it will only affect cases where the sparse-checkout feature is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/README | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/t/README b/t/README index 593d4a4e270c..b98bc563aab5 100644 --- a/t/README +++ b/t/README @@ -439,6 +439,9 @@ and "sha256". GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the 'pack.writeReverseIndex' setting. +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the +sparse-index format by default. + Naming Tests ------------ diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 3725d3997e70..71d6f9e4c014 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' test_expect_success 'setup' ' git init initial-repo && ( + GIT_TEST_SPARSE_INDEX=0 && cd initial-repo && echo a >a && echo "after deep" >e && @@ -87,23 +88,32 @@ init_repos () { cp -r initial-repo sparse-checkout && git -C sparse-checkout reset --hard && - git -C sparse-checkout sparse-checkout init --cone && + + cp -r initial-repo sparse-index && + git -C sparse-index reset --hard && # initialize sparse-checkout definitions - git -C sparse-checkout sparse-checkout set deep + git -C sparse-checkout sparse-checkout init --cone && + git -C sparse-checkout sparse-checkout set deep && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - "$@" >../sparse-checkout-out 2>../sparse-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + ) && + ( + cd sparse-index && + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - "$@" >../full-checkout-out 2>../full-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -114,6 +124,12 @@ test_all_match () { test_cmp full-checkout-err sparse-checkout-err } +test_sparse_match () { + run_on_sparse $* && + test_cmp sparse-checkout-out sparse-index-out && + test_cmp sparse-checkout-err sparse-index-err +} + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index 2021-03-10 19:30 ` [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-03-10 23:04 ` Elijah Newren 2021-03-11 14:17 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-10 23:04 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Add a new 'sparse-index' repo alongside the 'full-checkout' and > 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also > add run_on_sparse and test_sparse_match helpers. These helpers will be > used when the sparse index is implemented. > > Add GIT_TEST_SPARSE_INDEX environment variable to enable the > sparse-index by default. This will be intended to use across the entire > test suite, except that it will only affect cases where the > sparse-checkout feature is enabled. This last sentence was a bit awkward to read. "will be intended to use" -> "is intended to be used"? > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/README | 3 +++ > t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- > 2 files changed, 23 insertions(+), 4 deletions(-) > > diff --git a/t/README b/t/README > index 593d4a4e270c..b98bc563aab5 100644 > --- a/t/README > +++ b/t/README > @@ -439,6 +439,9 @@ and "sha256". > GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the > 'pack.writeReverseIndex' setting. > > +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the > +sparse-index format by default. > + > Naming Tests > ------------ > > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 3725d3997e70..71d6f9e4c014 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' > test_expect_success 'setup' ' > git init initial-repo && > ( > + GIT_TEST_SPARSE_INDEX=0 && > cd initial-repo && > echo a >a && > echo "after deep" >e && > @@ -87,23 +88,32 @@ init_repos () { > > cp -r initial-repo sparse-checkout && > git -C sparse-checkout reset --hard && > - git -C sparse-checkout sparse-checkout init --cone && > + > + cp -r initial-repo sparse-index && > + git -C sparse-index reset --hard && > > # initialize sparse-checkout definitions > - git -C sparse-checkout sparse-checkout set deep > + git -C sparse-checkout sparse-checkout init --cone && > + git -C sparse-checkout sparse-checkout set deep && > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep > } > > run_on_sparse () { > ( > cd sparse-checkout && > - "$@" >../sparse-checkout-out 2>../sparse-checkout-err > + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err > + ) && > + ( > + cd sparse-index && > + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err > ) > } > > run_on_all () { > ( > cd full-checkout && > - "$@" >../full-checkout-out 2>../full-checkout-err > + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err > ) && > run_on_sparse "$@" > } > @@ -114,6 +124,12 @@ test_all_match () { > test_cmp full-checkout-err sparse-checkout-err > } > > +test_sparse_match () { > + run_on_sparse $* && Should this be run_on_sparse "$@" in order to allow arguments with spaces? > + test_cmp sparse-checkout-out sparse-index-out && > + test_cmp sparse-checkout-err sparse-index-err > +} > + > test_expect_success 'status with options' ' > init_repos && > test_all_match git status --porcelain=v2 && > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index 2021-03-10 23:04 ` Elijah Newren @ 2021-03-11 14:17 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-11 14:17 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/10/2021 6:04 PM, Elijah Newren wrote: > On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> Add GIT_TEST_SPARSE_INDEX environment variable to enable the >> sparse-index by default. This will be intended to use across the entire >> test suite, except that it will only affect cases where the >> sparse-checkout feature is enabled. > > This last sentence was a bit awkward to read. "will be intended to > use" -> "is intended to be used"? Fixed locally to: Add the GIT_TEST_SPARSE_INDEX environment variable to enable the sparse-index by default. This can be enabled across all tests, but that will only affect cases where the sparse-checkout feature is enabled. >> +test_sparse_match () { >> + run_on_sparse $* && > > Should this be > run_on_sparse "$@" > in order to allow arguments with spaces? Sorry I missed this one. It was fixed to the right use in "sparse-index: convert from full to sparse" so I thought I had already covered this one when looking at the tip of my branch. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v2 07/20] test-read-cache: print cache entries with --table 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget ` (14 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This table is helpful for discovering data in the index to ensure it is being written correctly, especially as we build and test the sparse-index. This table includes an output format similar to 'git ls-tree', but should not be compared to that directly. The biggest reasons are that 'git ls-tree' includes a tree entry for every subdirectory, even those that would not appear as a sparse directory in a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. This does not print the stat() information for the blobs. That could be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. Care must be taken with the final check for the 'cnt' variable. We continue the expectation that the numerical value is the final argument. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 244977a29bdf..6cfd8f2de71c 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -1,36 +1,71 @@ #include "test-tool.h" #include "cache.h" #include "config.h" +#include "blob.h" +#include "commit.h" +#include "tree.h" + +static void print_cache_entry(struct cache_entry *ce) +{ + const char *type; + printf("%06o ", ce->ce_mode & 0177777); + + if (S_ISSPARSEDIR(ce->ce_mode)) + type = tree_type; + else if (S_ISGITLINK(ce->ce_mode)) + type = commit_type; + else + type = blob_type; + + printf("%s %s\t%s\n", + type, + oid_to_hex(&ce->oid), + ce->name); +} + +static void print_cache(struct index_state *istate) +{ + int i; + for (i = 0; i < istate->cache_nr; i++) + print_cache_entry(istate->cache[i]); +} int cmd__read_cache(int argc, const char **argv) { + struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; + int table = 0; - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { - argc--; - argv++; + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { + if (skip_prefix(*argv, "--print-and-refresh=", &name)) + continue; + if (!strcmp(*argv, "--table")) + table = 1; } - if (argc == 2) - cnt = strtol(argv[1], NULL, 0); + if (argc == 1) + cnt = strtol(argv[0], NULL, 0); setup_git_directory(); git_config(git_default_config, NULL); + for (i = 0; i < cnt; i++) { - read_cache(); + repo_read_index(r); if (name) { int pos; - refresh_index(&the_index, REFRESH_QUIET, + refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL); - pos = index_name_pos(&the_index, name, strlen(name)); + pos = index_name_pos(r->index, name, strlen(name)); if (pos < 0) die("%s not in index", name); printf("%s is%s up to date\n", name, - ce_uptodate(the_index.cache[pos]) ? "" : " not"); + ce_uptodate(r->index->cache[pos]) ? "" : " not"); write_file(name, "%d\n", i); } - discard_cache(); + if (table) + print_cache(r->index); + discard_index(r->index); } return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 08/20] test-tool: don't force full index 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget ` (13 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will use 'test-tool read-cache --table' to check that a sparse index is written as part of init_repos. Since we will no longer always expand a sparse index into a full index, add an '--expand' parameter that adds a call to ensure_full_index() so we can compare a sparse index directly against a full index, or at least what the in-memory index looks like when expanded in this way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 13 ++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 6cfd8f2de71c..b52c174acc7a 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -4,6 +4,7 @@ #include "blob.h" #include "commit.h" #include "tree.h" +#include "sparse-index.h" static void print_cache_entry(struct cache_entry *ce) { @@ -35,13 +36,19 @@ int cmd__read_cache(int argc, const char **argv) struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; - int table = 0; + int table = 0, expand = 0; + + initialize_the_repository(); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) continue; if (!strcmp(*argv, "--table")) table = 1; + else if (!strcmp(*argv, "--expand")) + expand = 1; } if (argc == 1) @@ -51,6 +58,10 @@ int cmd__read_cache(int argc, const char **argv) for (i = 0; i < cnt; i++) { repo_read_index(r); + + if (expand) + ensure_full_index(r->index); + if (name) { int pos; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 71d6f9e4c014..4d789fe86b9d 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -130,6 +130,11 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'expanded in-memory index matches full index' ' + init_repos && + test_sparse_match test-tool read-cache --expand --table +' + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 09/20] unpack-trees: ensure full index 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget ` (12 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index f5f668f532d8..4dd99219073a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1567,6 +1567,7 @@ static int verify_absent(const struct cache_entry *, */ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { + struct repository *repo = the_repository; int i, ret; static struct cache_entry *dfc; struct pattern_list pl; @@ -1578,6 +1579,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options trace_performance_enter(); trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) { + ensure_full_index(o->src_index); + ensure_full_index(o->dst_index); + } + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 10/20] sparse-checkout: hold pattern list in index 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget ` (11 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 17 ++++++++++------- cache.h | 2 ++ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 2306a9ad98e0..e00b82af727b 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl) if (is_index_unborn(r->index)) return UPDATE_SPARSITY_SUCCESS; + r->index->sparse_checkout_patterns = pl; + memset(&o, 0, sizeof(o)); o.verbose_update = isatty(2); o.update = 1; @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl) else rollback_lock_file(&lock_file); + r->index->sparse_checkout_patterns = NULL; return result; } @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) { int result; int changed_config = 0; - struct pattern_list pl; - memset(&pl, 0, sizeof(pl)); + struct pattern_list *pl = xcalloc(1, sizeof(*pl)); switch (m) { case ADD: if (core_sparse_checkout_cone) - add_patterns_cone_mode(argc, argv, &pl); + add_patterns_cone_mode(argc, argv, pl); else - add_patterns_literal(argc, argv, &pl); + add_patterns_literal(argc, argv, pl); break; case REPLACE: - add_patterns_from_input(&pl, argc, argv); + add_patterns_from_input(pl, argc, argv); break; } @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) changed_config = 1; } - result = write_patterns_and_update(&pl); + result = write_patterns_and_update(pl); if (result && changed_config) set_config(MODE_NO_PATTERNS); - clear_pattern_list(&pl); + clear_pattern_list(pl); + free(pl); return result; } diff --git a/cache.h b/cache.h index 1f0b42264606..303411726e10 100644 --- a/cache.h +++ b/cache.h @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode) struct split_index; struct untracked_cache; struct progress; +struct pattern_list; struct index_state { struct cache_entry **cache; @@ -338,6 +339,7 @@ struct index_state { struct mem_pool *ce_mem_pool; struct progress *progress; struct repository *repo; + struct pattern_list *sparse_checkout_patterns; }; /* Name hashing */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 11/20] sparse-index: convert from full to sparse 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 23:44 ` Elijah Newren 2021-03-10 19:30 ` [PATCH v2 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 3 + cache.h | 2 + read-cache.c | 26 ++++- sparse-index.c | 139 +++++++++++++++++++++++ sparse-index.h | 1 + t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- 6 files changed, 227 insertions(+), 5 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 2fb483d3c083..5f07a39e501e 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -6,6 +6,7 @@ #include "object-store.h" #include "replace-object.h" #include "promisor-remote.h" +#include "sparse-index.h" #ifndef DEBUG_CACHE_TREE #define DEBUG_CACHE_TREE 0 @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + ensure_full_index(istate); + if (!istate->cache_tree) istate->cache_tree = cache_tree(); diff --git a/cache.h b/cache.h index 303411726e10..9217d405b9b8 100644 --- a/cache.h +++ b/cache.h @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) { if (S_ISLNK(mode)) return S_IFLNK; + if (mode == S_IFDIR) + return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; return S_IFREG | ce_permissions(mode); diff --git a/read-cache.c b/read-cache.c index 97dbf2434f30..92126b9d23c9 100644 --- a/read-cache.c +++ b/read-cache.c @@ -25,6 +25,7 @@ #include "fsmonitor.h" #include "thread-utils.h" #include "progress.h" +#include "sparse-index.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode) c = *path++; if ((c == '.' && !verify_dotfile(path, mode)) || - is_dir_sep(c) || c == '\0') + is_dir_sep(c)) return 0; + /* + * allow terminating directory separators for + * sparse directory entries. + */ + if (c == '\0') + return S_ISDIR(mode); } else if (c == '\\' && protect_ntfs) { if (is_ntfs_dotgit(path)) return 0; @@ -3061,6 +3068,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l unsigned flags) { int ret; + int was_full = !istate->sparse_index; + + ret = convert_to_sparse(istate); + + if (ret) { + warning(_("failed to convert to a sparse-index")); + return ret; + } /* * TODO trace2: replace "the_repository" with the actual repo instance @@ -3072,6 +3087,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l trace2_region_leave_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; if (flags & COMMIT_LOCK) @@ -3162,9 +3180,10 @@ static int write_shared_index(struct index_state *istate, struct tempfile **temp) { struct split_index *si = istate->split_index; - int ret; + int ret, was_full = !istate->sparse_index; move_cache_to_base_index(istate); + convert_to_sparse(istate); trace2_region_enter_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); @@ -3172,6 +3191,9 @@ static int write_shared_index(struct index_state *istate, trace2_region_leave_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; ret = adjust_shared_perm(get_tempfile_path(*temp)); diff --git a/sparse-index.c b/sparse-index.c index 316cb949b74b..5eb561259bb1 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -4,6 +4,145 @@ #include "tree.h" #include "pathspec.h" #include "trace2.h" +#include "cache-tree.h" +#include "config.h" +#include "dir.h" +#include "fsmonitor.h" + +static struct cache_entry *construct_sparse_dir_entry( + struct index_state *istate, + const char *sparse_dir, + struct cache_tree *tree) +{ + struct cache_entry *de; + + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); + + de->ce_flags |= CE_SKIP_WORKTREE; + return de; +} + +/* + * Returns the number of entries "inserted" into the index. + */ +static int convert_to_sparse_rec(struct index_state *istate, + int num_converted, + int start, int end, + const char *ct_path, size_t ct_pathlen, + struct cache_tree *ct) +{ + int i, can_convert = 1; + int start_converted = num_converted; + enum pattern_match_result match; + int dtype; + struct strbuf child_path = STRBUF_INIT; + struct pattern_list *pl = istate->sparse_checkout_patterns; + + /* + * Is the current path outside of the sparse cone? + * Then check if the region can be replaced by a sparse + * directory entry (everything is sparse and merged). + */ + match = path_matches_pattern_list(ct_path, ct_pathlen, + NULL, &dtype, pl, istate); + if (match != NOT_MATCHED) + can_convert = 0; + + for (i = start; can_convert && i < end; i++) { + struct cache_entry *ce = istate->cache[i]; + + if (ce_stage(ce) || + !(ce->ce_flags & CE_SKIP_WORKTREE)) + can_convert = 0; + } + + if (can_convert) { + struct cache_entry *se; + se = construct_sparse_dir_entry(istate, ct_path, ct); + + istate->cache[num_converted++] = se; + return 1; + } + + for (i = start; i < end; ) { + int count, span, pos = -1; + const char *base, *slash; + struct cache_entry *ce = istate->cache[i]; + + /* + * Detect if this is a normal entry outside of any subtree + * entry. + */ + base = ce->name + ct_pathlen; + slash = strchr(base, '/'); + + if (slash) + pos = cache_tree_subtree_pos(ct, base, slash - base); + + if (pos < 0) { + istate->cache[num_converted++] = ce; + i++; + continue; + } + + strbuf_setlen(&child_path, 0); + strbuf_add(&child_path, ce->name, slash - ce->name + 1); + + span = ct->down[pos]->cache_tree->entry_count; + count = convert_to_sparse_rec(istate, + num_converted, i, i + span, + child_path.buf, child_path.len, + ct->down[pos]->cache_tree); + num_converted += count; + i += span; + } + + strbuf_release(&child_path); + return num_converted - start_converted; +} + +int convert_to_sparse(struct index_state *istate) +{ + if (istate->split_index || istate->sparse_index || + !core_apply_sparse_checkout || !core_sparse_checkout_cone) + return 0; + + /* + * For now, only create a sparse index with the + * GIT_TEST_SPARSE_INDEX environment variable. We will relax + * this once we have a proper way to opt-in (and later still, + * opt-out). + */ + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + return 0; + + if (!istate->sparse_checkout_patterns) { + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) + return 0; + } + + if (!istate->sparse_checkout_patterns->use_cone_patterns) { + warning(_("attempting to use sparse-index without cone mode")); + return -1; + } + + if (cache_tree_update(istate, 0)) { + warning(_("unable to update cache-tree, staying full")); + return -1; + } + + remove_fsmonitor(istate); + + trace2_region_enter("index", "convert_to_sparse", istate->repo); + istate->cache_nr = convert_to_sparse_rec(istate, + 0, 0, istate->cache_nr, + "", 0, istate->cache_tree); + istate->drop_cache_tree = 1; + istate->sparse_index = 1; + trace2_region_leave("index", "convert_to_sparse", istate->repo); + return 0; +} static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { diff --git a/sparse-index.h b/sparse-index.h index 09a20d036c46..64380e121d80 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -3,5 +3,6 @@ struct index_state; void ensure_full_index(struct index_state *istate); +int convert_to_sparse(struct index_state *istate); #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 4d789fe86b9d..ca87033d30b0 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,6 +2,9 @@ test_description='compare full workdir to sparse workdir' +GIT_TEST_CHECK_CACHE_TREE=0 +GIT_TEST_SPLIT_INDEX=0 + . ./test-lib.sh test_expect_success 'setup' ' @@ -121,15 +124,49 @@ run_on_all () { test_all_match () { run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && - test_cmp full-checkout-err sparse-checkout-err + test_cmp full-checkout-out sparse-index-out && + test_cmp full-checkout-err sparse-checkout-err && + test_cmp full-checkout-err sparse-index-err } test_sparse_match () { - run_on_sparse $* && + run_on_sparse "$@" && test_cmp sparse-checkout-out sparse-index-out && test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'sparse-index contents' ' + init_repos && + + test-tool -C sparse-index read-cache --table >cache && + for dir in folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep/deeper2 folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done +' + test_expect_success 'expanded in-memory index matches full index' ' init_repos && test_sparse_match test-tool read-cache --expand --table @@ -137,6 +174,7 @@ test_expect_success 'expanded in-memory index matches full index' ' test_expect_success 'status with options' ' init_repos && + test_sparse_match ls && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -273,6 +311,17 @@ test_expect_failure 'checkout and reset (mixed)' ' test_all_match git reset update-folder2 ' +# Ensure that sparse-index behaves identically to +# sparse-checkout with a full index. +test_expect_success 'checkout and reset (mixed) [sparse]' ' + init_repos && + + test_sparse_match git checkout -b reset-test update-deep && + test_sparse_match git reset deepest && + test_sparse_match git reset update-folder1 && + test_sparse_match git reset update-folder2 +' + test_expect_success 'merge' ' init_repos && @@ -309,14 +358,20 @@ test_expect_success 'clean' ' test_all_match git status --porcelain=v2 && test_all_match git clean -f && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xdf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && - test_path_is_dir sparse-checkout/folder1 + test_sparse_match test_path_is_dir folder1 ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v2 11/20] sparse-index: convert from full to sparse 2021-03-10 19:30 ` [PATCH v2 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-03-10 23:44 ` Elijah Newren 2021-03-11 14:13 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-10 23:44 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > If we have a full index, then we can convert it to a sparse index by > replacing directories outside of the sparse cone with sparse directory > entries. The convert_to_sparse() method does this, when the situation is > appropriate. > > For now, we avoid converting the index to a sparse index if: > > 1. the index is split. > 2. the index is already sparse. > 3. sparse-checkout is disabled. > 4. sparse-checkout does not use cone mode. > > Finally, we currently limit the conversion to when the > GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git > config will be added in a later change. > > The trickiest thing about this conversion is that we might not be able > to mark a directory as a sparse directory just because it is outside the > sparse cone. There might be unmerged files within that directory, so we > need to look for those. Also, if there is some strange reason why a file > is not marked with CE_SKIP_WORKTREE, then we should give up on > converting that directory. There is still hope that some of its > subdirectories might be able to convert to sparse, so we keep looking > deeper. > > The conversion process is assisted by the cache-tree extension. This is > calculated from the full index if it does not already exist. We then > abandon the cache-tree as it no longer applies to the newly-sparse > index. Thus, this cache-tree will be recalculated in every > sparse-full-sparse round-trip until we integrate the cache-tree > extension with the sparse index. > > Some Git commands use the index after writing it. For example, 'git add' > will update the index, then write it to disk, then read its entries to > report information. To keep the in-memory index in a full state after > writing, we re-expand it to a full one after the write. This is wasteful > for commands that only write the index and do not read from it again, > but that is only the case until we make those commands "sparse aware." > > We can compare the behavior of the sparse-index in > t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 > when operating on the 'sparse-index' repo. We can also compare the two > sparse repos directly, such as comparing their indexes (when expanded to > full in the case of the 'sparse-index' repo). We also verify that the > index is actually populated with sparse directory entries. > > The 'checkout and reset (mixed)' test is marked for failure when > comparing a sparse repo to a full repo, but we can compare the two > sparse-checkout cases directly to ensure that we are not changing the > behavior when using a sparse index. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > cache-tree.c | 3 + > cache.h | 2 + > read-cache.c | 26 ++++- > sparse-index.c | 139 +++++++++++++++++++++++ > sparse-index.h | 1 + > t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- > 6 files changed, 227 insertions(+), 5 deletions(-) > > diff --git a/cache-tree.c b/cache-tree.c > index 2fb483d3c083..5f07a39e501e 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -6,6 +6,7 @@ > #include "object-store.h" > #include "replace-object.h" > #include "promisor-remote.h" > +#include "sparse-index.h" > > #ifndef DEBUG_CACHE_TREE > #define DEBUG_CACHE_TREE 0 > @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) > if (i) > return i; > > + ensure_full_index(istate); > + > if (!istate->cache_tree) > istate->cache_tree = cache_tree(); > > diff --git a/cache.h b/cache.h > index 303411726e10..9217d405b9b8 100644 > --- a/cache.h > +++ b/cache.h > @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) > { > if (S_ISLNK(mode)) > return S_IFLNK; > + if (mode == S_IFDIR) > + return S_IFDIR; > if (S_ISDIR(mode) || S_ISGITLINK(mode)) > return S_IFGITLINK; > return S_IFREG | ce_permissions(mode); > diff --git a/read-cache.c b/read-cache.c > index 97dbf2434f30..92126b9d23c9 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -25,6 +25,7 @@ > #include "fsmonitor.h" > #include "thread-utils.h" > #include "progress.h" > +#include "sparse-index.h" > > /* Mask for the name length in ce_flags in the on-disk index */ > > @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode) > > c = *path++; > if ((c == '.' && !verify_dotfile(path, mode)) || > - is_dir_sep(c) || c == '\0') > + is_dir_sep(c)) > return 0; > + /* > + * allow terminating directory separators for > + * sparse directory entries. > + */ > + if (c == '\0') > + return S_ISDIR(mode); > } else if (c == '\\' && protect_ntfs) { > if (is_ntfs_dotgit(path)) > return 0; > @@ -3061,6 +3068,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l > unsigned flags) > { > int ret; > + int was_full = !istate->sparse_index; > + > + ret = convert_to_sparse(istate); > + > + if (ret) { > + warning(_("failed to convert to a sparse-index")); > + return ret; > + } > > /* > * TODO trace2: replace "the_repository" with the actual repo instance > @@ -3072,6 +3087,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l > trace2_region_leave_printf("index", "do_write_index", the_repository, > "%s", get_lock_file_path(lock)); > > + if (was_full) > + ensure_full_index(istate); > + > if (ret) > return ret; > if (flags & COMMIT_LOCK) > @@ -3162,9 +3180,10 @@ static int write_shared_index(struct index_state *istate, > struct tempfile **temp) > { > struct split_index *si = istate->split_index; > - int ret; > + int ret, was_full = !istate->sparse_index; > > move_cache_to_base_index(istate); > + convert_to_sparse(istate); > > trace2_region_enter_printf("index", "shared/do_write_index", > the_repository, "%s", get_tempfile_path(*temp)); > @@ -3172,6 +3191,9 @@ static int write_shared_index(struct index_state *istate, > trace2_region_leave_printf("index", "shared/do_write_index", > the_repository, "%s", get_tempfile_path(*temp)); > > + if (was_full) > + ensure_full_index(istate); > + > if (ret) > return ret; > ret = adjust_shared_perm(get_tempfile_path(*temp)); > diff --git a/sparse-index.c b/sparse-index.c > index 316cb949b74b..5eb561259bb1 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -4,6 +4,145 @@ > #include "tree.h" > #include "pathspec.h" > #include "trace2.h" > +#include "cache-tree.h" > +#include "config.h" > +#include "dir.h" > +#include "fsmonitor.h" > + > +static struct cache_entry *construct_sparse_dir_entry( > + struct index_state *istate, > + const char *sparse_dir, > + struct cache_tree *tree) > +{ > + struct cache_entry *de; > + > + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); > + > + de->ce_flags |= CE_SKIP_WORKTREE; > + return de; > +} > + > +/* > + * Returns the number of entries "inserted" into the index. > + */ > +static int convert_to_sparse_rec(struct index_state *istate, > + int num_converted, > + int start, int end, > + const char *ct_path, size_t ct_pathlen, > + struct cache_tree *ct) > +{ > + int i, can_convert = 1; > + int start_converted = num_converted; > + enum pattern_match_result match; > + int dtype; > + struct strbuf child_path = STRBUF_INIT; > + struct pattern_list *pl = istate->sparse_checkout_patterns; > + > + /* > + * Is the current path outside of the sparse cone? > + * Then check if the region can be replaced by a sparse > + * directory entry (everything is sparse and merged). > + */ > + match = path_matches_pattern_list(ct_path, ct_pathlen, > + NULL, &dtype, pl, istate); > + if (match != NOT_MATCHED) > + can_convert = 0; > + > + for (i = start; can_convert && i < end; i++) { > + struct cache_entry *ce = istate->cache[i]; > + > + if (ce_stage(ce) || > + !(ce->ce_flags & CE_SKIP_WORKTREE)) > + can_convert = 0; > + } > + > + if (can_convert) { > + struct cache_entry *se; > + se = construct_sparse_dir_entry(istate, ct_path, ct); > + > + istate->cache[num_converted++] = se; > + return 1; > + } > + > + for (i = start; i < end; ) { > + int count, span, pos = -1; > + const char *base, *slash; > + struct cache_entry *ce = istate->cache[i]; > + > + /* > + * Detect if this is a normal entry outside of any subtree > + * entry. > + */ > + base = ce->name + ct_pathlen; > + slash = strchr(base, '/'); > + > + if (slash) > + pos = cache_tree_subtree_pos(ct, base, slash - base); > + > + if (pos < 0) { > + istate->cache[num_converted++] = ce; > + i++; > + continue; > + } > + > + strbuf_setlen(&child_path, 0); > + strbuf_add(&child_path, ce->name, slash - ce->name + 1); > + > + span = ct->down[pos]->cache_tree->entry_count; > + count = convert_to_sparse_rec(istate, > + num_converted, i, i + span, > + child_path.buf, child_path.len, > + ct->down[pos]->cache_tree); > + num_converted += count; > + i += span; > + } > + > + strbuf_release(&child_path); > + return num_converted - start_converted; > +} > + > +int convert_to_sparse(struct index_state *istate) > +{ > + if (istate->split_index || istate->sparse_index || > + !core_apply_sparse_checkout || !core_sparse_checkout_cone) > + return 0; > + > + /* > + * For now, only create a sparse index with the > + * GIT_TEST_SPARSE_INDEX environment variable. We will relax > + * this once we have a proper way to opt-in (and later still, > + * opt-out). > + */ > + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) > + return 0; > + > + if (!istate->sparse_checkout_patterns) { > + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); > + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) > + return 0; > + } > + > + if (!istate->sparse_checkout_patterns->use_cone_patterns) { > + warning(_("attempting to use sparse-index without cone mode")); > + return -1; > + } > + > + if (cache_tree_update(istate, 0)) { > + warning(_("unable to update cache-tree, staying full")); > + return -1; > + } > + > + remove_fsmonitor(istate); > + > + trace2_region_enter("index", "convert_to_sparse", istate->repo); > + istate->cache_nr = convert_to_sparse_rec(istate, > + 0, 0, istate->cache_nr, > + "", 0, istate->cache_tree); > + istate->drop_cache_tree = 1; > + istate->sparse_index = 1; > + trace2_region_leave("index", "convert_to_sparse", istate->repo); > + return 0; > +} > > static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) > { > diff --git a/sparse-index.h b/sparse-index.h > index 09a20d036c46..64380e121d80 100644 > --- a/sparse-index.h > +++ b/sparse-index.h > @@ -3,5 +3,6 @@ > > struct index_state; > void ensure_full_index(struct index_state *istate); > +int convert_to_sparse(struct index_state *istate); > > #endif > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 4d789fe86b9d..ca87033d30b0 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -2,6 +2,9 @@ > > test_description='compare full workdir to sparse workdir' > > +GIT_TEST_CHECK_CACHE_TREE=0 I still think it'd be nice to get a comment, either in the code or the commit message, explaining why your series needs to set GIT_TEST_CHECK_CACHE_TREE to 0. I feel like I should almost know the answer (was this just a preliminary step and it'll later be turned on? did the cache-tree checking do stuff that assumes no sparse directory entries? is it really slow?), but I don't. > +GIT_TEST_SPLIT_INDEX=0 > + > . ./test-lib.sh > > test_expect_success 'setup' ' > @@ -121,15 +124,49 @@ run_on_all () { > test_all_match () { > run_on_all "$@" && > test_cmp full-checkout-out sparse-checkout-out && > - test_cmp full-checkout-err sparse-checkout-err > + test_cmp full-checkout-out sparse-index-out && > + test_cmp full-checkout-err sparse-checkout-err && > + test_cmp full-checkout-err sparse-index-err > } > > test_sparse_match () { > - run_on_sparse $* && > + run_on_sparse "$@" && > test_cmp sparse-checkout-out sparse-index-out && > test_cmp sparse-checkout-err sparse-index-err > } > > +test_expect_success 'sparse-index contents' ' > + init_repos && > + > + test-tool -C sparse-index read-cache --table >cache && > + for dir in folder1 folder2 x > + do > + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > + grep "040000 tree $TREE $dir/" cache \ > + || return 1 > + done && > + > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && > + > + test-tool -C sparse-index read-cache --table >cache && > + for dir in deep folder2 x > + do > + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > + grep "040000 tree $TREE $dir/" cache \ > + || return 1 > + done && > + > + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && > + > + test-tool -C sparse-index read-cache --table >cache && > + for dir in deep/deeper2 folder1 folder2 x > + do > + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > + grep "040000 tree $TREE $dir/" cache \ > + || return 1 > + done > +' > + > test_expect_success 'expanded in-memory index matches full index' ' > init_repos && > test_sparse_match test-tool read-cache --expand --table > @@ -137,6 +174,7 @@ test_expect_success 'expanded in-memory index matches full index' ' > > test_expect_success 'status with options' ' > init_repos && > + test_sparse_match ls && > test_all_match git status --porcelain=v2 && > test_all_match git status --porcelain=v2 -z -u && > test_all_match git status --porcelain=v2 -uno && > @@ -273,6 +311,17 @@ test_expect_failure 'checkout and reset (mixed)' ' > test_all_match git reset update-folder2 > ' > > +# Ensure that sparse-index behaves identically to > +# sparse-checkout with a full index. > +test_expect_success 'checkout and reset (mixed) [sparse]' ' > + init_repos && > + > + test_sparse_match git checkout -b reset-test update-deep && > + test_sparse_match git reset deepest && > + test_sparse_match git reset update-folder1 && > + test_sparse_match git reset update-folder2 > +' > + > test_expect_success 'merge' ' > init_repos && > > @@ -309,14 +358,20 @@ test_expect_success 'clean' ' > test_all_match git status --porcelain=v2 && > test_all_match git clean -f && > test_all_match git status --porcelain=v2 && > + test_sparse_match ls && > + test_sparse_match ls folder1 && > > test_all_match git clean -xf && > test_all_match git status --porcelain=v2 && > + test_sparse_match ls && > + test_sparse_match ls folder1 && > > test_all_match git clean -xdf && > test_all_match git status --porcelain=v2 && > + test_sparse_match ls && > + test_sparse_match ls folder1 && > > - test_path_is_dir sparse-checkout/folder1 > + test_sparse_match test_path_is_dir folder1 > ' > > test_done > -- > gitgitgadget > ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v2 11/20] sparse-index: convert from full to sparse 2021-03-10 23:44 ` Elijah Newren @ 2021-03-11 14:13 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-11 14:13 UTC (permalink / raw) To: Elijah Newren, Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/10/2021 6:44 PM, Elijah Newren wrote: > On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> +GIT_TEST_CHECK_CACHE_TREE=0 > > I still think it'd be nice to get a comment, either in the code or the > commit message, explaining why your series needs to set > GIT_TEST_CHECK_CACHE_TREE to 0. I feel like I should almost know the > answer (was this just a preliminary step and it'll later be turned on? > did the cache-tree checking do stuff that assumes no sparse directory > entries? is it really slow?), but I don't. Sorry I missed commenting on this earlier. The GIT_TEST_CHECK_CACHE_TREE environment is enabled by the test suite by default and it does extra validation to see that the cache-tree extension exists and matches the index contents. Since at this point we don't have the cache-tree extension enabled with sparse-index, we would start getting failures by those tests. This is re-enabled in "sparse-index: loose integration with cache_tree_verify()" so everything is being verified at the end of the series. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v2 12/20] submodule: sparse-index should not collapse links 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- sparse-index.c | 1 + t/t1092-sparse-checkout-compatibility.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/sparse-index.c b/sparse-index.c index 5eb561259bb1..36b4dde7eeda 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -52,6 +52,7 @@ static int convert_to_sparse_rec(struct index_state *istate, struct cache_entry *ce = istate->cache[i]; if (ce_stage(ce) || + S_ISGITLINK(ce->ce_mode) || !(ce->ce_flags & CE_SKIP_WORKTREE)) can_convert = 0; } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index ca87033d30b0..b38fab6455d9 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -374,4 +374,21 @@ test_expect_success 'clean' ' test_sparse_match test_path_is_dir folder1 ' +test_expect_success 'submodule handling' ' + init_repos && + + test_all_match mkdir modules && + test_all_match touch modules/a && + test_all_match git add modules && + test_all_match git commit -m "add modules directory" && + + run_on_all git submodule add "$(pwd)/initial-repo" modules/sub && + test_all_match git commit -m "add submodule" && + + # having a submodule prevents "modules" from collapse + test-tool -C sparse-index read-cache --table >cache && + grep "100644 blob .* modules/a" cache && + grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 13/20] unpack-trees: allow sparse directories 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (11 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 4dd99219073a..b324eec2a5d1 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -746,9 +746,12 @@ static int index_pos_by_traverse_info(struct name_entry *names, strbuf_make_traverse_path(&name, info, names->path, names->pathlen); strbuf_addch(&name, '/'); pos = index_name_pos(o->src_index, name.buf, name.len); - if (pos >= 0) - BUG("This is a directory and should not exist in index"); - pos = -pos - 1; + if (pos >= 0) { + if (!o->src_index->sparse_index || + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) + BUG("This is a directory and should not exist in index"); + } else + pos = -pos - 1; if (pos >= o->src_index->cache_nr || !starts_with(o->src_index->cache[pos]->name, name.buf) || (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 14/20] sparse-index: check index conversion happens 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (12 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a test case that uses test_region to ensure that we are truly expanding a sparse index to a full one, then converting back to sparse when writing the index. As we integrate more Git commands with the sparse index, we will convert these commands to check that we do _not_ convert the sparse index to a full index and instead stay sparse the entire time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index b38fab6455d9..bfc9e28ef0e1 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -391,4 +391,22 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +test_expect_success 'sparse-index is expanded and converted back' ' + init_repos && + + ( + GIT_TEST_SPARSE_INDEX=1 && + export GIT_TEST_SPARSE_INDEX && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 15/20] sparse-index: create extension for compatibility 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (13 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Previously, we enabled the sparse index format only using GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to actually select this mode. Further, sparse directory entries are not understood by the index formats as advertised. We _could_ add a new index version that explicitly adds these capabilities, but there are nuances to index formats 2, 3, and 4 that are still valuable to select as options. Until we add index format version 5, create a repo extension, "extensions.sparseIndex", that specifies that the tool reading this repository must understand sparse directory entries. This change only encodes the extension and enables it when GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI mechanism. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/config/extensions.txt | 8 ++++++ cache.h | 1 + repo-settings.c | 7 ++++++ repository.h | 3 ++- setup.c | 3 +++ sparse-index.c | 38 +++++++++++++++++++++++++---- 6 files changed, 54 insertions(+), 6 deletions(-) diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 4e23d73cdcad..c02e09af0046 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -6,3 +6,11 @@ extensions.objectFormat:: Note that this setting should only be set by linkgit:git-init[1] or linkgit:git-clone[1]. Trying to change it after initialization will not work and will produce hard-to-diagnose issues. + +extensions.sparseIndex:: + When combined with `core.sparseCheckout=true` and + `core.sparseCheckoutCone=true`, the index may contain entries + corresponding to directories outside of the sparse-checkout + definition in lieu of containing each path under such directories. + Versions of Git that do not understand this extension do not + expect directory entries in the index. diff --git a/cache.h b/cache.h index 9217d405b9b8..03f931c5f34d 100644 --- a/cache.h +++ b/cache.h @@ -1059,6 +1059,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int sparse_index; char *work_tree; struct string_list unknown_extensions; struct string_list v1_only_extensions; diff --git a/repo-settings.c b/repo-settings.c index d63569e4041e..9677d50f9238 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r) * removed. */ r->settings.command_requires_full_index = 1; + + /* + * Initialize this as off. + */ + r->settings.sparse_index = 0; + if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) + r->settings.sparse_index = 1; } diff --git a/repository.h b/repository.h index e06a23015697..a45f7520fd9e 100644 --- a/repository.h +++ b/repository.h @@ -42,7 +42,8 @@ struct repo_settings { int core_multi_pack_index; - unsigned command_requires_full_index:1; + unsigned command_requires_full_index:1, + sparse_index:1; }; struct repository { diff --git a/setup.c b/setup.c index c04cd25a30df..cd8394564613 100644 --- a/setup.c +++ b/setup.c @@ -500,6 +500,9 @@ static enum extension_result handle_extension(const char *var, return error("invalid value for 'extensions.objectformat'"); data->hash_algo = format; return EXTENSION_OK; + } else if (!strcmp(ext, "sparseindex")) { + data->sparse_index = 1; + return EXTENSION_OK; } return EXTENSION_UNKNOWN; } diff --git a/sparse-index.c b/sparse-index.c index 36b4dde7eeda..b9c14ef7ab50 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -102,19 +102,47 @@ static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; } +static int enable_sparse_index(struct repository *repo) +{ + const char *config_path = repo_git_path(repo, "config.worktree"); + + if (upgrade_repository_format(1) < 0) { + warning(_("unable to upgrade repository format to enable sparse-index")); + return -1; + } + git_config_set_in_file_gently(config_path, + "extensions.sparseIndex", + "true"); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 1; + return 0; +} + int convert_to_sparse(struct index_state *istate) { if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; + if (!istate->repo) + istate->repo = the_repository; + + /* + * The GIT_TEST_SPARSE_INDEX environment variable triggers the + * extensions.sparseIndex config variable to be on. + */ + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { + int err = enable_sparse_index(istate->repo); + if (err < 0) + return err; + } + /* - * For now, only create a sparse index with the - * GIT_TEST_SPARSE_INDEX environment variable. We will relax - * this once we have a proper way to opt-in (and later still, - * opt-out). + * Only convert to sparse if extensions.sparseIndex is set. */ - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + prepare_repo_settings(istate->repo); + if (!istate->repo->settings.sparse_index) return 0; if (!istate->sparse_checkout_patterns) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (14 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 ` Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:30 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/git-sparse-checkout.txt | 14 +++++++++ builtin/sparse-checkout.c | 17 ++++++++++- sparse-index.c | 37 +++++++++++++++-------- sparse-index.h | 3 ++ t/t1092-sparse-checkout-compatibility.sh | 38 +++++++++++------------- 5 files changed, 76 insertions(+), 33 deletions(-) diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02ee3..4a8343cf7fa4 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout sparse-index disable` +to rewrite your index to not be sparse. Older versions of Git will not +understand the `sparseIndex` repository extension and may fail to interact +with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e00b82af727b..ca63e2c64e95 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -14,6 +14,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "quote.h" +#include "sparse-index.h" static const char *empty_base = ""; @@ -283,12 +284,13 @@ static int set_config(enum sparse_checkout_mode mode) } static char const * const builtin_sparse_checkout_init_usage[] = { - N_("git sparse-checkout init [--cone]"), + N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"), NULL }; static struct sparse_checkout_init_opts { int cone_mode; + int sparse_index; } init_opts; static int sparse_checkout_init(int argc, const char **argv) @@ -303,11 +305,15 @@ static int sparse_checkout_init(int argc, const char **argv) static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, N_("initialize the sparse-checkout in cone mode")), + OPT_BOOL(0, "sparse-index", &init_opts.sparse_index, + N_("toggle the use of a sparse index")), OPT_END(), }; repo_read_index(the_repository); + init_opts.sparse_index = -1; + argc = parse_options(argc, argv, NULL, builtin_sparse_checkout_init_options, builtin_sparse_checkout_init_usage, 0); @@ -326,6 +332,15 @@ static int sparse_checkout_init(int argc, const char **argv) sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL); + if (init_opts.sparse_index >= 0) { + if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0) + die(_("failed to modify sparse-index config")); + + /* force an index rewrite */ + repo_read_index(the_repository); + the_repository->index->updated_workdir = 1; + } + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); diff --git a/sparse-index.c b/sparse-index.c index b9c14ef7ab50..1c84cac255bf 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -104,23 +104,37 @@ static int convert_to_sparse_rec(struct index_state *istate, static int enable_sparse_index(struct repository *repo) { - const char *config_path = repo_git_path(repo, "config.worktree"); + int res; if (upgrade_repository_format(1) < 0) { warning(_("unable to upgrade repository format to enable sparse-index")); return -1; } - git_config_set_in_file_gently(config_path, - "extensions.sparseIndex", - "true"); + res = git_config_set_gently("extensions.sparseindex", "true"); prepare_repo_settings(repo); repo->settings.sparse_index = 1; - return 0; + return res; +} + +int set_sparse_index_config(struct repository *repo, int enable) +{ + int res; + + if (enable) + return enable_sparse_index(repo); + + /* Don't downgrade repository format, just remove the extension. */ + res = git_config_set_gently("extensions.sparseindex", NULL); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 0; + return res; } int convert_to_sparse(struct index_state *istate) { + int test_env; if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ -129,14 +143,13 @@ int convert_to_sparse(struct index_state *istate) istate->repo = the_repository; /* - * The GIT_TEST_SPARSE_INDEX environment variable triggers the - * extensions.sparseIndex config variable to be on. + * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex + * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), + * then purposefully disable the setting. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); - if (err < 0) - return err; - } + test_env = git_env_bool("GIT_TEST_SPARSE_INDEX", -1); + if (test_env >= 0) + set_sparse_index_config(istate->repo, test_env); /* * Only convert to sparse if extensions.sparseIndex is set. diff --git a/sparse-index.h b/sparse-index.h index 64380e121d80..39dcc859735e 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,4 +5,7 @@ struct index_state; void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); +struct repository; +int set_sparse_index_config(struct repository *repo, int enable); + #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index bfc9e28ef0e1..9c2bc4d25f66 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -4,6 +4,7 @@ test_description='compare full workdir to sparse workdir' GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= . ./test-lib.sh @@ -98,25 +99,26 @@ init_repos () { # initialize sparse-checkout definitions git -C sparse-checkout sparse-checkout init --cone && git -C sparse-checkout sparse-checkout set deep && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && + test_cmp_config -C sparse-index true extensions.sparseindex && + git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) && ( cd sparse-index && - GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err + "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -146,7 +148,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + git -C sparse-index sparse-checkout set folder1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep folder2 x @@ -156,7 +158,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + git -C sparse-index sparse-checkout set deep/deeper1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x @@ -394,19 +396,15 @@ test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && - ( - GIT_TEST_SPARSE_INDEX=1 && - export GIT_TEST_SPARSE_INDEX && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && - test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && - - rm trace2.txt && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt - ) + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 17/20] sparse-checkout: disable sparse-index 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (15 preceding siblings ...) 2021-03-10 19:30 ` [PATCH v2 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 ` Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We use 'git sparse-checkout init --cone --sparse-index' to toggle the sparse-index feature. It makes sense to also disable it when running 'git sparse-checkout disable'. This is particularly important because it removes the extensions.sparseIndex config option, allowing other tools to use this Git repository again. This does mean that 'git sparse-checkout init' will not re-enable the sparse-index feature, even if it was previously enabled. While testing this feature, I noticed that the sparse-index was not being written on the first run, but by a second. This was caught by the call to 'test-tool read-cache --table'. This requires adjusting some assignments to core_apply_sparse_checkout and pl.use_cone_patterns in the sparse_checkout_init() logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 10 +++++++++- t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index ca63e2c64e95..585343fa1972 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode) "core.sparseCheckoutCone", mode == MODE_CONE_PATTERNS ? "true" : NULL); + if (mode == MODE_NO_PATTERNS) + set_sparse_index_config(the_repository, 0); + return 0; } @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv) the_repository->index->updated_workdir = 1; } + core_apply_sparse_checkout = 1; + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); - core_apply_sparse_checkout = 1; return update_working_directory(NULL); } @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv) add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); strbuf_addstr(&pattern, "!/*/"); add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); + pl.use_cone_patterns = init_opts.cone_mode; return write_patterns_and_update(&pl); } @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv) strbuf_addstr(&match_all, "/*"); add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); + prepare_repo_settings(the_repository); + the_repository->settings.sparse_index = 0; + if (update_working_directory(&pl)) die(_("error while refreshing working directory")); diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index fc64e9ed99f4..ff1ad570a255 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' ' check_files repo a deep folder1 folder2 ' +test_expect_success 'sparse-index enabled and disabled' ' + git -C repo sparse-checkout init --cone --sparse-index && + test_cmp_config -C repo true extensions.sparseIndex && + test-tool -C repo read-cache --table >cache && + grep " tree " cache && + + git -C repo sparse-checkout disable && + test-tool -C repo read-cache --table >cache && + ! grep " tree " cache && + git -C repo config --list >config && + ! grep extensions.sparseindex config +' + test_expect_success 'cone mode: init and set' ' git -C repo sparse-checkout init --cone && git -C repo config --list >config && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 18/20] cache-tree: integrate with sparse directory entries 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (16 preceding siblings ...) 2021-03-10 19:31 ` [PATCH v2 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 ` Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 18 ++++++++++++++++++ sparse-index.c | 10 +++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 5f07a39e501e..950a9615db8f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it, *skip_count = 0; + /* + * If the first entry of this region is a sparse directory + * entry corresponding exactly to 'base', then this cache_tree + * struct is a "leaf" in the data structure, pointing to the + * tree OID specified in the entry. + */ + if (entries > 0) { + const struct cache_entry *ce = cache[0]; + + if (S_ISSPARSEDIR(ce->ce_mode) && + ce->ce_namelen == baselen && + !strncmp(ce->name, base, baselen)) { + it->entry_count = 1; + oidcpy(&it->oid, &ce->oid); + return 1; + } + } + if (0 <= it->entry_count && has_object_file(&it->oid)) return it->entry_count; diff --git a/sparse-index.c b/sparse-index.c index 1c84cac255bf..ea603201a323 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -180,7 +180,11 @@ int convert_to_sparse(struct index_state *istate) istate->cache_nr = convert_to_sparse_rec(istate, 0, 0, istate->cache_nr, "", 0, istate->cache_tree); - istate->drop_cache_tree = 1; + + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + istate->sparse_index = 1; trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; @@ -278,5 +282,9 @@ void ensure_full_index(struct index_state *istate) free(full); + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 19/20] sparse-index: loose integration with cache_tree_verify() 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (17 preceding siblings ...) 2021-03-10 19:31 ` [PATCH v2 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 ` Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE is enabled, which it is by default in the test suite. The logic must be adjusted for the presence of these directory entries. For now, leave the test as a simple check for whether the directory entry is sparse. Do not go any further until needed. This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in t1092-sparse-checkout-compatibility.sh. Further, p2000-sparse-operations.sh uses the test suite and hence this is enabled for all tests. We need to integrate with it before we run our performance tests with a sparse-index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 19 +++++++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 1 - 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 950a9615db8f..11bf1fcae6e1 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -808,6 +808,19 @@ int cache_tree_matches_traversal(struct cache_tree *root, return 0; } +static void verify_one_sparse(struct repository *r, + struct index_state *istate, + struct cache_tree *it, + struct strbuf *path, + int pos) +{ + struct cache_entry *ce = istate->cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + BUG("directory '%s' is present in index, but not sparse", + path->buf); +} + static void verify_one(struct repository *r, struct index_state *istate, struct cache_tree *it, @@ -830,6 +843,12 @@ static void verify_one(struct repository *r, if (path->len) { pos = index_name_pos(istate, path->buf, path->len); + + if (pos >= 0) { + verify_one_sparse(r, istate, it, path, pos); + return; + } + pos = -pos - 1; } else { pos = 0; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 9c2bc4d25f66..c2624176c2e0 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,7 +2,6 @@ test_description='compare full workdir to sparse workdir' -GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 GIT_TEST_SPARSE_INDEX= -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v2 20/20] p2000: add sparse-index repos 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (18 preceding siblings ...) 2021-03-10 19:31 ` [PATCH v2 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 ` Derrick Stolee via GitGitGadget 2021-03-11 0:07 ` [PATCH v2 00/20] Sparse Index: Design, Format, Tests Elijah Newren 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-10 19:31 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> p2000-sparse-operations.sh compares different Git commands in repositories with many files at HEAD but using sparse-checkout to focus on a small portion of those files. Add extra copies of the repository that use the sparse-index format so we can track how that affects the performance of different commands. At this point in time, the sparse-index is 100% overhead from the CPU front, and this is measurable in these tests: Test --------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.59(0.51+0.12) 2000.3: git status (full-index-v4) 0.59(0.52+0.11) 2000.4: git status (sparse-index-v3) 1.40(1.32+0.12) 2000.5: git status (sparse-index-v4) 1.41(1.36+0.08) 2000.6: git add -A (full-index-v3) 2.32(1.97+0.19) 2000.7: git add -A (full-index-v4) 2.17(1.92+0.14) 2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15) 2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13) 2000.10: git add . (full-index-v3) 2.39(2.02+0.20) 2000.11: git add . (full-index-v4) 2.20(1.94+0.16) 2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12) 2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16) 2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20) 2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17) 2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16) 2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15) Note that there is very little difference between the v3 and v4 index formats when the sparse-index is enabled. This is primarily due to the fact that the relative file sizes are the same, and the command time is mostly taken up by parsing tree objects to expand the sparse index into a full one. With the current file layout, the index file sizes are given by this table: | full index | sparse index | +-------------+--------------+ v3 | 108 MiB | 1.6 MiB | v4 | 80 MiB | 1.2 MiB | Future updates will improve the performance of Git commands when the index is sparse. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 2fbc81b22119..e527316e66d6 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -60,12 +60,29 @@ test_expect_success 'setup repo and indexes' ' git sparse-checkout set $SPARSE_CONE && git config index.version 4 && git update-index --index-version=4 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v3 && + ( + cd sparse-index-v3 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v4 && + ( + cd sparse-index-v4 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 ) ' test_perf_on_all () { command="$@" - for repo in full-index-v3 full-index-v4 + for repo in full-index-v3 full-index-v4 \ + sparse-index-v3 sparse-index-v4 do test_perf "$command ($repo)" " ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v2 00/20] Sparse Index: Design, Format, Tests 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (19 preceding siblings ...) 2021-03-10 19:31 ` [PATCH v2 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget @ 2021-03-11 0:07 ` Elijah Newren 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget 21 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-11 0:07 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. > > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ > > I won't waste too much space here, because PATCH 1 includes a sizeable > design document that describes the feature, the reasoning behind it, and my > plan for getting this implemented widely throughout the codebase. > > There are some new things here that were not in the RFC: > > * Design doc and format updates. (Patch 1) > * Performance test script. (Patches 2 and 20) > > Notably missing in this series from the RFC: > > * The mega-patch inserting ensure_full_index() throughout the codebase. > That will be a follow-up series to this one. > * The integrations with git status and git add to demonstrate the improved > performance. Those will also appear in their own series later. > > I plan to keep my latest work in this area in my 'sparse-index/wip' branch > [2]. It includes all of the work from the RFC right now, updated with the > work from this series. > > [2] https://github.com/derrickstolee/git/tree/sparse-index/wip > > > Updates in V2 > ============= > > * Various typos and awkward grammar is fixed. > * Cleaned up unnecessary commands in p2000-sparse-operations.sh > * Added a comment to the sparse_index member of struct index_state. > * Used tree_type, commit_type, and blob_type in test-read-cache.c. I read through the range-diff and comments from the previous series. There's only a few things left (as I noted in comments), but they're all pretty trivial so this one is: Reviewed-by: Elijah Newren <newren@gmail.com> > > Thanks, -Stolee > > Derrick Stolee (20): > sparse-index: design doc and format update > t/perf: add performance test for sparse operations > t1092: clean up script quoting > sparse-index: add guard to ensure full index > sparse-index: implement ensure_full_index() > t1092: compare sparse-checkout to sparse-index > test-read-cache: print cache entries with --table > test-tool: don't force full index > unpack-trees: ensure full index > sparse-checkout: hold pattern list in index > sparse-index: convert from full to sparse > submodule: sparse-index should not collapse links > unpack-trees: allow sparse directories > sparse-index: check index conversion happens > sparse-index: create extension for compatibility > sparse-checkout: toggle sparse index from builtin > sparse-checkout: disable sparse-index > cache-tree: integrate with sparse directory entries > sparse-index: loose integration with cache_tree_verify() > p2000: add sparse-index repos > > Documentation/config/extensions.txt | 8 + > Documentation/git-sparse-checkout.txt | 14 ++ > Documentation/technical/index-format.txt | 7 + > Documentation/technical/sparse-index.txt | 173 ++++++++++++++ > Makefile | 1 + > builtin/sparse-checkout.c | 44 +++- > cache-tree.c | 40 ++++ > cache.h | 18 +- > read-cache.c | 35 ++- > repo-settings.c | 15 ++ > repository.c | 11 +- > repository.h | 3 + > setup.c | 3 + > sparse-index.c | 290 +++++++++++++++++++++++ > sparse-index.h | 11 + > t/README | 3 + > t/helper/test-read-cache.c | 66 +++++- > t/perf/p2000-sparse-operations.sh | 102 ++++++++ > t/t1091-sparse-checkout-builtin.sh | 13 + > t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++-- > unpack-trees.c | 16 +- > 21 files changed, 969 insertions(+), 40 deletions(-) > create mode 100644 Documentation/technical/sparse-index.txt > create mode 100644 sparse-index.c > create mode 100644 sparse-index.h > create mode 100755 t/perf/p2000-sparse-operations.sh > > > base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/883 > > Range-diff vs v1: > > 1: daa9a6bcefbc ! 1: 2fe413fdac80 sparse-index: design doc and format update > @@ Documentation/technical/sparse-index.txt (new) > +If we need to discover the details for paths within that directory, we > +can parse trees to find that list. > + > -+This addition of sparse-directory entries violates expectations about the > ++At time of writing, sparse-directory entries violate expectations about the > +index format and its in-memory data structure. There are many consumers in > +the codebase that expect to iterate through all of the index entries and > +see only files. In addition, they expect to see all files at `HEAD`. One > @@ Documentation/technical/sparse-index.txt (new) > +* `git merge` > +* `git rebase` > + > ++Hopefully, commands such as `git merge` and `git rebase` can benefit > ++instead from merge algorithms that do not use the index as a data > ++structure, such as the merge-ORT strategy. As these topics mature, we > ++may enalbe the ORT strategy by default for repositories using the > ++sparse-index feature. > ++ > +Along with `git status` and `git add`, these commands cover the majority > +of users' interactions with the working directory. In addition, we can > +integrate with these commands: > 2: a8c6322a3dbe ! 2: 540ab5495065 t/perf: add performance test for sparse operations > @@ t/perf/p2000-sparse-operations.sh (new) > + # Remove submodules from the example repo, because our > + # duplication of the entire repo creates an unlikly data shape. > + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && > -+ rm -f .gitmodules && > -+ git add .gitmodules && > ++ git rm -f .gitmodules && > + for module in $(awk "{print \$2}" modules) > + do > + git rm $module || return 1 > + done && > -+ git add . && > + git commit -m "remove submodules" && > + > + echo bogus >a && > 3: 6e783c88821e = 3: 5cbedb377b37 t1092: clean up script quoting > 4: 01da4c48a1fa = 4: 6e21f776e883 sparse-index: add guard to ensure full index > 5: 2b83989fbcd3 ! 5: 399ddb0bad56 sparse-index: implement ensure_full_index() > @@ cache.h: struct index_state { > updated_skipworktree : 1, > - fsmonitor_has_run_once : 1; > + fsmonitor_has_run_once : 1, > ++ > ++ /* > ++ * sparse_index == 1 when sparse-directory > ++ * entries exist. Requires sparse-checkout > ++ * in cone mode. > ++ */ > + sparse_index : 1; > struct hashmap name_hash; > struct hashmap dir_hash; > 6: c9910a37579c = 6: eac2db5efc22 t1092: compare sparse-checkout to sparse-index > 7: 3d92df7a0cf9 ! 7: e9c82d2eda82 test-read-cache: print cache entries with --table > @@ Commit message > > ## t/helper/test-read-cache.c ## > @@ > + #include "test-tool.h" > #include "cache.h" > #include "config.h" > - > ++#include "blob.h" > ++#include "commit.h" > ++#include "tree.h" > ++ > +static void print_cache_entry(struct cache_entry *ce) > +{ > -+ printf("%06o ", ce->ce_mode & 0777777); > ++ const char *type; > ++ printf("%06o ", ce->ce_mode & 0177777); > + > + if (S_ISSPARSEDIR(ce->ce_mode)) > -+ printf("tree "); > ++ type = tree_type; > + else if (S_ISGITLINK(ce->ce_mode)) > -+ printf("commit "); > ++ type = commit_type; > + else > -+ printf("blob "); > ++ type = blob_type; > + > -+ printf("%s\t%s\n", > ++ printf("%s %s\t%s\n", > ++ type, > + oid_to_hex(&ce->oid), > + ce->name); > +} > + > -+static void print_cache(struct index_state *cache) > ++static void print_cache(struct index_state *istate) > +{ > + int i; > -+ for (i = 0; i < the_index.cache_nr; i++) > -+ print_cache_entry(the_index.cache[i]); > ++ for (i = 0; i < istate->cache_nr; i++) > ++ print_cache_entry(istate->cache[i]); > +} > -+ > + > int cmd__read_cache(int argc, const char **argv) > { > + struct repository *r = the_repository; > 8: 94373e2bfbbc ! 8: 243541fc5820 test-tool: don't force full index > @@ Commit message > > ## t/helper/test-read-cache.c ## > @@ > - #include "test-tool.h" > - #include "cache.h" > - #include "config.h" > + #include "blob.h" > + #include "commit.h" > + #include "tree.h" > +#include "sparse-index.h" > > static void print_cache_entry(struct cache_entry *ce) > 9: e71f033c2871 = 9: 48f65093b3da unpack-trees: ensure full index > 10: f86d3dc154d1 ! 10: 83aac8b7a1ec sparse-checkout: hold pattern list in index > @@ Commit message > pattern set, we need access to that in-memory copy. Place a pointer to > a 'struct pattern_list' in the index so we can access this on-demand. > This will be used in the next change which uses the sparse-checkout > - definition to filter out directories that are outsie the sparse cone. > + definition to filter out directories that are outside the sparse cone. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > 11: a2d77c23a0cb ! 11: f6db0c27a285 sparse-index: convert from full to sparse > @@ read-cache.c: int verify_path(const char *path, unsigned mode) > return 0; > + /* > + * allow terminating directory separators for > -+ * sparse directory enries. > ++ * sparse directory entries. > + */ > + if (c == '\0') > + return S_ISDIR(mode); > @@ sparse-index.c > + struct cache_entry *ce = istate->cache[i]; > + > + /* > -+ * Detect if this is a normal entry oustide of any subtree > ++ * Detect if this is a normal entry outside of any subtree > + * entry. > + */ > + base = ce->name + ct_pathlen; > 12: 4405a9115c3b = 12: f2a3e7298798 submodule: sparse-index should not collapse links > 13: fda23f07e6a2 ! 13: 6f1ebe6ccc08 unpack-trees: allow sparse directories > @@ Commit message > is possible to have a directory in a sparse index as long as that entry > is itself marked with the skip-worktree bit. > > - The negation of the 'pos' variable must be conditioned to only when it > - starts as negative. This is identical behavior as before when the index > - is full. > + The 'pos' variable is assigned a negative value if an exact match is not > + found. Since a directory name can be an exact match, it is no longer an > + error to have a nonnegative 'pos' value. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > 14: 7d4627574bb8 = 14: 3fa684b315fb sparse-index: check index conversion happens > 15: 564503f78784 ! 15: d74576d677f6 sparse-index: create extension for compatibility > @@ Commit message > > We _could_ add a new index version that explicitly adds these > capabilities, but there are nuances to index formats 2, 3, and 4 that > - are still valuable to select as options. For now, create a repo > - extension, "extensions.sparseIndex", that specifies that the tool > - reading this repository must understand sparse directory entries. > + are still valuable to select as options. Until we add index format > + version 5, create a repo extension, "extensions.sparseIndex", that > + specifies that the tool reading this repository must understand sparse > + directory entries. > > This change only encodes the extension and enables it when > GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI > @@ Documentation/config/extensions.txt: extensions.objectFormat:: > + When combined with `core.sparseCheckout=true` and > + `core.sparseCheckoutCone=true`, the index may contain entries > + corresponding to directories outside of the sparse-checkout > -+ definition. Versions of Git that do not understand this extension > -+ do not expect directory entries in the index. > ++ definition in lieu of containing each path under such directories. > ++ Versions of Git that do not understand this extension do not > ++ expect directory entries in the index. > > ## cache.h ## > @@ cache.h: struct repository_format { > 16: 6d6b230e3318 ! 16: e530ca5f668d sparse-checkout: toggle sparse index from builtin > @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees > +a sparse index until they are properly integrated with the feature. > ++ > +**WARNING:** Using a sparse index requires modifying the index in a way > -+that is not completely understood by other tools. Enabling sparse index > -+enables the `extensions.spareseIndex` config value, which might cause > -+other tools to stop working with your repository. If you have trouble with > -+this compatibility, then run `git sparse-checkout sparse-index disable` to > -+remove this config and rewrite your index to not be sparse. > ++that is not completely understood by external tools. If you have trouble > ++with this compatibility, then run `git sparse-checkout sparse-index disable` > ++to rewrite your index to not be sparse. Older versions of Git will not > ++understand the `sparseIndex` repository extension and may fail to interact > ++with your repository until it is disabled. > > 'set':: > Write a set of patterns to the sparse-checkout file, as given as > 17: bcf960ef2362 = 17: 42d0da9c5def sparse-checkout: disable sparse-index > 18: e6afec58674e = 18: 6bb0976a6295 cache-tree: integrate with sparse directory entries > 19: 2be4981fe698 = 19: 07f34e80609a sparse-index: loose integration with cache_tree_verify() > 20: a738b0ba8ab4 = 20: 41e3b56b9c17 p2000: add sparse-index repos > > -- > gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 00/20] Sparse Index: Design, Format, Tests 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget ` (20 preceding siblings ...) 2021-03-11 0:07 ` [PATCH v2 00/20] Sparse Index: Design, Format, Tests Elijah Newren @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget ` (23 more replies) 21 siblings, 24 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee Here is the first full patch series submission coming out of the sparse-index RFC [1]. [1] https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ I won't waste too much space here, because PATCH 1 includes a sizeable design document that describes the feature, the reasoning behind it, and my plan for getting this implemented widely throughout the codebase. There are some new things here that were not in the RFC: * Design doc and format updates. (Patch 1) * Performance test script. (Patches 2 and 20) Notably missing in this series from the RFC: * The mega-patch inserting ensure_full_index() throughout the codebase. That will be a follow-up series to this one. * The integrations with git status and git add to demonstrate the improved performance. Those will also appear in their own series later. I plan to keep my latest work in this area in my 'sparse-index/wip' branch [2]. It includes all of the work from the RFC right now, updated with the work from this series. [2] https://github.com/derrickstolee/git/tree/sparse-index/wip Updates in V3 ============= For this version, I took Ævar's latest patches and applied them to v2.31.0 and rebased this series on top. It uses his new "read_tree_at()" helper and the associated changes to the function pointer type. * Fixed more typos. Thanks Martin and Elijah! * Updated the test_sparse_match() macro to use "$@" instead of $* * Added a test that git sparse-checkout init --no-sparse-index rewrites the index to be full. Updates in V2 ============= * Various typos and awkward grammar is fixed. * Cleaned up unnecessary commands in p2000-sparse-operations.sh * Added a comment to the sparse_index member of struct index_state. * Used tree_type, commit_type, and blob_type in test-read-cache.c. Thanks, -Stolee Derrick Stolee (20): sparse-index: design doc and format update t/perf: add performance test for sparse operations t1092: clean up script quoting sparse-index: add guard to ensure full index sparse-index: implement ensure_full_index() t1092: compare sparse-checkout to sparse-index test-read-cache: print cache entries with --table test-tool: don't force full index unpack-trees: ensure full index sparse-checkout: hold pattern list in index sparse-index: convert from full to sparse submodule: sparse-index should not collapse links unpack-trees: allow sparse directories sparse-index: check index conversion happens sparse-index: create extension for compatibility sparse-checkout: toggle sparse index from builtin sparse-checkout: disable sparse-index cache-tree: integrate with sparse directory entries sparse-index: loose integration with cache_tree_verify() p2000: add sparse-index repos Documentation/config/extensions.txt | 8 + Documentation/git-sparse-checkout.txt | 14 ++ Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 173 +++++++++++++ Makefile | 1 + builtin/sparse-checkout.c | 44 +++- cache-tree.c | 40 ++++ cache.h | 18 +- read-cache.c | 35 ++- repo-settings.c | 15 ++ repository.c | 11 +- repository.h | 3 + setup.c | 3 + sparse-index.c | 293 +++++++++++++++++++++++ sparse-index.h | 11 + t/README | 3 + t/helper/test-read-cache.c | 66 ++++- t/perf/p2000-sparse-operations.sh | 102 ++++++++ t/t1091-sparse-checkout-builtin.sh | 13 + t/t1092-sparse-checkout-compatibility.sh | 143 +++++++++-- unpack-trees.c | 16 +- 21 files changed, 979 insertions(+), 40 deletions(-) create mode 100644 Documentation/technical/sparse-index.txt create mode 100644 sparse-index.c create mode 100644 sparse-index.h create mode 100755 t/perf/p2000-sparse-operations.sh base-commit: 9c34e7ffd7b544199d889e2f3f7d9ba663c4357d Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/883 Range-diff vs v2: 1: 2fe413fdac80 ! 1: 62ac13945bec sparse-index: design doc and format update @@ Documentation/technical/sparse-index.txt (new) +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we -+may enalbe the ORT strategy by default for repositories using the ++may enable the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority 2: 540ab5495065 = 2: d2197e895e4d t/perf: add performance test for sparse operations 3: 5cbedb377b37 = 3: d3cfd34b8418 t1092: clean up script quoting 4: 6e21f776e883 = 4: 4472118cf903 sparse-index: add guard to ensure full index 5: 399ddb0bad56 ! 5: 99292cdbaae4 sparse-index: implement ensure_full_index() @@ sparse-index.c +} + +static int add_path_to_index(const struct object_id *oid, -+ struct strbuf *base, const char *path, -+ unsigned int mode, int stage, void *context) ++ struct strbuf *base, const char *path, ++ unsigned int mode, void *context) +{ + struct index_state *istate = (struct index_state *)context; + struct cache_entry *ce; @@ sparse-index.c - /* intentionally left blank */ + int i; + struct index_state *full; ++ struct strbuf base = STRBUF_INIT; + + if (!istate || !istate->sparse_index) + return; @@ sparse-index.c + ps.has_wildcard = 1; + ps.max_depth = -1; + -+ read_tree_recursive(istate->repo, tree, -+ ce->name, strlen(ce->name), -+ 0, &ps, -+ add_path_to_index, full); ++ strbuf_setlen(&base, 0); ++ strbuf_add(&base, ce->name, strlen(ce->name)); ++ ++ read_tree_at(istate->repo, tree, &base, &ps, ++ add_path_to_index, full); + + /* free directory entries. full entries are re-used */ + discard_cache_entry(ce); @@ sparse-index.c + istate->cache_nr = full->cache_nr; + istate->cache_alloc = full->cache_alloc; + ++ strbuf_release(&base); + free(full); + + trace2_region_leave("index", "ensure_full_index", istate->repo); 6: eac2db5efc22 ! 6: fae5663a17bb t1092: compare sparse-checkout to sparse-index @@ Commit message add run_on_sparse and test_sparse_match helpers. These helpers will be used when the sparse index is implemented. - Add GIT_TEST_SPARSE_INDEX environment variable to enable the - sparse-index by default. This will be intended to use across the entire - test suite, except that it will only affect cases where the - sparse-checkout feature is enabled. + Add the GIT_TEST_SPARSE_INDEX environment variable to enable the + sparse-index by default. This can be enabled across all tests, but that + will only affect cases where the sparse-checkout feature is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> @@ t/t1092-sparse-checkout-compatibility.sh: test_all_match () { } +test_sparse_match () { -+ run_on_sparse $* && ++ run_on_sparse "$@" && + test_cmp sparse-checkout-out sparse-index-out && + test_cmp sparse-checkout-err sparse-index-err +} 7: e9c82d2eda82 = 7: dffe8821fde2 test-read-cache: print cache entries with --table 8: 243541fc5820 = 8: f4ad081f25bb test-tool: don't force full index 9: 48f65093b3da = 9: 4780076a50df unpack-trees: ensure full index 10: 83aac8b7a1ec = 10: 33fdba2b8cfd sparse-checkout: hold pattern list in index 11: f6db0c27a285 ! 11: e41b14e03ebb sparse-index: convert from full to sparse @@ t/t1092-sparse-checkout-compatibility.sh test_description='compare full workdir to sparse workdir' ++# The verify_cache_tree() check is not sparse-aware (yet). ++# So, disable the check until that integration is complete. +GIT_TEST_CHECK_CACHE_TREE=0 +GIT_TEST_SPLIT_INDEX=0 + @@ t/t1092-sparse-checkout-compatibility.sh: run_on_all () { } test_sparse_match () { -- run_on_sparse $* && -+ run_on_sparse "$@" && - test_cmp sparse-checkout-out sparse-index-out && +@@ t/t1092-sparse-checkout-compatibility.sh: test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } 12: f2a3e7298798 = 12: b77cd6b02265 submodule: sparse-index should not collapse links 13: 6f1ebe6ccc08 = 13: 4000c5cdd4cf unpack-trees: allow sparse directories 14: 3fa684b315fb = 14: 1a2be38b2ca7 sparse-index: check index conversion happens 15: d74576d677f6 = 15: f89891b0ae4e sparse-index: create extension for compatibility 16: e530ca5f668d ! 16: bd703c76c859 sparse-checkout: toggle sparse index from builtin @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble -+with this compatibility, then run `git sparse-checkout sparse-index disable` ++with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not +understand the `sparseIndex` repository extension and may fail to interact +with your repository until it is disabled. @@ sparse-index.h: struct index_state; ## t/t1092-sparse-checkout-compatibility.sh ## @@ t/t1092-sparse-checkout-compatibility.sh: test_description='compare full workdir to sparse workdir' - + # So, disable the check until that integration is complete. GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index cont test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index contents' ' + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 +- done ++ done && ++ ++ # Disabling the sparse-index removes tree entries with full ones ++ git -C sparse-index sparse-checkout init --no-sparse-index && ++ ++ test-tool -C sparse-index read-cache --table >cache && ++ ! grep "040000 tree" cache && ++ test_sparse_match test-tool read-cache --table + ' + + test_expect_success 'expanded in-memory index matches full index' ' @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && 17: 42d0da9c5def = 17: 598557f90a2a sparse-checkout: disable sparse-index 18: 6bb0976a6295 ! 18: c2d0c17db31a cache-tree: integrate with sparse directory entries @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; @@ sparse-index.c: void ensure_full_index(struct index_state *istate) - + strbuf_release(&base); free(full); + /* Clear and recompute the cache-tree */ 19: 07f34e80609a ! 19: 6fdd9323c14e sparse-index: loose integration with cache_tree_verify() @@ t/t1092-sparse-checkout-compatibility.sh test_description='compare full workdir to sparse workdir' +-# The verify_cache_tree() check is not sparse-aware (yet). +-# So, disable the check until that integration is complete. -GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 GIT_TEST_SPARSE_INDEX= 20: 41e3b56b9c17 = 20: 3db06ac46dd5 p2000: add sparse-index repos -- gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 01/20] sparse-index: design doc and format update 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-19 23:43 ` Junio C Hamano 2021-03-16 16:42 ` [PATCH v3 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget ` (22 subsequent siblings) 23 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This begins a long effort to update the index format to allow sparse directory entries. This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in their sparse-checkout definition. Currently, the index format is only updated in the presence of extensions.sparseIndex instead of increasing a file format version number. This is temporary, and index v5 is part of the plan for future work in this area. The design document details many of the reasons for embarking on this work, and also the plan for completing it safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 173 +++++++++++++++++++++++ 2 files changed, 180 insertions(+) create mode 100644 Documentation/technical/sparse-index.txt diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index d363a71c37ec..cc548eaa0e97 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 000000000000..aa116406a016 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,173 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git worktree are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. +These dimensions are also ordered by how expensive they are per item: it +is expensive to detect a modified file than it is to write one that we +know must be populated; changing `HEAD` only really requires updating the +index. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In addition, they expect to see all files at `HEAD`. One +way to handle this is to parse trees to replace a sparse-directory entry +with all of the files within that tree as the index is loaded. However, +parsing trees is slower than parsing the index format, so that is a slower +operation than if we left the index alone. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will expand the sparse-directory entries into +the full list of paths at `HEAD`. This will be slower in all cases. The +only noticable change in behavior will be that the serialized index file +contains sparse-directory entries. + +To start, we use a new repository extension, `extensions.sparseIndex`, to +allow inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, but it also prevents other +operations that do not use the index at all. A new format, index v5, will +be introduced that includes sparse-directory entries by default. It might +also introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. After these guards are in place, we can begin +leaving sparse-directory entries in the in-memory index structure. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we +may enable the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 01/20] sparse-index: design doc and format update 2021-03-16 16:42 ` [PATCH v3 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-19 23:43 ` Junio C Hamano 2021-03-23 11:16 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-19 23:43 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <dstolee@microsoft.com> > > This begins a long effort to update the index format to allow sparse > directory entries. This should result in a significant improvement to > Git commands when HEAD contains millions of files, but the user has > selected many fewer files to keep in their sparse-checkout definition. This compromise makes sense. In the past, we often dreamed of recording trees in the index (instead of using a bolted on extension like cache-tree, treating trees as first-class citizens) and lazily expanding it only when the user starts modifying the paths within the subdirectory. But such an optimization never materialized, as the dual and conflicting nature of the index to keep track of the contents for the "next" commit (for which it is sufficient to just record trees for parts that have not been modified) and to cache stat information to detect which working tree paths may possibly have modifications (for which, we used the one-entry-per-path nature of the cache entries so far) was never resolved. But if we limit the use of trees-in-index for sparse/cone checkout case, we do not even have to worry about having to cache the stat information for those paths that we are not going to populate in the working tree at all. It is a great simplification of the problem. > + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and > + the path ends in a directory separator. > + Why leading two 0's? At the tree object level, we do not 0-pad blob mode word, and if you are writing for C programmers, you need only one '0' prefix to signal that it is in octal (in the on-disk index file, the blob mode word is stored in a be16 word). > diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt > new file mode 100644 > index 000000000000..aa116406a016 > --- /dev/null > +++ b/Documentation/technical/sparse-index.txt > @@ -0,0 +1,173 @@ > +Git Sparse-Index Design Document > +================================ > + > +The sparse-checkout feature allows users to focus a working directory on > +a subset of the files at HEAD. The cone mode patterns, enabled by > +`core.sparseCheckoutCone`, allow for very fast pattern matching to > +discover which files at HEAD belong in the sparse-checkout cone. > + > +Three important scale dimensions for a Git worktree are: s/worktree/working tree/; The former is the thing the "git worktree" command deals with. The latter is relevant even when "git worktree" is not used (the traditional "git clone and you get a working tree to work in"). > +* `HEAD`: How many files are present at `HEAD`? > + > +* Populated: How many files are within the sparse-checkout cone. > + > +* Modified: How many files has the user modified in the working directory? > + > +We will use big-O notation -- O(X) -- to denote how expensive certain > +operations are in terms of these dimensions. > + > +These dimensions are ordered by their magnitude: users (typically) modify > +fewer files than are populated, and we can only populate files at `HEAD`. OK. > +These dimensions are also ordered by how expensive they are per item: it > +is expensive to detect a modified file than it is to write one that we > +know must be populated; changing `HEAD` only really requires updating the > +index. This is a bit too dense to grok. Among Populated, there are some Modified but it takes lstat(2) per path or fsmonitor listening to inotify to know which ones are in the Modified set. Is that the "expensive" you are referring to here? I am not sure how you compared the cost to know if a path is modified or merely populated with the cost of "write one that we know must be populated" (which I take as "given a populated file, make modification to it"). Also it is unclear what you mean by "changing HEAD only require updating the index". Certainly when "git switch" flips HEAD from one commit to another, you'd update the index and update the files in the working tree (in the Populated part that is in the sparse-checkout cone) to match, no? > +Problems occur if there is an extreme imbalance in these dimensions. For > +example, if `HEAD` contains millions of paths but the populated set has > +only tens of thousands, then commands like `git status` and `git add` can > +be dominated by operations that require O(`HEAD`) operations instead of > +O(Populated). Primarily, the cost is in parsing and rewriting the index, > +which is filled primarily with files at `HEAD` that are marked with the > +`SKIP_WORKTREE` bit. > + > +The sparse-index intends to take these commands that read and modify the > +index from O(`HEAD`) to O(Populated). To do this, we need to modify the > +index format in a significant way: add "sparse directory" entries. OK. > +With cone mode patterns, it is possible to detect when an entire > +directory will have its contents outside of the sparse-checkout definition. > +Instead of listing all of the files it contains as individual entries, a > +sparse-index contains an entry with the directory name, referencing the > +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. > +If we need to discover the details for paths within that directory, we > +can parse trees to find that list. ;-) > +At time of writing, sparse-directory entries violate expectations about the > +index format and its in-memory data structure. There are many consumers in > +the codebase that expect to iterate through all of the index entries and > +see only files. True. > In addition, they expect to see all files at `HEAD`. It is not clear to me what this means. After "git add", "git ls-files" would expect to see a file that may not even in HEAD. After "git rm", it would expect to see some file missing from the set of paths in HEAD. While I do not think that is what you meant here, it is hard to guess what you wanted to say. > One > +way to handle this is to parse trees to replace a sparse-directory entry > +with all of the files within that tree as the index is loaded. However, > +parsing trees is slower than parsing the index format, so that is a slower > +operation than if we left the index alone. Besides, that would leave in-core index fully populated, so I would suspect that you'd lose a lot of benefit that comes from having to keep much fewer entries in the in-core index than what is in HEAD. It would be nice for "git diff-index --cached" (which is part of "git status") to be able to skip a single "tree" entry in the sparse index as "known to be untouched", than skipping thousands of paths in that single subdirectory (in a mega monorepo project) as "these are marked with SKIP_WORKTREE so ignore what is in the working tree". > +The implementation plan below follows four phases to slowly integrate with > +the sparse-index. The intention is to incrementally update Git commands to > +interact safely with the sparse-index without significant slowdowns. This > +may not always be possible, but the hope is that the primary commands that > +users need in their daily work are dramatically improved. OK. > +Phase I: Format and initial speedups > +------------------------------------ > + > +During this phase, Git learns to enable the sparse-index and safely parse > +one. Protections are put in place so that every consumer of the in-memory > +data structure can operate with its current assumption of every file at > +`HEAD`. IOW, before they iterate over the in-core index, tree entries are expanded into bunch of individual entries with SKIP_WORKTREE bit? Makes sense. > +At first, every index parse will expand the sparse-directory entries into > +the full list of paths at `HEAD`. This will be slower in all cases. The > +only noticable change in behavior will be that the serialized index file > +contains sparse-directory entries. Hmph, do you mean that the expansion is done by not replacing each "tree" entry with blob entries for the contents of the directory, but the original "tree" entry is still left in the in-core index? It is not immediately clear what we are trying to gain by leaving it in, but let's read on. Perhaps we can get rid of cache-tree extension and replace its use with these "tree" entries whose content paths are populated in the index? > +To start, we use a new repository extension, `extensions.sparseIndex`, to > +allow inserting sparse-directory entries into indexes with file format > +versions 2, 3, and 4. This prevents Git versions that do not understand > +the sparse-index from operating on one, but it also prevents other > +operations that do not use the index at all. A new format, index v5, will > +be introduced that includes sparse-directory entries by default. It might > +also introduce other features that have been considered for improving the > +index, as well. OK. > +Next, consumers of the index will be guarded against operating on a > +sparse-index by inserting calls to `ensure_full_index()` or > +`expand_index_to_path()`. After these guards are in place, we can begin > +leaving sparse-directory entries in the in-memory index structure. It is unclear why "we can begin leaving"; an iterator that only expects to see blobs would need to be updated to skip them, too, no? They would probably be already skipping blob entries that are marked with the SKIP_WORKTREE bit, so it may be just a matter of skipping more things than the current code. Or did I misread the design presented earlier, and when a directory that is outside the cone is expanded into the paths of blobs in the directory, the "tree" entry is removed from the in-core index? > +Even after inserting these guards, we will keep expanding sparse-indexes > +for most Git commands using the `command_requires_full_index` repository > +setting. This setting will be on by default and disabled one builtin at a > +time until we have sufficient confidence that all of the index operations > +are properly guarded. OK. > +To complete this phase, the commands `git status` and `git add` will be > +integrated with the sparse-index so that they operate with O(Populated) > +performance. They will be carefully tested for operations within and > +outside the sparse-checkout definition. ;-) ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 01/20] sparse-index: design doc and format update 2021-03-19 23:43 ` Junio C Hamano @ 2021-03-23 11:16 ` Derrick Stolee 2021-03-23 20:10 ` Junio C Hamano 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-23 11:16 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee On 3/19/2021 7:43 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <dstolee@microsoft.com> >> >> This begins a long effort to update the index format to allow sparse >> directory entries. This should result in a significant improvement to >> Git commands when HEAD contains millions of files, but the user has >> selected many fewer files to keep in their sparse-checkout definition. > > This compromise makes sense. > > In the past, we often dreamed of recording trees in the index > (instead of using a bolted on extension like cache-tree, treating > trees as first-class citizens) and lazily expanding it only when the > user starts modifying the paths within the subdirectory. > > But such an optimization never materialized, as the dual and > conflicting nature of the index to keep track of the contents for > the "next" commit (for which it is sufficient to just record trees > for parts that have not been modified) and to cache stat information > to detect which working tree paths may possibly have modifications > (for which, we used the one-entry-per-path nature of the cache > entries so far) was never resolved. > > But if we limit the use of trees-in-index for sparse/cone checkout > case, we do not even have to worry about having to cache the stat > information for those paths that we are not going to populate in the > working tree at all. It is a great simplification of the problem. Thanks. I appreciate your input here. >> + These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and >> + the path ends in a directory separator. >> + > > Why leading two 0's? At the tree object level, we do not 0-pad blob > mode word, and if you are writing for C programmers, you need only > one '0' prefix to signal that it is in octal (in the on-disk index > file, the blob mode word is stored in a be16 word). Fixed. >> diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt >> new file mode 100644 >> index 000000000000..aa116406a016 >> --- /dev/null >> +++ b/Documentation/technical/sparse-index.txt >> @@ -0,0 +1,173 @@ >> +Git Sparse-Index Design Document >> +================================ >> + >> +The sparse-checkout feature allows users to focus a working directory on >> +a subset of the files at HEAD. The cone mode patterns, enabled by >> +`core.sparseCheckoutCone`, allow for very fast pattern matching to >> +discover which files at HEAD belong in the sparse-checkout cone. >> + >> +Three important scale dimensions for a Git worktree are: > > s/worktree/working tree/; The former is the thing the "git worktree" > command deals with. The latter is relevant even when "git worktree" > is not used (the traditional "git clone and you get a working tree > to work in"). I guess I'm distracted by using SKIP_WORKTREE a lot, but "working directory" is more specific and hence better. >> +These dimensions are also ordered by how expensive they are per item: it >> +is expensive to detect a modified file than it is to write one that we >> +know must be populated; changing `HEAD` only really requires updating the >> +index. > > This is a bit too dense to grok. Among Populated, there are some > Modified but it takes lstat(2) per path or fsmonitor listening to > inotify to know which ones are in the Modified set. Is that the > "expensive" you are referring to here? I am not sure how you > compared the cost to know if a path is modified or merely populated > with the cost of "write one that we know must be populated" (which I > take as "given a populated file, make modification to it"). I could rearrange things here. The important things to note are: 1. Updating index entries is very fast, but adds up at large scale. 2. It is faster to write a file to disk from Git's object database than it is to compare a file on disk to the copy in the database, which is frequently necessary when the mtime on disk doesn't match the mtime in the index. > Also it > is unclear what you mean by "changing HEAD only require updating the > index". Certainly when "git switch" flips HEAD from one commit to > another, you'd update the index and update the files in the working > tree (in the Populated part that is in the sparse-checkout cone) to > match, no? This is unclear of me. I was thinking more on the lines of "git reset" (soft mode) which updates HEAD without changing the files on disk. After all of this postulating, I think that the offending sentences are better off deleted. They don't add clarity over what can be inferred by an interested reader. >> In addition, they expect to see all files at `HEAD`. > > It is not clear to me what this means. After "git add", "git > ls-files" would expect to see a file that may not even in HEAD. > After "git rm", it would expect to see some file missing from the > set of paths in HEAD. While I do not think that is what you meant > here, it is hard to guess what you wanted to say. I'm mixing terms incorrectly. I think what I really mean is In fact, these loops expect to see a reference to every staged file. >> One >> +way to handle this is to parse trees to replace a sparse-directory entry >> +with all of the files within that tree as the index is loaded. However, >> +parsing trees is slower than parsing the index format, so that is a slower >> +operation than if we left the index alone. > > Besides, that would leave in-core index fully populated, so I would > suspect that you'd lose a lot of benefit that comes from having to > keep much fewer entries in the in-core index than what is in HEAD. > It would be nice for "git diff-index --cached" (which is part of > "git status") to be able to skip a single "tree" entry in the sparse > index as "known to be untouched", than skipping thousands of paths > in that single subdirectory (in a mega monorepo project) as "these > are marked with SKIP_WORKTREE so ignore what is in the working tree". Absolutely! I'm burying the lead here, so I should get to the real point by adding this to the end: The plan is to make all of these integrations "sparse aware" so this expansion through tree parsing is unnecessary and they use fewer resources than when using a full index. >> +Phase I: Format and initial speedups >> +------------------------------------ >> + >> +During this phase, Git learns to enable the sparse-index and safely parse >> +one. Protections are put in place so that every consumer of the in-memory >> +data structure can operate with its current assumption of every file at >> +`HEAD`. > > IOW, before they iterate over the in-core index, tree entries are expanded > into bunch of individual entries with SKIP_WORKTREE bit? Makes sense. > >> +At first, every index parse will expand the sparse-directory entries into >> +the full list of paths at `HEAD`. This will be slower in all cases. The >> +only noticable change in behavior will be that the serialized index file >> +contains sparse-directory entries. > > Hmph, do you mean that the expansion is done by not replacing each > "tree" entry with blob entries for the contents of the directory, > but the original "tree" entry is still left in the in-core index? I meant by "serialized index file" is that the file written to disk has the sparse directory entries, but the in-core copy will not (except for a very brief moment in time, during do_read_index()). The intention at this point in time is that all code behaves identically to the full index case, except that the index file itself is smaller due to these sparse directory entries. > It is not immediately clear what we are trying to gain by leaving it > in, but let's read on. Perhaps we can get rid of cache-tree > extension and replace its use with these "tree" entries whose > content paths are populated in the index? This is an interesting idea, but not one I plan to pursue with this work. >> +Next, consumers of the index will be guarded against operating on a >> +sparse-index by inserting calls to `ensure_full_index()` or >> +`expand_index_to_path()`. After these guards are in place, we can begin >> +leaving sparse-directory entries in the in-memory index structure. > > It is unclear why "we can begin leaving"; an iterator that only > expects to see blobs would need to be updated to skip them, too, no? > They would probably be already skipping blob entries that are marked > with the SKIP_WORKTREE bit, so it may be just a matter of skipping > more things than the current code. > > Or did I misread the design presented earlier, and when a directory > that is outside the cone is expanded into the paths of blobs in the > directory, the "tree" entry is removed from the in-core index? I will make this more explicit. Thanks for your help improving this doc! Hopefully the plan is a little more clear, now. -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 01/20] sparse-index: design doc and format update 2021-03-23 11:16 ` Derrick Stolee @ 2021-03-23 20:10 ` Junio C Hamano 2021-03-23 20:42 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-23 20:10 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee Derrick Stolee <stolee@gmail.com> writes: >>> +Three important scale dimensions for a Git worktree are: >> >> s/worktree/working tree/; The former is the thing the "git worktree" >> command deals with. The latter is relevant even when "git worktree" >> is not used (the traditional "git clone and you get a working tree >> to work in"). > > I guess I'm distracted by using SKIP_WORKTREE a lot, but "working > directory" is more specific and hence better. Since the user's current working directory can be outside any working tree that is governed by any git repository, "working directory" is a term I try to avoid when describing the directory where a checkout of a revision lives. Documentation/glossary-content.txt is where the suggestion for "working tree" comes from. > I could rearrange things here. The important things to note are: > > 1. Updating index entries is very fast, but adds up at large scale. This is the "checkout to match the index to the tree of HEAD" part, ignoring the cost of writing working tree files out? > 2. It is faster to write a file to disk from Git's object database > than it is to compare a file on disk to the copy in the database, > which is frequently necessary when the mtime on disk doesn't match > the mtime in the index. True. But of course, not having to do either (i.e. having a fresh cached stat info) would be even faster ;-). >> Also it >> is unclear what you mean by "changing HEAD only require updating the >> index". Certainly when "git switch" flips HEAD from one commit to >> another, you'd update the index and update the files in the working >> tree (in the Populated part that is in the sparse-checkout cone) to >> match, no? > > This is unclear of me. I was thinking more on the lines of "git reset" > (soft mode) which updates HEAD without changing the files on disk. OK, and that is in line with your "updating index entries is very fast (but adds up)". > After all of this postulating, I think that the offending sentences > are better off deleted. They don't add clarity over what can be > inferred by an interested reader. OK. > I'm mixing terms incorrectly. I think what I really mean is > > In fact, these loops expect to see a reference to every > staged file. OK. > The plan is to make all of these integrations "sparse aware" so > this expansion through tree parsing is unnecessary and they use > fewer resources than when using a full index. ;-) > I meant by "serialized index file" is that the file written to disk has > the sparse directory entries, but the in-core copy will not (except for > a very brief moment in time, during do_read_index()). Nice. That would probably mean cache-tree extension on-disk can go away, because we can populate in-core cache-tree from these entries. I've always hated the on-disk encoding of that extension. Or we are not doing this "extra tree" everywhere (i.e. limited only to the parts that are marked for "sparse checkout")? Thanks. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 01/20] sparse-index: design doc and format update 2021-03-23 20:10 ` Junio C Hamano @ 2021-03-23 20:42 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-23 20:42 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee On 3/23/2021 4:10 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>>> +Three important scale dimensions for a Git worktree are: >>> >>> s/worktree/working tree/; The former is the thing the "git worktree" >>> command deals with. The latter is relevant even when "git worktree" >>> is not used (the traditional "git clone and you get a working tree >>> to work in"). >> >> I guess I'm distracted by using SKIP_WORKTREE a lot, but "working >> directory" is more specific and hence better. > > Since the user's current working directory can be outside any > working tree that is governed by any git repository, "working > directory" is a term I try to avoid when describing the directory > where a checkout of a revision lives. > > Documentation/glossary-content.txt is where the suggestion for > "working tree" comes from. Whoops. Somehow I read that wrong. Thanks for pointing out my error. >> I meant by "serialized index file" is that the file written to disk has >> the sparse directory entries, but the in-core copy will not (except for >> a very brief moment in time, during do_read_index()). > > Nice. That would probably mean cache-tree extension on-disk can go > away, because we can populate in-core cache-tree from these entries. > I've always hated the on-disk encoding of that extension. > > Or we are not doing this "extra tree" everywhere (i.e. limited only > to the parts that are marked for "sparse checkout")? The current design is to only have these entries when all paths within the directory are marked with SKIP_WORKTREE. This pairs with the cache-tree extension, which has these directories as nodes, but only consuming one cache entry (for itself). I haven't considered the idea of inserting trees for other reasons. Seems like a valuable experiment. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 02/20] t/perf: add performance test for sparse operations 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-17 8:41 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget ` (21 subsequent siblings) 23 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Create a test script that takes the default performance test (the Git codebase) and multiplies it by 256 using four layers of duplicated trees of width four. This results in nearly one million blob entries in the index. Then, we can clone this repository with sparse-checkout patterns that demonstrate four copies of the initial repository. Each clone will use a different index format or mode so peformance can be tested across the different options. Note that the initial repo is stripped of submodules before doing the copies. This preserves the expected data shape of the sparse index, because directories containing submodules are not collapsed to a sparse directory entry. Run a few Git commands on these clones, especially those that use the index (status, add, commit). Here are the results on my Linux machine: Test -------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.37(0.30+0.09) 2000.3: git status (full-index-v4) 0.39(0.32+0.10) 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) It is perhaps noteworthy that there is an improvement when using index version 4. This is because the v3 index uses 108 MiB while the v4 index uses 80 MiB. Since the repeated portions of the directories are very short (f3/f1/f2, for example) this ratio is less pronounced than in similarly-sized real repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 85 +++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100755 t/perf/p2000-sparse-operations.sh diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh new file mode 100755 index 000000000000..2fbc81b22119 --- /dev/null +++ b/t/perf/p2000-sparse-operations.sh @@ -0,0 +1,85 @@ +#!/bin/sh + +test_description="test performance of Git operations using the index" + +. ./perf-lib.sh + +test_perf_default_repo + +SPARSE_CONE=f2/f4/f1 + +test_expect_success 'setup repo and indexes' ' + git reset --hard HEAD && + # Remove submodules from the example repo, because our + # duplication of the entire repo creates an unlikly data shape. + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && + git rm -f .gitmodules && + for module in $(awk "{print \$2}" modules) + do + git rm $module || return 1 + done && + git commit -m "remove submodules" && + + echo bogus >a && + cp a b && + git add a b && + git commit -m "level 0" && + BLOB=$(git rev-parse HEAD:a) && + OLD_COMMIT=$(git rev-parse HEAD) && + OLD_TREE=$(git rev-parse HEAD^{tree}) && + + for i in $(test_seq 1 4) + do + cat >in <<-EOF && + 100755 blob $BLOB a + 040000 tree $OLD_TREE f1 + 040000 tree $OLD_TREE f2 + 040000 tree $OLD_TREE f3 + 040000 tree $OLD_TREE f4 + EOF + NEW_TREE=$(git mktree <in) && + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && + OLD_TREE=$NEW_TREE && + OLD_COMMIT=$NEW_COMMIT || return 1 + done && + + git sparse-checkout init --cone && + git branch -f wide $OLD_COMMIT && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && + ( + cd full-index-v3 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && + ( + cd full-index-v4 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 + ) +' + +test_perf_on_all () { + command="$@" + for repo in full-index-v3 full-index-v4 + do + test_perf "$command ($repo)" " + ( + cd $repo && + echo >>$SPARSE_CONE/a && + $command + ) + " + done +} + +test_perf_on_all git status +test_perf_on_all git add -A +test_perf_on_all git add . +test_perf_on_all git commit -a -m A + +test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 02/20] t/perf: add performance test for sparse operations 2021-03-16 16:42 ` [PATCH v3 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-03-17 8:41 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:05 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 8:41 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > Create a test script that takes the default performance test (the Git > codebase) and multiplies it by 256 using four layers of duplicated > trees of width four. This results in nearly one million blob entries in > the index. Then, we can clone this repository with sparse-checkout > patterns that demonstrate four copies of the initial repository. Each > clone will use a different index format or mode so peformance can be > tested across the different options. > > Note that the initial repo is stripped of submodules before doing the > copies. This preserves the expected data shape of the sparse index, > because directories containing submodules are not collapsed to a sparse > directory entry. > > Run a few Git commands on these clones, especially those that use the > index (status, add, commit). > > Here are the results on my Linux machine: > > Test > -------------------------------------------------------------- > 2000.2: git status (full-index-v3) 0.37(0.30+0.09) > 2000.3: git status (full-index-v4) 0.39(0.32+0.10) > 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) > 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) > 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) > 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) > 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) > 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) > > It is perhaps noteworthy that there is an improvement when using index > version 4. This is because the v3 index uses 108 MiB while the v4 > index uses 80 MiB. Since the repeated portions of the directories are > very short (f3/f1/f2, for example) this ratio is less pronounced than in > similarly-sized real repositories. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/perf/p2000-sparse-operations.sh | 85 +++++++++++++++++++++++++++++++ > 1 file changed, 85 insertions(+) > create mode 100755 t/perf/p2000-sparse-operations.sh > > diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh > new file mode 100755 > index 000000000000..2fbc81b22119 > --- /dev/null > +++ b/t/perf/p2000-sparse-operations.sh > @@ -0,0 +1,85 @@ > +#!/bin/sh > + > +test_description="test performance of Git operations using the index" > + > +. ./perf-lib.sh > + > +test_perf_default_repo > + > +SPARSE_CONE=f2/f4/f1 > + > +test_expect_success 'setup repo and indexes' ' > + git reset --hard HEAD && > + # Remove submodules from the example repo, because our > + # duplication of the entire repo creates an unlikly data shape. > + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && > + git rm -f .gitmodules && > + for module in $(awk "{print \$2}" modules) > + do > + git rm $module || return 1 > + done && > + git commit -m "remove submodules" && Paradoxically with this you can no longer use a repo that's not git.git or another repo that has submodules, since we'll die in trying to remove them. Also you don't have to "git rm .gitmodules", the "git rm" command removes submodule entries. Perhaps just: for module in $(git ls-files --stage | grep ^160000 | awk -F '\t' '{ print $2 }') do git rm "$module" done Or another way of guarding against rm getting the empty list && commit? But it seems odd to be doing this at all, the point of the perf framework is that you can point it at any repo, and some repos you want to test will have submodules. Seems like something like the WIP patch at the end on top would be better. > + echo bogus >a && > + cp a b && > + git add a b && > + git commit -m "level 0" && > + BLOB=$(git rev-parse HEAD:a) && Isn't the way we're getting this $BLOB equivalent to just 'echo bogus | git hash-object --stdin -w' why commit it? > + OLD_COMMIT=$(git rev-parse HEAD) && > + OLD_TREE=$(git rev-parse HEAD^{tree}) && > + > + for i in $(test_seq 1 4) > + do > + cat >in <<-EOF && > + 100755 blob $BLOB a > + 040000 tree $OLD_TREE f1 > + 040000 tree $OLD_TREE f2 > + 040000 tree $OLD_TREE f3 > + 040000 tree $OLD_TREE f4 > + EOF > + NEW_TREE=$(git mktree <in) && > + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && > + OLD_TREE=$NEW_TREE && > + OLD_COMMIT=$NEW_COMMIT || return 1 > + done && > + > + git sparse-checkout init --cone && > + git branch -f wide $OLD_COMMIT && > + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && > + ( > + cd full-index-v3 && > + git sparse-checkout init --cone && > + git sparse-checkout set $SPARSE_CONE && > + git config index.version 3 && > + git update-index --index-version=3 > + ) && > + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && > + ( > + cd full-index-v4 && > + git sparse-checkout init --cone && > + git sparse-checkout set $SPARSE_CONE && > + git config index.version 4 && > + git update-index --index-version=4 > + ) > +' This whole thing makes me think you just wanted a test_perf_fresh_repo all along, but I think this would be much more useful if you took the default repo and multiplied the size in its tree by some multiple. E.g. take the files we have in git.git, write a copy at prefix-1/, prefix-2/ etc. The whole point of test_perf_{default,large}_repo is being able to point them at a local repo you're testing for performance and get numbers representative of that repo. So maybe that's not what's wanted here at all, but that brings us back to test_perf_fresh_repo... > +test_perf_on_all () { > + command="$@" > + for repo in full-index-v3 full-index-v4 > + do > + test_perf "$command ($repo)" " > + ( > + cd $repo && > + echo >>$SPARSE_CONE/a && > + $command > + ) > + " > + done > +} > + > +test_perf_on_all git status > +test_perf_on_all git add -A > +test_perf_on_all git add . > +test_perf_on_all git commit -a -m A > + > +test_done diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index e527316e66..2c07b04159 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -4,22 +4,11 @@ test_description="test performance of Git operations using the index" . ./perf-lib.sh -test_perf_default_repo +test_perf_nosubodules_repo SPARSE_CONE=f2/f4/f1 test_expect_success 'setup repo and indexes' ' - git reset --hard HEAD && - # Remove submodules from the example repo, because our - # duplication of the entire repo creates an unlikly data shape. - git config --file .gitmodules --get-regexp "submodule.*.path" >modules && - git rm -f .gitmodules && - for module in $(awk "{print \$2}" modules) - do - git rm $module || return 1 - done && - git commit -m "remove submodules" && - echo bogus >a && cp a b && git add a b && diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index e385c6896f..86b716ce8f 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -128,6 +128,15 @@ test_perf_large_repo () { fi test_perf_create_repo_from "${1:-$TRASH_DIRECTORY}" "$GIT_PERF_LARGE_REPO" } +test_perf_nosubodules_repo () { + if test "$GIT_PERF_NOSUBMODULES_REPO" = "$GIT_BUILD_DIR"; then + echo "warning: \$GIT_PERF_NOSUBMODULES_REPO is \$GIT_BUILD_DIR." >&2 + echo "warning: This will probably work, but it has a submodule!" >&2 + echo "warning: point to another repo for representative measurements." >&2 + # git rm dance here? optionally? + fi + test_perf_create_repo_from "${1:-$TRASH_DIRECTORY}" "$GIT_PERF_NOSUBMODULES_REPO" +} test_checkout_worktree () { git checkout-index -u -a || error "git checkout-index failed" @@ -196,7 +205,7 @@ test_perf_ () { else echo "perf $test_count - $1:" fi - for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do + for i in $(test_seq 1 $GIT_PERF_REP say >&3 "running: $2" if test_run_perf_ "$2" then ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 02/20] t/perf: add performance test for sparse operations 2021-03-17 8:41 ` Ævar Arnfjörð Bjarmason @ 2021-03-17 13:05 ` Derrick Stolee 2021-03-17 13:21 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-17 13:05 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/17/2021 4:41 AM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: >> +test_expect_success 'setup repo and indexes' ' >> + git reset --hard HEAD && >> + # Remove submodules from the example repo, because our >> + # duplication of the entire repo creates an unlikly data shape. >> + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && >> + git rm -f .gitmodules && >> + for module in $(awk "{print \$2}" modules) >> + do >> + git rm $module || return 1 >> + done && >> + git commit -m "remove submodules" && > > Paradoxically with this you can no longer use a repo that's not git.git > or another repo that has submodules, since we'll die in trying to remove > them. Good point. > Also you don't have to "git rm .gitmodules", the "git rm" command > removes submodule entries. Sure. > Perhaps just: > > for module in $(git ls-files --stage | grep ^160000 | awk -F '\t' '{ print $2 }') > do > git rm "$module" > done > > Or another way of guarding against rm getting the empty list && commit? > > But it seems odd to be doing this at all, the point of the perf > framework is that you can point it at any repo, and some repos you want > to test will have submodules. You're right that it should handle all repos. However, the point of the test is to have many copies of the repo, but most of them are excluded by sparse-directory entries. We don't collapse sparse-directory entries if there is a submodule inside, so the data shape is wrong after making all the copies. So, I disagree with your approach in your suggested diff, and instead offer this one. I've tested this with git.git and another local repo without submodules and checked that everything works as expected. diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index e527316e66d..5c0d78eeeea 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -10,15 +10,17 @@ SPARSE_CONE=f2/f4/f1 test_expect_success 'setup repo and indexes' ' git reset --hard HEAD && + # Remove submodules from the example repo, because our - # duplication of the entire repo creates an unlikly data shape. - git config --file .gitmodules --get-regexp "submodule.*.path" >modules && - git rm -f .gitmodules && - for module in $(awk "{print \$2}" modules) - do - git rm $module || return 1 - done && - git commit -m "remove submodules" && + # duplication of the entire repo creates an unlikely data shape. + if (git config --file .gitmodules --get-regexp "submodule.*.path" >modules) + then + for module in $(awk "{print \$2}" modules) + do + git rm $module || return 1 + done && + git commit -m "remove submodules" || return 1 + fi && echo bogus >a && cp a b && > Seems like something like the WIP patch at the end on top would be > better. > >> + echo bogus >a && >> + cp a b && >> + git add a b && >> + git commit -m "level 0" && >> + BLOB=$(git rev-parse HEAD:a) && > > Isn't the way we're getting this $BLOB equivalent to just 'echo bogus | > git hash-object --stdin -w' why commit it? We are committing it so we can add commits that deepen the copies, but within those copies we have these known file paths. > This whole thing makes me think you just wanted a test_perf_fresh_repo > all along, but I think this would be much more useful if you took the > default repo and multiplied the size in its tree by some multiple. > > E.g. take the files we have in git.git, write a copy at prefix-1/, > prefix-2/ etc. That is essentially what is happening here, but using multiple levels of directories. Using these multiple levels presents extra tree lookups and parsing in the event of expanding a sparse index to a full one. Thanks, -Stolee ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 02/20] t/perf: add performance test for sparse operations 2021-03-17 13:05 ` Derrick Stolee @ 2021-03-17 13:21 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:02 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:21 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17 2021, Derrick Stolee wrote: > On 3/17/2021 4:41 AM, Ævar Arnfjörð Bjarmason wrote: >> >> On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: >>> +test_expect_success 'setup repo and indexes' ' >>> + git reset --hard HEAD && >>> + # Remove submodules from the example repo, because our >>> + # duplication of the entire repo creates an unlikly data shape. >>> + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && >>> + git rm -f .gitmodules && >>> + for module in $(awk "{print \$2}" modules) >>> + do >>> + git rm $module || return 1 >>> + done && >>> + git commit -m "remove submodules" && >> >> Paradoxically with this you can no longer use a repo that's not git.git >> or another repo that has submodules, since we'll die in trying to remove >> them. > > Good point. > >> Also you don't have to "git rm .gitmodules", the "git rm" command >> removes submodule entries. > > Sure. > >> Perhaps just: >> >> for module in $(git ls-files --stage | grep ^160000 | awk -F '\t' '{ print $2 }') >> do >> git rm "$module" >> done >> >> Or another way of guarding against rm getting the empty list && commit? >> >> But it seems odd to be doing this at all, the point of the perf >> framework is that you can point it at any repo, and some repos you want >> to test will have submodules. > > You're right that it should handle all repos. However, the point of > the test is to have many copies of the repo, but most of them are > excluded by sparse-directory entries. We don't collapse sparse-directory > entries if there is a submodule inside, so the data shape is wrong after > making all the copies. > > So, I disagree with your approach in your suggested diff, and instead > offer this one. I've tested this with git.git and another local repo > without submodules and checked that everything works as expected. What's got me confused here is that there's two uses for the perf framework in this context. It's to use an empty/git.git as a test repo to demonstrate something, but then also that you can run it in your arbitrary repo, and e.g. see how much a given feature might benefit you. Hence suggesting that maybe test_perf_fresh_repois better here, because by using test_perf_default_repo you're creating the expectation that you can run the perf test, observe an %X difference, and that'll be give-or-take what you'll get for that use case if you enable the feature. Except it won't because the repo has submodules, which we deleted for the perf test... > diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh > index e527316e66d..5c0d78eeeea 100755 > --- a/t/perf/p2000-sparse-operations.sh > +++ b/t/perf/p2000-sparse-operations.sh > @@ -10,15 +10,17 @@ SPARSE_CONE=f2/f4/f1 > > test_expect_success 'setup repo and indexes' ' > git reset --hard HEAD && > + > # Remove submodules from the example repo, because our > - # duplication of the entire repo creates an unlikly data shape. > - git config --file .gitmodules --get-regexp "submodule.*.path" >modules && > - git rm -f .gitmodules && > - for module in $(awk "{print \$2}" modules) > - do > - git rm $module || return 1 > - done && > - git commit -m "remove submodules" && > + # duplication of the entire repo creates an unlikely data shape. > + if (git config --file .gitmodules --get-regexp "submodule.*.path" >modules) A subshell isn't needed here. FWIW the reason I got this out of ls-files is because you can have submodules without .gitmodules entries, rare and broken, but seemed more direct to grep the mode bits. > + then > + for module in $(awk "{print \$2}" modules) > + do > + git rm $module || return 1 > + done && Once we know we have submodules we can just do this without the loop. git rm $(awk "{print \$2}" modules) > + git commit -m "remove submodules" || return 1 > + fi && > > echo bogus >a && > cp a b && > >> Seems like something like the WIP patch at the end on top would be >> better. >> >>> + echo bogus >a && >>> + cp a b && >>> + git add a b && >>> + git commit -m "level 0" && >>> + BLOB=$(git rev-parse HEAD:a) && >> >> Isn't the way we're getting this $BLOB equivalent to just 'echo bogus | >> git hash-object --stdin -w' why commit it? > > We are committing it so we can add commits that deepen the copies, > but within those copies we have these known file paths. > >> This whole thing makes me think you just wanted a test_perf_fresh_repo >> all along, but I think this would be much more useful if you took the >> default repo and multiplied the size in its tree by some multiple. >> >> E.g. take the files we have in git.git, write a copy at prefix-1/, >> prefix-2/ etc. > > That is essentially what is happening here, but using multiple levels > of directories. Using these multiple levels presents extra tree > lookups and parsing in the event of expanding a sparse index to a > full one. *nod* Anyway, this thread's a bit of a bikeshed on my part, I was just wondering if & what part of the test relied on the existing repo if it was mostly setting up its own test data. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 02/20] t/perf: add performance test for sparse operations 2021-03-17 13:21 ` Ævar Arnfjörð Bjarmason @ 2021-03-17 18:02 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-17 18:02 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Derrick Stolee via GitGitGadget, git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/17/2021 9:21 AM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Mar 17 2021, Derrick Stolee wrote: > >> On 3/17/2021 4:41 AM, Ævar Arnfjörð Bjarmason wrote: >>> But it seems odd to be doing this at all, the point of the perf >>> framework is that you can point it at any repo, and some repos you want >>> to test will have submodules. >> >> You're right that it should handle all repos. However, the point of >> the test is to have many copies of the repo, but most of them are >> excluded by sparse-directory entries. We don't collapse sparse-directory >> entries if there is a submodule inside, so the data shape is wrong after >> making all the copies. >> >> So, I disagree with your approach in your suggested diff, and instead >> offer this one. I've tested this with git.git and another local repo >> without submodules and checked that everything works as expected. > > What's got me confused here is that there's two uses for the perf > framework in this context. > > It's to use an empty/git.git as a test repo to demonstrate something, > but then also that you can run it in your arbitrary repo, and e.g. see > how much a given feature might benefit you. > > Hence suggesting that maybe test_perf_fresh_repois better here, because > by using test_perf_default_repo you're creating the expectation that you > can run the perf test, observe an %X difference, and that'll be > give-or-take what you'll get for that use case if you enable the feature. > > Except it won't because the repo has submodules, which we deleted for > the perf test... I'm also dramatically changing the repository shape to expose index reads and writes as a bottleneck. The benefit of using other repos (like git.git or optionally choosing the Linux kernel repo) is to change how much of the time is spent crawling the populated set. >> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh >> index e527316e66d..5c0d78eeeea 100755 >> --- a/t/perf/p2000-sparse-operations.sh >> +++ b/t/perf/p2000-sparse-operations.sh >> @@ -10,15 +10,17 @@ SPARSE_CONE=f2/f4/f1 >> >> test_expect_success 'setup repo and indexes' ' >> git reset --hard HEAD && >> + >> # Remove submodules from the example repo, because our >> - # duplication of the entire repo creates an unlikly data shape. >> - git config --file .gitmodules --get-regexp "submodule.*.path" >modules && >> - git rm -f .gitmodules && >> - for module in $(awk "{print \$2}" modules) >> - do >> - git rm $module || return 1 >> - done && >> - git commit -m "remove submodules" && >> + # duplication of the entire repo creates an unlikely data shape. >> + if (git config --file .gitmodules --get-regexp "submodule.*.path" >modules) > > A subshell isn't needed here. > > FWIW the reason I got this out of ls-files is because you can have > submodules without .gitmodules entries, rare and broken, but seemed more > direct to grep the mode bits. I'd prefer to do something (textually) simpler, expecting the input repos to have correct data. >> + then >> + for module in $(awk "{print \$2}" modules) >> + do >> + git rm $module || return 1 >> + done && > > Once we know we have submodules we can just do this without the loop. > > git rm $(awk "{print \$2}" modules) Ok. That works for me. >>> Seems like something like the WIP patch at the end on top would be >>> better. >>> >>>> + echo bogus >a && >>>> + cp a b && >>>> + git add a b && >>>> + git commit -m "level 0" && >>>> + BLOB=$(git rev-parse HEAD:a) && >>> >>> Isn't the way we're getting this $BLOB equivalent to just 'echo bogus | >>> git hash-object --stdin -w' why commit it? >> >> We are committing it so we can add commits that deepen the copies, >> but within those copies we have these known file paths. >> >>> This whole thing makes me think you just wanted a test_perf_fresh_repo >>> all along, but I think this would be much more useful if you took the >>> default repo and multiplied the size in its tree by some multiple. >>> >>> E.g. take the files we have in git.git, write a copy at prefix-1/, >>> prefix-2/ etc. >> >> That is essentially what is happening here, but using multiple levels >> of directories. Using these multiple levels presents extra tree >> lookups and parsing in the event of expanding a sparse index to a >> full one. > > *nod* > > Anyway, this thread's a bit of a bikeshed on my part, I was just > wondering if & what part of the test relied on the existing repo if it > was mostly setting up its own test data. Again, the benefit is to depend on the repo shape in some aspects, while exaggerating the data shape to make the non-populated set extremely large. This presents different aspects that are worth examining, such as git.git is much smaller than linux.git, and that is noticable with these different performance numbers (taken at the end of this series): git.git Test this tree --------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.39(0.35+0.08) 2000.3: git status (full-index-v4) 0.39(0.34+0.09) 2000.4: git status (sparse-index-v3) 2.46(2.33+0.16) 2000.5: git status (sparse-index-v4) 2.42(2.31+0.15) 2000.6: git add -A (full-index-v3) 1.35(0.98+0.20) 2000.7: git add -A (full-index-v4) 1.25(0.96+0.18) 2000.8: git add -A (sparse-index-v3) 2.39(2.26+0.17) 2000.9: git add -A (sparse-index-v4) 2.35(2.29+0.11) 2000.10: git add . (full-index-v3) 1.39(1.01+0.19) 2000.11: git add . (full-index-v4) 1.31(1.00+0.19) 2000.12: git add . (sparse-index-v3) 2.41(2.28+0.16) 2000.13: git add . (sparse-index-v4) 2.45(2.32+0.16) 2000.14: git commit -a -m A (full-index-v3) 1.44(1.08+0.21) 2000.15: git commit -a -m A (full-index-v4) 1.31(1.04+0.19) 2000.16: git commit -a -m A (sparse-index-v3) 2.44(2.35+0.16) 2000.17: git commit -a -m A (sparse-index-v4) 2.44(2.36+0.16) linux.git Test this tree ----------------------------------------------------------------- 2000.2: git status (full-index-v3) 7.14(6.06+1.79) 2000.3: git status (full-index-v4) 7.01(6.16+1.60) 2000.4: git status (sparse-index-v3) 58.50(56.86+2.34) 2000.5: git status (sparse-index-v4) 57.52(55.80+2.45) 2000.6: git add -A (full-index-v3) 25.52(18.70+3.18) 2000.7: git add -A (full-index-v4) 22.26(17.52+2.72) 2000.8: git add -A (sparse-index-v3) 56.65(55.00+2.35) 2000.9: git add -A (sparse-index-v4) 56.56(54.98+2.29) 2000.10: git add . (full-index-v3) 25.87(19.12+3.15) 2000.11: git add . (full-index-v4) 22.56(17.85+2.71) 2000.12: git add . (sparse-index-v3) 57.01(55.28+2.42) 2000.13: git add . (sparse-index-v4) 56.84(55.38+2.19) 2000.14: git commit -a -m A (full-index-v3) 26.83(20.69+3.24) 2000.15: git commit -a -m A (full-index-v4) 24.04(19.86+2.65) 2000.16: git commit -a -m A (sparse-index-v3) 60.23(58.99+2.44) 2000.17: git commit -a -m A (sparse-index-v4) 60.52(59.09+2.74) The intention is to make these numbers improve in the future so that the sparse-index is a better approach. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 03/20] t1092: clean up script quoting 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-17 8:47 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget ` (20 subsequent siblings) 23 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This test was introduced in 19a0acc83e4 (t1092: test interesting sparse-checkout scenarios, 2021-01-23), but these issues with quoting were not noticed until starting this follow-up series. The old mechanism would drop quoting such as in test_all_match git commit -m "touch README.md" The above happened to work because README.md is a file in the repository, so 'git commit -m touch REAMDE.md' would succeed by accident. Other cases included quoting for no good reason, so clean that up now. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 8cd3e5a8d227..3725d3997e70 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -96,20 +96,20 @@ init_repos () { run_on_sparse () { ( cd sparse-checkout && - $* >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) } run_on_all () { ( cd full-checkout && - $* >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && - run_on_sparse $* + run_on_sparse "$@" } test_all_match () { - run_on_all $* && + run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && test_cmp full-checkout-err sparse-checkout-err } @@ -119,7 +119,7 @@ test_expect_success 'status with options' ' test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && - run_on_all "touch README.md" && + run_on_all touch README.md && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -135,7 +135,7 @@ test_expect_success 'add, commit, checkout' ' write_script edit-contents <<-\EOF && echo text >>$1 EOF - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add README.md && test_all_match git status --porcelain=v2 && @@ -144,7 +144,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add -A && test_all_match git status --porcelain=v2 && @@ -153,7 +153,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents deep/newfile" && + run_on_all ../edit-contents deep/newfile && test_all_match git status --porcelain=v2 -uno && test_all_match git status --porcelain=v2 && @@ -186,7 +186,7 @@ test_expect_success 'diff --staged' ' write_script edit-contents <<-\EOF && echo text >>README.md EOF - run_on_all "../edit-contents" && + run_on_all ../edit-contents && test_all_match git diff && test_all_match git diff --staged && @@ -280,7 +280,7 @@ test_expect_success 'clean' ' echo bogus >>.gitignore && run_on_all cp ../.gitignore . && test_all_match git add .gitignore && - test_all_match git commit -m ignore-bogus-files && + test_all_match git commit -m "ignore bogus files" && run_on_sparse mkdir folder1 && run_on_all touch folder1/bogus && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 03/20] t1092: clean up script quoting 2021-03-16 16:42 ` [PATCH v3 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget @ 2021-03-17 8:47 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 8:47 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > This test was introduced in 19a0acc83e4 (t1092: test interesting > sparse-checkout scenarios, 2021-01-23), but these issues with quoting > were not noticed until starting this follow-up series. The old mechanism > would drop quoting such as in the "but these issues" follows a partial sentence where we haven't introduces "what issues?". Perhaps leading with some summary about $@ v.s. $*: Fix a bug in the sparse checkout tests of "$@" being conflated with "$*". The bug was introduced in 19a0acc83e4 ([...]), but had no effect until now because XYZ ... > test_all_match git commit -m "touch README.md" > > The above happened to work because README.md is a file in the > repository, so 'git commit -m touch REAMDE.md' would succeed by > accident. > > Other cases included quoting for no good reason, so clean that up now. Maybe just my taste, per your comment on another series of mine we might not have the same sense of splitting up commits, but... I think in this case it's clearer to have these be two commits. We have 3 hunks fixing the bug, and 6 on an unrelated cleanup. It's a lot easier for eyeballing a fix to be able to glance just at the 3, especially with something like $@ v.s. $*. > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/t1092-sparse-checkout-compatibility.sh | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 8cd3e5a8d227..3725d3997e70 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -96,20 +96,20 @@ init_repos () { > run_on_sparse () { > ( > cd sparse-checkout && > - $* >../sparse-checkout-out 2>../sparse-checkout-err > + "$@" >../sparse-checkout-out 2>../sparse-checkout-err > ) > } > > run_on_all () { > ( > cd full-checkout && > - $* >../full-checkout-out 2>../full-checkout-err > + "$@" >../full-checkout-out 2>../full-checkout-err > ) && > - run_on_sparse $* > + run_on_sparse "$@" > } > > test_all_match () { > - run_on_all $* && > + run_on_all "$@" && > test_cmp full-checkout-out sparse-checkout-out && > test_cmp full-checkout-err sparse-checkout-err > } > @@ -119,7 +119,7 @@ test_expect_success 'status with options' ' > test_all_match git status --porcelain=v2 && > test_all_match git status --porcelain=v2 -z -u && > test_all_match git status --porcelain=v2 -uno && > - run_on_all "touch README.md" && > + run_on_all touch README.md && > test_all_match git status --porcelain=v2 && > test_all_match git status --porcelain=v2 -z -u && > test_all_match git status --porcelain=v2 -uno && > @@ -135,7 +135,7 @@ test_expect_success 'add, commit, checkout' ' > write_script edit-contents <<-\EOF && > echo text >>$1 > EOF > - run_on_all "../edit-contents README.md" && > + run_on_all ../edit-contents README.md && > > test_all_match git add README.md && > test_all_match git status --porcelain=v2 && > @@ -144,7 +144,7 @@ test_expect_success 'add, commit, checkout' ' > test_all_match git checkout HEAD~1 && > test_all_match git checkout - && > > - run_on_all "../edit-contents README.md" && > + run_on_all ../edit-contents README.md && > > test_all_match git add -A && > test_all_match git status --porcelain=v2 && > @@ -153,7 +153,7 @@ test_expect_success 'add, commit, checkout' ' > test_all_match git checkout HEAD~1 && > test_all_match git checkout - && > > - run_on_all "../edit-contents deep/newfile" && > + run_on_all ../edit-contents deep/newfile && > > test_all_match git status --porcelain=v2 -uno && > test_all_match git status --porcelain=v2 && > @@ -186,7 +186,7 @@ test_expect_success 'diff --staged' ' > write_script edit-contents <<-\EOF && > echo text >>README.md > EOF > - run_on_all "../edit-contents" && > + run_on_all ../edit-contents && > > test_all_match git diff && > test_all_match git diff --staged && > @@ -280,7 +280,7 @@ test_expect_success 'clean' ' > echo bogus >>.gitignore && > run_on_all cp ../.gitignore . && > test_all_match git add .gitignore && > - test_all_match git commit -m ignore-bogus-files && > + test_all_match git commit -m "ignore bogus files" && > > run_on_sparse mkdir folder1 && > run_on_all touch folder1/bogus && ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 04/20] sparse-index: add guard to ensure full index 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget ` (19 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Makefile | 1 + repo-settings.c | 8 ++++++++ repository.c | 11 ++++++++++- repository.h | 2 ++ sparse-index.c | 8 ++++++++ sparse-index.h | 7 +++++++ 6 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 sparse-index.c create mode 100644 sparse-index.h diff --git a/Makefile b/Makefile index dfb0f1000fa3..89b1d5374107 100644 --- a/Makefile +++ b/Makefile @@ -985,6 +985,7 @@ LIB_OBJS += setup.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-index.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/repo-settings.c b/repo-settings.c index f7fff0f5ab83..d63569e4041e 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r) UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP); UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT); + + /* + * This setting guards all index reads to require a full index + * over a sparse index. After suitable guards are placed in the + * codebase around uses of the index, this setting will be + * removed. + */ + r->settings.command_requires_full_index = 1; } diff --git a/repository.c b/repository.c index c98298acd017..a8acae002f71 100644 --- a/repository.c +++ b/repository.c @@ -10,6 +10,7 @@ #include "object.h" #include "lockfile.h" #include "submodule-config.h" +#include "sparse-index.h" /* The main repository */ static struct repository the_repo; @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo) int repo_read_index(struct repository *repo) { + int res; + if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo) else if (repo->index->repo != repo) BUG("repo's index should point back at itself"); - return read_index_from(repo->index, repo->index_file, repo->gitdir); + res = read_index_from(repo->index, repo->index_file, repo->gitdir); + + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) + ensure_full_index(repo->index); + + return res; } int repo_hold_locked_index(struct repository *repo, diff --git a/repository.h b/repository.h index b385ca3c94b6..e06a23015697 100644 --- a/repository.h +++ b/repository.h @@ -41,6 +41,8 @@ struct repo_settings { enum fetch_negotiation_setting fetch_negotiation_algorithm; int core_multi_pack_index; + + unsigned command_requires_full_index:1; }; struct repository { diff --git a/sparse-index.c b/sparse-index.c new file mode 100644 index 000000000000..82183ead563b --- /dev/null +++ b/sparse-index.c @@ -0,0 +1,8 @@ +#include "cache.h" +#include "repository.h" +#include "sparse-index.h" + +void ensure_full_index(struct index_state *istate) +{ + /* intentionally left blank */ +} diff --git a/sparse-index.h b/sparse-index.h new file mode 100644 index 000000000000..09a20d036c46 --- /dev/null +++ b/sparse-index.h @@ -0,0 +1,7 @@ +#ifndef SPARSE_INDEX_H__ +#define SPARSE_INDEX_H__ + +struct index_state; +void ensure_full_index(struct index_state *istate); + +#endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 05/20] sparse-index: implement ensure_full_index() 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-17 13:03 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget ` (18 subsequent siblings) 23 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache.h | 13 ++++++- read-cache.c | 9 +++++ sparse-index.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 118 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index c2f8a8eadf67..abb00a068e5d 100644 --- a/cache.h +++ b/cache.h @@ -204,6 +204,8 @@ struct cache_entry { #error "CE_EXTENDED_FLAGS out of range" #endif +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR) + /* Forward structure decls */ struct pathspec; struct child_process; @@ -319,7 +321,14 @@ struct index_state { drop_cache_tree : 1, updated_workdir : 1, updated_skipworktree : 1, - fsmonitor_has_run_once : 1; + fsmonitor_has_run_once : 1, + + /* + * sparse_index == 1 when sparse-directory + * entries exist. Requires sparse-checkout + * in cone mode. + */ + sparse_index : 1; struct hashmap name_hash; struct hashmap dir_hash; struct object_id oid; @@ -722,6 +731,8 @@ int read_index_from(struct index_state *, const char *path, const char *gitdir); int is_index_unborn(struct index_state *); +void ensure_full_index(struct index_state *istate); + /* For use with `write_locked_index()`. */ #define COMMIT_LOCK (1 << 0) #define SKIP_IF_UNCHANGED (1 << 1) diff --git a/read-cache.c b/read-cache.c index 1e9a50c6c734..dd3980c12b53 100644 --- a/read-cache.c +++ b/read-cache.c @@ -101,6 +101,9 @@ static const char *alternate_index_output; static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { + if (S_ISSPARSEDIR(ce->ce_mode)) + istate->sparse_index = 1; + istate->cache[nr] = ce; add_name_hash(istate, ce); } @@ -2273,6 +2276,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr); + if (!istate->repo) + istate->repo = the_repository; + prepare_repo_settings(istate->repo); + if (istate->repo->settings.command_requires_full_index) + ensure_full_index(istate); + return istate->cache_nr; unmap: diff --git a/sparse-index.c b/sparse-index.c index 82183ead563b..7095378a1b28 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -1,8 +1,104 @@ #include "cache.h" #include "repository.h" #include "sparse-index.h" +#include "tree.h" +#include "pathspec.h" +#include "trace2.h" + +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) +{ + ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc); + + istate->cache[nr] = ce; + add_name_hash(istate, ce); +} + +static int add_path_to_index(const struct object_id *oid, + struct strbuf *base, const char *path, + unsigned int mode, void *context) +{ + struct index_state *istate = (struct index_state *)context; + struct cache_entry *ce; + size_t len = base->len; + + if (S_ISDIR(mode)) + return READ_TREE_RECURSIVE; + + strbuf_addstr(base, path); + + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); + ce->ce_flags |= CE_SKIP_WORKTREE; + set_index_entry(istate, istate->cache_nr++, ce); + + strbuf_setlen(base, len); + return 0; +} void ensure_full_index(struct index_state *istate) { - /* intentionally left blank */ + int i; + struct index_state *full; + struct strbuf base = STRBUF_INIT; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + trace2_region_enter("index", "ensure_full_index", istate->repo); + + /* initialize basics of new index */ + full = xcalloc(1, sizeof(struct index_state)); + memcpy(full, istate, sizeof(struct index_state)); + + /* then change the necessary things */ + full->sparse_index = 0; + full->cache_alloc = (3 * istate->cache_alloc) / 2; + full->cache_nr = 0; + ALLOC_ARRAY(full->cache, full->cache_alloc); + + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + struct tree *tree; + struct pathspec ps; + + if (!S_ISSPARSEDIR(ce->ce_mode)) { + set_index_entry(full, full->cache_nr++, ce); + continue; + } + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) + warning(_("index entry is a directory, but not sparse (%08x)"), + ce->ce_flags); + + /* recursively walk into cd->name */ + tree = lookup_tree(istate->repo, &ce->oid); + + memset(&ps, 0, sizeof(ps)); + ps.recursive = 1; + ps.has_wildcard = 1; + ps.max_depth = -1; + + strbuf_setlen(&base, 0); + strbuf_add(&base, ce->name, strlen(ce->name)); + + read_tree_at(istate->repo, tree, &base, &ps, + add_path_to_index, full); + + /* free directory entries. full entries are re-used */ + discard_cache_entry(ce); + } + + /* Copy back into original index. */ + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); + istate->sparse_index = 0; + free(istate->cache); + istate->cache = full->cache; + istate->cache_nr = full->cache_nr; + istate->cache_alloc = full->cache_alloc; + + strbuf_release(&base); + free(full); + + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 05/20] sparse-index: implement ensure_full_index() 2021-03-16 16:42 ` [PATCH v3 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-03-17 13:03 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:03 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > [...] > +static int add_path_to_index(const struct object_id *oid, > + struct strbuf *base, const char *path, > + unsigned int mode, void *context) > +{ > + struct index_state *istate = (struct index_state *)context; > + struct cache_entry *ce; > + size_t len = base->len; > + > + if (S_ISDIR(mode)) > + return READ_TREE_RECURSIVE; > + > + strbuf_addstr(base, path); > + > + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); > + ce->ce_flags |= CE_SKIP_WORKTREE; > + set_index_entry(istate, istate->cache_nr++, ce); > + > + strbuf_setlen(base, len); > + return 0; > +} > > void ensure_full_index(struct index_state *istate) > { > - /* intentionally left blank */ > + int i; > + struct index_state *full; > + struct strbuf base = STRBUF_INIT; > + > + if (!istate || !istate->sparse_index) > + return; > + > + if (!istate->repo) > + istate->repo = the_repository; > + > + trace2_region_enter("index", "ensure_full_index", istate->repo); > + > + /* initialize basics of new index */ > + full = xcalloc(1, sizeof(struct index_state)); > + memcpy(full, istate, sizeof(struct index_state)); > + > + /* then change the necessary things */ > + full->sparse_index = 0; > + full->cache_alloc = (3 * istate->cache_alloc) / 2; > + full->cache_nr = 0; > + ALLOC_ARRAY(full->cache, full->cache_alloc); > + > + for (i = 0; i < istate->cache_nr; i++) { > + struct cache_entry *ce = istate->cache[i]; > + struct tree *tree; > + struct pathspec ps; > + > + if (!S_ISSPARSEDIR(ce->ce_mode)) { > + set_index_entry(full, full->cache_nr++, ce); > + continue; > + } > + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) > + warning(_("index entry is a directory, but not sparse (%08x)"), > + ce->ce_flags); > + > + /* recursively walk into cd->name */ > + tree = lookup_tree(istate->repo, &ce->oid); > + > + memset(&ps, 0, sizeof(ps)); > + ps.recursive = 1; > + ps.has_wildcard = 1; > + ps.max_depth = -1; > + > + strbuf_setlen(&base, 0); > + strbuf_add(&base, ce->name, strlen(ce->name)); > + > + read_tree_at(istate->repo, tree, &base, &ps, > + add_path_to_index, full); > + > + /* free directory entries. full entries are re-used */ > + discard_cache_entry(ce); > + } > + > + /* Copy back into original index. */ > + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); > + istate->sparse_index = 0; > + free(istate->cache); > + istate->cache = full->cache; > + istate->cache_nr = full->cache_nr; > + istate->cache_alloc = full->cache_alloc; > + > + strbuf_release(&base); > + free(full); > + > + trace2_region_leave("index", "ensure_full_index", istate->repo); > } Not that I mind having added the read_tree_at() again, but just thinking aloud here. So we need this loop here because there's nothing like a read_tree_at() that knows how to start at the non-tree root of the index, and then for each directory there we're going to perform the equivalent of a read_tree() there, but we need to set the base for add_path_to_index() since we started at subdirs, not the root. That's fine, but grepping around a bit I wonder if we shouldn't eventually have some slightly fancier API that just works like read_tree() but takes an optional "start at the index's root" instead. Well, things that want that usually care about the index-specific bits, whereas this "I just care about the tree for these" is more of a special case I guess. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 06/20] t1092: compare sparse-checkout to sparse-index 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (17 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a new 'sparse-index' repo alongside the 'full-checkout' and 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also add run_on_sparse and test_sparse_match helpers. These helpers will be used when the sparse index is implemented. Add the GIT_TEST_SPARSE_INDEX environment variable to enable the sparse-index by default. This can be enabled across all tests, but that will only affect cases where the sparse-checkout feature is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/README | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/t/README b/t/README index 593d4a4e270c..b98bc563aab5 100644 --- a/t/README +++ b/t/README @@ -439,6 +439,9 @@ and "sha256". GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the 'pack.writeReverseIndex' setting. +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the +sparse-index format by default. + Naming Tests ------------ diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 3725d3997e70..de5d8461c993 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' test_expect_success 'setup' ' git init initial-repo && ( + GIT_TEST_SPARSE_INDEX=0 && cd initial-repo && echo a >a && echo "after deep" >e && @@ -87,23 +88,32 @@ init_repos () { cp -r initial-repo sparse-checkout && git -C sparse-checkout reset --hard && - git -C sparse-checkout sparse-checkout init --cone && + + cp -r initial-repo sparse-index && + git -C sparse-index reset --hard && # initialize sparse-checkout definitions - git -C sparse-checkout sparse-checkout set deep + git -C sparse-checkout sparse-checkout init --cone && + git -C sparse-checkout sparse-checkout set deep && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - "$@" >../sparse-checkout-out 2>../sparse-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + ) && + ( + cd sparse-index && + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - "$@" >../full-checkout-out 2>../full-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -114,6 +124,12 @@ test_all_match () { test_cmp full-checkout-err sparse-checkout-err } +test_sparse_match () { + run_on_sparse "$@" && + test_cmp sparse-checkout-out sparse-index-out && + test_cmp sparse-checkout-err sparse-index-err +} + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 07/20] test-read-cache: print cache entries with --table 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-17 13:28 ` [RFC/PATCH 0/5] " Ævar Arnfjörð Bjarmason ` (5 more replies) 2021-03-16 16:42 ` [PATCH v3 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget ` (16 subsequent siblings) 23 siblings, 6 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This table is helpful for discovering data in the index to ensure it is being written correctly, especially as we build and test the sparse-index. This table includes an output format similar to 'git ls-tree', but should not be compared to that directly. The biggest reasons are that 'git ls-tree' includes a tree entry for every subdirectory, even those that would not appear as a sparse directory in a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. This does not print the stat() information for the blobs. That could be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. Care must be taken with the final check for the 'cnt' variable. We continue the expectation that the numerical value is the final argument. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 244977a29bdf..6cfd8f2de71c 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -1,36 +1,71 @@ #include "test-tool.h" #include "cache.h" #include "config.h" +#include "blob.h" +#include "commit.h" +#include "tree.h" + +static void print_cache_entry(struct cache_entry *ce) +{ + const char *type; + printf("%06o ", ce->ce_mode & 0177777); + + if (S_ISSPARSEDIR(ce->ce_mode)) + type = tree_type; + else if (S_ISGITLINK(ce->ce_mode)) + type = commit_type; + else + type = blob_type; + + printf("%s %s\t%s\n", + type, + oid_to_hex(&ce->oid), + ce->name); +} + +static void print_cache(struct index_state *istate) +{ + int i; + for (i = 0; i < istate->cache_nr; i++) + print_cache_entry(istate->cache[i]); +} int cmd__read_cache(int argc, const char **argv) { + struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; + int table = 0; - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { - argc--; - argv++; + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { + if (skip_prefix(*argv, "--print-and-refresh=", &name)) + continue; + if (!strcmp(*argv, "--table")) + table = 1; } - if (argc == 2) - cnt = strtol(argv[1], NULL, 0); + if (argc == 1) + cnt = strtol(argv[0], NULL, 0); setup_git_directory(); git_config(git_default_config, NULL); + for (i = 0; i < cnt; i++) { - read_cache(); + repo_read_index(r); if (name) { int pos; - refresh_index(&the_index, REFRESH_QUIET, + refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL); - pos = index_name_pos(&the_index, name, strlen(name)); + pos = index_name_pos(r->index, name, strlen(name)); if (pos < 0) die("%s not in index", name); printf("%s is%s up to date\n", name, - ce_uptodate(the_index.cache[pos]) ? "" : " not"); + ce_uptodate(r->index->cache[pos]) ? "" : " not"); write_file(name, "%d\n", i); } - discard_cache(); + if (table) + print_cache(r->index); + discard_index(r->index); } return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [RFC/PATCH 0/5] Re: [PATCH v3 07/20] test-read-cache: print cache entries with --table 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-03-17 13:28 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:28 ` Elijah Newren 2021-03-17 13:28 ` [RFC/PATCH 1/5] ls-files: defer read_index() after parse_options() etc Ævar Arnfjörð Bjarmason ` (4 subsequent siblings) 5 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee > From: Derrick Stolee <dstolee@microsoft.com> > > This table is helpful for discovering data in the index to ensure it is > being written correctly, especially as we build and test the > sparse-index. This table includes an output format similar to 'git > ls-tree', but should not be compared to that directly. The biggest > reasons are that 'git ls-tree' includes a tree entry for every > subdirectory, even those that would not appear as a sparse directory in > a sparse-index. Further, 'git ls-tree' does not use a trailing directory > separator for its tree rows. > > This does not print the stat() information for the blobs. That could be > added in a future change with another option. The tests that are added > in the next few changes care only about the object types and IDs. > > To make the option parsing slightly more robust, wrap the string > comparisons in a loop adapted from test-dir-iterator.c. > > Care must be taken with the final check for the 'cnt' variable. We > continue the expectation that the numerical value is the final argument. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- > 1 file changed, 45 insertions(+), 10 deletions(-) > > diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c > index 244977a29bdf..6cfd8f2de71c 100644 > --- a/t/helper/test-read-cache.c > +++ b/t/helper/test-read-cache.c > @@ -1,36 +1,71 @@ > #include "test-tool.h" > #include "cache.h" > #include "config.h" > +#include "blob.h" > +#include "commit.h" > +#include "tree.h" > + > +static void print_cache_entry(struct cache_entry *ce) > +{ > + const char *type; > + printf("%06o ", ce->ce_mode & 0177777); > + > + if (S_ISSPARSEDIR(ce->ce_mode)) > + type = tree_type; > + else if (S_ISGITLINK(ce->ce_mode)) > + type = commit_type; > + else > + type = blob_type; > + > + printf("%s %s\t%s\n", > + type, > + oid_to_hex(&ce->oid), > + ce->name); > +} > + So we have a test tool that's mostly ls-files but mocks the output ls-tree would emit, won't these tests eventually care about what stage things are in? What follows is an RFC series on top that's the result of me wondering why if we're adding new index constructs we aren't updating our plumbing to emit that data, can we just add this to ls-files and drop this test helper? Turns out: Yes we can. Ævar Arnfjörð Bjarmason (5): ls-files: defer read_index() after parse_options() etc. ls-files: make "mode" in show_ce() loop a variable ls-files: add and use a new --sparse option test-tool read-cache: --table is redundant to ls-files test-tool: split up test-tool read-cache Documentation/git-ls-files.txt | 4 ++ Makefile | 3 +- builtin/ls-files.c | 29 +++++++-- t/helper/test-read-cache-again.c | 31 +++++++++ t/helper/test-read-cache-perf.c | 21 ++++++ t/helper/test-read-cache.c | 82 ------------------------ t/helper/test-tool.c | 3 +- t/helper/test-tool.h | 3 +- t/perf/p0002-read-cache.sh | 2 +- t/t1091-sparse-checkout-builtin.sh | 9 +-- t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++------ t/t7519-status-fsmonitor.sh | 2 +- 12 files changed, 131 insertions(+), 115 deletions(-) create mode 100644 t/helper/test-read-cache-again.c create mode 100644 t/helper/test-read-cache-perf.c delete mode 100644 t/helper/test-read-cache.c -- 2.31.0.260.g719c683c1d ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 0/5] Re: [PATCH v3 07/20] test-read-cache: print cache entries with --table 2021-03-17 13:28 ` [RFC/PATCH 0/5] " Ævar Arnfjörð Bjarmason @ 2021-03-17 18:28 ` Elijah Newren 2021-03-17 19:46 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-17 18:28 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > From: Derrick Stolee <dstolee@microsoft.com> > > > > This table is helpful for discovering data in the index to ensure it is > > being written correctly, especially as we build and test the > > sparse-index. This table includes an output format similar to 'git > > ls-tree', but should not be compared to that directly. The biggest > > reasons are that 'git ls-tree' includes a tree entry for every > > subdirectory, even those that would not appear as a sparse directory in > > a sparse-index. Further, 'git ls-tree' does not use a trailing directory > > separator for its tree rows. > > > > This does not print the stat() information for the blobs. That could be > > added in a future change with another option. The tests that are added > > in the next few changes care only about the object types and IDs. > > > > To make the option parsing slightly more robust, wrap the string > > comparisons in a loop adapted from test-dir-iterator.c. > > > > Care must be taken with the final check for the 'cnt' variable. We > > continue the expectation that the numerical value is the final argument. > > > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > --- > > t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- > > 1 file changed, 45 insertions(+), 10 deletions(-) > > > > diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c > > index 244977a29bdf..6cfd8f2de71c 100644 > > --- a/t/helper/test-read-cache.c > > +++ b/t/helper/test-read-cache.c > > @@ -1,36 +1,71 @@ > > #include "test-tool.h" > > #include "cache.h" > > #include "config.h" > > +#include "blob.h" > > +#include "commit.h" > > +#include "tree.h" > > + > > +static void print_cache_entry(struct cache_entry *ce) > > +{ > > + const char *type; > > + printf("%06o ", ce->ce_mode & 0177777); > > + > > + if (S_ISSPARSEDIR(ce->ce_mode)) > > + type = tree_type; > > + else if (S_ISGITLINK(ce->ce_mode)) > > + type = commit_type; > > + else > > + type = blob_type; > > + > > + printf("%s %s\t%s\n", > > + type, > > + oid_to_hex(&ce->oid), > > + ce->name); > > +} > > + > > So we have a test tool that's mostly ls-files but mocks the output > ls-tree would emit, won't these tests eventually care about what stage > things are in? > > What follows is an RFC series on top that's the result of me wondering > why if we're adding new index constructs we aren't updating our > plumbing to emit that data, can we just add this to ls-files and drop > this test helper? > > Turns out: Yes we can. I like the idea of having ls-files be usable to show the entries that are in the index; that seems great to me. I very much dislike the --sparse flag to ls-files, as noted on that commit. Also, as a minor point, the first two patches seemed a bit confusing to me. The first commit said that it was there solely to make "the next commit" easier, and the second was worded as just making the next patch easier, which made me wonder if the wording in the first commit message was referring to 3/5 when it said "the next commit". Both of the first two commits were so tiny that if they are both prep for 3/5, maybe it makes sense to combine them (together or both to 3/5)? If not, maybe the commit messages could be cleaned up or clarified a bit? > Ævar Arnfjörð Bjarmason (5): > ls-files: defer read_index() after parse_options() etc. > ls-files: make "mode" in show_ce() loop a variable > ls-files: add and use a new --sparse option > test-tool read-cache: --table is redundant to ls-files > test-tool: split up test-tool read-cache > > Documentation/git-ls-files.txt | 4 ++ > Makefile | 3 +- > builtin/ls-files.c | 29 +++++++-- > t/helper/test-read-cache-again.c | 31 +++++++++ > t/helper/test-read-cache-perf.c | 21 ++++++ > t/helper/test-read-cache.c | 82 ------------------------ > t/helper/test-tool.c | 3 +- > t/helper/test-tool.h | 3 +- > t/perf/p0002-read-cache.sh | 2 +- > t/t1091-sparse-checkout-builtin.sh | 9 +-- > t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++------ > t/t7519-status-fsmonitor.sh | 2 +- > 12 files changed, 131 insertions(+), 115 deletions(-) > create mode 100644 t/helper/test-read-cache-again.c > create mode 100644 t/helper/test-read-cache-perf.c > delete mode 100644 t/helper/test-read-cache.c > > -- > 2.31.0.260.g719c683c1d ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 0/5] Re: [PATCH v3 07/20] test-read-cache: print cache entries with --table 2021-03-17 18:28 ` Elijah Newren @ 2021-03-17 19:46 ` Derrick Stolee 2021-03-17 20:26 ` Elijah Newren 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-17 19:46 UTC (permalink / raw) To: Elijah Newren, Ævar Arnfjörð Bjarmason Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/17/2021 2:28 PM, Elijah Newren wrote: > On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >>> From: Derrick Stolee <dstolee@microsoft.com> >> >> So we have a test tool that's mostly ls-files but mocks the output >> ls-tree would emit, won't these tests eventually care about what stage >> things are in? >> >> What follows is an RFC series on top that's the result of me wondering >> why if we're adding new index constructs we aren't updating our >> plumbing to emit that data, can we just add this to ls-files and drop >> this test helper? >> >> Turns out: Yes we can. > > I like the idea of having ls-files be usable to show the entries that > are in the index; that seems great to me. I very much dislike the > --sparse flag to ls-files, as noted on that commit. I don't like this idea. I don't think exposing internal structures like this is something we want to do so quickly. Further, I intend to use this test tool in the future to _also_ show the stored stat() data, which would be inappropriate here in ls-files. I would prefer to continue using the test helper here and leave functional changes to ls-files be considered independently. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 0/5] Re: [PATCH v3 07/20] test-read-cache: print cache entries with --table 2021-03-17 19:46 ` Derrick Stolee @ 2021-03-17 20:26 ` Elijah Newren 2021-03-17 20:34 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-17 20:26 UTC (permalink / raw) To: Derrick Stolee Cc: Ævar Arnfjörð Bjarmason, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17, 2021 at 12:46 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/17/2021 2:28 PM, Elijah Newren wrote: > > On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> > >>> From: Derrick Stolee <dstolee@microsoft.com> > > >> > >> So we have a test tool that's mostly ls-files but mocks the output > >> ls-tree would emit, won't these tests eventually care about what stage > >> things are in? > >> > >> What follows is an RFC series on top that's the result of me wondering > >> why if we're adding new index constructs we aren't updating our > >> plumbing to emit that data, can we just add this to ls-files and drop > >> this test helper? > >> > >> Turns out: Yes we can. > > > > I like the idea of having ls-files be usable to show the entries that > > are in the index; that seems great to me. I very much dislike the > > --sparse flag to ls-files, as noted on that commit. > > I don't like this idea. I don't think exposing internal structures > like this is something we want to do so quickly. Not sure I follow; ls-files was already about exposing three bits of internal structures for index entries: mode, hash, and stage number. These are quantities that are well-defined for sparse directories too. It would not be exposing any new or different internal structures, nor changing the output format. (Ævar changed the tests to not look for "tree" but to look for the "040000" mode number.) > Further, I intend > to use this test tool in the future to _also_ show the stored stat() > data, which would be inappropriate here in ls-files. > > I would prefer to continue using the test helper here and leave > functional changes to ls-files be considered independently. Well, I was okay with it being in a test helper regardless of whether it could be done with ls-files, and then just circling back and fixing up ls-files later. But perhaps it's worth calling out in the commit message about your plans to add stat() data and how that future piece can't be done in ls-files (without functional changes of some sort) just to make it clearer why we're using a test helper instead of front-loading the port of ls-files over to sparse-indexes? ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 0/5] Re: [PATCH v3 07/20] test-read-cache: print cache entries with --table 2021-03-17 20:26 ` Elijah Newren @ 2021-03-17 20:34 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-17 20:34 UTC (permalink / raw) To: Elijah Newren Cc: Ævar Arnfjörð Bjarmason, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/17/2021 4:26 PM, Elijah Newren wrote: > On Wed, Mar 17, 2021 at 12:46 PM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 3/17/2021 2:28 PM, Elijah Newren wrote: >>> On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason >>> <avarab@gmail.com> wrote: >>>> >>>>> From: Derrick Stolee <dstolee@microsoft.com> >> >>>> >>>> So we have a test tool that's mostly ls-files but mocks the output >>>> ls-tree would emit, won't these tests eventually care about what stage >>>> things are in? >>>> >>>> What follows is an RFC series on top that's the result of me wondering >>>> why if we're adding new index constructs we aren't updating our >>>> plumbing to emit that data, can we just add this to ls-files and drop >>>> this test helper? >>>> >>>> Turns out: Yes we can. >>> >>> I like the idea of having ls-files be usable to show the entries that >>> are in the index; that seems great to me. I very much dislike the >>> --sparse flag to ls-files, as noted on that commit. >> >> I don't like this idea. I don't think exposing internal structures >> like this is something we want to do so quickly. > > Not sure I follow; ls-files was already about exposing three bits of > internal structures for index entries: mode, hash, and stage number. > These are quantities that are well-defined for sparse directories too. > It would not be exposing any new or different internal structures, nor > changing the output format. (Ævar changed the tests to not look for > "tree" but to look for the "040000" mode number.) True, that is some internal information already. >> Further, I intend >> to use this test tool in the future to _also_ show the stored stat() >> data, which would be inappropriate here in ls-files. >> >> I would prefer to continue using the test helper here and leave >> functional changes to ls-files be considered independently. > > Well, I was okay with it being in a test helper regardless of whether > it could be done with ls-files, and then just circling back and fixing > up ls-files later. But perhaps it's worth calling out in the commit > message about your plans to add stat() data and how that future piece > can't be done in ls-files (without functional changes of some sort) > just to make it clearer why we're using a test helper instead of > front-loading the port of ls-files over to sparse-indexes? Adding this justification to the commit message would definitely be helpful, so I will do that. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [RFC/PATCH 1/5] ls-files: defer read_index() after parse_options() etc. 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-03-17 13:28 ` [RFC/PATCH 0/5] " Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable Ævar Arnfjörð Bjarmason ` (3 subsequent siblings) 5 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee Move the reading of the index below the parsing of options. We'll need to setup some index options in the next commit after option parsing, but in any case it makes sense to give parse_options() handling a chance to die early before we perform the more expensive operation of reading the index. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/ls-files.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 13bcc2d847..eb72d16493 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -681,9 +681,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) prefix_len = strlen(prefix); git_config(git_default_config, NULL); - if (repo_read_index(the_repository) < 0) - die("index file corrupt"); - argc = parse_options(argc, argv, prefix, builtin_ls_files_options, ls_files_usage, 0); pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option"); @@ -743,6 +740,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) max_prefix = common_prefix(&pathspec); max_prefix_len = get_common_prefix_len(max_prefix); + /* + * Read the index after parse options etc. have had a chance + * to die early. + */ + if (repo_read_index(the_repository) < 0) + die("index file corrupt"); prune_index(the_repository->index, max_prefix, max_prefix_len); /* Treat unmatching pathspec elements as errors */ -- 2.31.0.260.g719c683c1d ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-03-17 13:28 ` [RFC/PATCH 0/5] " Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 1/5] ls-files: defer read_index() after parse_options() etc Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:11 ` Elijah Newren 2021-03-17 13:28 ` [RFC/PATCH 3/5] ls-files: add and use a new --sparse option Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 5 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee In a subsequent commit I'll optionally change the mode in a new sparse mode, let's do this first to make that change smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/ls-files.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index eb72d16493..4db75351f2 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -242,9 +242,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir, if (!show_stage) { fputs(tag, stdout); } else { + unsigned int mode = ce->ce_mode; + if (show_sparse && S_ISSPARSEDIR(mode)) + /* + * We could just do & 0177777 all the + * time, just make it clear this is + * for --stage-sparse. + */ + mode &= 0177777; printf("%s%06o %s %d\t", tag, - ce->ce_mode, + mode, find_unique_abbrev(&ce->oid, abbrev), ce_stage(ce)); } -- 2.31.0.260.g719c683c1d ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable 2021-03-17 13:28 ` [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable Ævar Arnfjörð Bjarmason @ 2021-03-17 18:11 ` Elijah Newren 2021-03-24 0:46 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-17 18:11 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > In a subsequent commit I'll optionally change the mode in a new sparse > mode, let's do this first to make that change smaller. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > builtin/ls-files.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > index eb72d16493..4db75351f2 100644 > --- a/builtin/ls-files.c > +++ b/builtin/ls-files.c > @@ -242,9 +242,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir, > if (!show_stage) { > fputs(tag, stdout); > } else { > + unsigned int mode = ce->ce_mode; > + if (show_sparse && S_ISSPARSEDIR(mode)) > + /* > + * We could just do & 0177777 all the > + * time, just make it clear this is > + * for --stage-sparse. > + */ > + mode &= 0177777; I could kind of see referencing the magic constant 0177777 in a test-* source file, but it really needs an explanation when showing up in actual git source code. At least reference something about how cache.h mentions these are the mode bits, or better yet #define this constant somewhere in cache.h with an explanation. Also, what is --stage-sparse? > printf("%s%06o %s %d\t", > tag, > - ce->ce_mode, > + mode, > find_unique_abbrev(&ce->oid, abbrev), > ce_stage(ce)); > } > -- > 2.31.0.260.g719c683c1d ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable 2021-03-17 18:11 ` Elijah Newren @ 2021-03-24 0:46 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-24 0:46 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17 2021, Elijah Newren wrote: > On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> In a subsequent commit I'll optionally change the mode in a new sparse >> mode, let's do this first to make that change smaller. >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> builtin/ls-files.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/builtin/ls-files.c b/builtin/ls-files.c >> index eb72d16493..4db75351f2 100644 >> --- a/builtin/ls-files.c >> +++ b/builtin/ls-files.c >> @@ -242,9 +242,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir, >> if (!show_stage) { >> fputs(tag, stdout); >> } else { >> + unsigned int mode = ce->ce_mode; >> + if (show_sparse && S_ISSPARSEDIR(mode)) >> + /* >> + * We could just do & 0177777 all the >> + * time, just make it clear this is >> + * for --stage-sparse. >> + */ >> + mode &= 0177777; > > I could kind of see referencing the magic constant 0177777 in a test-* > source file, but it really needs an explanation when showing up in > actual git source code. At least reference something about how > cache.h mentions these are the mode bits, or better yet #define this > constant somewhere in cache.h with an explanation. > > Also, what is --stage-sparse? A relic from a WIP version of this patch. I ended up just calling it --sparse in 3/5. >> printf("%s%06o %s %d\t", >> tag, >> - ce->ce_mode, >> + mode, >> find_unique_abbrev(&ce->oid, abbrev), >> ce_stage(ce)); >> } >> -- >> 2.31.0.260.g719c683c1d ^ permalink raw reply [flat|nested] 203+ messages in thread
* [RFC/PATCH 3/5] ls-files: add and use a new --sparse option 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2021-03-17 13:28 ` [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:19 ` Elijah Newren 2021-03-17 20:43 ` Derrick Stolee 2021-03-17 13:28 ` [RFC/PATCH 4/5] test-tool read-cache: --table is redundant to ls-files Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 5/5] test-tool: split up test-tool read-cache Ævar Arnfjörð Bjarmason 5 siblings, 2 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Documentation/git-ls-files.txt | 4 ++ builtin/ls-files.c | 10 ++++- t/t1091-sparse-checkout-builtin.sh | 9 ++-- t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++++++++-------- 4 files changed, 56 insertions(+), 24 deletions(-) diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index 6d11ab506b..1145e960a4 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -71,6 +71,10 @@ OPTIONS --unmerged:: Show unmerged files in the output (forces --stage) +--sparse:: + Show sparse directories in the output instead of expanding + them (forces --stage) + -k:: --killed:: Show files on the filesystem that need to be removed due diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 4db75351f2..1ebbb63c10 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -26,6 +26,7 @@ static int show_deleted; static int show_cached; static int show_others; static int show_stage; +static int show_sparse; static int show_unmerged; static int show_resolve_undo; static int show_modified; @@ -639,6 +640,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) DIR_SHOW_IGNORED), OPT_BOOL('s', "stage", &show_stage, N_("show staged contents' object name in the output")), + OPT_BOOL(0, "sparse", &show_sparse, + N_("show unexpanded sparse directories in the output")), OPT_BOOL('k', "killed", &show_killed, N_("show files on the filesystem that need to be removed")), OPT_BIT(0, "directory", &dir.flags, @@ -705,12 +708,17 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) tag_skip_worktree = "S "; tag_resolve_undo = "U "; } + if (show_sparse) { + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + } if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed) require_work_tree = 1; - if (show_unmerged) + if (show_unmerged || show_sparse) /* * There's no point in showing unmerged unless * you also show the stage information. + * The same goes for the --sparse option. */ show_stage = 1; if (show_tag || show_stage) diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index ff1ad570a2..c823df423c 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -208,12 +208,13 @@ test_expect_success 'sparse-checkout disable' ' test_expect_success 'sparse-index enabled and disabled' ' git -C repo sparse-checkout init --cone --sparse-index && test_cmp_config -C repo true extensions.sparseIndex && - test-tool -C repo read-cache --table >cache && - grep " tree " cache && + git -C repo ls-files --sparse >cache && + grep "^040000 " cache >lines && + test_line_count = 3 lines && git -C repo sparse-checkout disable && - test-tool -C repo read-cache --table >cache && - ! grep " tree " cache && + git -C repo ls-files --sparse >cache && + ! grep "^040000 " cache && git -C repo config --list >config && ! grep extensions.sparseindex config ' diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index d97bf9b645..48d3920490 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -136,48 +136,67 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_index_entry_like () { + dir=$1 + shift + fmt=$1 + shift + rev=$1 + shift + entry=$1 + shift + file=$1 + shift + hash=$(git -C "$dir" rev-parse "$rev") && + printf "$fmt\n" "$hash" "$entry" >expected && + if grep "$entry" "$file" >line + then + test_cmp expected line + else + cat cache && + false + fi +} + test_expect_success 'sparse-index contents' ' init_repos && - test-tool -C sparse-index read-cache --table >cache && + git -C sparse-index ls-files --sparse >cache && for dir in folder1 folder2 x do - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && - grep "040000 tree $TREE $dir/" cache \ - || return 1 + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 done && git -C sparse-index sparse-checkout set folder1 && - test-tool -C sparse-index read-cache --table >cache && + git -C sparse-index ls-files --sparse >cache && for dir in deep folder2 x do - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && - grep "040000 tree $TREE $dir/" cache \ - || return 1 + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 done && git -C sparse-index sparse-checkout set deep/deeper1 && - test-tool -C sparse-index read-cache --table >cache && + git -C sparse-index ls-files --sparse >cache && for dir in deep/deeper2 folder1 folder2 x do - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && - grep "040000 tree $TREE $dir/" cache \ - || return 1 + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 done && + grep 040000 cache >lines && + test_line_count = 4 lines && + # Disabling the sparse-index removes tree entries with full ones git -C sparse-index sparse-checkout init --no-sparse-index && - test-tool -C sparse-index read-cache --table >cache && - ! grep "040000 tree" cache && - test_sparse_match test-tool read-cache --table + git -C sparse-index ls-files --sparse >cache && + ! grep "^040000 " cache >lines && + test_sparse_match git ls-tree -r HEAD ' test_expect_success 'expanded in-memory index matches full index' ' init_repos && - test_sparse_match test-tool read-cache --expand --table + test_sparse_match git ls-tree -r HEAD ' test_expect_success 'status with options' ' @@ -394,9 +413,9 @@ test_expect_success 'submodule handling' ' test_all_match git commit -m "add submodule" && # having a submodule prevents "modules" from collapse - test-tool -C sparse-index read-cache --table >cache && - grep "100644 blob .* modules/a" cache && - grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache + git -C sparse-index ls-files --sparse >cache && + test_index_entry_like sparse-index "100644 %s 0\t%s" "HEAD:modules/a" "modules/a" cache && + test_index_entry_like sparse-index "160000 %s 0\t%s" "HEAD:modules/sub" "modules/sub" cache ' test_expect_success 'sparse-index is expanded and converted back' ' -- 2.31.0.260.g719c683c1d ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 3/5] ls-files: add and use a new --sparse option 2021-03-17 13:28 ` [RFC/PATCH 3/5] ls-files: add and use a new --sparse option Ævar Arnfjörð Bjarmason @ 2021-03-17 18:19 ` Elijah Newren 2021-03-17 18:27 ` Ævar Arnfjörð Bjarmason 2021-03-17 20:43 ` Derrick Stolee 1 sibling, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-17 18:19 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > Documentation/git-ls-files.txt | 4 ++ > builtin/ls-files.c | 10 ++++- > t/t1091-sparse-checkout-builtin.sh | 9 ++-- > t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++++++++-------- > 4 files changed, 56 insertions(+), 24 deletions(-) > > diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt > index 6d11ab506b..1145e960a4 100644 > --- a/Documentation/git-ls-files.txt > +++ b/Documentation/git-ls-files.txt > @@ -71,6 +71,10 @@ OPTIONS > --unmerged:: > Show unmerged files in the output (forces --stage) > > +--sparse:: > + Show sparse directories in the output instead of expanding > + them (forces --stage) > + > -k:: > --killed:: > Show files on the filesystem that need to be removed due > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > index 4db75351f2..1ebbb63c10 100644 > --- a/builtin/ls-files.c > +++ b/builtin/ls-files.c > @@ -26,6 +26,7 @@ static int show_deleted; > static int show_cached; > static int show_others; > static int show_stage; > +static int show_sparse; > static int show_unmerged; > static int show_resolve_undo; > static int show_modified; > @@ -639,6 +640,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > DIR_SHOW_IGNORED), > OPT_BOOL('s', "stage", &show_stage, > N_("show staged contents' object name in the output")), > + OPT_BOOL(0, "sparse", &show_sparse, > + N_("show unexpanded sparse directories in the output")), > OPT_BOOL('k', "killed", &show_killed, > N_("show files on the filesystem that need to be removed")), > OPT_BIT(0, "directory", &dir.flags, > @@ -705,12 +708,17 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > tag_skip_worktree = "S "; > tag_resolve_undo = "U "; > } > + if (show_sparse) { > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + } > if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed) > require_work_tree = 1; > - if (show_unmerged) > + if (show_unmerged || show_sparse) > /* > * There's no point in showing unmerged unless > * you also show the stage information. > + * The same goes for the --sparse option. Yuck, haven't you just made --sparse an alias for --stage? Why does it need an alias? Was the goal just to get a quick way to make the command run under repo->settings.command_requires_full_index = 0 without auditing the codepaths? It seems to rely on them having been audited anyway, since it just falls back to the code used for --stage, so I don't see how it helps. It also suggests the command might do unexpected or weird things if run without the --sparse option? If people manually configure a sparse-checkout and cone mode AND a sparse-index (it's annoying how they have to specify all three instead of having to just pass one flag somewhere), then now we also need to force them to remember to pass extra flags to random various commands for them to operate in a sane manner in their environment?? I think this is a bad path to go down. However, if you want to write the necessary tests to make it so that ls-files can operate with command_requires_full_index = 0, then I think that's useful. If you want to add a special flag so that folks in a sparse-checkout-with-cone-mode-with-sparse-index setup want to operate densely (i.e. to show what files would be in the index if it were fully populated), then I think that's useful. But having sparse-yes-with-cone-yes-very-sparse folks need to specify an extra flag to commands to get sparse behavior just seems wrong to me. > */ > show_stage = 1; > if (show_tag || show_stage) > diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh > index ff1ad570a2..c823df423c 100755 > --- a/t/t1091-sparse-checkout-builtin.sh > +++ b/t/t1091-sparse-checkout-builtin.sh > @@ -208,12 +208,13 @@ test_expect_success 'sparse-checkout disable' ' > test_expect_success 'sparse-index enabled and disabled' ' > git -C repo sparse-checkout init --cone --sparse-index && > test_cmp_config -C repo true extensions.sparseIndex && > - test-tool -C repo read-cache --table >cache && > - grep " tree " cache && > + git -C repo ls-files --sparse >cache && > + grep "^040000 " cache >lines && > + test_line_count = 3 lines && > > git -C repo sparse-checkout disable && > - test-tool -C repo read-cache --table >cache && > - ! grep " tree " cache && > + git -C repo ls-files --sparse >cache && > + ! grep "^040000 " cache && > git -C repo config --list >config && > ! grep extensions.sparseindex config > ' > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index d97bf9b645..48d3920490 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -136,48 +136,67 @@ test_sparse_match () { > test_cmp sparse-checkout-err sparse-index-err > } > > +test_index_entry_like () { > + dir=$1 > + shift > + fmt=$1 > + shift > + rev=$1 > + shift > + entry=$1 > + shift > + file=$1 > + shift > + hash=$(git -C "$dir" rev-parse "$rev") && > + printf "$fmt\n" "$hash" "$entry" >expected && > + if grep "$entry" "$file" >line > + then > + test_cmp expected line > + else > + cat cache && > + false > + fi > +} > + > test_expect_success 'sparse-index contents' ' > init_repos && > > - test-tool -C sparse-index read-cache --table >cache && > + git -C sparse-index ls-files --sparse >cache && > for dir in folder1 folder2 x > do > - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > - grep "040000 tree $TREE $dir/" cache \ > - || return 1 > + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 > done && > > git -C sparse-index sparse-checkout set folder1 && > > - test-tool -C sparse-index read-cache --table >cache && > + git -C sparse-index ls-files --sparse >cache && > for dir in deep folder2 x > do > - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > - grep "040000 tree $TREE $dir/" cache \ > - || return 1 > + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 > done && > > git -C sparse-index sparse-checkout set deep/deeper1 && > > - test-tool -C sparse-index read-cache --table >cache && > + git -C sparse-index ls-files --sparse >cache && > for dir in deep/deeper2 folder1 folder2 x > do > - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > - grep "040000 tree $TREE $dir/" cache \ > - || return 1 > + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 > done && > > + grep 040000 cache >lines && > + test_line_count = 4 lines && > + > # Disabling the sparse-index removes tree entries with full ones > git -C sparse-index sparse-checkout init --no-sparse-index && > > - test-tool -C sparse-index read-cache --table >cache && > - ! grep "040000 tree" cache && > - test_sparse_match test-tool read-cache --table > + git -C sparse-index ls-files --sparse >cache && > + ! grep "^040000 " cache >lines && > + test_sparse_match git ls-tree -r HEAD > ' > > test_expect_success 'expanded in-memory index matches full index' ' > init_repos && > - test_sparse_match test-tool read-cache --expand --table > + test_sparse_match git ls-tree -r HEAD > ' > > test_expect_success 'status with options' ' > @@ -394,9 +413,9 @@ test_expect_success 'submodule handling' ' > test_all_match git commit -m "add submodule" && > > # having a submodule prevents "modules" from collapse > - test-tool -C sparse-index read-cache --table >cache && > - grep "100644 blob .* modules/a" cache && > - grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache > + git -C sparse-index ls-files --sparse >cache && > + test_index_entry_like sparse-index "100644 %s 0\t%s" "HEAD:modules/a" "modules/a" cache && > + test_index_entry_like sparse-index "160000 %s 0\t%s" "HEAD:modules/sub" "modules/sub" cache > ' > > test_expect_success 'sparse-index is expanded and converted back' ' > -- > 2.31.0.260.g719c683c1d I do like the tests and your idea that we can use ls-files to list whatever entries are in the index, I just think the tests should use --stage to do that. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 3/5] ls-files: add and use a new --sparse option 2021-03-17 18:19 ` Elijah Newren @ 2021-03-17 18:27 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:44 ` Elijah Newren 0 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 18:27 UTC (permalink / raw) To: Elijah Newren Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17 2021, Elijah Newren wrote: > On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> Documentation/git-ls-files.txt | 4 ++ >> builtin/ls-files.c | 10 ++++- >> t/t1091-sparse-checkout-builtin.sh | 9 ++-- >> t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++++++++-------- >> 4 files changed, 56 insertions(+), 24 deletions(-) >> >> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt >> index 6d11ab506b..1145e960a4 100644 >> --- a/Documentation/git-ls-files.txt >> +++ b/Documentation/git-ls-files.txt >> @@ -71,6 +71,10 @@ OPTIONS >> --unmerged:: >> Show unmerged files in the output (forces --stage) >> >> +--sparse:: >> + Show sparse directories in the output instead of expanding >> + them (forces --stage) >> + >> -k:: >> --killed:: >> Show files on the filesystem that need to be removed due >> diff --git a/builtin/ls-files.c b/builtin/ls-files.c >> index 4db75351f2..1ebbb63c10 100644 >> --- a/builtin/ls-files.c >> +++ b/builtin/ls-files.c >> @@ -26,6 +26,7 @@ static int show_deleted; >> static int show_cached; >> static int show_others; >> static int show_stage; >> +static int show_sparse; >> static int show_unmerged; >> static int show_resolve_undo; >> static int show_modified; >> @@ -639,6 +640,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) >> DIR_SHOW_IGNORED), >> OPT_BOOL('s', "stage", &show_stage, >> N_("show staged contents' object name in the output")), >> + OPT_BOOL(0, "sparse", &show_sparse, >> + N_("show unexpanded sparse directories in the output")), >> OPT_BOOL('k', "killed", &show_killed, >> N_("show files on the filesystem that need to be removed")), >> OPT_BIT(0, "directory", &dir.flags, >> @@ -705,12 +708,17 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) >> tag_skip_worktree = "S "; >> tag_resolve_undo = "U "; >> } >> + if (show_sparse) { >> + prepare_repo_settings(the_repository); >> + the_repository->settings.command_requires_full_index = 0; >> + } >> if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed) >> require_work_tree = 1; >> - if (show_unmerged) >> + if (show_unmerged || show_sparse) >> /* >> * There's no point in showing unmerged unless >> * you also show the stage information. >> + * The same goes for the --sparse option. > > Yuck, haven't you just made --sparse an alias for --stage? Why does > it need an alias? It doesn't, but --unmerged, the one other option which purely modifies --stage output implies --stage. So it's in line with existing UI convention in the command, it's probably better to keep following that than have new options behave differently. But yeah, we could spell out --stage --sparse in the tests. > Was the goal just to get a quick way to make the command run under > repo->settings.command_requires_full_index = 0 without auditing the > codepaths? It seems to rely on them having been audited anyway, since > it just falls back to the code used for --stage, so I don't see how it > helps. It also suggests the command might do unexpected or weird > things if run without the --sparse option? If people manually > configure a sparse-checkout and cone mode AND a sparse-index (it's > annoying how they have to specify all three instead of having to just > pass one flag somewhere), then now we also need to force them to > remember to pass extra flags to random various commands for them to > operate in a sane manner in their environment?? > > I think this is a bad path to go down. Those are probably good points, I don't have enough overview of the whole sparse thing yet to say. I just thought it didn't make sense to have a series changing the nature of the index without corresponding tooling changes to interrogate the state of the index. > However, if you want to write the necessary tests to make it so that > ls-files can operate with command_requires_full_index = 0, then I > think that's useful. If you want to add a special flag so that folks > in a sparse-checkout-with-cone-mode-with-sparse-index setup want to > operate densely (i.e. to show what files would be in the index if it > were fully populated), then I think that's useful. But having > sparse-yes-with-cone-yes-very-sparse folks need to specify an extra > flag to commands to get sparse behavior just seems wrong to me. Maybe, but what else do you suggest for getting this information out of the index? >> */ >> show_stage = 1; >> if (show_tag || show_stage) >> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh >> index ff1ad570a2..c823df423c 100755 >> --- a/t/t1091-sparse-checkout-builtin.sh >> +++ b/t/t1091-sparse-checkout-builtin.sh >> @@ -208,12 +208,13 @@ test_expect_success 'sparse-checkout disable' ' >> test_expect_success 'sparse-index enabled and disabled' ' >> git -C repo sparse-checkout init --cone --sparse-index && >> test_cmp_config -C repo true extensions.sparseIndex && >> - test-tool -C repo read-cache --table >cache && >> - grep " tree " cache && >> + git -C repo ls-files --sparse >cache && >> + grep "^040000 " cache >lines && >> + test_line_count = 3 lines && >> >> git -C repo sparse-checkout disable && >> - test-tool -C repo read-cache --table >cache && >> - ! grep " tree " cache && >> + git -C repo ls-files --sparse >cache && >> + ! grep "^040000 " cache && >> git -C repo config --list >config && >> ! grep extensions.sparseindex config >> ' >> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh >> index d97bf9b645..48d3920490 100755 >> --- a/t/t1092-sparse-checkout-compatibility.sh >> +++ b/t/t1092-sparse-checkout-compatibility.sh >> @@ -136,48 +136,67 @@ test_sparse_match () { >> test_cmp sparse-checkout-err sparse-index-err >> } >> >> +test_index_entry_like () { >> + dir=$1 >> + shift >> + fmt=$1 >> + shift >> + rev=$1 >> + shift >> + entry=$1 >> + shift >> + file=$1 >> + shift >> + hash=$(git -C "$dir" rev-parse "$rev") && >> + printf "$fmt\n" "$hash" "$entry" >expected && >> + if grep "$entry" "$file" >line >> + then >> + test_cmp expected line >> + else >> + cat cache && >> + false >> + fi >> +} >> + >> test_expect_success 'sparse-index contents' ' >> init_repos && >> >> - test-tool -C sparse-index read-cache --table >cache && >> + git -C sparse-index ls-files --sparse >cache && >> for dir in folder1 folder2 x >> do >> - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && >> - grep "040000 tree $TREE $dir/" cache \ >> - || return 1 >> + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 >> done && >> >> git -C sparse-index sparse-checkout set folder1 && >> >> - test-tool -C sparse-index read-cache --table >cache && >> + git -C sparse-index ls-files --sparse >cache && >> for dir in deep folder2 x >> do >> - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && >> - grep "040000 tree $TREE $dir/" cache \ >> - || return 1 >> + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 >> done && >> >> git -C sparse-index sparse-checkout set deep/deeper1 && >> >> - test-tool -C sparse-index read-cache --table >cache && >> + git -C sparse-index ls-files --sparse >cache && >> for dir in deep/deeper2 folder1 folder2 x >> do >> - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && >> - grep "040000 tree $TREE $dir/" cache \ >> - || return 1 >> + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 >> done && >> >> + grep 040000 cache >lines && >> + test_line_count = 4 lines && >> + >> # Disabling the sparse-index removes tree entries with full ones >> git -C sparse-index sparse-checkout init --no-sparse-index && >> >> - test-tool -C sparse-index read-cache --table >cache && >> - ! grep "040000 tree" cache && >> - test_sparse_match test-tool read-cache --table >> + git -C sparse-index ls-files --sparse >cache && >> + ! grep "^040000 " cache >lines && >> + test_sparse_match git ls-tree -r HEAD >> ' >> >> test_expect_success 'expanded in-memory index matches full index' ' >> init_repos && >> - test_sparse_match test-tool read-cache --expand --table >> + test_sparse_match git ls-tree -r HEAD >> ' >> >> test_expect_success 'status with options' ' >> @@ -394,9 +413,9 @@ test_expect_success 'submodule handling' ' >> test_all_match git commit -m "add submodule" && >> >> # having a submodule prevents "modules" from collapse >> - test-tool -C sparse-index read-cache --table >cache && >> - grep "100644 blob .* modules/a" cache && >> - grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache >> + git -C sparse-index ls-files --sparse >cache && >> + test_index_entry_like sparse-index "100644 %s 0\t%s" "HEAD:modules/a" "modules/a" cache && >> + test_index_entry_like sparse-index "160000 %s 0\t%s" "HEAD:modules/sub" "modules/sub" cache >> ' >> >> test_expect_success 'sparse-index is expanded and converted back' ' >> -- >> 2.31.0.260.g719c683c1d > > I do like the tests and your idea that we can use ls-files to list > whatever entries are in the index, I just think the tests should use > --stage to do that. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 3/5] ls-files: add and use a new --sparse option 2021-03-17 18:27 ` Ævar Arnfjörð Bjarmason @ 2021-03-17 18:44 ` Elijah Newren 0 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-17 18:44 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 17, 2021 at 11:27 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Wed, Mar 17 2021, Elijah Newren wrote: > > > On Wed, Mar 17, 2021 at 6:28 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > >> > >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > >> --- > >> Documentation/git-ls-files.txt | 4 ++ > >> builtin/ls-files.c | 10 ++++- > >> t/t1091-sparse-checkout-builtin.sh | 9 ++-- > >> t/t1092-sparse-checkout-compatibility.sh | 57 ++++++++++++++++-------- > >> 4 files changed, 56 insertions(+), 24 deletions(-) > >> > >> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt > >> index 6d11ab506b..1145e960a4 100644 > >> --- a/Documentation/git-ls-files.txt > >> +++ b/Documentation/git-ls-files.txt > >> @@ -71,6 +71,10 @@ OPTIONS > >> --unmerged:: > >> Show unmerged files in the output (forces --stage) > >> > >> +--sparse:: > >> + Show sparse directories in the output instead of expanding > >> + them (forces --stage) > >> + > >> -k:: > >> --killed:: > >> Show files on the filesystem that need to be removed due > >> diff --git a/builtin/ls-files.c b/builtin/ls-files.c > >> index 4db75351f2..1ebbb63c10 100644 > >> --- a/builtin/ls-files.c > >> +++ b/builtin/ls-files.c > >> @@ -26,6 +26,7 @@ static int show_deleted; > >> static int show_cached; > >> static int show_others; > >> static int show_stage; > >> +static int show_sparse; > >> static int show_unmerged; > >> static int show_resolve_undo; > >> static int show_modified; > >> @@ -639,6 +640,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > >> DIR_SHOW_IGNORED), > >> OPT_BOOL('s', "stage", &show_stage, > >> N_("show staged contents' object name in the output")), > >> + OPT_BOOL(0, "sparse", &show_sparse, > >> + N_("show unexpanded sparse directories in the output")), > >> OPT_BOOL('k', "killed", &show_killed, > >> N_("show files on the filesystem that need to be removed")), > >> OPT_BIT(0, "directory", &dir.flags, > >> @@ -705,12 +708,17 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > >> tag_skip_worktree = "S "; > >> tag_resolve_undo = "U "; > >> } > >> + if (show_sparse) { > >> + prepare_repo_settings(the_repository); > >> + the_repository->settings.command_requires_full_index = 0; > >> + } > >> if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed) > >> require_work_tree = 1; > >> - if (show_unmerged) > >> + if (show_unmerged || show_sparse) > >> /* > >> * There's no point in showing unmerged unless > >> * you also show the stage information. > >> + * The same goes for the --sparse option. > > > > Yuck, haven't you just made --sparse an alias for --stage? Why does > > it need an alias? > > It doesn't, but --unmerged, the one other option which purely modifies > --stage output implies --stage. --unmerged modifies --stage output. --sparse won't. (Maybe it does _now_ because the command doesn't yet support sparse-indexes, but that's a temporary artifact. Long term, there should be no difference in the output.) > So it's in line with existing UI convention in the command, it's > probably better to keep following that than have new options behave > differently. > > But yeah, we could spell out --stage --sparse in the tests. There should not be a --sparse option. The index is _already_ sparse and users had to take multiple steps to make it so; users shouldn't have to repeat themselves with each and every command they ever type when they've created a sparse index that they want sparse behavior. You should just spell it "--stage". > > Was the goal just to get a quick way to make the command run under > > repo->settings.command_requires_full_index = 0 without auditing the > > codepaths? It seems to rely on them having been audited anyway, since > > it just falls back to the code used for --stage, so I don't see how it > > helps. It also suggests the command might do unexpected or weird > > things if run without the --sparse option? If people manually > > configure a sparse-checkout and cone mode AND a sparse-index (it's > > annoying how they have to specify all three instead of having to just > > pass one flag somewhere), then now we also need to force them to > > remember to pass extra flags to random various commands for them to > > operate in a sane manner in their environment?? > > > > I think this is a bad path to go down. > > Those are probably good points, I don't have enough overview of the > whole sparse thing yet to say. > > I just thought it didn't make sense to have a series changing the nature > of the index without corresponding tooling changes to interrogate the > state of the index. That makes sense to me; I agree with you on that point. > > However, if you want to write the necessary tests to make it so that > > ls-files can operate with command_requires_full_index = 0, then I > > think that's useful. If you want to add a special flag so that folks > > in a sparse-checkout-with-cone-mode-with-sparse-index setup want to > > operate densely (i.e. to show what files would be in the index if it > > were fully populated), then I think that's useful. But having > > sparse-yes-with-cone-yes-very-sparse folks need to specify an extra > > flag to commands to get sparse behavior just seems wrong to me. > > Maybe, but what else do you suggest for getting this information out of > the index? Use git ls-files without new options...as I stated here: ... > > I do like the tests and your idea that we can use ls-files to list > > whatever entries are in the index, I just think the tests should use > > --stage to do that. In other words, I think making "git ls-files" the first, or at least one of the first, commands to be modified to behave properly in a sparse-index world is what you should be aiming for, not some new-option-shortcut that'll make no sense long term and persist indefinitely. List the entries in the index: `git ls-files` List the entries in the index with their hash, mode, and stage: `git ls-files --stage` List all the entries that would be in the index if it weren't sparse: `git ls-files --$SOME_NEW_OPTION_NAME` You don't need to implement the --$SOME_NEW_OPTION_NAME yet, of course. We can just note that it's the plan to add it later. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 3/5] ls-files: add and use a new --sparse option 2021-03-17 13:28 ` [RFC/PATCH 3/5] ls-files: add and use a new --sparse option Ævar Arnfjörð Bjarmason 2021-03-17 18:19 ` Elijah Newren @ 2021-03-17 20:43 ` Derrick Stolee 2021-03-24 0:52 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-17 20:43 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, dstolee On 3/17/2021 9:28 AM, Ævar Arnfjörð Bjarmason wrote: > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh I want to learn from your suggested changes to the test, here, so forgive my questions here: > +test_index_entry_like () { > + dir=$1 > + shift > + fmt=$1 > + shift > + rev=$1 > + shift > + entry=$1 > + shift > + file=$1 > + shift Why all the shifts? Why not just use $1, $2, $3,...? My guess is that you want to be able to insert a new parameter in the middle in the future without changing the later numbers, but that seems unlikely, and we could just add the parameter at the end. > + hash=$(git -C "$dir" rev-parse "$rev") && > + printf "$fmt\n" "$hash" "$entry" >expected && > + if grep "$entry" "$file" >line > + then > + test_cmp expected line > + else > + cat cache && > + false > + fi > +} > + > test_expect_success 'sparse-index contents' ' > init_repos && > > - test-tool -C sparse-index read-cache --table >cache && > + git -C sparse-index ls-files --sparse >cache && > for dir in folder1 folder2 x > do > - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > - grep "040000 tree $TREE $dir/" cache \ > - || return 1 > + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 I see how this uses only one line, but it seems like the test_index_entry_like is too generic to make it not a complicated mess of format strings that need to copy over and over again. Perhaps instead it could be a "test_entry_is_tree" and it only passes "$dir" and "cache"? Then we could drop the loop and just have test_entry_is_tree cache folder1 && test_entry_is_tree cache folder2 && test_entry_is_tree cache x && or we could still use the loop, especially when we test for four trees. > - test-tool -C sparse-index read-cache --table >cache && > + git -C sparse-index ls-files --sparse >cache && > for dir in deep/deeper2 folder1 folder2 x > do > - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && > - grep "040000 tree $TREE $dir/" cache \ > - || return 1 > + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 > done && > > + grep 040000 cache >lines && > + test_line_count = 4 lines && > + The point here is to check that no other entries are trees? We know that this number will be _at least_ 4 based on the loop above. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 3/5] ls-files: add and use a new --sparse option 2021-03-17 20:43 ` Derrick Stolee @ 2021-03-24 0:52 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-24 0:52 UTC (permalink / raw) To: Derrick Stolee Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, dstolee On Wed, Mar 17 2021, Derrick Stolee wrote: > On 3/17/2021 9:28 AM, Ævar Arnfjörð Bjarmason wrote: >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > > I want to learn from your suggested changes to the test, here, > so forgive my questions here: > >> +test_index_entry_like () { >> + dir=$1 >> + shift >> + fmt=$1 >> + shift >> + rev=$1 >> + shift >> + entry=$1 >> + shift >> + file=$1 >> + shift > > Why all the shifts? Why not just use $1, $2, $3,...? My > guess is that you want to be able to insert a new parameter > in the middle in the future without changing the later > numbers, but that seems unlikely, and we could just add > the parameter at the end. It's just crappy RFC-quality code. I probably copied some other function and went with it. No good reason. Yeah it's ugly. >> + hash=$(git -C "$dir" rev-parse "$rev") && >> + printf "$fmt\n" "$hash" "$entry" >expected && >> + if grep "$entry" "$file" >line >> + then >> + test_cmp expected line >> + else >> + cat cache && >> + false >> + fi >> +} >> + >> test_expect_success 'sparse-index contents' ' >> init_repos && >> >> - test-tool -C sparse-index read-cache --table >cache && >> + git -C sparse-index ls-files --sparse >cache && >> for dir in folder1 folder2 x >> do >> - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && >> - grep "040000 tree $TREE $dir/" cache \ >> - || return 1 >> + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 > > I see how this uses only one line, but it seems like the > test_index_entry_like is too generic to make it not a > complicated mess of format strings that need to copy > over and over again. > > Perhaps instead it could be a "test_entry_is_tree" > and it only passes "$dir" and "cache"? Then we could drop the loop and > just have > > test_entry_is_tree cache folder1 && > test_entry_is_tree cache folder2 && > test_entry_is_tree cache x && > > or we could still use the loop, especially when we test for four trees. Yeah that sounds good. Personally I don't mind 4x similar lines copy/pasted over a for-loop in the tests. You don't need to worry about the || return doing the right thing, and just setting up the for-loop is already 3 lines... >> - test-tool -C sparse-index read-cache --table >cache && >> + git -C sparse-index ls-files --sparse >cache && >> for dir in deep/deeper2 folder1 folder2 x >> do >> - TREE=$(git -C sparse-index rev-parse HEAD:$dir) && >> - grep "040000 tree $TREE $dir/" cache \ >> - || return 1 >> + test_index_entry_like sparse-index "040000 %s 0\t%s" "HEAD:$dir" "$dir/" cache || return 1 >> done && >> >> + grep 040000 cache >lines && >> + test_line_count = 4 lines && >> + > > The point here is to check that no other entries are trees? We know > that this number will be _at least_ 4 based on the loop above. It's exactly 4 because we have 4 folders we're checking. But you tell me. I was just trying to refactor this dependence on the ls-tree format while moving it over to ls-files without spending too much time on understanding all the specifics. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [RFC/PATCH 4/5] test-tool read-cache: --table is redundant to ls-files 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2021-03-17 13:28 ` [RFC/PATCH 3/5] ls-files: add and use a new --sparse option Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 5/5] test-tool: split up test-tool read-cache Ævar Arnfjörð Bjarmason 5 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/helper/test-read-cache.c | 43 -------------------------------------- 1 file changed, 43 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index b52c174acc..2499999af3 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -1,54 +1,16 @@ #include "test-tool.h" #include "cache.h" #include "config.h" -#include "blob.h" -#include "commit.h" -#include "tree.h" -#include "sparse-index.h" - -static void print_cache_entry(struct cache_entry *ce) -{ - const char *type; - printf("%06o ", ce->ce_mode & 0177777); - - if (S_ISSPARSEDIR(ce->ce_mode)) - type = tree_type; - else if (S_ISGITLINK(ce->ce_mode)) - type = commit_type; - else - type = blob_type; - - printf("%s %s\t%s\n", - type, - oid_to_hex(&ce->oid), - ce->name); -} - -static void print_cache(struct index_state *istate) -{ - int i; - for (i = 0; i < istate->cache_nr; i++) - print_cache_entry(istate->cache[i]); -} int cmd__read_cache(int argc, const char **argv) { struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; - int table = 0, expand = 0; - - initialize_the_repository(); - prepare_repo_settings(r); - r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) continue; - if (!strcmp(*argv, "--table")) - table = 1; - else if (!strcmp(*argv, "--expand")) - expand = 1; } if (argc == 1) @@ -59,9 +21,6 @@ int cmd__read_cache(int argc, const char **argv) for (i = 0; i < cnt; i++) { repo_read_index(r); - if (expand) - ensure_full_index(r->index); - if (name) { int pos; @@ -74,8 +33,6 @@ int cmd__read_cache(int argc, const char **argv) ce_uptodate(r->index->cache[pos]) ? "" : " not"); write_file(name, "%d\n", i); } - if (table) - print_cache(r->index); discard_index(r->index); } return 0; -- 2.31.0.260.g719c683c1d ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [RFC/PATCH 5/5] test-tool: split up test-tool read-cache 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2021-03-17 13:28 ` [RFC/PATCH 4/5] test-tool read-cache: --table is redundant to ls-files Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:32 ` Ævar Arnfjörð Bjarmason 5 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:28 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee Since the "test-tool read-cache" was originally added back in 1ecb5ff141 (read-cache: add simple performance test, 2013-06-09) it's been growing all sorts of bells and whistles that aren't very conducive to performance testing the index, e.g. it learned how to read config. Let's split what remains of the "test-tool read-cache" into the two narrow use-cases it's used for. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Makefile | 3 ++- t/helper/test-read-cache-again.c | 31 +++++++++++++++++++++++++ t/helper/test-read-cache-perf.c | 21 +++++++++++++++++ t/helper/test-read-cache.c | 39 -------------------------------- t/helper/test-tool.c | 3 ++- t/helper/test-tool.h | 3 ++- t/perf/p0002-read-cache.sh | 2 +- t/t7519-status-fsmonitor.sh | 2 +- 8 files changed, 60 insertions(+), 44 deletions(-) create mode 100644 t/helper/test-read-cache-again.c create mode 100644 t/helper/test-read-cache-perf.c delete mode 100644 t/helper/test-read-cache.c diff --git a/Makefile b/Makefile index 89b1d53741..a1bbb818d9 100644 --- a/Makefile +++ b/Makefile @@ -724,7 +724,8 @@ TEST_BUILTINS_OBJS += test-prio-queue.o TEST_BUILTINS_OBJS += test-proc-receive.o TEST_BUILTINS_OBJS += test-progress.o TEST_BUILTINS_OBJS += test-reach.o -TEST_BUILTINS_OBJS += test-read-cache.o +TEST_BUILTINS_OBJS += test-read-cache-again.o +TEST_BUILTINS_OBJS += test-read-cache-perf.o TEST_BUILTINS_OBJS += test-read-graph.o TEST_BUILTINS_OBJS += test-read-midx.o TEST_BUILTINS_OBJS += test-ref-store.o diff --git a/t/helper/test-read-cache-again.c b/t/helper/test-read-cache-again.c new file mode 100644 index 0000000000..5e20ca1c8f --- /dev/null +++ b/t/helper/test-read-cache-again.c @@ -0,0 +1,31 @@ +#include "test-tool.h" +#include "cache.h" + +int cmd__read_cache_again(int argc, const char **argv) +{ + struct repository *r = the_repository; + int cnt; + const char *name; + + if (argc != 2) + die("usage: test-tool read-cache-again <count> <file>"); + + cnt = strtol(argv[0], NULL, 0); + name = argv[2]; + + setup_git_directory(); + while (cnt--) { + int pos; + repo_read_index(r); + refresh_index(r->index, REFRESH_QUIET, + NULL, NULL, NULL); + pos = index_name_pos(r->index, name, strlen(name)); + if (pos < 0) + die("%s not in index", name); + printf("%s is%s up to date\n", name, + ce_uptodate(r->index->cache[pos]) ? "" : " not"); + write_file(name, "%d\n", cnt); + discard_index(r->index); + } + return 0; +} diff --git a/t/helper/test-read-cache-perf.c b/t/helper/test-read-cache-perf.c new file mode 100644 index 0000000000..ac9c297efa --- /dev/null +++ b/t/helper/test-read-cache-perf.c @@ -0,0 +1,21 @@ +#include "test-tool.h" +#include "cache.h" + +int cmd__read_cache_perf(int argc, const char **argv) +{ + struct repository *r = the_repository; + int cnt = 1000; + + if (argc == 1) + cnt = strtol(argv[0], NULL, 0); + else if (argc) + die("usage: test-tool read-cache-perf [<count>]"); + + setup_git_directory(); + while (cnt--) { + repo_read_index(r); + discard_index(r->index); + } + + return 0; +} diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c deleted file mode 100644 index 2499999af3..0000000000 --- a/t/helper/test-read-cache.c +++ /dev/null @@ -1,39 +0,0 @@ -#include "test-tool.h" -#include "cache.h" -#include "config.h" - -int cmd__read_cache(int argc, const char **argv) -{ - struct repository *r = the_repository; - int i, cnt = 1; - const char *name = NULL; - - for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { - if (skip_prefix(*argv, "--print-and-refresh=", &name)) - continue; - } - - if (argc == 1) - cnt = strtol(argv[0], NULL, 0); - setup_git_directory(); - git_config(git_default_config, NULL); - - for (i = 0; i < cnt; i++) { - repo_read_index(r); - - if (name) { - int pos; - - refresh_index(r->index, REFRESH_QUIET, - NULL, NULL, NULL); - pos = index_name_pos(r->index, name, strlen(name)); - if (pos < 0) - die("%s not in index", name); - printf("%s is%s up to date\n", name, - ce_uptodate(r->index->cache[pos]) ? "" : " not"); - write_file(name, "%d\n", i); - } - discard_index(r->index); - } - return 0; -} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index f97cd9f48a..1334fa25ba 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -52,7 +52,8 @@ static struct test_cmd cmds[] = { { "proc-receive", cmd__proc_receive}, { "progress", cmd__progress }, { "reach", cmd__reach }, - { "read-cache", cmd__read_cache }, + { "read-cache-again", cmd__read_cache_again }, + { "read-cache-perf", cmd__read_cache_perf }, { "read-graph", cmd__read_graph }, { "read-midx", cmd__read_midx }, { "ref-store", cmd__ref_store }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 28072c0ad5..d70cde8574 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -41,7 +41,8 @@ int cmd__prio_queue(int argc, const char **argv); int cmd__proc_receive(int argc, const char **argv); int cmd__progress(int argc, const char **argv); int cmd__reach(int argc, const char **argv); -int cmd__read_cache(int argc, const char **argv); +int cmd__read_cache_again(int argc, const char **argv); +int cmd__read_cache_perf(int argc, const char **argv); int cmd__read_graph(int argc, const char **argv); int cmd__read_midx(int argc, const char **argv); int cmd__ref_store(int argc, const char **argv); diff --git a/t/perf/p0002-read-cache.sh b/t/perf/p0002-read-cache.sh index cdd105a594..d0ba5173fb 100755 --- a/t/perf/p0002-read-cache.sh +++ b/t/perf/p0002-read-cache.sh @@ -8,7 +8,7 @@ test_perf_default_repo count=1000 test_perf "read_cache/discard_cache $count times" " - test-tool read-cache $count + test-tool read-cache-perf $count " test_done diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index 45d025f960..3761a8781d 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -359,7 +359,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' test_expect_success 'discard_index() also discards fsmonitor info' ' test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" && test_might_fail git update-index --refresh && - test-tool read-cache --print-and-refresh=tracked 2 >actual && + test-tool read-cache-again 2 tracked >actual && printf "tracked is%s up to date\n" "" " not" >expect && test_cmp expect actual ' -- 2.31.0.260.g719c683c1d ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [RFC/PATCH 5/5] test-tool: split up test-tool read-cache 2021-03-17 13:28 ` [RFC/PATCH 5/5] test-tool: split up test-tool read-cache Ævar Arnfjörð Bjarmason @ 2021-03-17 13:32 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:32 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, dstolee On Wed, Mar 17 2021, Ævar Arnfjörð Bjarmason wrote: > + if (argc != 2) > + die("usage: test-tool read-cache-again <count> <file>"); > + > + cnt = strtol(argv[0], NULL, 0); > + name = argv[2]; This is needed on top, the perils of sending out ad-hoc RFC patches from the working tree..: diff --git a/t/helper/test-read-cache-again.c b/t/helper/test-read-cache-again.c index 5e20ca1c8f..aa97b3aaf3 100644 --- a/t/helper/test-read-cache-again.c +++ b/t/helper/test-read-cache-again.c @@ -7,10 +7,9 @@ int cmd__read_cache_again(int argc, const char **argv) int cnt; const char *name; - if (argc != 2) + if (argc != 3) die("usage: test-tool read-cache-again <count> <file>"); - - cnt = strtol(argv[0], NULL, 0); + cnt = strtol(argv[1], NULL, 0); name = argv[2]; setup_git_directory(); ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 08/20] test-tool: don't force full index 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget ` (15 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will use 'test-tool read-cache --table' to check that a sparse index is written as part of init_repos. Since we will no longer always expand a sparse index into a full index, add an '--expand' parameter that adds a call to ensure_full_index() so we can compare a sparse index directly against a full index, or at least what the in-memory index looks like when expanded in this way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 13 ++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 6cfd8f2de71c..b52c174acc7a 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -4,6 +4,7 @@ #include "blob.h" #include "commit.h" #include "tree.h" +#include "sparse-index.h" static void print_cache_entry(struct cache_entry *ce) { @@ -35,13 +36,19 @@ int cmd__read_cache(int argc, const char **argv) struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; - int table = 0; + int table = 0, expand = 0; + + initialize_the_repository(); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) continue; if (!strcmp(*argv, "--table")) table = 1; + else if (!strcmp(*argv, "--expand")) + expand = 1; } if (argc == 1) @@ -51,6 +58,10 @@ int cmd__read_cache(int argc, const char **argv) for (i = 0; i < cnt; i++) { repo_read_index(r); + + if (expand) + ensure_full_index(r->index); + if (name) { int pos; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index de5d8461c993..a1aea141c62c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -130,6 +130,11 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'expanded in-memory index matches full index' ' + init_repos && + test_sparse_match test-tool read-cache --expand --table +' + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 09/20] unpack-trees: ensure full index 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget ` (14 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index eb8fcda31ba7..2da3e5ec77a1 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1570,6 +1570,7 @@ static int verify_absent(const struct cache_entry *, */ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { + struct repository *repo = the_repository; int i, ret; static struct cache_entry *dfc; struct pattern_list pl; @@ -1581,6 +1582,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options trace_performance_enter(); trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) { + ensure_full_index(o->src_index); + ensure_full_index(o->dst_index); + } + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 10/20] sparse-checkout: hold pattern list in index 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget ` (13 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 17 ++++++++++------- cache.h | 2 ++ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 2306a9ad98e0..e00b82af727b 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl) if (is_index_unborn(r->index)) return UPDATE_SPARSITY_SUCCESS; + r->index->sparse_checkout_patterns = pl; + memset(&o, 0, sizeof(o)); o.verbose_update = isatty(2); o.update = 1; @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl) else rollback_lock_file(&lock_file); + r->index->sparse_checkout_patterns = NULL; return result; } @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) { int result; int changed_config = 0; - struct pattern_list pl; - memset(&pl, 0, sizeof(pl)); + struct pattern_list *pl = xcalloc(1, sizeof(*pl)); switch (m) { case ADD: if (core_sparse_checkout_cone) - add_patterns_cone_mode(argc, argv, &pl); + add_patterns_cone_mode(argc, argv, pl); else - add_patterns_literal(argc, argv, &pl); + add_patterns_literal(argc, argv, pl); break; case REPLACE: - add_patterns_from_input(&pl, argc, argv); + add_patterns_from_input(pl, argc, argv); break; } @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) changed_config = 1; } - result = write_patterns_and_update(&pl); + result = write_patterns_and_update(pl); if (result && changed_config) set_config(MODE_NO_PATTERNS); - clear_pattern_list(&pl); + clear_pattern_list(pl); + free(pl); return result; } diff --git a/cache.h b/cache.h index abb00a068e5d..759ca92e2ecc 100644 --- a/cache.h +++ b/cache.h @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode) struct split_index; struct untracked_cache; struct progress; +struct pattern_list; struct index_state { struct cache_entry **cache; @@ -338,6 +339,7 @@ struct index_state { struct mem_pool *ce_mem_pool; struct progress *progress; struct repository *repo; + struct pattern_list *sparse_checkout_patterns; }; /* Name hashing */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 11/20] sparse-index: convert from full to sparse 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-17 13:43 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget ` (12 subsequent siblings) 23 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 3 + cache.h | 2 + read-cache.c | 26 ++++- sparse-index.c | 139 +++++++++++++++++++++++ sparse-index.h | 1 + t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- 6 files changed, 228 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 2fb483d3c083..5f07a39e501e 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -6,6 +6,7 @@ #include "object-store.h" #include "replace-object.h" #include "promisor-remote.h" +#include "sparse-index.h" #ifndef DEBUG_CACHE_TREE #define DEBUG_CACHE_TREE 0 @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + ensure_full_index(istate); + if (!istate->cache_tree) istate->cache_tree = cache_tree(); diff --git a/cache.h b/cache.h index 759ca92e2ecc..69a32146cd77 100644 --- a/cache.h +++ b/cache.h @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) { if (S_ISLNK(mode)) return S_IFLNK; + if (mode == S_IFDIR) + return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; return S_IFREG | ce_permissions(mode); diff --git a/read-cache.c b/read-cache.c index dd3980c12b53..b9c08773466c 100644 --- a/read-cache.c +++ b/read-cache.c @@ -25,6 +25,7 @@ #include "fsmonitor.h" #include "thread-utils.h" #include "progress.h" +#include "sparse-index.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode) c = *path++; if ((c == '.' && !verify_dotfile(path, mode)) || - is_dir_sep(c) || c == '\0') + is_dir_sep(c)) return 0; + /* + * allow terminating directory separators for + * sparse directory entries. + */ + if (c == '\0') + return S_ISDIR(mode); } else if (c == '\\' && protect_ntfs) { if (is_ntfs_dotgit(path)) return 0; @@ -3079,6 +3086,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l unsigned flags) { int ret; + int was_full = !istate->sparse_index; + + ret = convert_to_sparse(istate); + + if (ret) { + warning(_("failed to convert to a sparse-index")); + return ret; + } /* * TODO trace2: replace "the_repository" with the actual repo instance @@ -3090,6 +3105,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l trace2_region_leave_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; if (flags & COMMIT_LOCK) @@ -3180,9 +3198,10 @@ static int write_shared_index(struct index_state *istate, struct tempfile **temp) { struct split_index *si = istate->split_index; - int ret; + int ret, was_full = !istate->sparse_index; move_cache_to_base_index(istate); + convert_to_sparse(istate); trace2_region_enter_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); @@ -3190,6 +3209,9 @@ static int write_shared_index(struct index_state *istate, trace2_region_leave_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; ret = adjust_shared_perm(get_tempfile_path(*temp)); diff --git a/sparse-index.c b/sparse-index.c index 7095378a1b28..619ff7c2e217 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -4,6 +4,145 @@ #include "tree.h" #include "pathspec.h" #include "trace2.h" +#include "cache-tree.h" +#include "config.h" +#include "dir.h" +#include "fsmonitor.h" + +static struct cache_entry *construct_sparse_dir_entry( + struct index_state *istate, + const char *sparse_dir, + struct cache_tree *tree) +{ + struct cache_entry *de; + + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); + + de->ce_flags |= CE_SKIP_WORKTREE; + return de; +} + +/* + * Returns the number of entries "inserted" into the index. + */ +static int convert_to_sparse_rec(struct index_state *istate, + int num_converted, + int start, int end, + const char *ct_path, size_t ct_pathlen, + struct cache_tree *ct) +{ + int i, can_convert = 1; + int start_converted = num_converted; + enum pattern_match_result match; + int dtype; + struct strbuf child_path = STRBUF_INIT; + struct pattern_list *pl = istate->sparse_checkout_patterns; + + /* + * Is the current path outside of the sparse cone? + * Then check if the region can be replaced by a sparse + * directory entry (everything is sparse and merged). + */ + match = path_matches_pattern_list(ct_path, ct_pathlen, + NULL, &dtype, pl, istate); + if (match != NOT_MATCHED) + can_convert = 0; + + for (i = start; can_convert && i < end; i++) { + struct cache_entry *ce = istate->cache[i]; + + if (ce_stage(ce) || + !(ce->ce_flags & CE_SKIP_WORKTREE)) + can_convert = 0; + } + + if (can_convert) { + struct cache_entry *se; + se = construct_sparse_dir_entry(istate, ct_path, ct); + + istate->cache[num_converted++] = se; + return 1; + } + + for (i = start; i < end; ) { + int count, span, pos = -1; + const char *base, *slash; + struct cache_entry *ce = istate->cache[i]; + + /* + * Detect if this is a normal entry outside of any subtree + * entry. + */ + base = ce->name + ct_pathlen; + slash = strchr(base, '/'); + + if (slash) + pos = cache_tree_subtree_pos(ct, base, slash - base); + + if (pos < 0) { + istate->cache[num_converted++] = ce; + i++; + continue; + } + + strbuf_setlen(&child_path, 0); + strbuf_add(&child_path, ce->name, slash - ce->name + 1); + + span = ct->down[pos]->cache_tree->entry_count; + count = convert_to_sparse_rec(istate, + num_converted, i, i + span, + child_path.buf, child_path.len, + ct->down[pos]->cache_tree); + num_converted += count; + i += span; + } + + strbuf_release(&child_path); + return num_converted - start_converted; +} + +int convert_to_sparse(struct index_state *istate) +{ + if (istate->split_index || istate->sparse_index || + !core_apply_sparse_checkout || !core_sparse_checkout_cone) + return 0; + + /* + * For now, only create a sparse index with the + * GIT_TEST_SPARSE_INDEX environment variable. We will relax + * this once we have a proper way to opt-in (and later still, + * opt-out). + */ + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + return 0; + + if (!istate->sparse_checkout_patterns) { + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) + return 0; + } + + if (!istate->sparse_checkout_patterns->use_cone_patterns) { + warning(_("attempting to use sparse-index without cone mode")); + return -1; + } + + if (cache_tree_update(istate, 0)) { + warning(_("unable to update cache-tree, staying full")); + return -1; + } + + remove_fsmonitor(istate); + + trace2_region_enter("index", "convert_to_sparse", istate->repo); + istate->cache_nr = convert_to_sparse_rec(istate, + 0, 0, istate->cache_nr, + "", 0, istate->cache_tree); + istate->drop_cache_tree = 1; + istate->sparse_index = 1; + trace2_region_leave("index", "convert_to_sparse", istate->repo); + return 0; +} static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { diff --git a/sparse-index.h b/sparse-index.h index 09a20d036c46..64380e121d80 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -3,5 +3,6 @@ struct index_state; void ensure_full_index(struct index_state *istate); +int convert_to_sparse(struct index_state *istate); #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index a1aea141c62c..1e888d195122 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,6 +2,11 @@ test_description='compare full workdir to sparse workdir' +# The verify_cache_tree() check is not sparse-aware (yet). +# So, disable the check until that integration is complete. +GIT_TEST_CHECK_CACHE_TREE=0 +GIT_TEST_SPLIT_INDEX=0 + . ./test-lib.sh test_expect_success 'setup' ' @@ -121,7 +126,9 @@ run_on_all () { test_all_match () { run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && - test_cmp full-checkout-err sparse-checkout-err + test_cmp full-checkout-out sparse-index-out && + test_cmp full-checkout-err sparse-checkout-err && + test_cmp full-checkout-err sparse-index-err } test_sparse_match () { @@ -130,6 +137,38 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'sparse-index contents' ' + init_repos && + + test-tool -C sparse-index read-cache --table >cache && + for dir in folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep/deeper2 folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done +' + test_expect_success 'expanded in-memory index matches full index' ' init_repos && test_sparse_match test-tool read-cache --expand --table @@ -137,6 +176,7 @@ test_expect_success 'expanded in-memory index matches full index' ' test_expect_success 'status with options' ' init_repos && + test_sparse_match ls && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -273,6 +313,17 @@ test_expect_failure 'checkout and reset (mixed)' ' test_all_match git reset update-folder2 ' +# Ensure that sparse-index behaves identically to +# sparse-checkout with a full index. +test_expect_success 'checkout and reset (mixed) [sparse]' ' + init_repos && + + test_sparse_match git checkout -b reset-test update-deep && + test_sparse_match git reset deepest && + test_sparse_match git reset update-folder1 && + test_sparse_match git reset update-folder2 +' + test_expect_success 'merge' ' init_repos && @@ -309,14 +360,20 @@ test_expect_success 'clean' ' test_all_match git status --porcelain=v2 && test_all_match git clean -f && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xdf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && - test_path_is_dir sparse-checkout/folder1 + test_sparse_match test_path_is_dir folder1 ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 11/20] sparse-index: convert from full to sparse 2021-03-16 16:42 ` [PATCH v3 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-03-17 13:43 ` Ævar Arnfjörð Bjarmason 2021-03-17 19:55 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:43 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: > diff --git a/cache-tree.c b/cache-tree.c > index 2fb483d3c083..5f07a39e501e 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -6,6 +6,7 @@ > #include "object-store.h" > #include "replace-object.h" > #include "promisor-remote.h" > +#include "sparse-index.h" > > #ifndef DEBUG_CACHE_TREE > #define DEBUG_CACHE_TREE 0 > @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) > if (i) > return i; > > + ensure_full_index(istate); > + > if (!istate->cache_tree) > istate->cache_tree = cache_tree(); > > diff --git a/cache.h b/cache.h > index 759ca92e2ecc..69a32146cd77 100644 > --- a/cache.h > +++ b/cache.h > @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) > { > if (S_ISLNK(mode)) > return S_IFLNK; > + if (mode == S_IFDIR) > + return S_IFDIR; Does this actually need to be mode == S_IFDIR v.s. S_ISDIR(mode)? Those aren't the same thing... > if (S_ISDIR(mode) || S_ISGITLINK(mode)) > return S_IFGITLINK; ...and if it can be S_ISDIR(mode) then this becomes just S_ISGITLINK(mode), but losing the "if" there makes me suspect that some dir == submodule heuristic is being broken somewhere.. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 11/20] sparse-index: convert from full to sparse 2021-03-17 13:43 ` Ævar Arnfjörð Bjarmason @ 2021-03-17 19:55 ` Derrick Stolee 2021-03-18 13:38 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-17 19:55 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/17/2021 9:43 AM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: >> @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) >> { >> if (S_ISLNK(mode)) >> return S_IFLNK; >> + if (mode == S_IFDIR) >> + return S_IFDIR; > > Does this actually need to be mode == S_IFDIR v.s. S_ISDIR(mode)? Those > aren't the same thing... > >> if (S_ISDIR(mode) || S_ISGITLINK(mode)) >> return S_IFGITLINK; > > ...and if it can be S_ISDIR(mode) then this becomes just > S_ISGITLINK(mode), but losing the "if" there makes me suspect that some > dir == submodule heuristic is being broken somewhere.. I have a vague recollection that I did that at one point, and it didn't work. However, using the simpler if (S_ISDIR(mode)) return S_IFDIR; if (S_ISGITLINK(mode)) return S_IFGITLINK; passes all of my tests. Looking at the history of create_ce_mode(), this "||" condition was created in this commit: commit 9eec4795d44439cd170fb52c73827c728252648d Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Mon Apr 9 21:14:58 2007 -0700 Add "S_IFDIRLNK" file mode infrastructure for git links This just adds the basic helper functions to recognize and work with git tree entries that are links to other git repositories ("subprojects"). They still aren't actually connected up to any of the code-paths, but now all the infrastructure is in place. The next commit will start actually adding actual subproject support. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> There isn't any justification of why S_ISDIR() is there. Perhaps it was defensive programming? If that is the case, then this simpler logic will work. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 11/20] sparse-index: convert from full to sparse 2021-03-17 19:55 ` Derrick Stolee @ 2021-03-18 13:38 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-18 13:38 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/17/2021 3:55 PM, Derrick Stolee wrote: > On 3/17/2021 9:43 AM, Ævar Arnfjörð Bjarmason wrote: >> >> On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: >>> @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) >>> { >>> if (S_ISLNK(mode)) >>> return S_IFLNK; >>> + if (mode == S_IFDIR) >>> + return S_IFDIR; >> >> Does this actually need to be mode == S_IFDIR v.s. S_ISDIR(mode)? Those >> aren't the same thing... >> >>> if (S_ISDIR(mode) || S_ISGITLINK(mode)) >>> return S_IFGITLINK; >> >> ...and if it can be S_ISDIR(mode) then this becomes just >> S_ISGITLINK(mode), but losing the "if" there makes me suspect that some >> dir == submodule heuristic is being broken somewhere.. > > I have a vague recollection that I did that at one point, and > it didn't work. However, using the simpler > > if (S_ISDIR(mode)) > return S_IFDIR; > if (S_ISGITLINK(mode)) > return S_IFGITLINK; > > passes all of my tests. I'm not sure why it was passing yesterday (maybe I was in the wrong worktree) but I _do_ get failures, such as this one in t2105: expecting success of 2105.4 'add gitlink to relative .git file': git update-index --add -- sub2 + git update-index --add -- sub2 warning: index entry is a directory, but not sparse (00000000) error: Could not read 50e526bb426771f6036ad3a8b0c81d511d91fc2a BUG: read-cache.c:324: unsupported ce_mode: 40000 Aborted (core dumped) error: last command exited with $?=134 not ok 4 - add gitlink to relative .git file # # git update-index --add -- sub2 # In this case, the mode that is specified is equal to 040775, so we need to use the permission bits outside of __S_IFMT (0170000) to determine if this is a sparse directory or a submodule entry. Submodules will never be sparse, so permissions matter. Sparse directories never actually exist, so permissions don't matter. Playing around with it, I still only see the exact equality as working for me. I can, however, use this format for these if statements: if (S_ISSPARSEDIR(mode)) return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; The S_ISSPARSEDIR macro expands to the exact equality. Now, if we intended to make this work differently, then a change would be required to construct_sparse_dir_entry() in sparse-index.c: static struct cache_entry *construct_sparse_dir_entry( struct index_state *istate, const char *sparse_dir, struct cache_tree *tree) { struct cache_entry *de; de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); de->ce_flags |= CE_SKIP_WORKTREE; return de; } For instance, we could at this point assign de->ce_mode to be S_IFDIR directly. It seems like the wrong place to do that to me, but I'm open to suggestions. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 12/20] submodule: sparse-index should not collapse links 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget ` (11 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- sparse-index.c | 1 + t/t1092-sparse-checkout-compatibility.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/sparse-index.c b/sparse-index.c index 619ff7c2e217..7631f7bd00b7 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -52,6 +52,7 @@ static int convert_to_sparse_rec(struct index_state *istate, struct cache_entry *ce = istate->cache[i]; if (ce_stage(ce) || + S_ISGITLINK(ce->ce_mode) || !(ce->ce_flags & CE_SKIP_WORKTREE)) can_convert = 0; } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 1e888d195122..cba5f89b1e96 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -376,4 +376,21 @@ test_expect_success 'clean' ' test_sparse_match test_path_is_dir folder1 ' +test_expect_success 'submodule handling' ' + init_repos && + + test_all_match mkdir modules && + test_all_match touch modules/a && + test_all_match git add modules && + test_all_match git commit -m "add modules directory" && + + run_on_all git submodule add "$(pwd)/initial-repo" modules/sub && + test_all_match git commit -m "add submodule" && + + # having a submodule prevents "modules" from collapse + test-tool -C sparse-index read-cache --table >cache && + grep "100644 blob .* modules/a" cache && + grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 13/20] unpack-trees: allow sparse directories 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (11 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-17 13:35 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 23 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 2da3e5ec77a1..e81d82d72d89 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -749,9 +749,12 @@ static int index_pos_by_traverse_info(struct name_entry *names, strbuf_make_traverse_path(&name, info, names->path, names->pathlen); strbuf_addch(&name, '/'); pos = index_name_pos(o->src_index, name.buf, name.len); - if (pos >= 0) - BUG("This is a directory and should not exist in index"); - pos = -pos - 1; + if (pos >= 0) { + if (!o->src_index->sparse_index || + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) + BUG("This is a directory and should not exist in index"); + } else + pos = -pos - 1; if (pos >= o->src_index->cache_nr || !starts_with(o->src_index->cache[pos]->name, name.buf) || (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 13/20] unpack-trees: allow sparse directories 2021-03-16 16:42 ` [PATCH v3 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-03-17 13:35 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-17 13:35 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Tue, Mar 16 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > The index_pos_by_traverse_info() currently throws a BUG() when a > directory entry exists exactly in the index. We need to consider that it > is possible to have a directory in a sparse index as long as that entry > is itself marked with the skip-worktree bit. > > The 'pos' variable is assigned a negative value if an exact match is not > found. Since a directory name can be an exact match, it is no longer an > error to have a nonnegative 'pos' value. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > unpack-trees.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/unpack-trees.c b/unpack-trees.c > index 2da3e5ec77a1..e81d82d72d89 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -749,9 +749,12 @@ static int index_pos_by_traverse_info(struct name_entry *names, > strbuf_make_traverse_path(&name, info, names->path, names->pathlen); > strbuf_addch(&name, '/'); > pos = index_name_pos(o->src_index, name.buf, name.len); > - if (pos >= 0) > - BUG("This is a directory and should not exist in index"); > - pos = -pos - 1; > + if (pos >= 0) { > + if (!o->src_index->sparse_index || > + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) > + BUG("This is a directory and should not exist in index"); > + } else > + pos = -pos - 1; Style nit: add {}'s to the "else" once the "if" gets one. > if (pos >= o->src_index->cache_nr || > !starts_with(o->src_index->cache[pos]->name, name.buf) || > (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v3 14/20] sparse-index: check index conversion happens 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (12 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a test case that uses test_region to ensure that we are truly expanding a sparse index to a full one, then converting back to sparse when writing the index. As we integrate more Git commands with the sparse index, we will convert these commands to check that we do _not_ convert the sparse index to a full index and instead stay sparse the entire time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index cba5f89b1e96..47f983217852 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -393,4 +393,22 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +test_expect_success 'sparse-index is expanded and converted back' ' + init_repos && + + ( + GIT_TEST_SPARSE_INDEX=1 && + export GIT_TEST_SPARSE_INDEX && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 15/20] sparse-index: create extension for compatibility 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (13 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Previously, we enabled the sparse index format only using GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to actually select this mode. Further, sparse directory entries are not understood by the index formats as advertised. We _could_ add a new index version that explicitly adds these capabilities, but there are nuances to index formats 2, 3, and 4 that are still valuable to select as options. Until we add index format version 5, create a repo extension, "extensions.sparseIndex", that specifies that the tool reading this repository must understand sparse directory entries. This change only encodes the extension and enables it when GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI mechanism. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/config/extensions.txt | 8 ++++++ cache.h | 1 + repo-settings.c | 7 ++++++ repository.h | 3 ++- setup.c | 3 +++ sparse-index.c | 38 +++++++++++++++++++++++++---- 6 files changed, 54 insertions(+), 6 deletions(-) diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 4e23d73cdcad..c02e09af0046 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -6,3 +6,11 @@ extensions.objectFormat:: Note that this setting should only be set by linkgit:git-init[1] or linkgit:git-clone[1]. Trying to change it after initialization will not work and will produce hard-to-diagnose issues. + +extensions.sparseIndex:: + When combined with `core.sparseCheckout=true` and + `core.sparseCheckoutCone=true`, the index may contain entries + corresponding to directories outside of the sparse-checkout + definition in lieu of containing each path under such directories. + Versions of Git that do not understand this extension do not + expect directory entries in the index. diff --git a/cache.h b/cache.h index 69a32146cd77..4ca6cd7f782c 100644 --- a/cache.h +++ b/cache.h @@ -1059,6 +1059,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int sparse_index; char *work_tree; struct string_list unknown_extensions; struct string_list v1_only_extensions; diff --git a/repo-settings.c b/repo-settings.c index d63569e4041e..9677d50f9238 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r) * removed. */ r->settings.command_requires_full_index = 1; + + /* + * Initialize this as off. + */ + r->settings.sparse_index = 0; + if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) + r->settings.sparse_index = 1; } diff --git a/repository.h b/repository.h index e06a23015697..a45f7520fd9e 100644 --- a/repository.h +++ b/repository.h @@ -42,7 +42,8 @@ struct repo_settings { int core_multi_pack_index; - unsigned command_requires_full_index:1; + unsigned command_requires_full_index:1, + sparse_index:1; }; struct repository { diff --git a/setup.c b/setup.c index c04cd25a30df..cd8394564613 100644 --- a/setup.c +++ b/setup.c @@ -500,6 +500,9 @@ static enum extension_result handle_extension(const char *var, return error("invalid value for 'extensions.objectformat'"); data->hash_algo = format; return EXTENSION_OK; + } else if (!strcmp(ext, "sparseindex")) { + data->sparse_index = 1; + return EXTENSION_OK; } return EXTENSION_UNKNOWN; } diff --git a/sparse-index.c b/sparse-index.c index 7631f7bd00b7..3a6df66faeab 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -102,19 +102,47 @@ static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; } +static int enable_sparse_index(struct repository *repo) +{ + const char *config_path = repo_git_path(repo, "config.worktree"); + + if (upgrade_repository_format(1) < 0) { + warning(_("unable to upgrade repository format to enable sparse-index")); + return -1; + } + git_config_set_in_file_gently(config_path, + "extensions.sparseIndex", + "true"); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 1; + return 0; +} + int convert_to_sparse(struct index_state *istate) { if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; + if (!istate->repo) + istate->repo = the_repository; + + /* + * The GIT_TEST_SPARSE_INDEX environment variable triggers the + * extensions.sparseIndex config variable to be on. + */ + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { + int err = enable_sparse_index(istate->repo); + if (err < 0) + return err; + } + /* - * For now, only create a sparse index with the - * GIT_TEST_SPARSE_INDEX environment variable. We will relax - * this once we have a proper way to opt-in (and later still, - * opt-out). + * Only convert to sparse if extensions.sparseIndex is set. */ - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + prepare_repo_settings(istate->repo); + if (!istate->repo->settings.sparse_index) return 0; if (!istate->sparse_checkout_patterns) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (14 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 ` Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:42 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/git-sparse-checkout.txt | 14 +++++++ builtin/sparse-checkout.c | 17 ++++++++- sparse-index.c | 37 +++++++++++++------ sparse-index.h | 3 ++ t/t1092-sparse-checkout-compatibility.sh | 47 +++++++++++++----------- 5 files changed, 84 insertions(+), 34 deletions(-) diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02ee3..2ff66c5a4e41 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not +understand the `sparseIndex` repository extension and may fail to interact +with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e00b82af727b..ca63e2c64e95 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -14,6 +14,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "quote.h" +#include "sparse-index.h" static const char *empty_base = ""; @@ -283,12 +284,13 @@ static int set_config(enum sparse_checkout_mode mode) } static char const * const builtin_sparse_checkout_init_usage[] = { - N_("git sparse-checkout init [--cone]"), + N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"), NULL }; static struct sparse_checkout_init_opts { int cone_mode; + int sparse_index; } init_opts; static int sparse_checkout_init(int argc, const char **argv) @@ -303,11 +305,15 @@ static int sparse_checkout_init(int argc, const char **argv) static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, N_("initialize the sparse-checkout in cone mode")), + OPT_BOOL(0, "sparse-index", &init_opts.sparse_index, + N_("toggle the use of a sparse index")), OPT_END(), }; repo_read_index(the_repository); + init_opts.sparse_index = -1; + argc = parse_options(argc, argv, NULL, builtin_sparse_checkout_init_options, builtin_sparse_checkout_init_usage, 0); @@ -326,6 +332,15 @@ static int sparse_checkout_init(int argc, const char **argv) sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL); + if (init_opts.sparse_index >= 0) { + if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0) + die(_("failed to modify sparse-index config")); + + /* force an index rewrite */ + repo_read_index(the_repository); + the_repository->index->updated_workdir = 1; + } + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); diff --git a/sparse-index.c b/sparse-index.c index 3a6df66faeab..30c1a11fd62d 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -104,23 +104,37 @@ static int convert_to_sparse_rec(struct index_state *istate, static int enable_sparse_index(struct repository *repo) { - const char *config_path = repo_git_path(repo, "config.worktree"); + int res; if (upgrade_repository_format(1) < 0) { warning(_("unable to upgrade repository format to enable sparse-index")); return -1; } - git_config_set_in_file_gently(config_path, - "extensions.sparseIndex", - "true"); + res = git_config_set_gently("extensions.sparseindex", "true"); prepare_repo_settings(repo); repo->settings.sparse_index = 1; - return 0; + return res; +} + +int set_sparse_index_config(struct repository *repo, int enable) +{ + int res; + + if (enable) + return enable_sparse_index(repo); + + /* Don't downgrade repository format, just remove the extension. */ + res = git_config_set_gently("extensions.sparseindex", NULL); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 0; + return res; } int convert_to_sparse(struct index_state *istate) { + int test_env; if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ -129,14 +143,13 @@ int convert_to_sparse(struct index_state *istate) istate->repo = the_repository; /* - * The GIT_TEST_SPARSE_INDEX environment variable triggers the - * extensions.sparseIndex config variable to be on. + * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex + * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), + * then purposefully disable the setting. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); - if (err < 0) - return err; - } + test_env = git_env_bool("GIT_TEST_SPARSE_INDEX", -1); + if (test_env >= 0) + set_sparse_index_config(istate->repo, test_env); /* * Only convert to sparse if extensions.sparseIndex is set. diff --git a/sparse-index.h b/sparse-index.h index 64380e121d80..39dcc859735e 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,4 +5,7 @@ struct index_state; void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); +struct repository; +int set_sparse_index_config(struct repository *repo, int enable); + #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 47f983217852..f14dc48924d2 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -6,6 +6,7 @@ test_description='compare full workdir to sparse workdir' # So, disable the check until that integration is complete. GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= . ./test-lib.sh @@ -100,25 +101,26 @@ init_repos () { # initialize sparse-checkout definitions git -C sparse-checkout sparse-checkout init --cone && git -C sparse-checkout sparse-checkout set deep && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && + test_cmp_config -C sparse-index true extensions.sparseindex && + git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) && ( cd sparse-index && - GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err + "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -148,7 +150,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + git -C sparse-index sparse-checkout set folder1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep folder2 x @@ -158,7 +160,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + git -C sparse-index sparse-checkout set deep/deeper1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x @@ -166,7 +168,14 @@ test_expect_success 'sparse-index contents' ' TREE=$(git -C sparse-index rev-parse HEAD:$dir) && grep "040000 tree $TREE $dir/" cache \ || return 1 - done + done && + + # Disabling the sparse-index removes tree entries with full ones + git -C sparse-index sparse-checkout init --no-sparse-index && + + test-tool -C sparse-index read-cache --table >cache && + ! grep "040000 tree" cache && + test_sparse_match test-tool read-cache --table ' test_expect_success 'expanded in-memory index matches full index' ' @@ -396,19 +405,15 @@ test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && - ( - GIT_TEST_SPARSE_INDEX=1 && - export GIT_TEST_SPARSE_INDEX && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && - test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && - - rm trace2.txt && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt - ) + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 17/20] sparse-checkout: disable sparse-index 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (15 preceding siblings ...) 2021-03-16 16:42 ` [PATCH v3 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 ` Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We use 'git sparse-checkout init --cone --sparse-index' to toggle the sparse-index feature. It makes sense to also disable it when running 'git sparse-checkout disable'. This is particularly important because it removes the extensions.sparseIndex config option, allowing other tools to use this Git repository again. This does mean that 'git sparse-checkout init' will not re-enable the sparse-index feature, even if it was previously enabled. While testing this feature, I noticed that the sparse-index was not being written on the first run, but by a second. This was caught by the call to 'test-tool read-cache --table'. This requires adjusting some assignments to core_apply_sparse_checkout and pl.use_cone_patterns in the sparse_checkout_init() logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 10 +++++++++- t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index ca63e2c64e95..585343fa1972 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode) "core.sparseCheckoutCone", mode == MODE_CONE_PATTERNS ? "true" : NULL); + if (mode == MODE_NO_PATTERNS) + set_sparse_index_config(the_repository, 0); + return 0; } @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv) the_repository->index->updated_workdir = 1; } + core_apply_sparse_checkout = 1; + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); - core_apply_sparse_checkout = 1; return update_working_directory(NULL); } @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv) add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); strbuf_addstr(&pattern, "!/*/"); add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); + pl.use_cone_patterns = init_opts.cone_mode; return write_patterns_and_update(&pl); } @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv) strbuf_addstr(&match_all, "/*"); add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); + prepare_repo_settings(the_repository); + the_repository->settings.sparse_index = 0; + if (update_working_directory(&pl)) die(_("error while refreshing working directory")); diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index fc64e9ed99f4..ff1ad570a255 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' ' check_files repo a deep folder1 folder2 ' +test_expect_success 'sparse-index enabled and disabled' ' + git -C repo sparse-checkout init --cone --sparse-index && + test_cmp_config -C repo true extensions.sparseIndex && + test-tool -C repo read-cache --table >cache && + grep " tree " cache && + + git -C repo sparse-checkout disable && + test-tool -C repo read-cache --table >cache && + ! grep " tree " cache && + git -C repo config --list >config && + ! grep extensions.sparseindex config +' + test_expect_success 'cone mode: init and set' ' git -C repo sparse-checkout init --cone && git -C repo config --list >config && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 18/20] cache-tree: integrate with sparse directory entries 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (16 preceding siblings ...) 2021-03-16 16:43 ` [PATCH v3 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 ` Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 18 ++++++++++++++++++ sparse-index.c | 10 +++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 5f07a39e501e..950a9615db8f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it, *skip_count = 0; + /* + * If the first entry of this region is a sparse directory + * entry corresponding exactly to 'base', then this cache_tree + * struct is a "leaf" in the data structure, pointing to the + * tree OID specified in the entry. + */ + if (entries > 0) { + const struct cache_entry *ce = cache[0]; + + if (S_ISSPARSEDIR(ce->ce_mode) && + ce->ce_namelen == baselen && + !strncmp(ce->name, base, baselen)) { + it->entry_count = 1; + oidcpy(&it->oid, &ce->oid); + return 1; + } + } + if (0 <= it->entry_count && has_object_file(&it->oid)) return it->entry_count; diff --git a/sparse-index.c b/sparse-index.c index 30c1a11fd62d..56313e805d9d 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -180,7 +180,11 @@ int convert_to_sparse(struct index_state *istate) istate->cache_nr = convert_to_sparse_rec(istate, 0, 0, istate->cache_nr, "", 0, istate->cache_tree); - istate->drop_cache_tree = 1; + + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + istate->sparse_index = 1; trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; @@ -281,5 +285,9 @@ void ensure_full_index(struct index_state *istate) strbuf_release(&base); free(full); + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 19/20] sparse-index: loose integration with cache_tree_verify() 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (17 preceding siblings ...) 2021-03-16 16:43 ` [PATCH v3 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 ` Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE is enabled, which it is by default in the test suite. The logic must be adjusted for the presence of these directory entries. For now, leave the test as a simple check for whether the directory entry is sparse. Do not go any further until needed. This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in t1092-sparse-checkout-compatibility.sh. Further, p2000-sparse-operations.sh uses the test suite and hence this is enabled for all tests. We need to integrate with it before we run our performance tests with a sparse-index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 19 +++++++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 3 --- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 950a9615db8f..11bf1fcae6e1 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -808,6 +808,19 @@ int cache_tree_matches_traversal(struct cache_tree *root, return 0; } +static void verify_one_sparse(struct repository *r, + struct index_state *istate, + struct cache_tree *it, + struct strbuf *path, + int pos) +{ + struct cache_entry *ce = istate->cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + BUG("directory '%s' is present in index, but not sparse", + path->buf); +} + static void verify_one(struct repository *r, struct index_state *istate, struct cache_tree *it, @@ -830,6 +843,12 @@ static void verify_one(struct repository *r, if (path->len) { pos = index_name_pos(istate, path->buf, path->len); + + if (pos >= 0) { + verify_one_sparse(r, istate, it, path, pos); + return; + } + pos = -pos - 1; } else { pos = 0; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index f14dc48924d2..d97bf9b64527 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,9 +2,6 @@ test_description='compare full workdir to sparse workdir' -# The verify_cache_tree() check is not sparse-aware (yet). -# So, disable the check until that integration is complete. -GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 GIT_TEST_SPARSE_INDEX= -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v3 20/20] p2000: add sparse-index repos 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (18 preceding siblings ...) 2021-03-16 16:43 ` [PATCH v3 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 ` Derrick Stolee via GitGitGadget 2021-03-16 16:59 ` [PATCH v3 00/20] Sparse Index: Design, Format, Tests Derrick Stolee ` (3 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-16 16:43 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> p2000-sparse-operations.sh compares different Git commands in repositories with many files at HEAD but using sparse-checkout to focus on a small portion of those files. Add extra copies of the repository that use the sparse-index format so we can track how that affects the performance of different commands. At this point in time, the sparse-index is 100% overhead from the CPU front, and this is measurable in these tests: Test --------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.59(0.51+0.12) 2000.3: git status (full-index-v4) 0.59(0.52+0.11) 2000.4: git status (sparse-index-v3) 1.40(1.32+0.12) 2000.5: git status (sparse-index-v4) 1.41(1.36+0.08) 2000.6: git add -A (full-index-v3) 2.32(1.97+0.19) 2000.7: git add -A (full-index-v4) 2.17(1.92+0.14) 2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15) 2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13) 2000.10: git add . (full-index-v3) 2.39(2.02+0.20) 2000.11: git add . (full-index-v4) 2.20(1.94+0.16) 2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12) 2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16) 2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20) 2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17) 2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16) 2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15) Note that there is very little difference between the v3 and v4 index formats when the sparse-index is enabled. This is primarily due to the fact that the relative file sizes are the same, and the command time is mostly taken up by parsing tree objects to expand the sparse index into a full one. With the current file layout, the index file sizes are given by this table: | full index | sparse index | +-------------+--------------+ v3 | 108 MiB | 1.6 MiB | v4 | 80 MiB | 1.2 MiB | Future updates will improve the performance of Git commands when the index is sparse. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index 2fbc81b22119..e527316e66d6 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -60,12 +60,29 @@ test_expect_success 'setup repo and indexes' ' git sparse-checkout set $SPARSE_CONE && git config index.version 4 && git update-index --index-version=4 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v3 && + ( + cd sparse-index-v3 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v4 && + ( + cd sparse-index-v4 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 ) ' test_perf_on_all () { command="$@" - for repo in full-index-v3 full-index-v4 + for repo in full-index-v3 full-index-v4 \ + sparse-index-v3 sparse-index-v4 do test_perf "$command ($repo)" " ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v3 00/20] Sparse Index: Design, Format, Tests 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (19 preceding siblings ...) 2021-03-16 16:43 ` [PATCH v3 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget @ 2021-03-16 16:59 ` Derrick Stolee 2021-03-16 21:18 ` Elijah Newren ` (2 subsequent siblings) 23 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-16 16:59 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget, git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On 3/16/2021 12:42 PM, Derrick Stolee via GitGitGadget wrote:> Updates in V3 > ============= > > For this version, I took Ævar's latest patches and applied them to v2.31.0 > and rebased this series on top. It uses his new "read_tree_at()" helper and > the associated changes to the function pointer type. Junio, I wanted to call your attention to this change in base. Here is the relevant part of the range-diff: > 5: 399ddb0bad56 ! 5: 99292cdbaae4 sparse-index: implement ensure_full_index() > @@ sparse-index.c > +} > + > +static int add_path_to_index(const struct object_id *oid, > -+ struct strbuf *base, const char *path, > -+ unsigned int mode, int stage, void *context) > ++ struct strbuf *base, const char *path, > ++ unsigned int mode, void *context) > +{ > + struct index_state *istate = (struct index_state *)context; > + struct cache_entry *ce; > @@ sparse-index.c > - /* intentionally left blank */ > + int i; > + struct index_state *full; > ++ struct strbuf base = STRBUF_INIT; > + > + if (!istate || !istate->sparse_index) > + return; > @@ sparse-index.c > + ps.has_wildcard = 1; > + ps.max_depth = -1; > + > -+ read_tree_recursive(istate->repo, tree, > -+ ce->name, strlen(ce->name), > -+ 0, &ps, > -+ add_path_to_index, full); > ++ strbuf_setlen(&base, 0); > ++ strbuf_add(&base, ce->name, strlen(ce->name)); > ++ > ++ read_tree_at(istate->repo, tree, &base, &ps, > ++ add_path_to_index, full); > + > + /* free directory entries. full entries are re-used */ > + discard_cache_entry(ce); > @@ sparse-index.c > + istate->cache_nr = full->cache_nr; > + istate->cache_alloc = full->cache_alloc; > + > ++ strbuf_release(&base); > + free(full); > + > + trace2_region_leave("index", "ensure_full_index", istate->repo); Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 00/20] Sparse Index: Design, Format, Tests 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (20 preceding siblings ...) 2021-03-16 16:59 ` [PATCH v3 00/20] Sparse Index: Design, Format, Tests Derrick Stolee @ 2021-03-16 21:18 ` Elijah Newren 2021-03-18 21:50 ` Junio C Hamano 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget 23 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-16 21:18 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On Tue, Mar 16, 2021 at 9:43 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. > > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ > > I won't waste too much space here, because PATCH 1 includes a sizeable > design document that describes the feature, the reasoning behind it, and my > plan for getting this implemented widely throughout the codebase. > > There are some new things here that were not in the RFC: > > * Design doc and format updates. (Patch 1) > * Performance test script. (Patches 2 and 20) > > Notably missing in this series from the RFC: > > * The mega-patch inserting ensure_full_index() throughout the codebase. > That will be a follow-up series to this one. > * The integrations with git status and git add to demonstrate the improved > performance. Those will also appear in their own series later. > > I plan to keep my latest work in this area in my 'sparse-index/wip' branch > [2]. It includes all of the work from the RFC right now, updated with the > work from this series. > > [2] https://github.com/derrickstolee/git/tree/sparse-index/wip > > > Updates in V3 > ============= > > For this version, I took Ævar's latest patches and applied them to v2.31.0 > and rebased this series on top. It uses his new "read_tree_at()" helper and > the associated changes to the function pointer type. > > * Fixed more typos. Thanks Martin and Elijah! > * Updated the test_sparse_match() macro to use "$@" instead of $* > * Added a test that git sparse-checkout init --no-sparse-index rewrites the > index to be full. I've read through the range-diff. Sorry for not spotting the conflict with Ævar's series (that I also reviewed). Anyway, my Reviewed-by from the last series still holds. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 00/20] Sparse Index: Design, Format, Tests 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (21 preceding siblings ...) 2021-03-16 21:18 ` Elijah Newren @ 2021-03-18 21:50 ` Junio C Hamano 2021-03-19 13:00 ` Derrick Stolee 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget 23 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-18 21:50 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > For this version, I took Ævar's latest patches and applied them to v2.31.0 > and rebased this series on top. It uses his new "read_tree_at()" helper and > the associated changes to the function pointer type. > > * Fixed more typos. Thanks Martin and Elijah! > * Updated the test_sparse_match() macro to use "$@" instead of $* > * Added a test that git sparse-checkout init --no-sparse-index rewrites the > index to be full. Thanks. I expect ab/read-tree would be rerolled at least one more time, if only to straighten out the "oops #5 was screwy, let's patch it up on top with three more steps", but I do not expect the end state would be all that different, so tentatively I'll queue these patches on top of the latest iteration of the topic for now and hope that the other topic will be updated soonish. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v3 00/20] Sparse Index: Design, Format, Tests 2021-03-18 21:50 ` Junio C Hamano @ 2021-03-19 13:00 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-19 13:00 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On 3/18/2021 5:50 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> For this version, I took Ævar's latest patches and applied them to v2.31.0 >> and rebased this series on top. It uses his new "read_tree_at()" helper and >> the associated changes to the function pointer type. >> >> * Fixed more typos. Thanks Martin and Elijah! >> * Updated the test_sparse_match() macro to use "$@" instead of $* >> * Added a test that git sparse-checkout init --no-sparse-index rewrites the >> index to be full. > > Thanks. I expect ab/read-tree would be rerolled at least one more > time, if only to straighten out the "oops #5 was screwy, let's patch > it up on top with three more steps", but I do not expect the end > state would be all that different, so tentatively I'll queue these > patches on top of the latest iteration of the topic for now and > hope that the other topic will be updated soonish. Thanks. I'm grateful that it can spend some time in 'seen' if only to avoid these conflicts in the meantime. I'm waiting for that reroll of ab/read-tree before updating this version with the feedback from v3. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v4 00/20] Sparse Index: Design, Format, Tests 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget ` (22 preceding siblings ...) 2021-03-18 21:50 ` Junio C Hamano @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget ` (21 more replies) 23 siblings, 22 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee Here is the first full patch series submission coming out of the sparse-index RFC [1]. [1] https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ I won't waste too much space here, because PATCH 1 includes a sizeable design document that describes the feature, the reasoning behind it, and my plan for getting this implemented widely throughout the codebase. There are some new things here that were not in the RFC: * Design doc and format updates. (Patch 1) * Performance test script. (Patches 2 and 20) Notably missing in this series from the RFC: * The mega-patch inserting ensure_full_index() throughout the codebase. That will be a follow-up series to this one. * The integrations with git status and git add to demonstrate the improved performance. Those will also appear in their own series later. I plan to keep my latest work in this area in my 'sparse-index/wip' branch [2]. It includes all of the work from the RFC right now, updated with the work from this series. [2] https://github.com/derrickstolee/git/tree/sparse-index/wip Updates in V4 ============= * Rebased onto the latest copy of ab/read-tree. * Updated the design document as per Junio's comments. * Updated the submodule handling in the performance test. * Followed up on some other review from Ævar, mostly style or commit message things. Updates in V3 ============= For this version, I took Ævar's latest patches and applied them to v2.31.0 and rebased this series on top. It uses his new "read_tree_at()" helper and the associated changes to the function pointer type. * Fixed more typos. Thanks Martin and Elijah! * Updated the test_sparse_match() macro to use "$@" instead of $* * Added a test that git sparse-checkout init --no-sparse-index rewrites the index to be full. Updates in V2 ============= * Various typos and awkward grammar is fixed. * Cleaned up unnecessary commands in p2000-sparse-operations.sh * Added a comment to the sparse_index member of struct index_state. * Used tree_type, commit_type, and blob_type in test-read-cache.c. Thanks, -Stolee Derrick Stolee (20): sparse-index: design doc and format update t/perf: add performance test for sparse operations t1092: clean up script quoting sparse-index: add guard to ensure full index sparse-index: implement ensure_full_index() t1092: compare sparse-checkout to sparse-index test-read-cache: print cache entries with --table test-tool: don't force full index unpack-trees: ensure full index sparse-checkout: hold pattern list in index sparse-index: convert from full to sparse submodule: sparse-index should not collapse links unpack-trees: allow sparse directories sparse-index: check index conversion happens sparse-index: create extension for compatibility sparse-checkout: toggle sparse index from builtin sparse-checkout: disable sparse-index cache-tree: integrate with sparse directory entries sparse-index: loose integration with cache_tree_verify() p2000: add sparse-index repos Documentation/config/extensions.txt | 8 + Documentation/git-sparse-checkout.txt | 14 ++ Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 174 ++++++++++++++ Makefile | 1 + builtin/sparse-checkout.c | 44 +++- cache-tree.c | 40 ++++ cache.h | 18 +- read-cache.c | 35 ++- repo-settings.c | 15 ++ repository.c | 11 +- repository.h | 3 + setup.c | 3 + sparse-index.c | 293 +++++++++++++++++++++++ sparse-index.h | 11 + t/README | 3 + t/helper/test-read-cache.c | 66 ++++- t/perf/p2000-sparse-operations.sh | 101 ++++++++ t/t1091-sparse-checkout-builtin.sh | 13 + t/t1092-sparse-checkout-compatibility.sh | 143 +++++++++-- unpack-trees.c | 17 +- 21 files changed, 980 insertions(+), 40 deletions(-) create mode 100644 Documentation/technical/sparse-index.txt create mode 100644 sparse-index.c create mode 100644 sparse-index.h create mode 100755 t/perf/p2000-sparse-operations.sh base-commit: 47957485b3b731a7860e0554d2bd12c0dce1c75a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/883 Range-diff vs v3: 1: 62ac13945bec ! 1: 6426a5c60e53 sparse-index: design doc and format update @@ Documentation/technical/index-format.txt: Git index format + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. -+ These entries have mode `0040000`, include the `SKIP_WORKTREE` bit, and ++ These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed @@ Documentation/technical/sparse-index.txt (new) +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + -+Three important scale dimensions for a Git worktree are: ++Three important scale dimensions for a Git working directory are: + +* `HEAD`: How many files are present at `HEAD`? + @@ Documentation/technical/sparse-index.txt (new) + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. -+These dimensions are also ordered by how expensive they are per item: it -+is expensive to detect a modified file than it is to write one that we -+know must be populated; changing `HEAD` only really requires updating the -+index. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has @@ Documentation/technical/sparse-index.txt (new) +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and -+see only files. In addition, they expect to see all files at `HEAD`. One -+way to handle this is to parse trees to replace a sparse-directory entry -+with all of the files within that tree as the index is loaded. However, -+parsing trees is slower than parsing the index format, so that is a slower -+operation than if we left the index alone. ++see only files. In fact, these loops expect to see a reference to every ++staged file. One way to handle this is to parse trees to replace a ++sparse-directory entry with all of the files within that tree as the index ++is loaded. However, parsing trees is slower than parsing the index format, ++so that is a slower operation than if we left the index alone. The plan is ++to make all of these integrations "sparse aware" so this expansion through ++tree parsing is unnecessary and they use fewer resources than when using a ++full index. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to @@ Documentation/technical/sparse-index.txt (new) +data structure can operate with its current assumption of every file at +`HEAD`. + -+At first, every index parse will expand the sparse-directory entries into -+the full list of paths at `HEAD`. This will be slower in all cases. The -+only noticable change in behavior will be that the serialized index file -+contains sparse-directory entries. ++At first, every index parse will call a helper method, ++`ensure_full_index()`, which scans the index for sparse-directory entries ++(pointing to trees) and replaces them with the full list of paths (with ++blob contents) by parsing tree objects. This will be slower in all cases. ++The only noticeable change in behavior will be that the serialized index ++file contains sparse-directory entries. + +To start, we use a new repository extension, `extensions.sparseIndex`, to +allow inserting sparse-directory entries into indexes with file format 2: d2197e895e4d ! 2: 7eabc1d0586c t/perf: add performance test for sparse operations @@ t/perf/p2000-sparse-operations.sh (new) + +test_expect_success 'setup repo and indexes' ' + git reset --hard HEAD && ++ + # Remove submodules from the example repo, because our -+ # duplication of the entire repo creates an unlikly data shape. -+ git config --file .gitmodules --get-regexp "submodule.*.path" >modules && -+ git rm -f .gitmodules && -+ for module in $(awk "{print \$2}" modules) -+ do -+ git rm $module || return 1 -+ done && -+ git commit -m "remove submodules" && ++ # duplication of the entire repo creates an unlikely data shape. ++ if git config --file .gitmodules --get-regexp "submodule.*.path" >modules ++ then ++ git rm $(awk "{print \$2}" modules) && ++ git commit -m "remove submodules" || return 1 ++ fi && + + echo bogus >a && + cp a b && 3: d3cfd34b8418 ! 3: c9e21d78ecba t1092: clean up script quoting @@ Commit message t1092: clean up script quoting This test was introduced in 19a0acc83e4 (t1092: test interesting - sparse-checkout scenarios, 2021-01-23), but these issues with quoting - were not noticed until starting this follow-up series. The old mechanism - would drop quoting such as in + sparse-checkout scenarios, 2021-01-23), but it contains issues with quoting + that were not noticed until starting this follow-up series. The old + mechanism would drop quoting such as in test_all_match git commit -m "touch README.md" 4: 4472118cf903 = 4: 03cdde756563 sparse-index: add guard to ensure full index 5: 99292cdbaae4 = 5: 6b3b6d86385d sparse-index: implement ensure_full_index() 6: fae5663a17bb = 6: 7f67adba0498 t1092: compare sparse-checkout to sparse-index 7: dffe8821fde2 ! 7: 7ebd9570b1ad test-read-cache: print cache entries with --table @@ Commit message a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. - This does not print the stat() information for the blobs. That could be + This does not print the stat() information for the blobs. That will be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. + However, this future need for full index information justifies the need + for this test helper over extending a user-facing feature, such as 'git + ls-files'. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. 8: f4ad081f25bb = 8: db7bbd06dbcc test-tool: don't force full index 9: 4780076a50df = 9: 3ddd5e794b5e unpack-trees: ensure full index 10: 33fdba2b8cfd = 10: 7308c87697f1 sparse-checkout: hold pattern list in index 11: e41b14e03ebb ! 11: 7c10d653ca6b sparse-index: convert from full to sparse @@ cache.h: static inline unsigned int create_ce_mode(unsigned int mode) { if (S_ISLNK(mode)) return S_IFLNK; -+ if (mode == S_IFDIR) ++ if (S_ISSPARSEDIR(mode)) + return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; 12: b77cd6b02265 = 12: 6db36f33e960 submodule: sparse-index should not collapse links 13: 4000c5cdd4cf ! 13: d24bd3348d98 unpack-trees: allow sparse directories @@ unpack-trees.c: static int index_pos_by_traverse_info(struct name_entry *names, + if (!o->src_index->sparse_index || + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) + BUG("This is a directory and should not exist in index"); -+ } else ++ } else { + pos = -pos - 1; ++ } if (pos >= o->src_index->cache_nr || !starts_with(o->src_index->cache[pos]->name, name.buf) || (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) 14: 1a2be38b2ca7 = 14: 08d9f5f3c0d1 sparse-index: check index conversion happens 15: f89891b0ae4e = 15: 6f38cef196b0 sparse-index: create extension for compatibility 16: bd703c76c859 = 16: 923081e7e079 sparse-checkout: toggle sparse index from builtin 17: 598557f90a2a = 17: 6f1ad72c390d sparse-checkout: disable sparse-index 18: c2d0c17db31a = 18: bd94e6b7d089 cache-tree: integrate with sparse directory entries 19: 6fdd9323c14e = 19: e7190376b806 sparse-index: loose integration with cache_tree_verify() 20: 3db06ac46dd5 = 20: bcf0a58eb38c p2000: add sparse-index repos -- gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v4 01/20] sparse-index: design doc and format update 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-26 20:29 ` SZEDER Gábor 2021-03-23 13:44 ` [PATCH v4 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget ` (20 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This begins a long effort to update the index format to allow sparse directory entries. This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in their sparse-checkout definition. Currently, the index format is only updated in the presence of extensions.sparseIndex instead of increasing a file format version number. This is temporary, and index v5 is part of the plan for future work in this area. The design document details many of the reasons for embarking on this work, and also the plan for completing it safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 174 +++++++++++++++++++++++ 2 files changed, 181 insertions(+) create mode 100644 Documentation/technical/sparse-index.txt diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index d363a71c37ec..3b74c05647db 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 000000000000..62f6dc225a44 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,174 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git working directory are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In fact, these loops expect to see a reference to every +staged file. One way to handle this is to parse trees to replace a +sparse-directory entry with all of the files within that tree as the index +is loaded. However, parsing trees is slower than parsing the index format, +so that is a slower operation than if we left the index alone. The plan is +to make all of these integrations "sparse aware" so this expansion through +tree parsing is unnecessary and they use fewer resources than when using a +full index. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will call a helper method, +`ensure_full_index()`, which scans the index for sparse-directory entries +(pointing to trees) and replaces them with the full list of paths (with +blob contents) by parsing tree objects. This will be slower in all cases. +The only noticeable change in behavior will be that the serialized index +file contains sparse-directory entries. + +To start, we use a new repository extension, `extensions.sparseIndex`, to +allow inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, but it also prevents other +operations that do not use the index at all. A new format, index v5, will +be introduced that includes sparse-directory entries by default. It might +also introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. After these guards are in place, we can begin +leaving sparse-directory entries in the in-memory index structure. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we +may enable the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v4 01/20] sparse-index: design doc and format update 2021-03-23 13:44 ` [PATCH v4 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-26 20:29 ` SZEDER Gábor 2021-03-28 1:47 ` Junio C Hamano 0 siblings, 1 reply; 203+ messages in thread From: SZEDER Gábor @ 2021-03-26 20:29 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee On Tue, Mar 23, 2021 at 01:44:09PM +0000, Derrick Stolee via GitGitGadget wrote: > Currently, the index format is only updated in the presence of > extensions.sparseIndex instead of increasing a file format version > number. This is temporary, and index v5 is part of the plan for future > work in this area. > diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt > new file mode 100644 > index 000000000000..62f6dc225a44 > --- /dev/null > +++ b/Documentation/technical/sparse-index.txt > +To start, we use a new repository extension, `extensions.sparseIndex`, to > +allow inserting sparse-directory entries into indexes with file format > +versions 2, 3, and 4. This prevents Git versions that do not understand > +the sparse-index from operating on one, but it also prevents other > +operations that do not use the index at all. Why is this not a non-optional index extension? That would allow older Git versions and other implementations not understanding sparse directory entries to still perform any operation that doesn't involve the index. More importantly, that would prevent older Git versions and other implementations not understanding repository extensions from potentially wreaking havoc when the index contains sparse directory entries. Notably JGit's current version (5.11.0.202103091610-r) does still completely ignore repository extensions, and e.g. happily attempts any operations on a SHA256 repository, so it would do the same in the presence of 'extensions.sparseIndex' as well. JGit does respect non-optional index extensions, see e.g. 87a6bb701a (t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX, 2018-05-10). This really should be a non-optional index extension. > A new format, index v5, will > +be introduced that includes sparse-directory entries by default. It might > +also introduce other features that have been considered for improving the > +index, as well. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 01/20] sparse-index: design doc and format update 2021-03-26 20:29 ` SZEDER Gábor @ 2021-03-28 1:47 ` Junio C Hamano 2021-03-29 14:32 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-28 1:47 UTC (permalink / raw) To: SZEDER Gábor Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, Derrick Stolee, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee SZEDER Gábor <szeder.dev@gmail.com> writes: >> +To start, we use a new repository extension, `extensions.sparseIndex`, to >> +allow inserting sparse-directory entries into indexes with file format >> +versions 2, 3, and 4. This prevents Git versions that do not understand >> +the sparse-index from operating on one, but it also prevents other >> +operations that do not use the index at all. > > Why is this not a non-optional index extension? ... > This really should be a non-optional index extension. Yeah, the index extension mechanism was designed with optional and required kinds because we wanted to support exactly a use case like this one. Thanks for pointing it out. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 01/20] sparse-index: design doc and format update 2021-03-28 1:47 ` Junio C Hamano @ 2021-03-29 14:32 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-29 14:32 UTC (permalink / raw) To: Junio C Hamano, SZEDER Gábor Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee On 3/27/2021 9:47 PM, Junio C Hamano wrote: > SZEDER Gábor <szeder.dev@gmail.com> writes: > >>> +To start, we use a new repository extension, `extensions.sparseIndex`, to >>> +allow inserting sparse-directory entries into indexes with file format >>> +versions 2, 3, and 4. This prevents Git versions that do not understand >>> +the sparse-index from operating on one, but it also prevents other >>> +operations that do not use the index at all. >> >> Why is this not a non-optional index extension? ... >> This really should be a non-optional index extension. > > Yeah, the index extension mechanism was designed with optional and > required kinds because we wanted to support exactly a use case like > this one. > > Thanks for pointing it out. Ok, so let me be sure I understand the request, as I believe it is a very good one: Using a REQUIRED index extension that says "this index has sparse-directory entries" will allow tools that don't touch the index to be compatible with repos using the sparse-index, while also avoiding a new index version. I'll work on this right away. Thanks! -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v4 02/20] t/perf: add performance test for sparse operations 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget ` (19 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Create a test script that takes the default performance test (the Git codebase) and multiplies it by 256 using four layers of duplicated trees of width four. This results in nearly one million blob entries in the index. Then, we can clone this repository with sparse-checkout patterns that demonstrate four copies of the initial repository. Each clone will use a different index format or mode so peformance can be tested across the different options. Note that the initial repo is stripped of submodules before doing the copies. This preserves the expected data shape of the sparse index, because directories containing submodules are not collapsed to a sparse directory entry. Run a few Git commands on these clones, especially those that use the index (status, add, commit). Here are the results on my Linux machine: Test -------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.37(0.30+0.09) 2000.3: git status (full-index-v4) 0.39(0.32+0.10) 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) It is perhaps noteworthy that there is an improvement when using index version 4. This is because the v3 index uses 108 MiB while the v4 index uses 80 MiB. Since the repeated portions of the directories are very short (f3/f1/f2, for example) this ratio is less pronounced than in similarly-sized real repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 84 +++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100755 t/perf/p2000-sparse-operations.sh diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh new file mode 100755 index 000000000000..dddd527b6330 --- /dev/null +++ b/t/perf/p2000-sparse-operations.sh @@ -0,0 +1,84 @@ +#!/bin/sh + +test_description="test performance of Git operations using the index" + +. ./perf-lib.sh + +test_perf_default_repo + +SPARSE_CONE=f2/f4/f1 + +test_expect_success 'setup repo and indexes' ' + git reset --hard HEAD && + + # Remove submodules from the example repo, because our + # duplication of the entire repo creates an unlikely data shape. + if git config --file .gitmodules --get-regexp "submodule.*.path" >modules + then + git rm $(awk "{print \$2}" modules) && + git commit -m "remove submodules" || return 1 + fi && + + echo bogus >a && + cp a b && + git add a b && + git commit -m "level 0" && + BLOB=$(git rev-parse HEAD:a) && + OLD_COMMIT=$(git rev-parse HEAD) && + OLD_TREE=$(git rev-parse HEAD^{tree}) && + + for i in $(test_seq 1 4) + do + cat >in <<-EOF && + 100755 blob $BLOB a + 040000 tree $OLD_TREE f1 + 040000 tree $OLD_TREE f2 + 040000 tree $OLD_TREE f3 + 040000 tree $OLD_TREE f4 + EOF + NEW_TREE=$(git mktree <in) && + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && + OLD_TREE=$NEW_TREE && + OLD_COMMIT=$NEW_COMMIT || return 1 + done && + + git sparse-checkout init --cone && + git branch -f wide $OLD_COMMIT && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && + ( + cd full-index-v3 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && + ( + cd full-index-v4 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 + ) +' + +test_perf_on_all () { + command="$@" + for repo in full-index-v3 full-index-v4 + do + test_perf "$command ($repo)" " + ( + cd $repo && + echo >>$SPARSE_CONE/a && + $command + ) + " + done +} + +test_perf_on_all git status +test_perf_on_all git add -A +test_perf_on_all git add . +test_perf_on_all git commit -a -m A + +test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 03/20] t1092: clean up script quoting 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget ` (18 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This test was introduced in 19a0acc83e4 (t1092: test interesting sparse-checkout scenarios, 2021-01-23), but it contains issues with quoting that were not noticed until starting this follow-up series. The old mechanism would drop quoting such as in test_all_match git commit -m "touch README.md" The above happened to work because README.md is a file in the repository, so 'git commit -m touch REAMDE.md' would succeed by accident. Other cases included quoting for no good reason, so clean that up now. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 8cd3e5a8d227..3725d3997e70 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -96,20 +96,20 @@ init_repos () { run_on_sparse () { ( cd sparse-checkout && - $* >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) } run_on_all () { ( cd full-checkout && - $* >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && - run_on_sparse $* + run_on_sparse "$@" } test_all_match () { - run_on_all $* && + run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && test_cmp full-checkout-err sparse-checkout-err } @@ -119,7 +119,7 @@ test_expect_success 'status with options' ' test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && - run_on_all "touch README.md" && + run_on_all touch README.md && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -135,7 +135,7 @@ test_expect_success 'add, commit, checkout' ' write_script edit-contents <<-\EOF && echo text >>$1 EOF - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add README.md && test_all_match git status --porcelain=v2 && @@ -144,7 +144,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add -A && test_all_match git status --porcelain=v2 && @@ -153,7 +153,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents deep/newfile" && + run_on_all ../edit-contents deep/newfile && test_all_match git status --porcelain=v2 -uno && test_all_match git status --porcelain=v2 && @@ -186,7 +186,7 @@ test_expect_success 'diff --staged' ' write_script edit-contents <<-\EOF && echo text >>README.md EOF - run_on_all "../edit-contents" && + run_on_all ../edit-contents && test_all_match git diff && test_all_match git diff --staged && @@ -280,7 +280,7 @@ test_expect_success 'clean' ' echo bogus >>.gitignore && run_on_all cp ../.gitignore . && test_all_match git add .gitignore && - test_all_match git commit -m ignore-bogus-files && + test_all_match git commit -m "ignore bogus files" && run_on_sparse mkdir folder1 && run_on_all touch folder1/bogus && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 04/20] sparse-index: add guard to ensure full index 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget ` (17 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Makefile | 1 + repo-settings.c | 8 ++++++++ repository.c | 11 ++++++++++- repository.h | 2 ++ sparse-index.c | 8 ++++++++ sparse-index.h | 7 +++++++ 6 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 sparse-index.c create mode 100644 sparse-index.h diff --git a/Makefile b/Makefile index dfb0f1000fa3..89b1d5374107 100644 --- a/Makefile +++ b/Makefile @@ -985,6 +985,7 @@ LIB_OBJS += setup.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-index.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/repo-settings.c b/repo-settings.c index f7fff0f5ab83..d63569e4041e 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r) UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP); UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT); + + /* + * This setting guards all index reads to require a full index + * over a sparse index. After suitable guards are placed in the + * codebase around uses of the index, this setting will be + * removed. + */ + r->settings.command_requires_full_index = 1; } diff --git a/repository.c b/repository.c index c98298acd017..a8acae002f71 100644 --- a/repository.c +++ b/repository.c @@ -10,6 +10,7 @@ #include "object.h" #include "lockfile.h" #include "submodule-config.h" +#include "sparse-index.h" /* The main repository */ static struct repository the_repo; @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo) int repo_read_index(struct repository *repo) { + int res; + if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo) else if (repo->index->repo != repo) BUG("repo's index should point back at itself"); - return read_index_from(repo->index, repo->index_file, repo->gitdir); + res = read_index_from(repo->index, repo->index_file, repo->gitdir); + + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) + ensure_full_index(repo->index); + + return res; } int repo_hold_locked_index(struct repository *repo, diff --git a/repository.h b/repository.h index b385ca3c94b6..e06a23015697 100644 --- a/repository.h +++ b/repository.h @@ -41,6 +41,8 @@ struct repo_settings { enum fetch_negotiation_setting fetch_negotiation_algorithm; int core_multi_pack_index; + + unsigned command_requires_full_index:1; }; struct repository { diff --git a/sparse-index.c b/sparse-index.c new file mode 100644 index 000000000000..82183ead563b --- /dev/null +++ b/sparse-index.c @@ -0,0 +1,8 @@ +#include "cache.h" +#include "repository.h" +#include "sparse-index.h" + +void ensure_full_index(struct index_state *istate) +{ + /* intentionally left blank */ +} diff --git a/sparse-index.h b/sparse-index.h new file mode 100644 index 000000000000..09a20d036c46 --- /dev/null +++ b/sparse-index.h @@ -0,0 +1,7 @@ +#ifndef SPARSE_INDEX_H__ +#define SPARSE_INDEX_H__ + +struct index_state; +void ensure_full_index(struct index_state *istate); + +#endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 05/20] sparse-index: implement ensure_full_index() 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget ` (16 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache.h | 13 ++++++- read-cache.c | 9 +++++ sparse-index.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 118 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index bb317abc91fb..136dd496c95d 100644 --- a/cache.h +++ b/cache.h @@ -204,6 +204,8 @@ struct cache_entry { #error "CE_EXTENDED_FLAGS out of range" #endif +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR) + /* Forward structure decls */ struct pathspec; struct child_process; @@ -319,7 +321,14 @@ struct index_state { drop_cache_tree : 1, updated_workdir : 1, updated_skipworktree : 1, - fsmonitor_has_run_once : 1; + fsmonitor_has_run_once : 1, + + /* + * sparse_index == 1 when sparse-directory + * entries exist. Requires sparse-checkout + * in cone mode. + */ + sparse_index : 1; struct hashmap name_hash; struct hashmap dir_hash; struct object_id oid; @@ -722,6 +731,8 @@ int read_index_from(struct index_state *, const char *path, const char *gitdir); int is_index_unborn(struct index_state *); +void ensure_full_index(struct index_state *istate); + /* For use with `write_locked_index()`. */ #define COMMIT_LOCK (1 << 0) #define SKIP_IF_UNCHANGED (1 << 1) diff --git a/read-cache.c b/read-cache.c index 1e9a50c6c734..dd3980c12b53 100644 --- a/read-cache.c +++ b/read-cache.c @@ -101,6 +101,9 @@ static const char *alternate_index_output; static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { + if (S_ISSPARSEDIR(ce->ce_mode)) + istate->sparse_index = 1; + istate->cache[nr] = ce; add_name_hash(istate, ce); } @@ -2273,6 +2276,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr); + if (!istate->repo) + istate->repo = the_repository; + prepare_repo_settings(istate->repo); + if (istate->repo->settings.command_requires_full_index) + ensure_full_index(istate); + return istate->cache_nr; unmap: diff --git a/sparse-index.c b/sparse-index.c index 82183ead563b..7095378a1b28 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -1,8 +1,104 @@ #include "cache.h" #include "repository.h" #include "sparse-index.h" +#include "tree.h" +#include "pathspec.h" +#include "trace2.h" + +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) +{ + ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc); + + istate->cache[nr] = ce; + add_name_hash(istate, ce); +} + +static int add_path_to_index(const struct object_id *oid, + struct strbuf *base, const char *path, + unsigned int mode, void *context) +{ + struct index_state *istate = (struct index_state *)context; + struct cache_entry *ce; + size_t len = base->len; + + if (S_ISDIR(mode)) + return READ_TREE_RECURSIVE; + + strbuf_addstr(base, path); + + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); + ce->ce_flags |= CE_SKIP_WORKTREE; + set_index_entry(istate, istate->cache_nr++, ce); + + strbuf_setlen(base, len); + return 0; +} void ensure_full_index(struct index_state *istate) { - /* intentionally left blank */ + int i; + struct index_state *full; + struct strbuf base = STRBUF_INIT; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + trace2_region_enter("index", "ensure_full_index", istate->repo); + + /* initialize basics of new index */ + full = xcalloc(1, sizeof(struct index_state)); + memcpy(full, istate, sizeof(struct index_state)); + + /* then change the necessary things */ + full->sparse_index = 0; + full->cache_alloc = (3 * istate->cache_alloc) / 2; + full->cache_nr = 0; + ALLOC_ARRAY(full->cache, full->cache_alloc); + + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + struct tree *tree; + struct pathspec ps; + + if (!S_ISSPARSEDIR(ce->ce_mode)) { + set_index_entry(full, full->cache_nr++, ce); + continue; + } + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) + warning(_("index entry is a directory, but not sparse (%08x)"), + ce->ce_flags); + + /* recursively walk into cd->name */ + tree = lookup_tree(istate->repo, &ce->oid); + + memset(&ps, 0, sizeof(ps)); + ps.recursive = 1; + ps.has_wildcard = 1; + ps.max_depth = -1; + + strbuf_setlen(&base, 0); + strbuf_add(&base, ce->name, strlen(ce->name)); + + read_tree_at(istate->repo, tree, &base, &ps, + add_path_to_index, full); + + /* free directory entries. full entries are re-used */ + discard_cache_entry(ce); + } + + /* Copy back into original index. */ + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); + istate->sparse_index = 0; + free(istate->cache); + istate->cache = full->cache; + istate->cache_nr = full->cache_nr; + istate->cache_alloc = full->cache_alloc; + + strbuf_release(&base); + free(full); + + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 06/20] t1092: compare sparse-checkout to sparse-index 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (15 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a new 'sparse-index' repo alongside the 'full-checkout' and 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also add run_on_sparse and test_sparse_match helpers. These helpers will be used when the sparse index is implemented. Add the GIT_TEST_SPARSE_INDEX environment variable to enable the sparse-index by default. This can be enabled across all tests, but that will only affect cases where the sparse-checkout feature is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/README | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/t/README b/t/README index 593d4a4e270c..b98bc563aab5 100644 --- a/t/README +++ b/t/README @@ -439,6 +439,9 @@ and "sha256". GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the 'pack.writeReverseIndex' setting. +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the +sparse-index format by default. + Naming Tests ------------ diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 3725d3997e70..de5d8461c993 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' test_expect_success 'setup' ' git init initial-repo && ( + GIT_TEST_SPARSE_INDEX=0 && cd initial-repo && echo a >a && echo "after deep" >e && @@ -87,23 +88,32 @@ init_repos () { cp -r initial-repo sparse-checkout && git -C sparse-checkout reset --hard && - git -C sparse-checkout sparse-checkout init --cone && + + cp -r initial-repo sparse-index && + git -C sparse-index reset --hard && # initialize sparse-checkout definitions - git -C sparse-checkout sparse-checkout set deep + git -C sparse-checkout sparse-checkout init --cone && + git -C sparse-checkout sparse-checkout set deep && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - "$@" >../sparse-checkout-out 2>../sparse-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + ) && + ( + cd sparse-index && + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - "$@" >../full-checkout-out 2>../full-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -114,6 +124,12 @@ test_all_match () { test_cmp full-checkout-err sparse-checkout-err } +test_sparse_match () { + run_on_sparse "$@" && + test_cmp sparse-checkout-out sparse-index-out && + test_cmp sparse-checkout-err sparse-index-err +} + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-24 1:24 ` Ævar Arnfjörð Bjarmason 2021-03-23 13:44 ` [PATCH v4 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget ` (14 subsequent siblings) 21 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This table is helpful for discovering data in the index to ensure it is being written correctly, especially as we build and test the sparse-index. This table includes an output format similar to 'git ls-tree', but should not be compared to that directly. The biggest reasons are that 'git ls-tree' includes a tree entry for every subdirectory, even those that would not appear as a sparse directory in a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. This does not print the stat() information for the blobs. That will be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. However, this future need for full index information justifies the need for this test helper over extending a user-facing feature, such as 'git ls-files'. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. Care must be taken with the final check for the 'cnt' variable. We continue the expectation that the numerical value is the final argument. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 244977a29bdf..6cfd8f2de71c 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -1,36 +1,71 @@ #include "test-tool.h" #include "cache.h" #include "config.h" +#include "blob.h" +#include "commit.h" +#include "tree.h" + +static void print_cache_entry(struct cache_entry *ce) +{ + const char *type; + printf("%06o ", ce->ce_mode & 0177777); + + if (S_ISSPARSEDIR(ce->ce_mode)) + type = tree_type; + else if (S_ISGITLINK(ce->ce_mode)) + type = commit_type; + else + type = blob_type; + + printf("%s %s\t%s\n", + type, + oid_to_hex(&ce->oid), + ce->name); +} + +static void print_cache(struct index_state *istate) +{ + int i; + for (i = 0; i < istate->cache_nr; i++) + print_cache_entry(istate->cache[i]); +} int cmd__read_cache(int argc, const char **argv) { + struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; + int table = 0; - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { - argc--; - argv++; + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { + if (skip_prefix(*argv, "--print-and-refresh=", &name)) + continue; + if (!strcmp(*argv, "--table")) + table = 1; } - if (argc == 2) - cnt = strtol(argv[1], NULL, 0); + if (argc == 1) + cnt = strtol(argv[0], NULL, 0); setup_git_directory(); git_config(git_default_config, NULL); + for (i = 0; i < cnt; i++) { - read_cache(); + repo_read_index(r); if (name) { int pos; - refresh_index(&the_index, REFRESH_QUIET, + refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL); - pos = index_name_pos(&the_index, name, strlen(name)); + pos = index_name_pos(r->index, name, strlen(name)); if (pos < 0) die("%s not in index", name); printf("%s is%s up to date\n", name, - ce_uptodate(the_index.cache[pos]) ? "" : " not"); + ce_uptodate(r->index->cache[pos]) ? "" : " not"); write_file(name, "%d\n", i); } - discard_cache(); + if (table) + print_cache(r->index); + discard_index(r->index); } return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-23 13:44 ` [PATCH v4 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-03-24 1:24 ` Ævar Arnfjörð Bjarmason 2021-03-24 12:33 ` Derrick Stolee 0 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-24 1:24 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > This table is helpful for discovering data in the index to ensure it is > being written correctly, especially as we build and test the > sparse-index. This table includes an output format similar to 'git > ls-tree', but should not be compared to that directly. The biggest > reasons are that 'git ls-tree' includes a tree entry for every > subdirectory, even those that would not appear as a sparse directory in > a sparse-index. Further, 'git ls-tree' does not use a trailing directory > separator for its tree rows. > > This does not print the stat() information for the blobs. That will be > added in a future change with another option. The tests that are added > in the next few changes care only about the object types and IDs. > However, this future need for full index information justifies the need > for this test helper over extending a user-facing feature, such as 'git > ls-files'. Is that stat() information that's going to be essential to grab in the same process that runs the "for (i = 0; i < istate->cache_nr; i++)" for-loop, or stat() information that could be grabbed as: git ls-files -z --stage | some-program-that-stats-all-listed-blobs It's not so much that I still disagree as I feel like I'm missing something. I haven't gone through this topic with a fine toothed comb, so ... If and when these patches land and I'm using this nascent sparse checkout support why wouldn't I want ls-files or another not-a-test-tool to support extracting this new information that's in the index? That's why I sent the RFC patches at https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to roll this functionality into ls-files. Still, I think if there's a good reason for why we want this in the index but never want our plumbing to be able to dump it in some user-facing way I think just as a matter of reviewing this code it would be much simpler if it was in ls-files behind some git_env_bool("GIT_TEST_...") flag or something. Or maybe I'm the only one who spends a lot of time with both ls-files.c and test-read-cache.c open while trying to review this trying to keep track of if and how this helper is and isn't subtly different from ls-files (as my RFC series shows, not really that different at all...). Especially with the really-just-ls-files-plus-one-thing tool mimicking ls-tree output, for reasons I still don't get... > To make the option parsing slightly more robust, wrap the string > comparisons in a loop adapted from test-dir-iterator.c. > > Care must be taken with the final check for the 'cnt' variable. We > continue the expectation that the numerical value is the final argument. I think even if you're set on not having this exposed in some builtin/*.c command this code would be much clearer based on some version of my https://lore.kernel.org/git/20210317132814.30175-6-avarab@gmail.com/ i.e. the part that isn't entirely deleting t/helper/test-read-cache.c, which would survive as t/helper/test-read-cache-sparse.c or something. As that patch shows this code is needlessly convoluted because it's serving 3x wildly different in-tree use-cases. I don't see how the very small amount of de-duplication we're getting is worth the complexity. At that point we don't need any care with the cnt variable, because we're not combining the fsmonitor and perf use-cases of reading the index in some loop with the ls-files-alike. > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- > 1 file changed, 45 insertions(+), 10 deletions(-) > > diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c > index 244977a29bdf..6cfd8f2de71c 100644 > --- a/t/helper/test-read-cache.c > +++ b/t/helper/test-read-cache.c > @@ -1,36 +1,71 @@ > #include "test-tool.h" > #include "cache.h" > #include "config.h" > +#include "blob.h" > +#include "commit.h" > +#include "tree.h" > + > +static void print_cache_entry(struct cache_entry *ce) > +{ > + const char *type; > + printf("%06o ", ce->ce_mode & 0177777); > + > + if (S_ISSPARSEDIR(ce->ce_mode)) > + type = tree_type; > + else if (S_ISGITLINK(ce->ce_mode)) > + type = commit_type; > + else > + type = blob_type; > + > + printf("%s %s\t%s\n", > + type, > + oid_to_hex(&ce->oid), > + ce->name); > +} > + > +static void print_cache(struct index_state *istate) > +{ > + int i; > + for (i = 0; i < istate->cache_nr; i++) > + print_cache_entry(istate->cache[i]); > +} > > int cmd__read_cache(int argc, const char **argv) > { > + struct repository *r = the_repository; > int i, cnt = 1; > const char *name = NULL; > + int table = 0; > > - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { > - argc--; > - argv++; > + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { > + if (skip_prefix(*argv, "--print-and-refresh=", &name)) > + continue; > + if (!strcmp(*argv, "--table")) > + table = 1; > } > > - if (argc == 2) > - cnt = strtol(argv[1], NULL, 0); > + if (argc == 1) > + cnt = strtol(argv[0], NULL, 0); > setup_git_directory(); > git_config(git_default_config, NULL); > + > for (i = 0; i < cnt; i++) { > - read_cache(); > + repo_read_index(r); > if (name) { > int pos; > > - refresh_index(&the_index, REFRESH_QUIET, > + refresh_index(r->index, REFRESH_QUIET, > NULL, NULL, NULL); > - pos = index_name_pos(&the_index, name, strlen(name)); > + pos = index_name_pos(r->index, name, strlen(name)); > if (pos < 0) > die("%s not in index", name); > printf("%s is%s up to date\n", name, > - ce_uptodate(the_index.cache[pos]) ? "" : " not"); > + ce_uptodate(r->index->cache[pos]) ? "" : " not"); > write_file(name, "%d\n", i); > } > - discard_cache(); > + if (table) > + print_cache(r->index); > + discard_index(r->index); > } > return 0; > } ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-24 1:24 ` Ævar Arnfjörð Bjarmason @ 2021-03-24 12:33 ` Derrick Stolee 2021-03-25 3:41 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 203+ messages in thread From: Derrick Stolee @ 2021-03-24 12:33 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget Cc: git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <dstolee@microsoft.com> >> >> This table is helpful for discovering data in the index to ensure it is >> being written correctly, especially as we build and test the >> sparse-index. This table includes an output format similar to 'git >> ls-tree', but should not be compared to that directly. The biggest >> reasons are that 'git ls-tree' includes a tree entry for every >> subdirectory, even those that would not appear as a sparse directory in >> a sparse-index. Further, 'git ls-tree' does not use a trailing directory >> separator for its tree rows. >> >> This does not print the stat() information for the blobs. That will be >> added in a future change with another option. The tests that are added >> in the next few changes care only about the object types and IDs. >> However, this future need for full index information justifies the need >> for this test helper over extending a user-facing feature, such as 'git >> ls-files'. > > Is that stat() information that's going to be essential to grab in the > same process that runs the "for (i = 0; i < istate->cache_nr; i++)" > for-loop, or stat() information that could be grabbed as: > > git ls-files -z --stage | some-program-that-stats-all-listed-blobs The point is not to find the stat() data from disk, but to ensure that the stat() data is correctly stored in the index (say, after converting an existing index from another format). This pipe strategy does not allow for that scenario. > It's not so much that I still disagree as I feel like I'm missing > something. I haven't gone through this topic with a fine toothed comb, > so ... > > If and when these patches land and I'm using this nascent sparse > checkout support why wouldn't I want ls-files or another not-a-test-tool > to support extracting this new information that's in the index? > > That's why I sent the RFC patches at > https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to > roll this functionality into ls-files. And I recommend that you continue to pursue them as an independent series, but I'm not going to incorporate them into this one. I'm not going to distract from this internal data structure with changes to user-facing commands until I think it's ready to use. As the design document describes the plan, I don't expect this to be something I will recommend to users until most of "Phase 3" is complete, making the most common Git commands aware of a sparse index. (I expect to fast-track a prototype to willing users that covers that functionality while review continues on the mailing list.) Making a change to a builtin is _forever_, and since the only purpose right now is to expose the data in a test environment, I don't want to adjust the builtin until either there is a real user need or the feature has otherwise stabilized. If you want to take on that responsibility, then please do. Otherwise, I will need to eventually handle "git ls-files" being sparse-aware when eventually removing 'command_requires_full_index', (Phase 4) so that would be a good opportunity to adjust the expectations. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-24 12:33 ` Derrick Stolee @ 2021-03-25 3:41 ` Ævar Arnfjörð Bjarmason 2021-03-26 0:12 ` Elijah Newren 0 siblings, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-25 3:41 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, newren, gitster, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Wed, Mar 24 2021, Derrick Stolee wrote: > On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote: >> >> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote: >> >>> From: Derrick Stolee <dstolee@microsoft.com> >>> >>> This table is helpful for discovering data in the index to ensure it is >>> being written correctly, especially as we build and test the >>> sparse-index. This table includes an output format similar to 'git >>> ls-tree', but should not be compared to that directly. The biggest >>> reasons are that 'git ls-tree' includes a tree entry for every >>> subdirectory, even those that would not appear as a sparse directory in >>> a sparse-index. Further, 'git ls-tree' does not use a trailing directory >>> separator for its tree rows. >>> >>> This does not print the stat() information for the blobs. That will be >>> added in a future change with another option. The tests that are added >>> in the next few changes care only about the object types and IDs. >>> However, this future need for full index information justifies the need >>> for this test helper over extending a user-facing feature, such as 'git >>> ls-files'. >> >> Is that stat() information that's going to be essential to grab in the >> same process that runs the "for (i = 0; i < istate->cache_nr; i++)" >> for-loop, or stat() information that could be grabbed as: >> >> git ls-files -z --stage | some-program-that-stats-all-listed-blobs > > The point is not to find the stat() data from disk, but to ensure that > the stat() data is correctly stored in the index (say, after converting > an existing index from another format). This pipe strategy does not > allow for that scenario. So a dump of ce->ce_stat_data, i.e. the same thing ls-files --debug prints out now, or...? >> It's not so much that I still disagree as I feel like I'm missing >> something. I haven't gone through this topic with a fine toothed comb, >> so ... >> >> If and when these patches land and I'm using this nascent sparse >> checkout support why wouldn't I want ls-files or another not-a-test-tool >> to support extracting this new information that's in the index? >> >> That's why I sent the RFC patches at >> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to >> roll this functionality into ls-files. > > And I recommend that you continue to pursue them as an independent > series, but I'm not going to incorporate them into this one. I'm > not going to distract from this internal data structure with changes > to user-facing commands until I think it's ready to use. As the design > document describes the plan, I don't expect this to be something I > will recommend to users until most of "Phase 3" is complete, making > the most common Git commands aware of a sparse index. (I expect to > fast-track a prototype to willing users that covers that functionality > while review continues on the mailing list.) This series is 20 patches. Your current derrickstolee/sparse-index/wip is another 36, and from skimming those patches & your design doc those 56 seem to be partway into Phase I of IV. So at the rate things tend to get reviewed / re-rolled & land in git.git it seems exceedingly likely that we'll have some part-way implementation of this for at least a major release or two. No? Which is why I'm suggesting/asking if we shouldn't have something like this debugging helper as part of installed tooling, because people are going to try it, it's probably going to have bugs and do other weird things, and I'd rather not have to manually build some test-tool to debug some local sparse checkout somewhere. > Making a change to a builtin is _forever_, and since the only > purpose right now is to expose the data in a test environment, I > don't want to adjust the builtin until either there is a real user > need or the feature has otherwise stabilized. If you want to take on > that responsibility, then please do. That's just not the case, we have plenty of unstable debug-esque options in various built-in commands, in fact ls-files already has a --debug option whose docs say: This is intended to show as much information as possible for manual inspection; the exact format may change at any time. It was added in 84974217151 (ls-files: learn a debugging dump format, 2010-07-31) and "just tacks all available data from the cache onto each file's line" so in a way not adjusting it and using it would be a regression, after all this is new data in the cache, so it should print it :) There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the sanity/merits of me suggesting that this specific thing be in ls-files instead of a test-helper, it seems far fetched that something like that hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is something we'd need to worry about backwards compatibility for. So, whatever you think about the merits of including this functionality in ls-files I think your stance of this being a no-go for adding to the builtin is based on a false premise. It's fine to have unstable/transitory/debug output in the builtins. We just name & document them as such. I also had some feedback in that series and on the earlier iteration that I think is appropriate to be incorporated into a re-roll of this one, which doesn't have anything to do with the question of whether we use ls-files or the helper in the tests. Such as us showing more stuff into the read-cache.c test-tool v.s. splitting it up making that code needlessly convoluted. I don't see how recommending that I pursue that as an independent series is productive for anyone. So as you re-roll this I should submit another series on top to refactor your in-flight code & tests? Either my suggestions are just bad, and we shouldn't do them at all, or it makes sense to incorporate relevant feedback in re-rolls. I'll let other reviewers draw their own conclusions on that. That's not a snarky "I'm right" b.t.w., I may honestly be full of it on this particular topic. But if those suggested changes are worth doing at all, then doing them in that way seems like a massive waste of time for everyone involved, or maybe I'm not getting what you're suggesting by pursuing them as an independent series. > Otherwise, I will need to eventually handle "git ls-files" being > sparse-aware when eventually removing 'command_requires_full_index', > (Phase 4) so that would be a good opportunity to adjust the > expectations. At which point you'd be adjusting your tests that expect ls-tree format output to using ls-files output, instead of using ls-files-like output from the beginning? At the end of this E-Mail is a patch on top that adds an undocumented --debug-sparse in addition to the existing --debug. Running that in the middle of one of your tests: $ ~/g/git/git ls-files --debug -- a folder1 a ctime: 1616641434:474004002 mtime: 1616641434:474004002 dev: 2306 ino: 28576528 uid: 1001 gid: 1001 size: 8 flags: 0 folder1/a ctime: 0:0 mtime: 0:0 dev: 0 ino: 0 uid: 0 gid: 0 size: 0 flags: 40000000 $ ~/g/git/git ls-files --debug --debug-sparse -- a folder1 a ctime: 1616641434:474004002 mtime: 1616641434:474004002 dev: 2306 ino: 28576528 uid: 1001 gid: 1001 size: 8 flags: 0 folder1/ ctime: 0:0 mtime: 0:0 dev: 0 ino: 0 uid: 0 gid: 0 size: 0 flags: 40004000 $ ~/g/git/git ls-files --stage -- a folder1 100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0 a 100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0 folder1/a $ ~/g/git/git ls-files --stage --debug-sparse -- a folder1 100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0 a 040000 f203181537ff55dcf7896bf8c5b5c35af1514421 0 folder1/ I.e. it gives you everything your helper does and more with a trivial addition of a --debug-sparse (which we can later just remove, it's a debug option...). See e.g. my recent 15c9649730d (grep/log: remove hidden --debug and --grep-debug options, 2021-01-26) which is already in a release, and AFAICT nobody has noticed or cared. I don't know if that's the stat() information you wanted (your WIP branch doesn't have such a change), but presumably it either is the info you want, or ls-files's --debug would want to emit any such such info that's now missing too. diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 13bcc2d8473..e691512d4f8 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -34,6 +34,7 @@ static int show_valid_bit; static int show_fsmonitor_bit; static int line_terminator = '\n'; static int debug_mode; +static int debug_sparse_mode; static int show_eol; static int recurse_submodules; static int skipping_duplicates; @@ -242,9 +243,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir, if (!show_stage) { fputs(tag, stdout); } else { + unsigned int mode = ce->ce_mode; + if (debug_sparse_mode && S_ISSPARSEDIR(mode)) + /* + * We could just do & 0177777 all the + * time, just make it clear this is + * for --debug-sparse. + */ + mode &= 0177777; printf("%s%06o %s %d\t", tag, - ce->ce_mode, + mode, find_unique_abbrev(&ce->oid, abbrev), ce_stage(ce)); } @@ -667,6 +676,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) N_("pretend that paths removed since <tree-ish> are still present")), OPT__ABBREV(&abbrev), OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")), + OPT_BOOL(0, "debug-sparse", &debug_sparse_mode, N_("show sparse debugging data")), OPT_BOOL(0, "deduplicate", &skipping_duplicates, N_("suppress duplicate entries")), OPT_END() @@ -681,9 +691,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) prefix_len = strlen(prefix); git_config(git_default_config, NULL); - if (repo_read_index(the_repository) < 0) - die("index file corrupt"); - argc = parse_options(argc, argv, prefix, builtin_ls_files_options, ls_files_usage, 0); pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option"); @@ -700,6 +707,10 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) tag_skip_worktree = "S "; tag_resolve_undo = "U "; } + if (debug_sparse_mode) { + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + } if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed) require_work_tree = 1; if (show_unmerged) @@ -743,6 +754,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) max_prefix = common_prefix(&pathspec); max_prefix_len = get_common_prefix_len(max_prefix); + /* + * Read the index after parse options etc. have had a chance + * to die early. + */ + if (repo_read_index(the_repository) < 0) + die("index file corrupt"); prune_index(the_repository->index, max_prefix, max_prefix_len); /* Treat unmatching pathspec elements as errors */ ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-25 3:41 ` Ævar Arnfjörð Bjarmason @ 2021-03-26 0:12 ` Elijah Newren 2021-03-28 15:31 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 203+ messages in thread From: Elijah Newren @ 2021-03-26 0:12 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee Hi, On Wed, Mar 24, 2021 at 8:41 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Wed, Mar 24 2021, Derrick Stolee wrote: > > > On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote: > >> > >> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote: > >> > >>> From: Derrick Stolee <dstolee@microsoft.com> > >>> ... > >> It's not so much that I still disagree as I feel like I'm missing > >> something. I haven't gone through this topic with a fine toothed comb, > >> so ... > >> > >> If and when these patches land and I'm using this nascent sparse > >> checkout support why wouldn't I want ls-files or another not-a-test-tool > >> to support extracting this new information that's in the index? > >> > >> That's why I sent the RFC patches at > >> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to > >> roll this functionality into ls-files. > > > > And I recommend that you continue to pursue them as an independent > > series, but I'm not going to incorporate them into this one. I'm > > not going to distract from this internal data structure with changes > > to user-facing commands until I think it's ready to use. As the design > > document describes the plan, I don't expect this to be something I > > will recommend to users until most of "Phase 3" is complete, making > > the most common Git commands aware of a sparse index. (I expect to > > fast-track a prototype to willing users that covers that functionality > > while review continues on the mailing list.) > > This series is 20 patches. Your current derrickstolee/sparse-index/wip > is another 36, and from skimming those patches & your design doc those > 56 seem to be partway into Phase I of IV. > > So at the rate things tend to get reviewed / re-rolled & land in git.git > it seems exceedingly likely that we'll have some part-way implementation > of this for at least a major release or two. No? > > Which is why I'm suggesting/asking if we shouldn't have something like > this debugging helper as part of installed tooling, because people are > going to try it, it's probably going to have bugs and do other weird > things, and I'd rather not have to manually build some test-tool to > debug some local sparse checkout somewhere. I'm curious why you feel it's critical that this particular piece of debugging machinery needs to be prioritized early and exposed; in particular, I'm not sure I follow the "people are going to try it" assertion. Are you the one who is going to try it or are you going to give it to your users? If so, what do you need out of the debugging tool? You are correct that this will span multiple releases; Stolee already said he was planning to be working on this for most of 2021. But just because pieces of the code exist and are shipped doesn't mean it'll be announced or supported. For example, the git-2.30 and git-2.31 release notes were completely silent about merge-ort. It existed in both releases; in fact, the version that ships in git-2.31, could theoretically be used successfully by the vast majority of users for their daily workflow. (But it does have known shortcomings and test failures so I definitely did *not* want it to be announced at that time.) > > Making a change to a builtin is _forever_, and since the only > > purpose right now is to expose the data in a test environment, I > > don't want to adjust the builtin until either there is a real user > > need or the feature has otherwise stabilized. If you want to take on > > that responsibility, then please do. > > That's just not the case, we have plenty of unstable debug-esque options > in various built-in commands, in fact ls-files already has a --debug > option whose docs say: > > This is intended to show as much information as possible for manual > inspection; the exact format may change at any time. > > It was added in 84974217151 (ls-files: learn a debugging dump format, > 2010-07-31) and "just tacks all available data from the cache onto each > file's line" so in a way not adjusting it and using it would be a > regression, after all this is new data in the cache, so it should print > it :) > > There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the > sanity/merits of me suggesting that this specific thing be in ls-files > instead of a test-helper, it seems far fetched that something like that > hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is > something we'd need to worry about backwards compatibility for. > > So, whatever you think about the merits of including this functionality > in ls-files I think your stance of this being a no-go for adding to the > builtin is based on a false premise. It's fine to have > unstable/transitory/debug output in the builtins. We just name & > document them as such. > > I also had some feedback in that series and on the earlier iteration > that I think is appropriate to be incorporated into a re-roll of this > one, which doesn't have anything to do with the question of whether we > use ls-files or the helper in the tests. Such as us showing more stuff > into the read-cache.c test-tool v.s. splitting it up making that code > needlessly convoluted. Well: * you seem to be strongly opposed to test-read-cache.c containing this code (though I don't quite follow why) * Stolee seems to be strongly opposed to modifying builtin/ls-files.c until he has time to think through how builtins should work. So putting it in another test file that looks slightly duplicative of test-read-cache.c might indeed be a good way out of this conundrum. :-) (I'm not opposed to any of the three solutions, I'm mostly chiming in here because I'm worried about possible bubbling frustration; see below.) > I don't see how recommending that I pursue that as an independent series > is productive for anyone. So as you re-roll this I should submit another > series on top to refactor your in-flight code & tests? Your tone suggests some frustration; I have a suspicion there's some lack of understanding or misreading that has occurred (perhaps on my part too), and before that misunderstanding morphs into motive questioning, let me see if I might be able to help... So far, you have advocated for: A) Moving the checks to ls-files with a permanent new flag (--sparse) B) Duplicating test-read-cache.c (which is admittedly pretty small) and then modifying the duplicate to have the new behavior, or alternatively: C) Just stating files to get the information D) Creating new debug option(s) to ls-files so that end users can use this in the next few releases before the feature is ready for prime time You also mentioned you had read just part of the series. Option D comes with the problem that it's not at all clear who these end-users are, why they want the option, or how we should design it. Personally, I'm totally onboard that ls-files should generally have the ability to show information in the index (e.g. if there are tree entries in addition to blob entries, it should be able to show both), but I'm not following the reasoning for why it needs to be there as part of the early stages of development of the sparse-index feature and who it's supposed to be helping in these next few releases. The progression also suggests that Option B might have just been a step along the way and that you were advocating for Option D now. I think it'd be easy to miss that you still had option B open and considered it equivalently good to option D (or am I misreading?), much like you missed how option C wasn't even relevant to the problem at hand or option A would have introduced perpetual confusion as a mere duplicate of --stage (in the best case scenario, anyway). They're all easy misunderstandings. > Either my suggestions are just bad, and we shouldn't do them at all, or > it makes sense to incorporate relevant feedback in re-rolls. I'll let > other reviewers draw their own conclusions on that. I think that's a bit unfair; Stolee has been incorporating feedback. He even called out fixing up things at your suggestion in v4 of his re-roll. > That's not a snarky "I'm right" b.t.w., I may honestly be full of it on > this particular topic. > > But if those suggested changes are worth doing at all, then doing them > in that way seems like a massive waste of time for everyone involved, or > maybe I'm not getting what you're suggesting by pursuing them as an > independent series. I think you should instead read it as he has no idea why this needs to be exposed in ls-files, who these users are you are asserting will be using it, or how to cater for their needs. Shouldn't the person who implements this understand those pieces to avoid a massive waste of time? > > Otherwise, I will need to eventually handle "git ls-files" being > > sparse-aware when eventually removing 'command_requires_full_index', > > (Phase 4) so that would be a good opportunity to adjust the > > expectations. > > At which point you'd be adjusting your tests that expect ls-tree format > output to using ls-files output, instead of using ls-files-like output > from the beginning? I don't understand what you're getting at here. I was the one who requested Stolee make the output look like ls-trees in his original RFC series, so if there's a problem with this style of output, I'm to blame. But, what is exactly the problem? Old-style ls-files output just isn't relevant anymore. ls-tree prints four things: mode, type, hash, and filename. ls-files prints all of those except "type". The reason ls-files never included type before was because it was always "blob". This series changes that, and adds "tree" to the mix. Once you have different types included in the index, then ls-files has to print all the same fields that ls-tree does...so why not make it look similar? > At the end of this E-Mail is a patch on top that adds an undocumented > --debug-sparse in addition to the existing --debug. Running that in the > middle of one of your tests: > > $ ~/g/git/git ls-files --debug -- a folder1 > a > ctime: 1616641434:474004002 > mtime: 1616641434:474004002 > dev: 2306 ino: 28576528 > uid: 1001 gid: 1001 > size: 8 flags: 0 > folder1/a > ctime: 0:0 > mtime: 0:0 > dev: 0 ino: 0 > uid: 0 gid: 0 > size: 0 flags: 40000000 > $ ~/g/git/git ls-files --debug --debug-sparse -- a folder1 > a > ctime: 1616641434:474004002 > mtime: 1616641434:474004002 > dev: 2306 ino: 28576528 > uid: 1001 gid: 1001 > size: 8 flags: 0 > folder1/ > ctime: 0:0 > mtime: 0:0 > dev: 0 ino: 0 > uid: 0 gid: 0 > size: 0 flags: 40004000 > $ ~/g/git/git ls-files --stage -- a folder1 > 100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0 a > 100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0 folder1/a > $ ~/g/git/git ls-files --stage --debug-sparse -- a folder1 > 100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0 a > 040000 f203181537ff55dcf7896bf8c5b5c35af1514421 0 folder1/ > > I.e. it gives you everything your helper does and more with a trivial > addition of a --debug-sparse (which we can later just remove, it's a > debug option...). > > See e.g. my recent 15c9649730d (grep/log: remove hidden --debug and > --grep-debug options, 2021-01-26) which is already in a release, and > AFAICT nobody has noticed or cared. > > I don't know if that's the stat() information you wanted (your WIP > branch doesn't have such a change), but presumably it either is the info > you want, or ls-files's --debug would want to emit any such such info > that's now missing too. > > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > index 13bcc2d8473..e691512d4f8 100644 > --- a/builtin/ls-files.c > +++ b/builtin/ls-files.c > @@ -34,6 +34,7 @@ static int show_valid_bit; > static int show_fsmonitor_bit; > static int line_terminator = '\n'; > static int debug_mode; > +static int debug_sparse_mode; > static int show_eol; > static int recurse_submodules; > static int skipping_duplicates; > @@ -242,9 +243,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir, > if (!show_stage) { > fputs(tag, stdout); > } else { > + unsigned int mode = ce->ce_mode; > + if (debug_sparse_mode && S_ISSPARSEDIR(mode)) > + /* > + * We could just do & 0177777 all the > + * time, just make it clear this is > + * for --debug-sparse. > + */ > + mode &= 0177777; > printf("%s%06o %s %d\t", > tag, > - ce->ce_mode, > + mode, > find_unique_abbrev(&ce->oid, abbrev), > ce_stage(ce)); > } > @@ -667,6 +676,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > N_("pretend that paths removed since <tree-ish> are still present")), > OPT__ABBREV(&abbrev), > OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")), > + OPT_BOOL(0, "debug-sparse", &debug_sparse_mode, N_("show sparse debugging data")), > OPT_BOOL(0, "deduplicate", &skipping_duplicates, > N_("suppress duplicate entries")), > OPT_END() > @@ -681,9 +691,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > prefix_len = strlen(prefix); > git_config(git_default_config, NULL); > > - if (repo_read_index(the_repository) < 0) > - die("index file corrupt"); > - > argc = parse_options(argc, argv, prefix, builtin_ls_files_options, > ls_files_usage, 0); > pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option"); > @@ -700,6 +707,10 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > tag_skip_worktree = "S "; > tag_resolve_undo = "U "; > } > + if (debug_sparse_mode) { > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + } > if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed) > require_work_tree = 1; > if (show_unmerged) > @@ -743,6 +754,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > max_prefix = common_prefix(&pathspec); > max_prefix_len = get_common_prefix_len(max_prefix); > > + /* > + * Read the index after parse options etc. have had a chance > + * to die early. > + */ > + if (repo_read_index(the_repository) < 0) > + die("index file corrupt"); > prune_index(the_repository->index, max_prefix, max_prefix_len); > > /* Treat unmatching pathspec elements as errors */ ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-26 0:12 ` Elijah Newren @ 2021-03-28 15:31 ` Ævar Arnfjörð Bjarmason 2021-03-29 19:46 ` Derrick Stolee 2021-03-29 22:02 ` Elijah Newren 0 siblings, 2 replies; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-28 15:31 UTC (permalink / raw) To: Elijah Newren Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Fri, Mar 26 2021, Elijah Newren wrote: > Hi, > > On Wed, Mar 24, 2021 at 8:41 PM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> On Wed, Mar 24 2021, Derrick Stolee wrote: >> >> > On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote: >> >> >> >> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote: >> >> >> >>> From: Derrick Stolee <dstolee@microsoft.com> >> >>> > ... >> >> It's not so much that I still disagree as I feel like I'm missing >> >> something. I haven't gone through this topic with a fine toothed comb, >> >> so ... >> >> >> >> If and when these patches land and I'm using this nascent sparse >> >> checkout support why wouldn't I want ls-files or another not-a-test-tool >> >> to support extracting this new information that's in the index? >> >> >> >> That's why I sent the RFC patches at >> >> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to >> >> roll this functionality into ls-files. >> > >> > And I recommend that you continue to pursue them as an independent >> > series, but I'm not going to incorporate them into this one. I'm >> > not going to distract from this internal data structure with changes >> > to user-facing commands until I think it's ready to use. As the design >> > document describes the plan, I don't expect this to be something I >> > will recommend to users until most of "Phase 3" is complete, making >> > the most common Git commands aware of a sparse index. (I expect to >> > fast-track a prototype to willing users that covers that functionality >> > while review continues on the mailing list.) >> >> This series is 20 patches. Your current derrickstolee/sparse-index/wip >> is another 36, and from skimming those patches & your design doc those >> 56 seem to be partway into Phase I of IV. >> >> So at the rate things tend to get reviewed / re-rolled & land in git.git >> it seems exceedingly likely that we'll have some part-way implementation >> of this for at least a major release or two. No? >> >> Which is why I'm suggesting/asking if we shouldn't have something like >> this debugging helper as part of installed tooling, because people are >> going to try it, it's probably going to have bugs and do other weird >> things, and I'd rather not have to manually build some test-tool to >> debug some local sparse checkout somewhere. > > I'm curious why you feel it's critical that this particular piece of > debugging machinery needs to be prioritized early and exposed; in > particular, I'm not sure I follow the "people are going to try it" > assertion. The debugging machinery's already there, the question is why we have a need for duplicating code in-tree. I just did some cursory review of this topic, and wondered why its tests couldn't use a builtin instead of (mostly) reinventing the wheel. It seems to me that the reason for that state is based on a misunderstanding about what we would and wouldn't add to builtin/*.c, i.e. that we wouldn't have something like a --debug option, but as ls-files shows that's not a problem. So my interest is twofold: * Just a comment on "can we avoid this code duplication" * The related one of not wanting to re-learn some custom test helper as (presumably) we get N number of large patch serieses on this topic, if it turns out that we can use an existing well-known tool with minimal changes. > Are you the one who is going to try it or are you going to > give it to your users? If so, what do you need out of the debugging > tool? I haven't understood the sparse index enough feature enough to know if anyone would ever want to run this --debug-sparse outside of the test suite. Isn't extract info about its internal state going to be useful sooner than later in the scenarios where you'd care enough to run "ls-files --stage" now? Maybe I've misunderstood this feature and it's going to be so transparent that nobody will ever have any reason to dump how it's working out of the index... > You are correct that this will span multiple releases; Stolee already > said he was planning to be working on this for most of 2021. But just > because pieces of the code exist and are shipped doesn't mean it'll be > announced or supported. For example, the git-2.30 and git-2.31 > release notes were completely silent about merge-ort. It existed in > both releases; in fact, the version that ships in git-2.31, could > theoretically be used successfully by the vast majority of users for > their daily workflow. (But it does have known shortcomings and test > failures so I definitely did *not* want it to be announced at that > time.) Yes, and that's fine. But if you'd been bending over backwards to add merge-ort to t/helper/ "because it's not ready yet" or something I'd have probably commented to the effect of "can't we just add it as part of builtins but not advertise it?" which is what you did :) >> > Making a change to a builtin is _forever_, and since the only >> > purpose right now is to expose the data in a test environment, I >> > don't want to adjust the builtin until either there is a real user >> > need or the feature has otherwise stabilized. If you want to take on >> > that responsibility, then please do. >> >> That's just not the case, we have plenty of unstable debug-esque options >> in various built-in commands, in fact ls-files already has a --debug >> option whose docs say: >> >> This is intended to show as much information as possible for manual >> inspection; the exact format may change at any time. >> >> It was added in 84974217151 (ls-files: learn a debugging dump format, >> 2010-07-31) and "just tacks all available data from the cache onto each >> file's line" so in a way not adjusting it and using it would be a >> regression, after all this is new data in the cache, so it should print >> it :) >> >> There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the >> sanity/merits of me suggesting that this specific thing be in ls-files >> instead of a test-helper, it seems far fetched that something like that >> hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is >> something we'd need to worry about backwards compatibility for. >> >> So, whatever you think about the merits of including this functionality >> in ls-files I think your stance of this being a no-go for adding to the >> builtin is based on a false premise. It's fine to have >> unstable/transitory/debug output in the builtins. We just name & >> document them as such. >> >> I also had some feedback in that series and on the earlier iteration >> that I think is appropriate to be incorporated into a re-roll of this >> one, which doesn't have anything to do with the question of whether we >> use ls-files or the helper in the tests. Such as us showing more stuff >> into the read-cache.c test-tool v.s. splitting it up making that code >> needlessly convoluted. > > Well: > * you seem to be strongly opposed to test-read-cache.c containing > this code (though I don't quite follow why) See above. > * Stolee seems to be strongly opposed to modifying > builtin/ls-files.c until he has time to think through how builtins > should work. As noted above my reading of upthread is that those reasons basically boil down to not knowing "git ls-files --debug" exists, and that we can extend it. > So putting it in another test file that looks slightly duplicative of > test-read-cache.c might indeed be a good way out of this conundrum. > :-) FWIW I think that read-cache.c split is worth doing even if this series doesn't modify t/helper/read-cache.c. The "this is for fsmonitor" and "this is for the perf test" use-cases are (as I think my RFC patch shows) clearer once they're split up. > (I'm not opposed to any of the three solutions, I'm mostly chiming in > here because I'm worried about possible bubbling frustration; see > below.) > >> I don't see how recommending that I pursue that as an independent series >> is productive for anyone. So as you re-roll this I should submit another >> series on top to refactor your in-flight code & tests? > > Your tone suggests some frustration; I have a suspicion there's some > lack of understanding or misreading that has occurred (perhaps on my > part too), and before that misunderstanding morphs into motive > questioning, let me see if I might be able to help... Honestly more flabbergasted than anything, so I'm trying to clarify what the author thinks of this direction. I mean it's fine if it's just a "I don't think this is important and don't want to spend time on it, but it seems like a good idea", in which case others have the option of re-rolling some of these patches if they care (at this point I wouldn't). Or "this is just a bad idea for XYZ reason", which is also fine, and even more valuable to document for future work in the area. But to have another series built on this with refactorings back and forth before code's landed on master just seems like needless churn. > So far, you have advocated for: > A) Moving the checks to ls-files with a permanent new flag (--sparse) > B) Duplicating test-read-cache.c (which is admittedly pretty small) > and then modifying the duplicate to have the new behavior, or > alternatively: > C) Just stating files to get the information > D) Creating new debug option(s) to ls-files so that end users can > use this in the next few releases before the feature is ready for > prime time > You also mentioned you had read just part of the series. > > Option D comes with the problem that it's not at all clear who these > end-users are, why they want the option, or how we should design it. [...] I think s/advocated/read the series and sent an flow-of-thought not-ready-for-anything RFC patches on top/ would be more accurate :) I.e. the A) --sparse thing was just reading the patch and seeing if ls-files couldn't be made to do this, but yes, having the documented --sparse interface might not make sense. we discussed B) above. C) Was a question to clarify what was meant with stat data, since it's an offhand comment in the commit message. Does it mean "stat after the fact" or "this will have a mode like ls-files --debug has now"? Right now I'm just suggesting with D) that this might be rolled into the dev-only-not-for-end-users --debug mode. > I'm totally onboard that ls-files should generally have > the ability to show information in the index (e.g. if there are tree > entries in addition to blob entries, it should be able to show both), > but I'm not following the reasoning for why it needs to be there as > part of the early stages of development of the sparse-index feature > and who it's supposed to be helping in these next few releases. We already are extracting the info at this early stage, just with a custom helper. All I'm suggesting right now is that the motivation for the custom helper is "this isn't for end users" then surely having a patch around 1/2 the size to add it to already reviewed/tested ls-files code under a --debug option makes more sense. Especially since the upthread commit mentions wanting to incorporate stat() data. I'm not sure how exactly (there's no outstanding patches, even on a WIP branch for it, AFAICT), but most likely it's further duplication of data "ls-files --debug" already spews out. So the patch would be 1/2 the size, and instead of saying "let's do stat stuff in the future" it would get it for free. Or not, part of that's speculation on information that's just in Stolee's head. Hence this side-discussion. > [...] [Cut parts hopefully all clarified with the above comments] >> [..] >> At which point you'd be adjusting your tests that expect ls-tree format >> output to using ls-files output, instead of using ls-files-like output >> from the beginning? > > I don't understand what you're getting at here. I was the one who > requested Stolee make the output look like ls-trees in his original > RFC series, so if there's a problem with this style of output, I'm to > blame. I didn't read the RFC series, so I missed that there was past discussion on this point. Perhaps something to roll into an updated commit message? My reading of the current version is that it suggests that the ls-tree-like output is important to get at the data we need, which my patch-for-discussion shows isn't the case. > [...] Once you have different types included in the index, then > ls-files has to print all the same fields that ls-tree does...so why > not make it look similar? I don't have a problem with how the output looks, I happen to like the ls-tree output better, I've just been suggesting that differing output == code duplication. In any case. I'm sorry about any comments I've made that came across as snarky or whatever. Since we're talking in a text-based medium I'm going to take the reading of a third-party native speaker (you) over mine. I didn't mean any comments I've made that way, I'm very interested in seeing this feature land, and just want to try to help it along. Given the size of this thread over a relatively trivial matter I think that "help" is probably counterproductive at this point. I don't think this is criticial or needs to be done or whatever. I've only kept up this thread for the reasons stated above, i.e. it seeming to me to be based on the premise that we can't add certain code to builtin/*.c, and if we can get around that we can make this simpler. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-28 15:31 ` Ævar Arnfjörð Bjarmason @ 2021-03-29 19:46 ` Derrick Stolee 2021-03-29 21:44 ` Junio C Hamano 2021-03-29 23:06 ` Ævar Arnfjörð Bjarmason 2021-03-29 22:02 ` Elijah Newren 1 sibling, 2 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-29 19:46 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Elijah Newren Cc: Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/28/2021 11:31 AM, Ævar Arnfjörð Bjarmason wrote:> It seems to me that the reason for that state is based on a > misunderstanding about what we would and wouldn't add to builtin/*.c, > i.e. that we wouldn't have something like a --debug option, but as > ls-files shows that's not a problem. I feel _strongly_ that a change to the user-facing CLI should come with a good reason and care about how it locks-in behavior for the future. Any adjustment to 'git ls-files' deserves its own series and attention, not in an already-too-large series like this one. I'm not happy that this series and the next are so long, but that's the best I can do to make them reviewable and still capture a complete scenario. Hopefully the remaining series after these first two are smaller. Things like "what should 'git ls-files' do with a sparse index?" can fit cleanly on top once the core functionality of the internals are stable. I have an _opinion_ that the ls-files output is not well-suited to testing because the --debug output splits details across multiple lines. This is a minor point that could probably be corrected by a complicated script method, but that's why I list this as an opinion. > I mean it's fine if it's just a "I don't think this is important and > don't want to spend time on it, but it seems like a good idea", in which > case others have the option of re-rolling some of these patches if they > care (at this point I wouldn't). > > Or "this is just a bad idea for XYZ reason", which is also fine, and > even more valuable to document for future work in the area. > > But to have another series built on this with refactorings back and > forth before code's landed on master just seems like needless churn. I think changing 'ls-files' before the sparse index has stabilized is premature. I said that a series like the RFC you sent would be appropriate after this concept is more stable. I do _not_ recommend trying to juggle it on top of the work while the patches are in flight. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-29 19:46 ` Derrick Stolee @ 2021-03-29 21:44 ` Junio C Hamano 2021-03-30 11:28 ` Derrick Stolee 2021-03-29 23:06 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-29 21:44 UTC (permalink / raw) To: Derrick Stolee Cc: Ævar Arnfjörð Bjarmason, Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee Derrick Stolee <stolee@gmail.com> writes: > I think changing 'ls-files' before the sparse index has stabilized is > premature. I said that a series like the RFC you sent would be > appropriate after this concept is more stable. I do _not_ recommend > trying to juggle it on top of the work while the patches are in flight. I do not have a problem with either of approaches to help debugging (i.e. extending "ls-files --debug" or a new test helper), but I am curious to be reminded what the plan for "git ls-files [-s]" output is, when run in a repository in which sparse cone checkout is used. Do we see trees and paths outside the cone omitted, or does the act of running "ls-files" dehydrate the trees into their constituent blobs? Thanks. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-29 21:44 ` Junio C Hamano @ 2021-03-30 11:28 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-30 11:28 UTC (permalink / raw) To: Junio C Hamano Cc: Ævar Arnfjörð Bjarmason, Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/29/2021 5:44 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> I think changing 'ls-files' before the sparse index has stabilized is >> premature. I said that a series like the RFC you sent would be >> appropriate after this concept is more stable. I do _not_ recommend >> trying to juggle it on top of the work while the patches are in flight. > > I do not have a problem with either of approaches to help debugging > (i.e. extending "ls-files --debug" or a new test helper), but I am > curious to be reminded what the plan for "git ls-files [-s]" output > is, when run in a repository in which sparse cone checkout is used. > > Do we see trees and paths outside the cone omitted, or does the act > of running "ls-files" dehydrate the trees into their constituent > blobs? At the moment, end-to-end behavior is identical as before: sparse directory entries are expanded to all of the contained blobs instead of writing the tree entries. The sparse-index work will not be complete until every command is audited for potential behavior change when disabling the command_requires_full_index setting. That includes deciding what is the best decision for ls-files, and will likely include an option for both possible outputs (tree entries, or expanding to blobs). The interesting discussion that is worth its own topic is whether or not the tree entries should be displayed by default. So the plan is: this _will_ be addressed, but in the future after the core functionality and value of the sparse-index is set. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-29 19:46 ` Derrick Stolee 2021-03-29 21:44 ` Junio C Hamano @ 2021-03-29 23:06 ` Ævar Arnfjörð Bjarmason 2021-03-30 11:41 ` Derrick Stolee 1 sibling, 1 reply; 203+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-03-29 23:06 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On Mon, Mar 29 2021, Derrick Stolee wrote: > On 3/28/2021 11:31 AM, Ævar Arnfjörð Bjarmason wrote:> It seems to me that the reason for that state is based on a >> misunderstanding about what we would and wouldn't add to builtin/*.c, >> i.e. that we wouldn't have something like a --debug option, but as >> ls-files shows that's not a problem. At the risk of going in circles here... > I feel _strongly_ that a change to the user-facing CLI should come > with a good reason and care about how it locks-in behavior for the > future. And I agree with you. Where we disagree is whether lives in builtin/*.c == user-facing. I think --debug options are != that. It seems Junio downthread agrees with that. > Any adjustment to 'git ls-files' deserves its own series and > attention[...] A user-facing change to it yes, but I don't see how use of an (existing even) --debug option would warrant any more attention than a new test helper, less actually, it's less new code. > [...] not in an already-too-large series like this one. The alternative way of doing it at the end of https://lore.kernel.org/git/874kgzq4qi.fsf@evledraar.gmail.com would make this series smaller. Anyway. As I noted in the E-Mail you're replying to (https://lore.kernel.org/git/87eeg0ng78.fsf@evledraar.gmail.com/) I really don't care that much. I'm just still perplexed at how you keep bringing up use of an internal-only --debug option as "user-facing", and here "already too large" when we're talking about a proposed alternate direction that would reduce the size. > I'm not happy that this series and the next are so long, but that's > the best I can do to make them reviewable and still capture a > complete scenario. Hopefully the remaining series after these first > two are smaller. Things like "what should 'git ls-files' do with a > sparse index?" can fit cleanly on top once the core functionality > of the internals are stable. Sure. I'm fully on board with just moving forward with this in some manner. I'm not on board with the part of this that seems like it could just be rephrased/understood as "...and we're not touching ls-files even with a --debug option now because that would be user-facing[...]". > I have an _opinion_ that the ls-files output is not well-suited to > testing because the --debug output splits details across multiple > lines. This is a minor point that could probably be corrected by > a complicated script method, but that's why I list this as an > opinion. If the --debug it's spewing now isn't handy we can just change the output format. The docs say: This is intended to show as much information as possible for manual inspection; the exact format may change at any time. And we don't have existing in-tree users, something like this would make it rather trivial: diff --git a/builtin/ls-files.c b/builtin/ls-files.c index f6f9e483b27..7596edc9f9d 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -113,11 +113,11 @@ static void print_debug(const struct cache_entry *ce) if (debug_mode) { const struct stat_data *sd = &ce->ce_stat_data; - printf(" ctime: %u:%u\n", sd->sd_ctime.sec, sd->sd_ctime.nsec); - printf(" mtime: %u:%u\n", sd->sd_mtime.sec, sd->sd_mtime.nsec); - printf(" dev: %u\tino: %u\n", sd->sd_dev, sd->sd_ino); - printf(" uid: %u\tgid: %u\n", sd->sd_uid, sd->sd_gid); - printf(" size: %u\tflags: %x\n", sd->sd_size, ce->ce_flags); + printf(" ctime: %u:%u%c", sd->sd_ctime.sec, sd->sd_ctime.nsec, line_terminator); + printf(" mtime: %u:%u%c", sd->sd_mtime.sec, sd->sd_mtime.nsec, line_terminator); + printf(" dev: %u\tino: %u%c", sd->sd_dev, sd->sd_ino, line_terminator); + printf(" uid: %u\tgid: %u%c", sd->sd_uid, sd->sd_gid, line_terminator); + printf(" size: %u\tflags: %x%c", sd->sd_size, ce->ce_flags, line_terminator); } } But even without that it wouldn't be some complicated post-processing, just a pipe to a small perl or awk process. >> I mean it's fine if it's just a "I don't think this is important and >> don't want to spend time on it, but it seems like a good idea", in which >> case others have the option of re-rolling some of these patches if they >> care (at this point I wouldn't). >> >> Or "this is just a bad idea for XYZ reason", which is also fine, and >> even more valuable to document for future work in the area. >> >> But to have another series built on this with refactorings back and >> forth before code's landed on master just seems like needless churn. > > I think changing 'ls-files' before the sparse index has stabilized is > premature. I said that a series like the RFC you sent would be > appropriate after this concept is more stable. I do _not_ recommend > trying to juggle it on top of the work while the patches are in flight. Just to clarify, upthread in [1] you said: And I recommend that you continue to pursue [these RFC patches] as an independent series, but I'm not going to incorporate them into this one[...] So do I understand it right that you're referring to phase IV in your opinion being the first point where we'd consider piggy-backing on anything in builtin (that "user-facing" dilemma again...). But at that point wouldn't you have your own ideas about some user-facing ls-files or other porcelain for this, so I'm not sure where to place the encouragement that I continue to pursue that RFC series, other than setting a reminder in my calendar for 6-12 months in the future :) 1. https://lore.kernel.org/git/ca8a96a4-5897-2484-b195-57e5b3820576@gmail.com/ ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-29 23:06 ` Ævar Arnfjörð Bjarmason @ 2021-03-30 11:41 ` Derrick Stolee 0 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-30 11:41 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Elijah Newren, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee On 3/29/2021 7:06 PM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Mar 29 2021, Derrick Stolee wrote: > >> On 3/28/2021 11:31 AM, Ævar Arnfjörð Bjarmason wrote:> It seems to me that the reason for that state is based on a >>> misunderstanding about what we would and wouldn't add to builtin/*.c, >>> i.e. that we wouldn't have something like a --debug option, but as >>> ls-files shows that's not a problem. > > At the risk of going in circles here... > >> I feel _strongly_ that a change to the user-facing CLI should come >> with a good reason and care about how it locks-in behavior for the >> future. > > And I agree with you. Where we disagree is whether lives in builtin/*.c > == user-facing. I think --debug options are != that. It seems Junio > downthread agrees with that. > >> Any adjustment to 'git ls-files' deserves its own series and >> attention[...] > > A user-facing change to it yes, but I don't see how use of an (existing > even) --debug option would warrant any more attention than a new test > helper, less actually, it's less new code. I disagree that we can change the expected output of --debug so quickly, despite warnings in the documentation. Changing that format or creating a new output format requires cognitive load, and we have enough of that going on in this area as it is. >> [...] not in an already-too-large series like this one. ... > I'm just still perplexed at how you keep bringing up use of an > internal-only --debug option as "user-facing", and here "already too > large" when we're talking about a proposed alternate direction that > would reduce the size. I'm not saying "patch size" or "code size" but instead thinking of it in terms of how many decisions need to be made. Changing a builtin when it's not necessary adds to the complexity of the series and interrupts its core goals. Finally, I have mentioned that I will need extra data for testing a new index format. I don't want to modify the builtin now in a way that is insufficient for the needs in that future series. > Just to clarify, upthread in [1] you said: > > And I recommend that you continue to pursue [these RFC patches] as > an independent series, but I'm not going to incorporate them into > this one[...] > > So do I understand it right that you're referring to phase IV in your > opinion being the first point where we'd consider piggy-backing on > anything in builtin (that "user-facing" dilemma again...). I'm saying that if you feel strongly about it, then please pursue the changes to ls-files any time after this series (but probably after the next) solidifies. Having the changes be in a separate series allows time to inspect the behavior change to the builtin in a focused way. > But at that point wouldn't you have your own ideas about some > user-facing ls-files or other porcelain for this, so I'm not sure where > to place the encouragement that I continue to pursue that RFC series, > other than setting a reminder in my calendar for 6-12 months in the > future :) Otherwise, I will modify ls-files myself in this 6-12 month timeframe, based on the established plan to remove the command_requires_full_index setting. Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table 2021-03-28 15:31 ` Ævar Arnfjörð Bjarmason 2021-03-29 19:46 ` Derrick Stolee @ 2021-03-29 22:02 ` Elijah Newren 1 sibling, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-29 22:02 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Derrick Stolee, Derrick Stolee Hi, On Sun, Mar 28, 2021 at 8:31 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Fri, Mar 26 2021, Elijah Newren wrote: > [...] > > You are correct that this will span multiple releases; Stolee already > > said he was planning to be working on this for most of 2021. But just > > because pieces of the code exist and are shipped doesn't mean it'll be > > announced or supported. For example, the git-2.30 and git-2.31 > > release notes were completely silent about merge-ort. It existed in > > both releases; in fact, the version that ships in git-2.31, could > > theoretically be used successfully by the vast majority of users for > > their daily workflow. (But it does have known shortcomings and test > > failures so I definitely did *not* want it to be announced at that > > time.) > > Yes, and that's fine. But if you'd been bending over backwards to add > merge-ort to t/helper/ "because it's not ready yet" or something I'd > have probably commented to the effect of "can't we just add it as part > of builtins but not advertise it?" which is what you did :) Actually, I did add a t/helper/test-fast-rebase.c (which is a few hundred lines long) as part of the work on merge-ort, because merge-ort wasn't ready and because rewiring sequencer.c was a huge amount of work that I didn't want to get distracted by at the time. I originally suggested making fast-rebase a non-advertised builtin, but multiple reviewers suggested the test helper route instead. ¯\_(ツ)_/¯ ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v4 08/20] test-tool: don't force full index 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget ` (13 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will use 'test-tool read-cache --table' to check that a sparse index is written as part of init_repos. Since we will no longer always expand a sparse index into a full index, add an '--expand' parameter that adds a call to ensure_full_index() so we can compare a sparse index directly against a full index, or at least what the in-memory index looks like when expanded in this way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 13 ++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 6cfd8f2de71c..b52c174acc7a 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -4,6 +4,7 @@ #include "blob.h" #include "commit.h" #include "tree.h" +#include "sparse-index.h" static void print_cache_entry(struct cache_entry *ce) { @@ -35,13 +36,19 @@ int cmd__read_cache(int argc, const char **argv) struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; - int table = 0; + int table = 0, expand = 0; + + initialize_the_repository(); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) continue; if (!strcmp(*argv, "--table")) table = 1; + else if (!strcmp(*argv, "--expand")) + expand = 1; } if (argc == 1) @@ -51,6 +58,10 @@ int cmd__read_cache(int argc, const char **argv) for (i = 0; i < cnt; i++) { repo_read_index(r); + + if (expand) + ensure_full_index(r->index); + if (name) { int pos; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index de5d8461c993..a1aea141c62c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -130,6 +130,11 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'expanded in-memory index matches full index' ' + init_repos && + test_sparse_match test-tool read-cache --expand --table +' + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 09/20] unpack-trees: ensure full index 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget ` (12 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index f5f668f532d8..4dd99219073a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1567,6 +1567,7 @@ static int verify_absent(const struct cache_entry *, */ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { + struct repository *repo = the_repository; int i, ret; static struct cache_entry *dfc; struct pattern_list pl; @@ -1578,6 +1579,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options trace_performance_enter(); trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) { + ensure_full_index(o->src_index); + ensure_full_index(o->dst_index); + } + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 10/20] sparse-checkout: hold pattern list in index 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget ` (11 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 17 ++++++++++------- cache.h | 2 ++ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 2306a9ad98e0..e00b82af727b 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl) if (is_index_unborn(r->index)) return UPDATE_SPARSITY_SUCCESS; + r->index->sparse_checkout_patterns = pl; + memset(&o, 0, sizeof(o)); o.verbose_update = isatty(2); o.update = 1; @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl) else rollback_lock_file(&lock_file); + r->index->sparse_checkout_patterns = NULL; return result; } @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) { int result; int changed_config = 0; - struct pattern_list pl; - memset(&pl, 0, sizeof(pl)); + struct pattern_list *pl = xcalloc(1, sizeof(*pl)); switch (m) { case ADD: if (core_sparse_checkout_cone) - add_patterns_cone_mode(argc, argv, &pl); + add_patterns_cone_mode(argc, argv, pl); else - add_patterns_literal(argc, argv, &pl); + add_patterns_literal(argc, argv, pl); break; case REPLACE: - add_patterns_from_input(&pl, argc, argv); + add_patterns_from_input(pl, argc, argv); break; } @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) changed_config = 1; } - result = write_patterns_and_update(&pl); + result = write_patterns_and_update(pl); if (result && changed_config) set_config(MODE_NO_PATTERNS); - clear_pattern_list(&pl); + clear_pattern_list(pl); + free(pl); return result; } diff --git a/cache.h b/cache.h index 136dd496c95d..8c4464420d0a 100644 --- a/cache.h +++ b/cache.h @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode) struct split_index; struct untracked_cache; struct progress; +struct pattern_list; struct index_state { struct cache_entry **cache; @@ -338,6 +339,7 @@ struct index_state { struct mem_pool *ce_mem_pool; struct progress *progress; struct repository *repo; + struct pattern_list *sparse_checkout_patterns; }; /* Name hashing */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 11/20] sparse-index: convert from full to sparse 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 3 + cache.h | 2 + read-cache.c | 26 ++++- sparse-index.c | 139 +++++++++++++++++++++++ sparse-index.h | 1 + t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- 6 files changed, 228 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 2fb483d3c083..5f07a39e501e 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -6,6 +6,7 @@ #include "object-store.h" #include "replace-object.h" #include "promisor-remote.h" +#include "sparse-index.h" #ifndef DEBUG_CACHE_TREE #define DEBUG_CACHE_TREE 0 @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + ensure_full_index(istate); + if (!istate->cache_tree) istate->cache_tree = cache_tree(); diff --git a/cache.h b/cache.h index 8c4464420d0a..74b43aaa2bd1 100644 --- a/cache.h +++ b/cache.h @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) { if (S_ISLNK(mode)) return S_IFLNK; + if (S_ISSPARSEDIR(mode)) + return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; return S_IFREG | ce_permissions(mode); diff --git a/read-cache.c b/read-cache.c index dd3980c12b53..b9c08773466c 100644 --- a/read-cache.c +++ b/read-cache.c @@ -25,6 +25,7 @@ #include "fsmonitor.h" #include "thread-utils.h" #include "progress.h" +#include "sparse-index.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode) c = *path++; if ((c == '.' && !verify_dotfile(path, mode)) || - is_dir_sep(c) || c == '\0') + is_dir_sep(c)) return 0; + /* + * allow terminating directory separators for + * sparse directory entries. + */ + if (c == '\0') + return S_ISDIR(mode); } else if (c == '\\' && protect_ntfs) { if (is_ntfs_dotgit(path)) return 0; @@ -3079,6 +3086,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l unsigned flags) { int ret; + int was_full = !istate->sparse_index; + + ret = convert_to_sparse(istate); + + if (ret) { + warning(_("failed to convert to a sparse-index")); + return ret; + } /* * TODO trace2: replace "the_repository" with the actual repo instance @@ -3090,6 +3105,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l trace2_region_leave_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; if (flags & COMMIT_LOCK) @@ -3180,9 +3198,10 @@ static int write_shared_index(struct index_state *istate, struct tempfile **temp) { struct split_index *si = istate->split_index; - int ret; + int ret, was_full = !istate->sparse_index; move_cache_to_base_index(istate); + convert_to_sparse(istate); trace2_region_enter_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); @@ -3190,6 +3209,9 @@ static int write_shared_index(struct index_state *istate, trace2_region_leave_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; ret = adjust_shared_perm(get_tempfile_path(*temp)); diff --git a/sparse-index.c b/sparse-index.c index 7095378a1b28..619ff7c2e217 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -4,6 +4,145 @@ #include "tree.h" #include "pathspec.h" #include "trace2.h" +#include "cache-tree.h" +#include "config.h" +#include "dir.h" +#include "fsmonitor.h" + +static struct cache_entry *construct_sparse_dir_entry( + struct index_state *istate, + const char *sparse_dir, + struct cache_tree *tree) +{ + struct cache_entry *de; + + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); + + de->ce_flags |= CE_SKIP_WORKTREE; + return de; +} + +/* + * Returns the number of entries "inserted" into the index. + */ +static int convert_to_sparse_rec(struct index_state *istate, + int num_converted, + int start, int end, + const char *ct_path, size_t ct_pathlen, + struct cache_tree *ct) +{ + int i, can_convert = 1; + int start_converted = num_converted; + enum pattern_match_result match; + int dtype; + struct strbuf child_path = STRBUF_INIT; + struct pattern_list *pl = istate->sparse_checkout_patterns; + + /* + * Is the current path outside of the sparse cone? + * Then check if the region can be replaced by a sparse + * directory entry (everything is sparse and merged). + */ + match = path_matches_pattern_list(ct_path, ct_pathlen, + NULL, &dtype, pl, istate); + if (match != NOT_MATCHED) + can_convert = 0; + + for (i = start; can_convert && i < end; i++) { + struct cache_entry *ce = istate->cache[i]; + + if (ce_stage(ce) || + !(ce->ce_flags & CE_SKIP_WORKTREE)) + can_convert = 0; + } + + if (can_convert) { + struct cache_entry *se; + se = construct_sparse_dir_entry(istate, ct_path, ct); + + istate->cache[num_converted++] = se; + return 1; + } + + for (i = start; i < end; ) { + int count, span, pos = -1; + const char *base, *slash; + struct cache_entry *ce = istate->cache[i]; + + /* + * Detect if this is a normal entry outside of any subtree + * entry. + */ + base = ce->name + ct_pathlen; + slash = strchr(base, '/'); + + if (slash) + pos = cache_tree_subtree_pos(ct, base, slash - base); + + if (pos < 0) { + istate->cache[num_converted++] = ce; + i++; + continue; + } + + strbuf_setlen(&child_path, 0); + strbuf_add(&child_path, ce->name, slash - ce->name + 1); + + span = ct->down[pos]->cache_tree->entry_count; + count = convert_to_sparse_rec(istate, + num_converted, i, i + span, + child_path.buf, child_path.len, + ct->down[pos]->cache_tree); + num_converted += count; + i += span; + } + + strbuf_release(&child_path); + return num_converted - start_converted; +} + +int convert_to_sparse(struct index_state *istate) +{ + if (istate->split_index || istate->sparse_index || + !core_apply_sparse_checkout || !core_sparse_checkout_cone) + return 0; + + /* + * For now, only create a sparse index with the + * GIT_TEST_SPARSE_INDEX environment variable. We will relax + * this once we have a proper way to opt-in (and later still, + * opt-out). + */ + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + return 0; + + if (!istate->sparse_checkout_patterns) { + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) + return 0; + } + + if (!istate->sparse_checkout_patterns->use_cone_patterns) { + warning(_("attempting to use sparse-index without cone mode")); + return -1; + } + + if (cache_tree_update(istate, 0)) { + warning(_("unable to update cache-tree, staying full")); + return -1; + } + + remove_fsmonitor(istate); + + trace2_region_enter("index", "convert_to_sparse", istate->repo); + istate->cache_nr = convert_to_sparse_rec(istate, + 0, 0, istate->cache_nr, + "", 0, istate->cache_tree); + istate->drop_cache_tree = 1; + istate->sparse_index = 1; + trace2_region_leave("index", "convert_to_sparse", istate->repo); + return 0; +} static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { diff --git a/sparse-index.h b/sparse-index.h index 09a20d036c46..64380e121d80 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -3,5 +3,6 @@ struct index_state; void ensure_full_index(struct index_state *istate); +int convert_to_sparse(struct index_state *istate); #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index a1aea141c62c..1e888d195122 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,6 +2,11 @@ test_description='compare full workdir to sparse workdir' +# The verify_cache_tree() check is not sparse-aware (yet). +# So, disable the check until that integration is complete. +GIT_TEST_CHECK_CACHE_TREE=0 +GIT_TEST_SPLIT_INDEX=0 + . ./test-lib.sh test_expect_success 'setup' ' @@ -121,7 +126,9 @@ run_on_all () { test_all_match () { run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && - test_cmp full-checkout-err sparse-checkout-err + test_cmp full-checkout-out sparse-index-out && + test_cmp full-checkout-err sparse-checkout-err && + test_cmp full-checkout-err sparse-index-err } test_sparse_match () { @@ -130,6 +137,38 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'sparse-index contents' ' + init_repos && + + test-tool -C sparse-index read-cache --table >cache && + for dir in folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep/deeper2 folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done +' + test_expect_success 'expanded in-memory index matches full index' ' init_repos && test_sparse_match test-tool read-cache --expand --table @@ -137,6 +176,7 @@ test_expect_success 'expanded in-memory index matches full index' ' test_expect_success 'status with options' ' init_repos && + test_sparse_match ls && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -273,6 +313,17 @@ test_expect_failure 'checkout and reset (mixed)' ' test_all_match git reset update-folder2 ' +# Ensure that sparse-index behaves identically to +# sparse-checkout with a full index. +test_expect_success 'checkout and reset (mixed) [sparse]' ' + init_repos && + + test_sparse_match git checkout -b reset-test update-deep && + test_sparse_match git reset deepest && + test_sparse_match git reset update-folder1 && + test_sparse_match git reset update-folder2 +' + test_expect_success 'merge' ' init_repos && @@ -309,14 +360,20 @@ test_expect_success 'clean' ' test_all_match git status --porcelain=v2 && test_all_match git clean -f && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xdf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && - test_path_is_dir sparse-checkout/folder1 + test_sparse_match test_path_is_dir folder1 ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 12/20] submodule: sparse-index should not collapse links 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- sparse-index.c | 1 + t/t1092-sparse-checkout-compatibility.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/sparse-index.c b/sparse-index.c index 619ff7c2e217..7631f7bd00b7 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -52,6 +52,7 @@ static int convert_to_sparse_rec(struct index_state *istate, struct cache_entry *ce = istate->cache[i]; if (ce_stage(ce) || + S_ISGITLINK(ce->ce_mode) || !(ce->ce_flags & CE_SKIP_WORKTREE)) can_convert = 0; } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 1e888d195122..cba5f89b1e96 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -376,4 +376,21 @@ test_expect_success 'clean' ' test_sparse_match test_path_is_dir folder1 ' +test_expect_success 'submodule handling' ' + init_repos && + + test_all_match mkdir modules && + test_all_match touch modules/a && + test_all_match git add modules && + test_all_match git commit -m "add modules directory" && + + run_on_all git submodule add "$(pwd)/initial-repo" modules/sub && + test_all_match git commit -m "add submodule" && + + # having a submodule prevents "modules" from collapse + test-tool -C sparse-index read-cache --table >cache && + grep "100644 blob .* modules/a" cache && + grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 13/20] unpack-trees: allow sparse directories 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (11 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 4dd99219073a..0b888dab2246 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -746,9 +746,13 @@ static int index_pos_by_traverse_info(struct name_entry *names, strbuf_make_traverse_path(&name, info, names->path, names->pathlen); strbuf_addch(&name, '/'); pos = index_name_pos(o->src_index, name.buf, name.len); - if (pos >= 0) - BUG("This is a directory and should not exist in index"); - pos = -pos - 1; + if (pos >= 0) { + if (!o->src_index->sparse_index || + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) + BUG("This is a directory and should not exist in index"); + } else { + pos = -pos - 1; + } if (pos >= o->src_index->cache_nr || !starts_with(o->src_index->cache[pos]->name, name.buf) || (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 14/20] sparse-index: check index conversion happens 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (12 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a test case that uses test_region to ensure that we are truly expanding a sparse index to a full one, then converting back to sparse when writing the index. As we integrate more Git commands with the sparse index, we will convert these commands to check that we do _not_ convert the sparse index to a full index and instead stay sparse the entire time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index cba5f89b1e96..47f983217852 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -393,4 +393,22 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +test_expect_success 'sparse-index is expanded and converted back' ' + init_repos && + + ( + GIT_TEST_SPARSE_INDEX=1 && + export GIT_TEST_SPARSE_INDEX && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 15/20] sparse-index: create extension for compatibility 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (13 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Previously, we enabled the sparse index format only using GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to actually select this mode. Further, sparse directory entries are not understood by the index formats as advertised. We _could_ add a new index version that explicitly adds these capabilities, but there are nuances to index formats 2, 3, and 4 that are still valuable to select as options. Until we add index format version 5, create a repo extension, "extensions.sparseIndex", that specifies that the tool reading this repository must understand sparse directory entries. This change only encodes the extension and enables it when GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI mechanism. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/config/extensions.txt | 8 ++++++ cache.h | 1 + repo-settings.c | 7 ++++++ repository.h | 3 ++- setup.c | 3 +++ sparse-index.c | 38 +++++++++++++++++++++++++---- 6 files changed, 54 insertions(+), 6 deletions(-) diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 4e23d73cdcad..c02e09af0046 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -6,3 +6,11 @@ extensions.objectFormat:: Note that this setting should only be set by linkgit:git-init[1] or linkgit:git-clone[1]. Trying to change it after initialization will not work and will produce hard-to-diagnose issues. + +extensions.sparseIndex:: + When combined with `core.sparseCheckout=true` and + `core.sparseCheckoutCone=true`, the index may contain entries + corresponding to directories outside of the sparse-checkout + definition in lieu of containing each path under such directories. + Versions of Git that do not understand this extension do not + expect directory entries in the index. diff --git a/cache.h b/cache.h index 74b43aaa2bd1..8aede373aeb3 100644 --- a/cache.h +++ b/cache.h @@ -1059,6 +1059,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int sparse_index; char *work_tree; struct string_list unknown_extensions; struct string_list v1_only_extensions; diff --git a/repo-settings.c b/repo-settings.c index d63569e4041e..9677d50f9238 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r) * removed. */ r->settings.command_requires_full_index = 1; + + /* + * Initialize this as off. + */ + r->settings.sparse_index = 0; + if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) + r->settings.sparse_index = 1; } diff --git a/repository.h b/repository.h index e06a23015697..a45f7520fd9e 100644 --- a/repository.h +++ b/repository.h @@ -42,7 +42,8 @@ struct repo_settings { int core_multi_pack_index; - unsigned command_requires_full_index:1; + unsigned command_requires_full_index:1, + sparse_index:1; }; struct repository { diff --git a/setup.c b/setup.c index c04cd25a30df..cd8394564613 100644 --- a/setup.c +++ b/setup.c @@ -500,6 +500,9 @@ static enum extension_result handle_extension(const char *var, return error("invalid value for 'extensions.objectformat'"); data->hash_algo = format; return EXTENSION_OK; + } else if (!strcmp(ext, "sparseindex")) { + data->sparse_index = 1; + return EXTENSION_OK; } return EXTENSION_UNKNOWN; } diff --git a/sparse-index.c b/sparse-index.c index 7631f7bd00b7..3a6df66faeab 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -102,19 +102,47 @@ static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; } +static int enable_sparse_index(struct repository *repo) +{ + const char *config_path = repo_git_path(repo, "config.worktree"); + + if (upgrade_repository_format(1) < 0) { + warning(_("unable to upgrade repository format to enable sparse-index")); + return -1; + } + git_config_set_in_file_gently(config_path, + "extensions.sparseIndex", + "true"); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 1; + return 0; +} + int convert_to_sparse(struct index_state *istate) { if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; + if (!istate->repo) + istate->repo = the_repository; + + /* + * The GIT_TEST_SPARSE_INDEX environment variable triggers the + * extensions.sparseIndex config variable to be on. + */ + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { + int err = enable_sparse_index(istate->repo); + if (err < 0) + return err; + } + /* - * For now, only create a sparse index with the - * GIT_TEST_SPARSE_INDEX environment variable. We will relax - * this once we have a proper way to opt-in (and later still, - * opt-out). + * Only convert to sparse if extensions.sparseIndex is set. */ - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + prepare_repo_settings(istate->repo); + if (!istate->repo->settings.sparse_index) return 0; if (!istate->sparse_checkout_patterns) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 16/20] sparse-checkout: toggle sparse index from builtin 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (14 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/git-sparse-checkout.txt | 14 +++++++ builtin/sparse-checkout.c | 17 ++++++++- sparse-index.c | 37 +++++++++++++------ sparse-index.h | 3 ++ t/t1092-sparse-checkout-compatibility.sh | 47 +++++++++++++----------- 5 files changed, 84 insertions(+), 34 deletions(-) diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02ee3..2ff66c5a4e41 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not +understand the `sparseIndex` repository extension and may fail to interact +with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e00b82af727b..ca63e2c64e95 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -14,6 +14,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "quote.h" +#include "sparse-index.h" static const char *empty_base = ""; @@ -283,12 +284,13 @@ static int set_config(enum sparse_checkout_mode mode) } static char const * const builtin_sparse_checkout_init_usage[] = { - N_("git sparse-checkout init [--cone]"), + N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"), NULL }; static struct sparse_checkout_init_opts { int cone_mode; + int sparse_index; } init_opts; static int sparse_checkout_init(int argc, const char **argv) @@ -303,11 +305,15 @@ static int sparse_checkout_init(int argc, const char **argv) static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, N_("initialize the sparse-checkout in cone mode")), + OPT_BOOL(0, "sparse-index", &init_opts.sparse_index, + N_("toggle the use of a sparse index")), OPT_END(), }; repo_read_index(the_repository); + init_opts.sparse_index = -1; + argc = parse_options(argc, argv, NULL, builtin_sparse_checkout_init_options, builtin_sparse_checkout_init_usage, 0); @@ -326,6 +332,15 @@ static int sparse_checkout_init(int argc, const char **argv) sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL); + if (init_opts.sparse_index >= 0) { + if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0) + die(_("failed to modify sparse-index config")); + + /* force an index rewrite */ + repo_read_index(the_repository); + the_repository->index->updated_workdir = 1; + } + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); diff --git a/sparse-index.c b/sparse-index.c index 3a6df66faeab..30c1a11fd62d 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -104,23 +104,37 @@ static int convert_to_sparse_rec(struct index_state *istate, static int enable_sparse_index(struct repository *repo) { - const char *config_path = repo_git_path(repo, "config.worktree"); + int res; if (upgrade_repository_format(1) < 0) { warning(_("unable to upgrade repository format to enable sparse-index")); return -1; } - git_config_set_in_file_gently(config_path, - "extensions.sparseIndex", - "true"); + res = git_config_set_gently("extensions.sparseindex", "true"); prepare_repo_settings(repo); repo->settings.sparse_index = 1; - return 0; + return res; +} + +int set_sparse_index_config(struct repository *repo, int enable) +{ + int res; + + if (enable) + return enable_sparse_index(repo); + + /* Don't downgrade repository format, just remove the extension. */ + res = git_config_set_gently("extensions.sparseindex", NULL); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 0; + return res; } int convert_to_sparse(struct index_state *istate) { + int test_env; if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ -129,14 +143,13 @@ int convert_to_sparse(struct index_state *istate) istate->repo = the_repository; /* - * The GIT_TEST_SPARSE_INDEX environment variable triggers the - * extensions.sparseIndex config variable to be on. + * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex + * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), + * then purposefully disable the setting. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); - if (err < 0) - return err; - } + test_env = git_env_bool("GIT_TEST_SPARSE_INDEX", -1); + if (test_env >= 0) + set_sparse_index_config(istate->repo, test_env); /* * Only convert to sparse if extensions.sparseIndex is set. diff --git a/sparse-index.h b/sparse-index.h index 64380e121d80..39dcc859735e 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,4 +5,7 @@ struct index_state; void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); +struct repository; +int set_sparse_index_config(struct repository *repo, int enable); + #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 47f983217852..f14dc48924d2 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -6,6 +6,7 @@ test_description='compare full workdir to sparse workdir' # So, disable the check until that integration is complete. GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= . ./test-lib.sh @@ -100,25 +101,26 @@ init_repos () { # initialize sparse-checkout definitions git -C sparse-checkout sparse-checkout init --cone && git -C sparse-checkout sparse-checkout set deep && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && + test_cmp_config -C sparse-index true extensions.sparseindex && + git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) && ( cd sparse-index && - GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err + "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -148,7 +150,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + git -C sparse-index sparse-checkout set folder1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep folder2 x @@ -158,7 +160,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + git -C sparse-index sparse-checkout set deep/deeper1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x @@ -166,7 +168,14 @@ test_expect_success 'sparse-index contents' ' TREE=$(git -C sparse-index rev-parse HEAD:$dir) && grep "040000 tree $TREE $dir/" cache \ || return 1 - done + done && + + # Disabling the sparse-index removes tree entries with full ones + git -C sparse-index sparse-checkout init --no-sparse-index && + + test-tool -C sparse-index read-cache --table >cache && + ! grep "040000 tree" cache && + test_sparse_match test-tool read-cache --table ' test_expect_success 'expanded in-memory index matches full index' ' @@ -396,19 +405,15 @@ test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && - ( - GIT_TEST_SPARSE_INDEX=1 && - export GIT_TEST_SPARSE_INDEX && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && - test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && - - rm trace2.txt && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt - ) + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 17/20] sparse-checkout: disable sparse-index 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (15 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We use 'git sparse-checkout init --cone --sparse-index' to toggle the sparse-index feature. It makes sense to also disable it when running 'git sparse-checkout disable'. This is particularly important because it removes the extensions.sparseIndex config option, allowing other tools to use this Git repository again. This does mean that 'git sparse-checkout init' will not re-enable the sparse-index feature, even if it was previously enabled. While testing this feature, I noticed that the sparse-index was not being written on the first run, but by a second. This was caught by the call to 'test-tool read-cache --table'. This requires adjusting some assignments to core_apply_sparse_checkout and pl.use_cone_patterns in the sparse_checkout_init() logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 10 +++++++++- t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index ca63e2c64e95..585343fa1972 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode) "core.sparseCheckoutCone", mode == MODE_CONE_PATTERNS ? "true" : NULL); + if (mode == MODE_NO_PATTERNS) + set_sparse_index_config(the_repository, 0); + return 0; } @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv) the_repository->index->updated_workdir = 1; } + core_apply_sparse_checkout = 1; + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); - core_apply_sparse_checkout = 1; return update_working_directory(NULL); } @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv) add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); strbuf_addstr(&pattern, "!/*/"); add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); + pl.use_cone_patterns = init_opts.cone_mode; return write_patterns_and_update(&pl); } @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv) strbuf_addstr(&match_all, "/*"); add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); + prepare_repo_settings(the_repository); + the_repository->settings.sparse_index = 0; + if (update_working_directory(&pl)) die(_("error while refreshing working directory")); diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index fc64e9ed99f4..ff1ad570a255 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' ' check_files repo a deep folder1 folder2 ' +test_expect_success 'sparse-index enabled and disabled' ' + git -C repo sparse-checkout init --cone --sparse-index && + test_cmp_config -C repo true extensions.sparseIndex && + test-tool -C repo read-cache --table >cache && + grep " tree " cache && + + git -C repo sparse-checkout disable && + test-tool -C repo read-cache --table >cache && + ! grep " tree " cache && + git -C repo config --list >config && + ! grep extensions.sparseindex config +' + test_expect_success 'cone mode: init and set' ' git -C repo sparse-checkout init --cone && git -C repo config --list >config && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 18/20] cache-tree: integrate with sparse directory entries 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (16 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 18 ++++++++++++++++++ sparse-index.c | 10 +++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 5f07a39e501e..950a9615db8f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it, *skip_count = 0; + /* + * If the first entry of this region is a sparse directory + * entry corresponding exactly to 'base', then this cache_tree + * struct is a "leaf" in the data structure, pointing to the + * tree OID specified in the entry. + */ + if (entries > 0) { + const struct cache_entry *ce = cache[0]; + + if (S_ISSPARSEDIR(ce->ce_mode) && + ce->ce_namelen == baselen && + !strncmp(ce->name, base, baselen)) { + it->entry_count = 1; + oidcpy(&it->oid, &ce->oid); + return 1; + } + } + if (0 <= it->entry_count && has_object_file(&it->oid)) return it->entry_count; diff --git a/sparse-index.c b/sparse-index.c index 30c1a11fd62d..56313e805d9d 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -180,7 +180,11 @@ int convert_to_sparse(struct index_state *istate) istate->cache_nr = convert_to_sparse_rec(istate, 0, 0, istate->cache_nr, "", 0, istate->cache_tree); - istate->drop_cache_tree = 1; + + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + istate->sparse_index = 1; trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; @@ -281,5 +285,9 @@ void ensure_full_index(struct index_state *istate) strbuf_release(&base); free(full); + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 19/20] sparse-index: loose integration with cache_tree_verify() 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (17 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE is enabled, which it is by default in the test suite. The logic must be adjusted for the presence of these directory entries. For now, leave the test as a simple check for whether the directory entry is sparse. Do not go any further until needed. This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in t1092-sparse-checkout-compatibility.sh. Further, p2000-sparse-operations.sh uses the test suite and hence this is enabled for all tests. We need to integrate with it before we run our performance tests with a sparse-index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 19 +++++++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 3 --- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 950a9615db8f..11bf1fcae6e1 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -808,6 +808,19 @@ int cache_tree_matches_traversal(struct cache_tree *root, return 0; } +static void verify_one_sparse(struct repository *r, + struct index_state *istate, + struct cache_tree *it, + struct strbuf *path, + int pos) +{ + struct cache_entry *ce = istate->cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + BUG("directory '%s' is present in index, but not sparse", + path->buf); +} + static void verify_one(struct repository *r, struct index_state *istate, struct cache_tree *it, @@ -830,6 +843,12 @@ static void verify_one(struct repository *r, if (path->len) { pos = index_name_pos(istate, path->buf, path->len); + + if (pos >= 0) { + verify_one_sparse(r, istate, it, path, pos); + return; + } + pos = -pos - 1; } else { pos = 0; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index f14dc48924d2..d97bf9b64527 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,9 +2,6 @@ test_description='compare full workdir to sparse workdir' -# The verify_cache_tree() check is not sparse-aware (yet). -# So, disable the check until that integration is complete. -GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 GIT_TEST_SPARSE_INDEX= -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v4 20/20] p2000: add sparse-index repos 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (18 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 ` Derrick Stolee via GitGitGadget 2021-03-23 16:16 ` [PATCH v4 00/20] Sparse Index: Design, Format, Tests Elijah Newren 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget 21 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-23 13:44 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> p2000-sparse-operations.sh compares different Git commands in repositories with many files at HEAD but using sparse-checkout to focus on a small portion of those files. Add extra copies of the repository that use the sparse-index format so we can track how that affects the performance of different commands. At this point in time, the sparse-index is 100% overhead from the CPU front, and this is measurable in these tests: Test --------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.59(0.51+0.12) 2000.3: git status (full-index-v4) 0.59(0.52+0.11) 2000.4: git status (sparse-index-v3) 1.40(1.32+0.12) 2000.5: git status (sparse-index-v4) 1.41(1.36+0.08) 2000.6: git add -A (full-index-v3) 2.32(1.97+0.19) 2000.7: git add -A (full-index-v4) 2.17(1.92+0.14) 2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15) 2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13) 2000.10: git add . (full-index-v3) 2.39(2.02+0.20) 2000.11: git add . (full-index-v4) 2.20(1.94+0.16) 2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12) 2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16) 2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20) 2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17) 2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16) 2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15) Note that there is very little difference between the v3 and v4 index formats when the sparse-index is enabled. This is primarily due to the fact that the relative file sizes are the same, and the command time is mostly taken up by parsing tree objects to expand the sparse index into a full one. With the current file layout, the index file sizes are given by this table: | full index | sparse index | +-------------+--------------+ v3 | 108 MiB | 1.6 MiB | v4 | 80 MiB | 1.2 MiB | Future updates will improve the performance of Git commands when the index is sparse. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index dddd527b6330..94513c977489 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -59,12 +59,29 @@ test_expect_success 'setup repo and indexes' ' git sparse-checkout set $SPARSE_CONE && git config index.version 4 && git update-index --index-version=4 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v3 && + ( + cd sparse-index-v3 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v4 && + ( + cd sparse-index-v4 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 ) ' test_perf_on_all () { command="$@" - for repo in full-index-v3 full-index-v4 + for repo in full-index-v3 full-index-v4 \ + sparse-index-v3 sparse-index-v4 do test_perf "$command ($repo)" " ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v4 00/20] Sparse Index: Design, Format, Tests 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (19 preceding siblings ...) 2021-03-23 13:44 ` [PATCH v4 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget @ 2021-03-23 16:16 ` Elijah Newren 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget 21 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-03-23 16:16 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On Tue, Mar 23, 2021 at 6:44 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. > ... > > Updates in V4 > ============= > > * Rebased onto the latest copy of ab/read-tree. > * Updated the design document as per Junio's comments. > * Updated the submodule handling in the performance test. > * Followed up on some other review from Ævar, mostly style or commit > message things. Range-diff looks good to me; my Reviewed-by still holds. ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v5 00/21] Sparse Index: Design, Format, Tests 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget ` (20 preceding siblings ...) 2021-03-23 16:16 ` [PATCH v4 00/20] Sparse Index: Design, Format, Tests Elijah Newren @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 01/21] sparse-index: design doc and format update Derrick Stolee via GitGitGadget ` (22 more replies) 21 siblings, 23 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee Here is the first full patch series submission coming out of the sparse-index RFC [1]. [1] https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ I won't waste too much space here, because PATCH 1 includes a sizeable design document that describes the feature, the reasoning behind it, and my plan for getting this implemented widely throughout the codebase. There are some new things here that were not in the RFC: * Design doc and format updates. (Patch 1) * Performance test script. (Patches 2 and 20) Notably missing in this series from the RFC: * The mega-patch inserting ensure_full_index() throughout the codebase. That will be a follow-up series to this one. * The integrations with git status and git add to demonstrate the improved performance. Those will also appear in their own series later. I plan to keep my latest work in this area in my 'sparse-index/wip' branch [2]. It includes all of the work from the RFC right now, updated with the work from this series. [2] https://github.com/derrickstolee/git/tree/sparse-index/wip Updates in V5 ============= This version is updated to use an index extension instead of a repository format extension. Thanks, Szeder! This one change affects the range-diff quite a bit, so please review those changes carefully. In particular: git sparse-checkout init --cone --sparse-index now sets a new index.sparse config option as an indicator that we should attempt writing the index in sparse form. Updates in V4 ============= * Rebased onto the latest copy of ab/read-tree. * Updated the design document as per Junio's comments. * Updated the submodule handling in the performance test. * Followed up on some other review from Ævar, mostly style or commit message things. Updates in V3 ============= For this version, I took Ævar's latest patches and applied them to v2.31.0 and rebased this series on top. It uses his new "read_tree_at()" helper and the associated changes to the function pointer type. * Fixed more typos. Thanks Martin and Elijah! * Updated the test_sparse_match() macro to use "$@" instead of $* * Added a test that git sparse-checkout init --no-sparse-index rewrites the index to be full. Updates in V2 ============= * Various typos and awkward grammar is fixed. * Cleaned up unnecessary commands in p2000-sparse-operations.sh * Added a comment to the sparse_index member of struct index_state. * Used tree_type, commit_type, and blob_type in test-read-cache.c. Thanks, -Stolee Derrick Stolee (21): sparse-index: design doc and format update t/perf: add performance test for sparse operations t1092: clean up script quoting sparse-index: add guard to ensure full index sparse-index: implement ensure_full_index() t1092: compare sparse-checkout to sparse-index test-read-cache: print cache entries with --table test-tool: don't force full index unpack-trees: ensure full index sparse-checkout: hold pattern list in index sparse-index: add 'sdir' index extension sparse-index: convert from full to sparse submodule: sparse-index should not collapse links unpack-trees: allow sparse directories sparse-index: check index conversion happens sparse-index: add index.sparse config option sparse-checkout: toggle sparse index from builtin sparse-checkout: disable sparse-index cache-tree: integrate with sparse directory entries sparse-index: loose integration with cache_tree_verify() p2000: add sparse-index repos Documentation/config/index.txt | 5 + Documentation/git-sparse-checkout.txt | 14 ++ Documentation/technical/index-format.txt | 19 ++ Documentation/technical/sparse-index.txt | 175 ++++++++++++++ Makefile | 1 + builtin/sparse-checkout.c | 44 +++- cache-tree.c | 40 ++++ cache.h | 18 +- read-cache.c | 44 +++- repo-settings.c | 15 ++ repository.c | 11 +- repository.h | 3 + sparse-index.c | 285 +++++++++++++++++++++++ sparse-index.h | 11 + t/README | 3 + t/helper/test-read-cache.c | 66 +++++- t/perf/p2000-sparse-operations.sh | 101 ++++++++ t/t1091-sparse-checkout-builtin.sh | 13 ++ t/t1092-sparse-checkout-compatibility.sh | 143 ++++++++++-- unpack-trees.c | 17 +- 20 files changed, 988 insertions(+), 40 deletions(-) create mode 100644 Documentation/technical/sparse-index.txt create mode 100644 sparse-index.c create mode 100644 sparse-index.h create mode 100755 t/perf/p2000-sparse-operations.sh base-commit: 47957485b3b731a7860e0554d2bd12c0dce1c75a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/883 Range-diff vs v4: 1: 6426a5c60e53 ! 1: 7b600d536c6e sparse-index: design doc and format update @@ Documentation/technical/sparse-index.txt (new) +The only noticeable change in behavior will be that the serialized index +file contains sparse-directory entries. + -+To start, we use a new repository extension, `extensions.sparseIndex`, to -+allow inserting sparse-directory entries into indexes with file format ++To start, we use a new required index extension, `sdir`, to allow ++inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand -+the sparse-index from operating on one, but it also prevents other -+operations that do not use the index at all. A new format, index v5, will -+be introduced that includes sparse-directory entries by default. It might -+also introduce other features that have been considered for improving the ++the sparse-index from operating on one, while allowing tools that do not ++understand the sparse-index to operate on repositories as long as they do ++not interact with the index. A new format, index v5, will be introduced ++that includes sparse-directory entries by default. It might also ++introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a 2: 7eabc1d0586c = 2: 202253ec82f3 t/perf: add performance test for sparse operations 3: c9e21d78ecba = 3: 437a0f144e57 t1092: clean up script quoting 4: 03cdde756563 = 4: b7e1bf5c55a7 sparse-index: add guard to ensure full index 5: 6b3b6d86385d = 5: e41d55d2cca9 sparse-index: implement ensure_full_index() 6: 7f67adba0498 = 6: 7bfbfbd17321 t1092: compare sparse-checkout to sparse-index 7: 7ebd9570b1ad = 7: a1b8135c0fc8 test-read-cache: print cache entries with --table 8: db7bbd06dbcc = 8: dd84a2a9121b test-tool: don't force full index 9: 3ddd5e794b5e = 9: b276d2ed5323 unpack-trees: ensure full index 10: 7308c87697f1 = 10: c3651e26dc3a sparse-checkout: hold pattern list in index -: ------------ > 11: f926cf8b2e01 sparse-index: add 'sdir' index extension 11: 7c10d653ca6b = 12: c870ae5e8749 sparse-index: convert from full to sparse 12: 6db36f33e960 = 13: bcf0da959ef3 submodule: sparse-index should not collapse links 13: d24bd3348d98 = 14: 7191b48237de unpack-trees: allow sparse directories 14: 08d9f5f3c0d1 = 15: 57be9b4a728b sparse-index: check index conversion happens 15: 6f38cef196b0 ! 16: c22b4111e49e sparse-index: create extension for compatibility @@ Metadata Author: Derrick Stolee <dstolee@microsoft.com> ## Commit message ## - sparse-index: create extension for compatibility + sparse-index: add index.sparse config option - Previously, we enabled the sparse index format only using - GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to - actually select this mode. Further, sparse directory entries are not - understood by the index formats as advertised. - - We _could_ add a new index version that explicitly adds these - capabilities, but there are nuances to index formats 2, 3, and 4 that - are still valuable to select as options. Until we add index format - version 5, create a repo extension, "extensions.sparseIndex", that - specifies that the tool reading this repository must understand sparse - directory entries. - - This change only encodes the extension and enables it when - GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI - mechanism. + When enabled, this config option signals that index writes should + attempt to use sparse-directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> - ## Documentation/config/extensions.txt ## -@@ Documentation/config/extensions.txt: extensions.objectFormat:: - Note that this setting should only be set by linkgit:git-init[1] or - linkgit:git-clone[1]. Trying to change it after initialization will not - work and will produce hard-to-diagnose issues. + ## Documentation/config/index.txt ## +@@ Documentation/config/index.txt: index.recordOffsetTable:: + Defaults to 'true' if index.threads has been explicitly enabled, + 'false' otherwise. + ++index.sparse:: ++ When enabled, write the index using sparse-directory entries. This ++ has no effect unless `core.sparseCheckout` and ++ `core.sparseCheckoutCone` are both enabled. Defaults to 'false'. + -+extensions.sparseIndex:: -+ When combined with `core.sparseCheckout=true` and -+ `core.sparseCheckoutCone=true`, the index may contain entries -+ corresponding to directories outside of the sparse-checkout -+ definition in lieu of containing each path under such directories. -+ Versions of Git that do not understand this extension do not -+ expect directory entries in the index. + index.threads:: + Specifies the number of threads to spawn when loading the index. + This is meant to reduce index load time on multiprocessor machines. ## cache.h ## @@ cache.h: struct repository_format { @@ repo-settings.c: void prepare_repo_settings(struct repository *r) + * Initialize this as off. + */ + r->settings.sparse_index = 0; -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value) + r->settings.sparse_index = 1; } @@ repository.h: struct repo_settings { struct repository { - ## setup.c ## -@@ setup.c: static enum extension_result handle_extension(const char *var, - return error("invalid value for 'extensions.objectformat'"); - data->hash_algo = format; - return EXTENSION_OK; -+ } else if (!strcmp(ext, "sparseindex")) { -+ data->sparse_index = 1; -+ return EXTENSION_OK; - } - return EXTENSION_UNKNOWN; - } - ## sparse-index.c ## @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, +{ + const char *config_path = repo_git_path(repo, "config.worktree"); + -+ if (upgrade_repository_format(1) < 0) { -+ warning(_("unable to upgrade repository format to enable sparse-index")); -+ return -1; -+ } + git_config_set_in_file_gently(config_path, -+ "extensions.sparseIndex", ++ "index.sparse", + "true"); + + prepare_repo_settings(repo); @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, + + /* + * The GIT_TEST_SPARSE_INDEX environment variable triggers the -+ * extensions.sparseIndex config variable to be on. ++ * index.sparse config variable to be on. + */ + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { + int err = enable_sparse_index(istate->repo); @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, - * GIT_TEST_SPARSE_INDEX environment variable. We will relax - * this once we have a proper way to opt-in (and later still, - * opt-out). -+ * Only convert to sparse if extensions.sparseIndex is set. ++ * Only convert to sparse if index.sparse is set. */ - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + prepare_repo_settings(istate->repo); 16: 923081e7e079 ! 17: 75fe9b0f57da sparse-checkout: toggle sparse index from builtin @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not -+understand the `sparseIndex` repository extension and may fail to interact -+with your repository until it is disabled. ++understand the sparse directory entries index extension and may fail to ++interact with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as @@ builtin/sparse-checkout.c: static int sparse_checkout_init(int argc, const char ## sparse-index.c ## @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, + return num_converted - start_converted; + } - static int enable_sparse_index(struct repository *repo) +-static int enable_sparse_index(struct repository *repo) ++static int set_index_sparse_config(struct repository *repo, int enable) { - const char *config_path = repo_git_path(repo, "config.worktree"); -+ int res; - - if (upgrade_repository_format(1) < 0) { - warning(_("unable to upgrade repository format to enable sparse-index")); - return -1; - } +- - git_config_set_in_file_gently(config_path, -- "extensions.sparseIndex", +- "index.sparse", - "true"); -+ res = git_config_set_gently("extensions.sparseindex", "true"); ++ int res; ++ char *config_path = repo_git_path(repo, "config.worktree"); ++ res = git_config_set_in_file_gently(config_path, ++ "index.sparse", ++ enable ? "true" : NULL); ++ free(config_path); prepare_repo_settings(repo); repo->settings.sparse_index = 1; @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, + +int set_sparse_index_config(struct repository *repo, int enable) +{ -+ int res; -+ -+ if (enable) -+ return enable_sparse_index(repo); -+ -+ /* Don't downgrade repository format, just remove the extension. */ -+ res = git_config_set_gently("extensions.sparseindex", NULL); ++ int res = set_index_sparse_config(repo, enable); + + prepare_repo_settings(repo); -+ repo->settings.sparse_index = 0; ++ repo->settings.sparse_index = enable; + return res; } @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) - istate->repo = the_repository; - - /* -- * The GIT_TEST_SPARSE_INDEX environment variable triggers the -- * extensions.sparseIndex config variable to be on. -+ * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex -+ * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), -+ * then purposefully disable the setting. + * The GIT_TEST_SPARSE_INDEX environment variable triggers the + * index.sparse config variable to be on. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) + set_sparse_index_config(istate->repo, test_env); /* - * Only convert to sparse if extensions.sparseIndex is set. + * Only convert to sparse if index.sparse is set. ## sparse-index.h ## @@ sparse-index.h: struct index_state; @@ t/t1092-sparse-checkout-compatibility.sh: init_repos () { - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && -+ test_cmp_config -C sparse-index true extensions.sparseindex && ++ test_cmp_config -C sparse-index true index.sparse && + git -C sparse-index sparse-checkout set deep } 17: 6f1ad72c390d ! 18: 7f55a232e647 sparse-checkout: disable sparse-index @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable +test_expect_success 'sparse-index enabled and disabled' ' + git -C repo sparse-checkout init --cone --sparse-index && -+ test_cmp_config -C repo true extensions.sparseIndex && ++ test_cmp_config -C repo true index.sparse && + test-tool -C repo read-cache --table >cache && + grep " tree " cache && + @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable + test-tool -C repo read-cache --table >cache && + ! grep " tree " cache && + git -C repo config --list >config && -+ ! grep extensions.sparseindex config ++ ! grep index.sparse config +' + test_expect_success 'cone mode: init and set' ' 18: bd94e6b7d089 = 19: 365901809d9d cache-tree: integrate with sparse directory entries 19: e7190376b806 = 20: 9b068c458898 sparse-index: loose integration with cache_tree_verify() 20: bcf0a58eb38c = 21: 66602733cc95 p2000: add sparse-index repos -- gitgitgadget ^ permalink raw reply [flat|nested] 203+ messages in thread
* [PATCH v5 01/21] sparse-index: design doc and format update 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 02/21] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget ` (21 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This begins a long effort to update the index format to allow sparse directory entries. This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in their sparse-checkout definition. Currently, the index format is only updated in the presence of extensions.sparseIndex instead of increasing a file format version number. This is temporary, and index v5 is part of the plan for future work in this area. The design document details many of the reasons for embarking on this work, and also the plan for completing it safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/technical/index-format.txt | 7 + Documentation/technical/sparse-index.txt | 175 +++++++++++++++++++++++ 2 files changed, 182 insertions(+) create mode 100644 Documentation/technical/sparse-index.txt diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index d363a71c37ec..3b74c05647db 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -44,6 +44,13 @@ Git index format localization, no special casing of directory separator '/'). Entries with the same name are sorted by their stage field. + An index entry typically represents a file. However, if sparse-checkout + is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the + `extensions.sparseIndex` extension is enabled, then the index may + contain entries for directories outside of the sparse-checkout definition. + These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and + the path ends in a directory separator. + 32-bit ctime seconds, the last time a file's metadata changed this is stat(2) data diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt new file mode 100644 index 000000000000..8d3d80804604 --- /dev/null +++ b/Documentation/technical/sparse-index.txt @@ -0,0 +1,175 @@ +Git Sparse-Index Design Document +================================ + +The sparse-checkout feature allows users to focus a working directory on +a subset of the files at HEAD. The cone mode patterns, enabled by +`core.sparseCheckoutCone`, allow for very fast pattern matching to +discover which files at HEAD belong in the sparse-checkout cone. + +Three important scale dimensions for a Git working directory are: + +* `HEAD`: How many files are present at `HEAD`? + +* Populated: How many files are within the sparse-checkout cone. + +* Modified: How many files has the user modified in the working directory? + +We will use big-O notation -- O(X) -- to denote how expensive certain +operations are in terms of these dimensions. + +These dimensions are ordered by their magnitude: users (typically) modify +fewer files than are populated, and we can only populate files at `HEAD`. + +Problems occur if there is an extreme imbalance in these dimensions. For +example, if `HEAD` contains millions of paths but the populated set has +only tens of thousands, then commands like `git status` and `git add` can +be dominated by operations that require O(`HEAD`) operations instead of +O(Populated). Primarily, the cost is in parsing and rewriting the index, +which is filled primarily with files at `HEAD` that are marked with the +`SKIP_WORKTREE` bit. + +The sparse-index intends to take these commands that read and modify the +index from O(`HEAD`) to O(Populated). To do this, we need to modify the +index format in a significant way: add "sparse directory" entries. + +With cone mode patterns, it is possible to detect when an entire +directory will have its contents outside of the sparse-checkout definition. +Instead of listing all of the files it contains as individual entries, a +sparse-index contains an entry with the directory name, referencing the +object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit. +If we need to discover the details for paths within that directory, we +can parse trees to find that list. + +At time of writing, sparse-directory entries violate expectations about the +index format and its in-memory data structure. There are many consumers in +the codebase that expect to iterate through all of the index entries and +see only files. In fact, these loops expect to see a reference to every +staged file. One way to handle this is to parse trees to replace a +sparse-directory entry with all of the files within that tree as the index +is loaded. However, parsing trees is slower than parsing the index format, +so that is a slower operation than if we left the index alone. The plan is +to make all of these integrations "sparse aware" so this expansion through +tree parsing is unnecessary and they use fewer resources than when using a +full index. + +The implementation plan below follows four phases to slowly integrate with +the sparse-index. The intention is to incrementally update Git commands to +interact safely with the sparse-index without significant slowdowns. This +may not always be possible, but the hope is that the primary commands that +users need in their daily work are dramatically improved. + +Phase I: Format and initial speedups +------------------------------------ + +During this phase, Git learns to enable the sparse-index and safely parse +one. Protections are put in place so that every consumer of the in-memory +data structure can operate with its current assumption of every file at +`HEAD`. + +At first, every index parse will call a helper method, +`ensure_full_index()`, which scans the index for sparse-directory entries +(pointing to trees) and replaces them with the full list of paths (with +blob contents) by parsing tree objects. This will be slower in all cases. +The only noticeable change in behavior will be that the serialized index +file contains sparse-directory entries. + +To start, we use a new required index extension, `sdir`, to allow +inserting sparse-directory entries into indexes with file format +versions 2, 3, and 4. This prevents Git versions that do not understand +the sparse-index from operating on one, while allowing tools that do not +understand the sparse-index to operate on repositories as long as they do +not interact with the index. A new format, index v5, will be introduced +that includes sparse-directory entries by default. It might also +introduce other features that have been considered for improving the +index, as well. + +Next, consumers of the index will be guarded against operating on a +sparse-index by inserting calls to `ensure_full_index()` or +`expand_index_to_path()`. After these guards are in place, we can begin +leaving sparse-directory entries in the in-memory index structure. + +Even after inserting these guards, we will keep expanding sparse-indexes +for most Git commands using the `command_requires_full_index` repository +setting. This setting will be on by default and disabled one builtin at a +time until we have sufficient confidence that all of the index operations +are properly guarded. + +To complete this phase, the commands `git status` and `git add` will be +integrated with the sparse-index so that they operate with O(Populated) +performance. They will be carefully tested for operations within and +outside the sparse-checkout definition. + +Phase II: Careful integrations +------------------------------ + +This phase focuses on ensuring that all index extensions and APIs work +well with a sparse-index. This requires significant increases to our test +coverage, especially for operations that interact with the working +directory outside of the sparse-checkout definition. Some of these +behaviors may not be the desirable ones, such as some tests already +marked for failure in `t1092-sparse-checkout-compatibility.sh`. + +The index extensions that may require special integrations are: + +* FS Monitor +* Untracked cache + +While integrating with these features, we should look for patterns that +might lead to better APIs for interacting with the index. Coalescing +common usage patterns into an API call can reduce the number of places +where sparse-directories need to be handled carefully. + +Phase III: Important command speedups +------------------------------------- + +At this point, the patterns for testing and implementing sparse-directory +logic should be relatively stable. This phase focuses on updating some of +the most common builtins that use the index to operate as O(Populated). +Here is a potential list of commands that could be valuable to integrate +at this point: + +* `git commit` +* `git checkout` +* `git merge` +* `git rebase` + +Hopefully, commands such as `git merge` and `git rebase` can benefit +instead from merge algorithms that do not use the index as a data +structure, such as the merge-ORT strategy. As these topics mature, we +may enable the ORT strategy by default for repositories using the +sparse-index feature. + +Along with `git status` and `git add`, these commands cover the majority +of users' interactions with the working directory. In addition, we can +integrate with these commands: + +* `git grep` +* `git rm` + +These have been proposed as some whose behavior could change when in a +repo with a sparse-checkout definition. It would be good to include this +behavior automatically when using a sparse-index. Some clarity is needed +to make the behavior switch clear to the user. + +This phase is the first where parallel work might be possible without too +much conflicts between topics. + +Phase IV: The long tail +----------------------- + +This last phase is less a "phase" and more "the new normal" after all of +the previous work. + +To start, the `command_requires_full_index` option could be removed in +favor of expanding only when hitting an API guard. + +There are many Git commands that could use special attention to operate as +O(Populated), while some might be so rare that it is acceptable to leave +them with additional overhead when a sparse-index is present. + +Here are some commands that might be useful to update: + +* `git sparse-checkout set` +* `git am` +* `git clean` +* `git stash` -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 02/21] t/perf: add performance test for sparse operations 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 01/21] sparse-index: design doc and format update Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 03/21] t1092: clean up script quoting Derrick Stolee via GitGitGadget ` (20 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Create a test script that takes the default performance test (the Git codebase) and multiplies it by 256 using four layers of duplicated trees of width four. This results in nearly one million blob entries in the index. Then, we can clone this repository with sparse-checkout patterns that demonstrate four copies of the initial repository. Each clone will use a different index format or mode so peformance can be tested across the different options. Note that the initial repo is stripped of submodules before doing the copies. This preserves the expected data shape of the sparse index, because directories containing submodules are not collapsed to a sparse directory entry. Run a few Git commands on these clones, especially those that use the index (status, add, commit). Here are the results on my Linux machine: Test -------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.37(0.30+0.09) 2000.3: git status (full-index-v4) 0.39(0.32+0.10) 2000.4: git add -A (full-index-v3) 1.42(1.06+0.20) 2000.5: git add -A (full-index-v4) 1.26(0.98+0.16) 2000.6: git add . (full-index-v3) 1.40(1.04+0.18) 2000.7: git add . (full-index-v4) 1.26(0.98+0.17) 2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16) 2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16) It is perhaps noteworthy that there is an improvement when using index version 4. This is because the v3 index uses 108 MiB while the v4 index uses 80 MiB. Since the repeated portions of the directories are very short (f3/f1/f2, for example) this ratio is less pronounced than in similarly-sized real repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 84 +++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100755 t/perf/p2000-sparse-operations.sh diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh new file mode 100755 index 000000000000..dddd527b6330 --- /dev/null +++ b/t/perf/p2000-sparse-operations.sh @@ -0,0 +1,84 @@ +#!/bin/sh + +test_description="test performance of Git operations using the index" + +. ./perf-lib.sh + +test_perf_default_repo + +SPARSE_CONE=f2/f4/f1 + +test_expect_success 'setup repo and indexes' ' + git reset --hard HEAD && + + # Remove submodules from the example repo, because our + # duplication of the entire repo creates an unlikely data shape. + if git config --file .gitmodules --get-regexp "submodule.*.path" >modules + then + git rm $(awk "{print \$2}" modules) && + git commit -m "remove submodules" || return 1 + fi && + + echo bogus >a && + cp a b && + git add a b && + git commit -m "level 0" && + BLOB=$(git rev-parse HEAD:a) && + OLD_COMMIT=$(git rev-parse HEAD) && + OLD_TREE=$(git rev-parse HEAD^{tree}) && + + for i in $(test_seq 1 4) + do + cat >in <<-EOF && + 100755 blob $BLOB a + 040000 tree $OLD_TREE f1 + 040000 tree $OLD_TREE f2 + 040000 tree $OLD_TREE f3 + 040000 tree $OLD_TREE f4 + EOF + NEW_TREE=$(git mktree <in) && + NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") && + OLD_TREE=$NEW_TREE && + OLD_COMMIT=$NEW_COMMIT || return 1 + done && + + git sparse-checkout init --cone && + git branch -f wide $OLD_COMMIT && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 && + ( + cd full-index-v3 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 && + ( + cd full-index-v4 && + git sparse-checkout init --cone && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 + ) +' + +test_perf_on_all () { + command="$@" + for repo in full-index-v3 full-index-v4 + do + test_perf "$command ($repo)" " + ( + cd $repo && + echo >>$SPARSE_CONE/a && + $command + ) + " + done +} + +test_perf_on_all git status +test_perf_on_all git add -A +test_perf_on_all git add . +test_perf_on_all git commit -a -m A + +test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 03/21] t1092: clean up script quoting 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 01/21] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 02/21] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 04/21] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget ` (19 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This test was introduced in 19a0acc83e4 (t1092: test interesting sparse-checkout scenarios, 2021-01-23), but it contains issues with quoting that were not noticed until starting this follow-up series. The old mechanism would drop quoting such as in test_all_match git commit -m "touch README.md" The above happened to work because README.md is a file in the repository, so 'git commit -m touch REAMDE.md' would succeed by accident. Other cases included quoting for no good reason, so clean that up now. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 8cd3e5a8d227..3725d3997e70 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -96,20 +96,20 @@ init_repos () { run_on_sparse () { ( cd sparse-checkout && - $* >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) } run_on_all () { ( cd full-checkout && - $* >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && - run_on_sparse $* + run_on_sparse "$@" } test_all_match () { - run_on_all $* && + run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && test_cmp full-checkout-err sparse-checkout-err } @@ -119,7 +119,7 @@ test_expect_success 'status with options' ' test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && - run_on_all "touch README.md" && + run_on_all touch README.md && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -135,7 +135,7 @@ test_expect_success 'add, commit, checkout' ' write_script edit-contents <<-\EOF && echo text >>$1 EOF - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add README.md && test_all_match git status --porcelain=v2 && @@ -144,7 +144,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents README.md" && + run_on_all ../edit-contents README.md && test_all_match git add -A && test_all_match git status --porcelain=v2 && @@ -153,7 +153,7 @@ test_expect_success 'add, commit, checkout' ' test_all_match git checkout HEAD~1 && test_all_match git checkout - && - run_on_all "../edit-contents deep/newfile" && + run_on_all ../edit-contents deep/newfile && test_all_match git status --porcelain=v2 -uno && test_all_match git status --porcelain=v2 && @@ -186,7 +186,7 @@ test_expect_success 'diff --staged' ' write_script edit-contents <<-\EOF && echo text >>README.md EOF - run_on_all "../edit-contents" && + run_on_all ../edit-contents && test_all_match git diff && test_all_match git diff --staged && @@ -280,7 +280,7 @@ test_expect_success 'clean' ' echo bogus >>.gitignore && run_on_all cp ../.gitignore . && test_all_match git add .gitignore && - test_all_match git commit -m ignore-bogus-files && + test_all_match git commit -m "ignore bogus files" && run_on_sparse mkdir folder1 && run_on_all touch folder1/bogus && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 04/21] sparse-index: add guard to ensure full index 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (2 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 03/21] t1092: clean up script quoting Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 05/21] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget ` (18 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Makefile | 1 + repo-settings.c | 8 ++++++++ repository.c | 11 ++++++++++- repository.h | 2 ++ sparse-index.c | 8 ++++++++ sparse-index.h | 7 +++++++ 6 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 sparse-index.c create mode 100644 sparse-index.h diff --git a/Makefile b/Makefile index dfb0f1000fa3..89b1d5374107 100644 --- a/Makefile +++ b/Makefile @@ -985,6 +985,7 @@ LIB_OBJS += setup.o LIB_OBJS += shallow.o LIB_OBJS += sideband.o LIB_OBJS += sigchain.o +LIB_OBJS += sparse-index.o LIB_OBJS += split-index.o LIB_OBJS += stable-qsort.o LIB_OBJS += strbuf.o diff --git a/repo-settings.c b/repo-settings.c index f7fff0f5ab83..d63569e4041e 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -77,4 +77,12 @@ void prepare_repo_settings(struct repository *r) UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_KEEP); UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_DEFAULT); + + /* + * This setting guards all index reads to require a full index + * over a sparse index. After suitable guards are placed in the + * codebase around uses of the index, this setting will be + * removed. + */ + r->settings.command_requires_full_index = 1; } diff --git a/repository.c b/repository.c index c98298acd017..a8acae002f71 100644 --- a/repository.c +++ b/repository.c @@ -10,6 +10,7 @@ #include "object.h" #include "lockfile.h" #include "submodule-config.h" +#include "sparse-index.h" /* The main repository */ static struct repository the_repo; @@ -261,6 +262,8 @@ void repo_clear(struct repository *repo) int repo_read_index(struct repository *repo) { + int res; + if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); @@ -270,7 +273,13 @@ int repo_read_index(struct repository *repo) else if (repo->index->repo != repo) BUG("repo's index should point back at itself"); - return read_index_from(repo->index, repo->index_file, repo->gitdir); + res = read_index_from(repo->index, repo->index_file, repo->gitdir); + + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) + ensure_full_index(repo->index); + + return res; } int repo_hold_locked_index(struct repository *repo, diff --git a/repository.h b/repository.h index b385ca3c94b6..e06a23015697 100644 --- a/repository.h +++ b/repository.h @@ -41,6 +41,8 @@ struct repo_settings { enum fetch_negotiation_setting fetch_negotiation_algorithm; int core_multi_pack_index; + + unsigned command_requires_full_index:1; }; struct repository { diff --git a/sparse-index.c b/sparse-index.c new file mode 100644 index 000000000000..82183ead563b --- /dev/null +++ b/sparse-index.c @@ -0,0 +1,8 @@ +#include "cache.h" +#include "repository.h" +#include "sparse-index.h" + +void ensure_full_index(struct index_state *istate) +{ + /* intentionally left blank */ +} diff --git a/sparse-index.h b/sparse-index.h new file mode 100644 index 000000000000..09a20d036c46 --- /dev/null +++ b/sparse-index.h @@ -0,0 +1,7 @@ +#ifndef SPARSE_INDEX_H__ +#define SPARSE_INDEX_H__ + +struct index_state; +void ensure_full_index(struct index_state *istate); + +#endif -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 05/21] sparse-index: implement ensure_full_index() 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (3 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 04/21] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 06/21] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget ` (17 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache.h | 13 ++++++- read-cache.c | 9 +++++ sparse-index.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 118 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index bb317abc91fb..136dd496c95d 100644 --- a/cache.h +++ b/cache.h @@ -204,6 +204,8 @@ struct cache_entry { #error "CE_EXTENDED_FLAGS out of range" #endif +#define S_ISSPARSEDIR(m) ((m) == S_IFDIR) + /* Forward structure decls */ struct pathspec; struct child_process; @@ -319,7 +321,14 @@ struct index_state { drop_cache_tree : 1, updated_workdir : 1, updated_skipworktree : 1, - fsmonitor_has_run_once : 1; + fsmonitor_has_run_once : 1, + + /* + * sparse_index == 1 when sparse-directory + * entries exist. Requires sparse-checkout + * in cone mode. + */ + sparse_index : 1; struct hashmap name_hash; struct hashmap dir_hash; struct object_id oid; @@ -722,6 +731,8 @@ int read_index_from(struct index_state *, const char *path, const char *gitdir); int is_index_unborn(struct index_state *); +void ensure_full_index(struct index_state *istate); + /* For use with `write_locked_index()`. */ #define COMMIT_LOCK (1 << 0) #define SKIP_IF_UNCHANGED (1 << 1) diff --git a/read-cache.c b/read-cache.c index 1e9a50c6c734..dd3980c12b53 100644 --- a/read-cache.c +++ b/read-cache.c @@ -101,6 +101,9 @@ static const char *alternate_index_output; static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { + if (S_ISSPARSEDIR(ce->ce_mode)) + istate->sparse_index = 1; + istate->cache[nr] = ce; add_name_hash(istate, ce); } @@ -2273,6 +2276,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr); + if (!istate->repo) + istate->repo = the_repository; + prepare_repo_settings(istate->repo); + if (istate->repo->settings.command_requires_full_index) + ensure_full_index(istate); + return istate->cache_nr; unmap: diff --git a/sparse-index.c b/sparse-index.c index 82183ead563b..7095378a1b28 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -1,8 +1,104 @@ #include "cache.h" #include "repository.h" #include "sparse-index.h" +#include "tree.h" +#include "pathspec.h" +#include "trace2.h" + +static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) +{ + ALLOC_GROW(istate->cache, nr + 1, istate->cache_alloc); + + istate->cache[nr] = ce; + add_name_hash(istate, ce); +} + +static int add_path_to_index(const struct object_id *oid, + struct strbuf *base, const char *path, + unsigned int mode, void *context) +{ + struct index_state *istate = (struct index_state *)context; + struct cache_entry *ce; + size_t len = base->len; + + if (S_ISDIR(mode)) + return READ_TREE_RECURSIVE; + + strbuf_addstr(base, path); + + ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0); + ce->ce_flags |= CE_SKIP_WORKTREE; + set_index_entry(istate, istate->cache_nr++, ce); + + strbuf_setlen(base, len); + return 0; +} void ensure_full_index(struct index_state *istate) { - /* intentionally left blank */ + int i; + struct index_state *full; + struct strbuf base = STRBUF_INIT; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + trace2_region_enter("index", "ensure_full_index", istate->repo); + + /* initialize basics of new index */ + full = xcalloc(1, sizeof(struct index_state)); + memcpy(full, istate, sizeof(struct index_state)); + + /* then change the necessary things */ + full->sparse_index = 0; + full->cache_alloc = (3 * istate->cache_alloc) / 2; + full->cache_nr = 0; + ALLOC_ARRAY(full->cache, full->cache_alloc); + + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + struct tree *tree; + struct pathspec ps; + + if (!S_ISSPARSEDIR(ce->ce_mode)) { + set_index_entry(full, full->cache_nr++, ce); + continue; + } + if (!(ce->ce_flags & CE_SKIP_WORKTREE)) + warning(_("index entry is a directory, but not sparse (%08x)"), + ce->ce_flags); + + /* recursively walk into cd->name */ + tree = lookup_tree(istate->repo, &ce->oid); + + memset(&ps, 0, sizeof(ps)); + ps.recursive = 1; + ps.has_wildcard = 1; + ps.max_depth = -1; + + strbuf_setlen(&base, 0); + strbuf_add(&base, ce->name, strlen(ce->name)); + + read_tree_at(istate->repo, tree, &base, &ps, + add_path_to_index, full); + + /* free directory entries. full entries are re-used */ + discard_cache_entry(ce); + } + + /* Copy back into original index. */ + memcpy(&istate->name_hash, &full->name_hash, sizeof(full->name_hash)); + istate->sparse_index = 0; + free(istate->cache); + istate->cache = full->cache; + istate->cache_nr = full->cache_nr; + istate->cache_alloc = full->cache_alloc; + + strbuf_release(&base); + free(full); + + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 06/21] t1092: compare sparse-checkout to sparse-index 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (4 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 05/21] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 07/21] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget ` (16 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a new 'sparse-index' repo alongside the 'full-checkout' and 'sparse-checkout' repos in t1092-sparse-checkout-compatibility.sh. Also add run_on_sparse and test_sparse_match helpers. These helpers will be used when the sparse index is implemented. Add the GIT_TEST_SPARSE_INDEX environment variable to enable the sparse-index by default. This can be enabled across all tests, but that will only affect cases where the sparse-checkout feature is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/README | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++---- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/t/README b/t/README index 593d4a4e270c..b98bc563aab5 100644 --- a/t/README +++ b/t/README @@ -439,6 +439,9 @@ and "sha256". GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the 'pack.writeReverseIndex' setting. +GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the +sparse-index format by default. + Naming Tests ------------ diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 3725d3997e70..de5d8461c993 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -7,6 +7,7 @@ test_description='compare full workdir to sparse workdir' test_expect_success 'setup' ' git init initial-repo && ( + GIT_TEST_SPARSE_INDEX=0 && cd initial-repo && echo a >a && echo "after deep" >e && @@ -87,23 +88,32 @@ init_repos () { cp -r initial-repo sparse-checkout && git -C sparse-checkout reset --hard && - git -C sparse-checkout sparse-checkout init --cone && + + cp -r initial-repo sparse-index && + git -C sparse-index reset --hard && # initialize sparse-checkout definitions - git -C sparse-checkout sparse-checkout set deep + git -C sparse-checkout sparse-checkout init --cone && + git -C sparse-checkout sparse-checkout set deep && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - "$@" >../sparse-checkout-out 2>../sparse-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + ) && + ( + cd sparse-index && + GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - "$@" >../full-checkout-out 2>../full-checkout-err + GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -114,6 +124,12 @@ test_all_match () { test_cmp full-checkout-err sparse-checkout-err } +test_sparse_match () { + run_on_sparse "$@" && + test_cmp sparse-checkout-out sparse-index-out && + test_cmp sparse-checkout-err sparse-index-err +} + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 07/21] test-read-cache: print cache entries with --table 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (5 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 06/21] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 08/21] test-tool: don't force full index Derrick Stolee via GitGitGadget ` (15 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> This table is helpful for discovering data in the index to ensure it is being written correctly, especially as we build and test the sparse-index. This table includes an output format similar to 'git ls-tree', but should not be compared to that directly. The biggest reasons are that 'git ls-tree' includes a tree entry for every subdirectory, even those that would not appear as a sparse directory in a sparse-index. Further, 'git ls-tree' does not use a trailing directory separator for its tree rows. This does not print the stat() information for the blobs. That will be added in a future change with another option. The tests that are added in the next few changes care only about the object types and IDs. However, this future need for full index information justifies the need for this test helper over extending a user-facing feature, such as 'git ls-files'. To make the option parsing slightly more robust, wrap the string comparisons in a loop adapted from test-dir-iterator.c. Care must be taken with the final check for the 'cnt' variable. We continue the expectation that the numerical value is the final argument. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 244977a29bdf..6cfd8f2de71c 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -1,36 +1,71 @@ #include "test-tool.h" #include "cache.h" #include "config.h" +#include "blob.h" +#include "commit.h" +#include "tree.h" + +static void print_cache_entry(struct cache_entry *ce) +{ + const char *type; + printf("%06o ", ce->ce_mode & 0177777); + + if (S_ISSPARSEDIR(ce->ce_mode)) + type = tree_type; + else if (S_ISGITLINK(ce->ce_mode)) + type = commit_type; + else + type = blob_type; + + printf("%s %s\t%s\n", + type, + oid_to_hex(&ce->oid), + ce->name); +} + +static void print_cache(struct index_state *istate) +{ + int i; + for (i = 0; i < istate->cache_nr; i++) + print_cache_entry(istate->cache[i]); +} int cmd__read_cache(int argc, const char **argv) { + struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; + int table = 0; - if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) { - argc--; - argv++; + for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { + if (skip_prefix(*argv, "--print-and-refresh=", &name)) + continue; + if (!strcmp(*argv, "--table")) + table = 1; } - if (argc == 2) - cnt = strtol(argv[1], NULL, 0); + if (argc == 1) + cnt = strtol(argv[0], NULL, 0); setup_git_directory(); git_config(git_default_config, NULL); + for (i = 0; i < cnt; i++) { - read_cache(); + repo_read_index(r); if (name) { int pos; - refresh_index(&the_index, REFRESH_QUIET, + refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL); - pos = index_name_pos(&the_index, name, strlen(name)); + pos = index_name_pos(r->index, name, strlen(name)); if (pos < 0) die("%s not in index", name); printf("%s is%s up to date\n", name, - ce_uptodate(the_index.cache[pos]) ? "" : " not"); + ce_uptodate(r->index->cache[pos]) ? "" : " not"); write_file(name, "%d\n", i); } - discard_cache(); + if (table) + print_cache(r->index); + discard_index(r->index); } return 0; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 08/21] test-tool: don't force full index 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (6 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 07/21] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 09/21] unpack-trees: ensure " Derrick Stolee via GitGitGadget ` (14 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We will use 'test-tool read-cache --table' to check that a sparse index is written as part of init_repos. Since we will no longer always expand a sparse index into a full index, add an '--expand' parameter that adds a call to ensure_full_index() so we can compare a sparse index directly against a full index, or at least what the in-memory index looks like when expanded in this way. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/helper/test-read-cache.c | 13 ++++++++++++- t/t1092-sparse-checkout-compatibility.sh | 5 +++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c index 6cfd8f2de71c..b52c174acc7a 100644 --- a/t/helper/test-read-cache.c +++ b/t/helper/test-read-cache.c @@ -4,6 +4,7 @@ #include "blob.h" #include "commit.h" #include "tree.h" +#include "sparse-index.h" static void print_cache_entry(struct cache_entry *ce) { @@ -35,13 +36,19 @@ int cmd__read_cache(int argc, const char **argv) struct repository *r = the_repository; int i, cnt = 1; const char *name = NULL; - int table = 0; + int table = 0, expand = 0; + + initialize_the_repository(); + prepare_repo_settings(r); + r->settings.command_requires_full_index = 0; for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) { if (skip_prefix(*argv, "--print-and-refresh=", &name)) continue; if (!strcmp(*argv, "--table")) table = 1; + else if (!strcmp(*argv, "--expand")) + expand = 1; } if (argc == 1) @@ -51,6 +58,10 @@ int cmd__read_cache(int argc, const char **argv) for (i = 0; i < cnt; i++) { repo_read_index(r); + + if (expand) + ensure_full_index(r->index); + if (name) { int pos; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index de5d8461c993..a1aea141c62c 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -130,6 +130,11 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'expanded in-memory index matches full index' ' + init_repos && + test_sparse_match test-tool read-cache --expand --table +' + test_expect_success 'status with options' ' init_repos && test_all_match git status --porcelain=v2 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 09/21] unpack-trees: ensure full index 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (7 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 08/21] test-tool: don't force full index Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 10/21] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget ` (13 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The next change will translate full indexes into sparse indexes at write time. The existing logic provides a way for every sparse index to be expanded to a full index at read time. However, there are cases where an index is written and then continues to be used in-memory to perform further updates. unpack_trees() is frequently called after such a write. In particular, commands like 'git reset' do this double-update of the index. Ensure that we have a full index when entering unpack_trees(), but only when command_requires_full_index is true. This is always true at the moment, but we will later relax that after unpack_trees() is updated to handle sparse directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index f5f668f532d8..4dd99219073a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1567,6 +1567,7 @@ static int verify_absent(const struct cache_entry *, */ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o) { + struct repository *repo = the_repository; int i, ret; static struct cache_entry *dfc; struct pattern_list pl; @@ -1578,6 +1579,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options trace_performance_enter(); trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + prepare_repo_settings(repo); + if (repo->settings.command_requires_full_index) { + ensure_full_index(o->src_index); + ensure_full_index(o->dst_index); + } + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 10/21] sparse-checkout: hold pattern list in index 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (8 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 09/21] unpack-trees: ensure " Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 11/21] sparse-index: add 'sdir' index extension Derrick Stolee via GitGitGadget ` (12 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> As we modify the sparse-checkout definition, we perform index operations on a pattern_list that only exists in-memory. This allows easy backing out in case the index update fails. However, if the index write itself cares about the sparse-checkout pattern set, we need access to that in-memory copy. Place a pointer to a 'struct pattern_list' in the index so we can access this on-demand. This will be used in the next change which uses the sparse-checkout definition to filter out directories that are outside the sparse cone. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 17 ++++++++++------- cache.h | 2 ++ 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 2306a9ad98e0..e00b82af727b 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -110,6 +110,8 @@ static int update_working_directory(struct pattern_list *pl) if (is_index_unborn(r->index)) return UPDATE_SPARSITY_SUCCESS; + r->index->sparse_checkout_patterns = pl; + memset(&o, 0, sizeof(o)); o.verbose_update = isatty(2); o.update = 1; @@ -138,6 +140,7 @@ static int update_working_directory(struct pattern_list *pl) else rollback_lock_file(&lock_file); + r->index->sparse_checkout_patterns = NULL; return result; } @@ -517,19 +520,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) { int result; int changed_config = 0; - struct pattern_list pl; - memset(&pl, 0, sizeof(pl)); + struct pattern_list *pl = xcalloc(1, sizeof(*pl)); switch (m) { case ADD: if (core_sparse_checkout_cone) - add_patterns_cone_mode(argc, argv, &pl); + add_patterns_cone_mode(argc, argv, pl); else - add_patterns_literal(argc, argv, &pl); + add_patterns_literal(argc, argv, pl); break; case REPLACE: - add_patterns_from_input(&pl, argc, argv); + add_patterns_from_input(pl, argc, argv); break; } @@ -539,12 +541,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) changed_config = 1; } - result = write_patterns_and_update(&pl); + result = write_patterns_and_update(pl); if (result && changed_config) set_config(MODE_NO_PATTERNS); - clear_pattern_list(&pl); + clear_pattern_list(pl); + free(pl); return result; } diff --git a/cache.h b/cache.h index 136dd496c95d..8c4464420d0a 100644 --- a/cache.h +++ b/cache.h @@ -307,6 +307,7 @@ static inline unsigned int canon_mode(unsigned int mode) struct split_index; struct untracked_cache; struct progress; +struct pattern_list; struct index_state { struct cache_entry **cache; @@ -338,6 +339,7 @@ struct index_state { struct mem_pool *ce_mem_pool; struct progress *progress; struct repository *repo; + struct pattern_list *sparse_checkout_patterns; }; /* Name hashing */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 11/21] sparse-index: add 'sdir' index extension 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (9 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 10/21] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 12/21] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget ` (11 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The index format does not currently allow for sparse directory entries. This violates some expectations that older versions of Git or third-party tools might not understand. We need an indicator inside the index file to warn these tools to not interact with a sparse index unless they are aware of sparse directory entries. Add a new _required_ index extension, 'sdir', that indicates that the index may contain sparse directory entries. This allows us to continue to use the differences in index formats 2, 3, and 4 before we create a new index version 5 in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/technical/index-format.txt | 12 ++++++++++++ read-cache.c | 9 +++++++++ 2 files changed, 21 insertions(+) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 3b74c05647db..65da0daaa563 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -392,3 +392,15 @@ The remaining data of each directory block is grouped by type: in this block of entries. - 32-bit count of cache entries in this block + +== Sparse Directory Entries + + When using sparse-checkout in cone mode, some entire directories within + the index can be summarized by pointing to a tree object instead of the + entire expanded list of paths within that tree. An index containing such + entries is a "sparse index". Index format versions 4 and less were not + implemented with such entries in mind. Thus, for these versions, an + index containing sparse directory entries will include this extension + with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, + tools should avoid interacting with a sparse index unless they understand + this extension. diff --git a/read-cache.c b/read-cache.c index dd3980c12b53..b8f092d1b7eb 100644 --- a/read-cache.c +++ b/read-cache.c @@ -47,6 +47,7 @@ #define CACHE_EXT_FSMONITOR 0x46534D4E /* "FSMN" */ #define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */ #define CACHE_EXT_INDEXENTRYOFFSETTABLE 0x49454F54 /* "IEOT" */ +#define CACHE_EXT_SPARSE_DIRECTORIES 0x73646972 /* "sdir" */ /* changes that can be kept in $GIT_DIR/index (basically all extensions) */ #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \ @@ -1763,6 +1764,10 @@ static int read_index_extension(struct index_state *istate, case CACHE_EXT_INDEXENTRYOFFSETTABLE: /* already handled in do_read_index() */ break; + case CACHE_EXT_SPARSE_DIRECTORIES: + /* no content, only an indicator */ + istate->sparse_index = 1; + break; default: if (*ext < 'A' || 'Z' < *ext) return error(_("index uses %.4s extension, which we do not understand"), @@ -3020,6 +3025,10 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (err) return -1; } + if (istate->sparse_index) { + if (write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_SPARSE_DIRECTORIES, 0) < 0) + return -1; + } /* * CACHE_EXT_ENDOFINDEXENTRIES must be written as the last entry before the SHA1 -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 12/21] sparse-index: convert from full to sparse 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (10 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 11/21] sparse-index: add 'sdir' index extension Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 13/21] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget ` (10 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 3 + cache.h | 2 + read-cache.c | 26 ++++- sparse-index.c | 139 +++++++++++++++++++++++ sparse-index.h | 1 + t/t1092-sparse-checkout-compatibility.sh | 61 +++++++++- 6 files changed, 228 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 2fb483d3c083..5f07a39e501e 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -6,6 +6,7 @@ #include "object-store.h" #include "replace-object.h" #include "promisor-remote.h" +#include "sparse-index.h" #ifndef DEBUG_CACHE_TREE #define DEBUG_CACHE_TREE 0 @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + ensure_full_index(istate); + if (!istate->cache_tree) istate->cache_tree = cache_tree(); diff --git a/cache.h b/cache.h index 8c4464420d0a..74b43aaa2bd1 100644 --- a/cache.h +++ b/cache.h @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode) { if (S_ISLNK(mode)) return S_IFLNK; + if (S_ISSPARSEDIR(mode)) + return S_IFDIR; if (S_ISDIR(mode) || S_ISGITLINK(mode)) return S_IFGITLINK; return S_IFREG | ce_permissions(mode); diff --git a/read-cache.c b/read-cache.c index b8f092d1b7eb..2410e6e0df13 100644 --- a/read-cache.c +++ b/read-cache.c @@ -25,6 +25,7 @@ #include "fsmonitor.h" #include "thread-utils.h" #include "progress.h" +#include "sparse-index.h" /* Mask for the name length in ce_flags in the on-disk index */ @@ -1003,8 +1004,14 @@ int verify_path(const char *path, unsigned mode) c = *path++; if ((c == '.' && !verify_dotfile(path, mode)) || - is_dir_sep(c) || c == '\0') + is_dir_sep(c)) return 0; + /* + * allow terminating directory separators for + * sparse directory entries. + */ + if (c == '\0') + return S_ISDIR(mode); } else if (c == '\\' && protect_ntfs) { if (is_ntfs_dotgit(path)) return 0; @@ -3088,6 +3095,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l unsigned flags) { int ret; + int was_full = !istate->sparse_index; + + ret = convert_to_sparse(istate); + + if (ret) { + warning(_("failed to convert to a sparse-index")); + return ret; + } /* * TODO trace2: replace "the_repository" with the actual repo instance @@ -3099,6 +3114,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l trace2_region_leave_printf("index", "do_write_index", the_repository, "%s", get_lock_file_path(lock)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; if (flags & COMMIT_LOCK) @@ -3189,9 +3207,10 @@ static int write_shared_index(struct index_state *istate, struct tempfile **temp) { struct split_index *si = istate->split_index; - int ret; + int ret, was_full = !istate->sparse_index; move_cache_to_base_index(istate); + convert_to_sparse(istate); trace2_region_enter_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); @@ -3199,6 +3218,9 @@ static int write_shared_index(struct index_state *istate, trace2_region_leave_printf("index", "shared/do_write_index", the_repository, "%s", get_tempfile_path(*temp)); + if (was_full) + ensure_full_index(istate); + if (ret) return ret; ret = adjust_shared_perm(get_tempfile_path(*temp)); diff --git a/sparse-index.c b/sparse-index.c index 7095378a1b28..619ff7c2e217 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -4,6 +4,145 @@ #include "tree.h" #include "pathspec.h" #include "trace2.h" +#include "cache-tree.h" +#include "config.h" +#include "dir.h" +#include "fsmonitor.h" + +static struct cache_entry *construct_sparse_dir_entry( + struct index_state *istate, + const char *sparse_dir, + struct cache_tree *tree) +{ + struct cache_entry *de; + + de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0); + + de->ce_flags |= CE_SKIP_WORKTREE; + return de; +} + +/* + * Returns the number of entries "inserted" into the index. + */ +static int convert_to_sparse_rec(struct index_state *istate, + int num_converted, + int start, int end, + const char *ct_path, size_t ct_pathlen, + struct cache_tree *ct) +{ + int i, can_convert = 1; + int start_converted = num_converted; + enum pattern_match_result match; + int dtype; + struct strbuf child_path = STRBUF_INIT; + struct pattern_list *pl = istate->sparse_checkout_patterns; + + /* + * Is the current path outside of the sparse cone? + * Then check if the region can be replaced by a sparse + * directory entry (everything is sparse and merged). + */ + match = path_matches_pattern_list(ct_path, ct_pathlen, + NULL, &dtype, pl, istate); + if (match != NOT_MATCHED) + can_convert = 0; + + for (i = start; can_convert && i < end; i++) { + struct cache_entry *ce = istate->cache[i]; + + if (ce_stage(ce) || + !(ce->ce_flags & CE_SKIP_WORKTREE)) + can_convert = 0; + } + + if (can_convert) { + struct cache_entry *se; + se = construct_sparse_dir_entry(istate, ct_path, ct); + + istate->cache[num_converted++] = se; + return 1; + } + + for (i = start; i < end; ) { + int count, span, pos = -1; + const char *base, *slash; + struct cache_entry *ce = istate->cache[i]; + + /* + * Detect if this is a normal entry outside of any subtree + * entry. + */ + base = ce->name + ct_pathlen; + slash = strchr(base, '/'); + + if (slash) + pos = cache_tree_subtree_pos(ct, base, slash - base); + + if (pos < 0) { + istate->cache[num_converted++] = ce; + i++; + continue; + } + + strbuf_setlen(&child_path, 0); + strbuf_add(&child_path, ce->name, slash - ce->name + 1); + + span = ct->down[pos]->cache_tree->entry_count; + count = convert_to_sparse_rec(istate, + num_converted, i, i + span, + child_path.buf, child_path.len, + ct->down[pos]->cache_tree); + num_converted += count; + i += span; + } + + strbuf_release(&child_path); + return num_converted - start_converted; +} + +int convert_to_sparse(struct index_state *istate) +{ + if (istate->split_index || istate->sparse_index || + !core_apply_sparse_checkout || !core_sparse_checkout_cone) + return 0; + + /* + * For now, only create a sparse index with the + * GIT_TEST_SPARSE_INDEX environment variable. We will relax + * this once we have a proper way to opt-in (and later still, + * opt-out). + */ + if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + return 0; + + if (!istate->sparse_checkout_patterns) { + istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list)); + if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0) + return 0; + } + + if (!istate->sparse_checkout_patterns->use_cone_patterns) { + warning(_("attempting to use sparse-index without cone mode")); + return -1; + } + + if (cache_tree_update(istate, 0)) { + warning(_("unable to update cache-tree, staying full")); + return -1; + } + + remove_fsmonitor(istate); + + trace2_region_enter("index", "convert_to_sparse", istate->repo); + istate->cache_nr = convert_to_sparse_rec(istate, + 0, 0, istate->cache_nr, + "", 0, istate->cache_tree); + istate->drop_cache_tree = 1; + istate->sparse_index = 1; + trace2_region_leave("index", "convert_to_sparse", istate->repo); + return 0; +} static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce) { diff --git a/sparse-index.h b/sparse-index.h index 09a20d036c46..64380e121d80 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -3,5 +3,6 @@ struct index_state; void ensure_full_index(struct index_state *istate); +int convert_to_sparse(struct index_state *istate); #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index a1aea141c62c..1e888d195122 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,6 +2,11 @@ test_description='compare full workdir to sparse workdir' +# The verify_cache_tree() check is not sparse-aware (yet). +# So, disable the check until that integration is complete. +GIT_TEST_CHECK_CACHE_TREE=0 +GIT_TEST_SPLIT_INDEX=0 + . ./test-lib.sh test_expect_success 'setup' ' @@ -121,7 +126,9 @@ run_on_all () { test_all_match () { run_on_all "$@" && test_cmp full-checkout-out sparse-checkout-out && - test_cmp full-checkout-err sparse-checkout-err + test_cmp full-checkout-out sparse-index-out && + test_cmp full-checkout-err sparse-checkout-err && + test_cmp full-checkout-err sparse-index-err } test_sparse_match () { @@ -130,6 +137,38 @@ test_sparse_match () { test_cmp sparse-checkout-err sparse-index-err } +test_expect_success 'sparse-index contents' ' + init_repos && + + test-tool -C sparse-index read-cache --table >cache && + for dir in folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done && + + GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + + test-tool -C sparse-index read-cache --table >cache && + for dir in deep/deeper2 folder1 folder2 x + do + TREE=$(git -C sparse-index rev-parse HEAD:$dir) && + grep "040000 tree $TREE $dir/" cache \ + || return 1 + done +' + test_expect_success 'expanded in-memory index matches full index' ' init_repos && test_sparse_match test-tool read-cache --expand --table @@ -137,6 +176,7 @@ test_expect_success 'expanded in-memory index matches full index' ' test_expect_success 'status with options' ' init_repos && + test_sparse_match ls && test_all_match git status --porcelain=v2 && test_all_match git status --porcelain=v2 -z -u && test_all_match git status --porcelain=v2 -uno && @@ -273,6 +313,17 @@ test_expect_failure 'checkout and reset (mixed)' ' test_all_match git reset update-folder2 ' +# Ensure that sparse-index behaves identically to +# sparse-checkout with a full index. +test_expect_success 'checkout and reset (mixed) [sparse]' ' + init_repos && + + test_sparse_match git checkout -b reset-test update-deep && + test_sparse_match git reset deepest && + test_sparse_match git reset update-folder1 && + test_sparse_match git reset update-folder2 +' + test_expect_success 'merge' ' init_repos && @@ -309,14 +360,20 @@ test_expect_success 'clean' ' test_all_match git status --porcelain=v2 && test_all_match git clean -f && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && test_all_match git clean -xdf && test_all_match git status --porcelain=v2 && + test_sparse_match ls && + test_sparse_match ls folder1 && - test_path_is_dir sparse-checkout/folder1 + test_sparse_match test_path_is_dir folder1 ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 13/21] submodule: sparse-index should not collapse links 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (11 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 12/21] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 14/21] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget ` (9 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- sparse-index.c | 1 + t/t1092-sparse-checkout-compatibility.sh | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/sparse-index.c b/sparse-index.c index 619ff7c2e217..7631f7bd00b7 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -52,6 +52,7 @@ static int convert_to_sparse_rec(struct index_state *istate, struct cache_entry *ce = istate->cache[i]; if (ce_stage(ce) || + S_ISGITLINK(ce->ce_mode) || !(ce->ce_flags & CE_SKIP_WORKTREE)) can_convert = 0; } diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 1e888d195122..cba5f89b1e96 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -376,4 +376,21 @@ test_expect_success 'clean' ' test_sparse_match test_path_is_dir folder1 ' +test_expect_success 'submodule handling' ' + init_repos && + + test_all_match mkdir modules && + test_all_match touch modules/a && + test_all_match git add modules && + test_all_match git commit -m "add modules directory" && + + run_on_all git submodule add "$(pwd)/initial-repo" modules/sub && + test_all_match git commit -m "add submodule" && + + # having a submodule prevents "modules" from collapse + test-tool -C sparse-index read-cache --table >cache && + grep "100644 blob .* modules/a" cache && + grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 14/21] unpack-trees: allow sparse directories 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (12 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 13/21] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 15/21] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget ` (8 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The index_pos_by_traverse_info() currently throws a BUG() when a directory entry exists exactly in the index. We need to consider that it is possible to have a directory in a sparse index as long as that entry is itself marked with the skip-worktree bit. The 'pos' variable is assigned a negative value if an exact match is not found. Since a directory name can be an exact match, it is no longer an error to have a nonnegative 'pos' value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- unpack-trees.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 4dd99219073a..0b888dab2246 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -746,9 +746,13 @@ static int index_pos_by_traverse_info(struct name_entry *names, strbuf_make_traverse_path(&name, info, names->path, names->pathlen); strbuf_addch(&name, '/'); pos = index_name_pos(o->src_index, name.buf, name.len); - if (pos >= 0) - BUG("This is a directory and should not exist in index"); - pos = -pos - 1; + if (pos >= 0) { + if (!o->src_index->sparse_index || + !(o->src_index->cache[pos]->ce_flags & CE_SKIP_WORKTREE)) + BUG("This is a directory and should not exist in index"); + } else { + pos = -pos - 1; + } if (pos >= o->src_index->cache_nr || !starts_with(o->src_index->cache[pos]->name, name.buf) || (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name.buf))) -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 15/21] sparse-index: check index conversion happens 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (13 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 14/21] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 16/21] sparse-index: add index.sparse config option Derrick Stolee via GitGitGadget ` (7 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> Add a test case that uses test_region to ensure that we are truly expanding a sparse index to a full one, then converting back to sparse when writing the index. As we integrate more Git commands with the sparse index, we will convert these commands to check that we do _not_ convert the sparse index to a full index and instead stay sparse the entire time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index cba5f89b1e96..47f983217852 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -393,4 +393,22 @@ test_expect_success 'submodule handling' ' grep "160000 commit $(git -C initial-repo rev-parse HEAD) modules/sub" cache ' +test_expect_success 'sparse-index is expanded and converted back' ' + init_repos && + + ( + GIT_TEST_SPARSE_INDEX=1 && + export GIT_TEST_SPARSE_INDEX && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 16/21] sparse-index: add index.sparse config option 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (14 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 15/21] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 ` Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 17/21] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget ` (6 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:10 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> When enabled, this config option signals that index writes should attempt to use sparse-directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/config/index.txt | 5 +++++ cache.h | 1 + repo-settings.c | 7 +++++++ repository.h | 3 ++- sparse-index.c | 34 +++++++++++++++++++++++++++++----- 5 files changed, 44 insertions(+), 6 deletions(-) diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 7cb50b37e98d..75f3a2d10541 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -14,6 +14,11 @@ index.recordOffsetTable:: Defaults to 'true' if index.threads has been explicitly enabled, 'false' otherwise. +index.sparse:: + When enabled, write the index using sparse-directory entries. This + has no effect unless `core.sparseCheckout` and + `core.sparseCheckoutCone` are both enabled. Defaults to 'false'. + index.threads:: Specifies the number of threads to spawn when loading the index. This is meant to reduce index load time on multiprocessor machines. diff --git a/cache.h b/cache.h index 74b43aaa2bd1..8aede373aeb3 100644 --- a/cache.h +++ b/cache.h @@ -1059,6 +1059,7 @@ struct repository_format { int worktree_config; int is_bare; int hash_algo; + int sparse_index; char *work_tree; struct string_list unknown_extensions; struct string_list v1_only_extensions; diff --git a/repo-settings.c b/repo-settings.c index d63569e4041e..0cfe8b787db2 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -85,4 +85,11 @@ void prepare_repo_settings(struct repository *r) * removed. */ r->settings.command_requires_full_index = 1; + + /* + * Initialize this as off. + */ + r->settings.sparse_index = 0; + if (!repo_config_get_bool(r, "index.sparse", &value) && value) + r->settings.sparse_index = 1; } diff --git a/repository.h b/repository.h index e06a23015697..a45f7520fd9e 100644 --- a/repository.h +++ b/repository.h @@ -42,7 +42,8 @@ struct repo_settings { int core_multi_pack_index; - unsigned command_requires_full_index:1; + unsigned command_requires_full_index:1, + sparse_index:1; }; struct repository { diff --git a/sparse-index.c b/sparse-index.c index 7631f7bd00b7..6f4d95d35b1e 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -102,19 +102,43 @@ static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; } +static int enable_sparse_index(struct repository *repo) +{ + const char *config_path = repo_git_path(repo, "config.worktree"); + + git_config_set_in_file_gently(config_path, + "index.sparse", + "true"); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 1; + return 0; +} + int convert_to_sparse(struct index_state *istate) { if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; + if (!istate->repo) + istate->repo = the_repository; + + /* + * The GIT_TEST_SPARSE_INDEX environment variable triggers the + * index.sparse config variable to be on. + */ + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { + int err = enable_sparse_index(istate->repo); + if (err < 0) + return err; + } + /* - * For now, only create a sparse index with the - * GIT_TEST_SPARSE_INDEX environment variable. We will relax - * this once we have a proper way to opt-in (and later still, - * opt-out). + * Only convert to sparse if index.sparse is set. */ - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) + prepare_repo_settings(istate->repo); + if (!istate->repo->settings.sparse_index) return 0; if (!istate->sparse_checkout_patterns) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 17/21] sparse-checkout: toggle sparse index from builtin 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (15 preceding siblings ...) 2021-03-30 13:10 ` [PATCH v5 16/21] sparse-index: add index.sparse config option Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 ` Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 18/21] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- Documentation/git-sparse-checkout.txt | 14 +++++++ builtin/sparse-checkout.c | 17 ++++++++- sparse-index.c | 33 +++++++++++------ sparse-index.h | 3 ++ t/t1092-sparse-checkout-compatibility.sh | 47 +++++++++++++----------- 5 files changed, 80 insertions(+), 34 deletions(-) diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02ee3..fdcf43f87cb3 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by external tools. If you have trouble +with this compatibility, then run `git sparse-checkout init --no-sparse-index` +to rewrite your index to not be sparse. Older versions of Git will not +understand the sparse directory entries index extension and may fail to +interact with your repository until it is disabled. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e00b82af727b..ca63e2c64e95 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -14,6 +14,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "quote.h" +#include "sparse-index.h" static const char *empty_base = ""; @@ -283,12 +284,13 @@ static int set_config(enum sparse_checkout_mode mode) } static char const * const builtin_sparse_checkout_init_usage[] = { - N_("git sparse-checkout init [--cone]"), + N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"), NULL }; static struct sparse_checkout_init_opts { int cone_mode; + int sparse_index; } init_opts; static int sparse_checkout_init(int argc, const char **argv) @@ -303,11 +305,15 @@ static int sparse_checkout_init(int argc, const char **argv) static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, N_("initialize the sparse-checkout in cone mode")), + OPT_BOOL(0, "sparse-index", &init_opts.sparse_index, + N_("toggle the use of a sparse index")), OPT_END(), }; repo_read_index(the_repository); + init_opts.sparse_index = -1; + argc = parse_options(argc, argv, NULL, builtin_sparse_checkout_init_options, builtin_sparse_checkout_init_usage, 0); @@ -326,6 +332,15 @@ static int sparse_checkout_init(int argc, const char **argv) sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL); + if (init_opts.sparse_index >= 0) { + if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0) + die(_("failed to modify sparse-index config")); + + /* force an index rewrite */ + repo_read_index(the_repository); + the_repository->index->updated_workdir = 1; + } + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); diff --git a/sparse-index.c b/sparse-index.c index 6f4d95d35b1e..4c73772c6d6c 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -102,21 +102,32 @@ static int convert_to_sparse_rec(struct index_state *istate, return num_converted - start_converted; } -static int enable_sparse_index(struct repository *repo) +static int set_index_sparse_config(struct repository *repo, int enable) { - const char *config_path = repo_git_path(repo, "config.worktree"); - - git_config_set_in_file_gently(config_path, - "index.sparse", - "true"); + int res; + char *config_path = repo_git_path(repo, "config.worktree"); + res = git_config_set_in_file_gently(config_path, + "index.sparse", + enable ? "true" : NULL); + free(config_path); prepare_repo_settings(repo); repo->settings.sparse_index = 1; - return 0; + return res; +} + +int set_sparse_index_config(struct repository *repo, int enable) +{ + int res = set_index_sparse_config(repo, enable); + + prepare_repo_settings(repo); + repo->settings.sparse_index = enable; + return res; } int convert_to_sparse(struct index_state *istate) { + int test_env; if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ -128,11 +139,9 @@ int convert_to_sparse(struct index_state *istate) * The GIT_TEST_SPARSE_INDEX environment variable triggers the * index.sparse config variable to be on. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); - if (err < 0) - return err; - } + test_env = git_env_bool("GIT_TEST_SPARSE_INDEX", -1); + if (test_env >= 0) + set_sparse_index_config(istate->repo, test_env); /* * Only convert to sparse if index.sparse is set. diff --git a/sparse-index.h b/sparse-index.h index 64380e121d80..39dcc859735e 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,4 +5,7 @@ struct index_state; void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); +struct repository; +int set_sparse_index_config(struct repository *repo, int enable); + #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 47f983217852..472c5337de1b 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -6,6 +6,7 @@ test_description='compare full workdir to sparse workdir' # So, disable the check until that integration is complete. GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= . ./test-lib.sh @@ -100,25 +101,26 @@ init_repos () { # initialize sparse-checkout definitions git -C sparse-checkout sparse-checkout init --cone && git -C sparse-checkout sparse-checkout set deep && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && + test_cmp_config -C sparse-index true index.sparse && + git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) && ( cd sparse-index && - GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err + "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -148,7 +150,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + git -C sparse-index sparse-checkout set folder1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep folder2 x @@ -158,7 +160,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + git -C sparse-index sparse-checkout set deep/deeper1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x @@ -166,7 +168,14 @@ test_expect_success 'sparse-index contents' ' TREE=$(git -C sparse-index rev-parse HEAD:$dir) && grep "040000 tree $TREE $dir/" cache \ || return 1 - done + done && + + # Disabling the sparse-index removes tree entries with full ones + git -C sparse-index sparse-checkout init --no-sparse-index && + + test-tool -C sparse-index read-cache --table >cache && + ! grep "040000 tree" cache && + test_sparse_match test-tool read-cache --table ' test_expect_success 'expanded in-memory index matches full index' ' @@ -396,19 +405,15 @@ test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && - ( - GIT_TEST_SPARSE_INDEX=1 && - export GIT_TEST_SPARSE_INDEX && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && - test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && - - rm trace2.txt && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt - ) + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt ' test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 18/21] sparse-checkout: disable sparse-index 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (16 preceding siblings ...) 2021-03-30 13:11 ` [PATCH v5 17/21] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 ` Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 19/21] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget ` (4 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> We use 'git sparse-checkout init --cone --sparse-index' to toggle the sparse-index feature. It makes sense to also disable it when running 'git sparse-checkout disable'. This is particularly important because it removes the extensions.sparseIndex config option, allowing other tools to use this Git repository again. This does mean that 'git sparse-checkout init' will not re-enable the sparse-index feature, even if it was previously enabled. While testing this feature, I noticed that the sparse-index was not being written on the first run, but by a second. This was caught by the call to 'test-tool read-cache --table'. This requires adjusting some assignments to core_apply_sparse_checkout and pl.use_cone_patterns in the sparse_checkout_init() logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- builtin/sparse-checkout.c | 10 +++++++++- t/t1091-sparse-checkout-builtin.sh | 13 +++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index ca63e2c64e95..585343fa1972 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -280,6 +280,9 @@ static int set_config(enum sparse_checkout_mode mode) "core.sparseCheckoutCone", mode == MODE_CONE_PATTERNS ? "true" : NULL); + if (mode == MODE_NO_PATTERNS) + set_sparse_index_config(the_repository, 0); + return 0; } @@ -341,10 +344,11 @@ static int sparse_checkout_init(int argc, const char **argv) the_repository->index->updated_workdir = 1; } + core_apply_sparse_checkout = 1; + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); - core_apply_sparse_checkout = 1; return update_working_directory(NULL); } @@ -366,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv) add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); strbuf_addstr(&pattern, "!/*/"); add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0); + pl.use_cone_patterns = init_opts.cone_mode; return write_patterns_and_update(&pl); } @@ -632,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv) strbuf_addstr(&match_all, "/*"); add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0); + prepare_repo_settings(the_repository); + the_repository->settings.sparse_index = 0; + if (update_working_directory(&pl)) die(_("error while refreshing working directory")); diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index fc64e9ed99f4..38fc8340f5c9 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -205,6 +205,19 @@ test_expect_success 'sparse-checkout disable' ' check_files repo a deep folder1 folder2 ' +test_expect_success 'sparse-index enabled and disabled' ' + git -C repo sparse-checkout init --cone --sparse-index && + test_cmp_config -C repo true index.sparse && + test-tool -C repo read-cache --table >cache && + grep " tree " cache && + + git -C repo sparse-checkout disable && + test-tool -C repo read-cache --table >cache && + ! grep " tree " cache && + git -C repo config --list >config && + ! grep index.sparse config +' + test_expect_success 'cone mode: init and set' ' git -C repo sparse-checkout init --cone && git -C repo config --list >config && -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 19/21] cache-tree: integrate with sparse directory entries 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (17 preceding siblings ...) 2021-03-30 13:11 ` [PATCH v5 18/21] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 ` Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 20/21] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 18 ++++++++++++++++++ sparse-index.c | 10 +++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 5f07a39e501e..950a9615db8f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it, *skip_count = 0; + /* + * If the first entry of this region is a sparse directory + * entry corresponding exactly to 'base', then this cache_tree + * struct is a "leaf" in the data structure, pointing to the + * tree OID specified in the entry. + */ + if (entries > 0) { + const struct cache_entry *ce = cache[0]; + + if (S_ISSPARSEDIR(ce->ce_mode) && + ce->ce_namelen == baselen && + !strncmp(ce->name, base, baselen)) { + it->entry_count = 1; + oidcpy(&it->oid, &ce->oid); + return 1; + } + } + if (0 <= it->entry_count && has_object_file(&it->oid)) return it->entry_count; diff --git a/sparse-index.c b/sparse-index.c index 4c73772c6d6c..95ea17174da3 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -172,7 +172,11 @@ int convert_to_sparse(struct index_state *istate) istate->cache_nr = convert_to_sparse_rec(istate, 0, 0, istate->cache_nr, "", 0, istate->cache_tree); - istate->drop_cache_tree = 1; + + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + istate->sparse_index = 1; trace2_region_leave("index", "convert_to_sparse", istate->repo); return 0; @@ -273,5 +277,9 @@ void ensure_full_index(struct index_state *istate) strbuf_release(&base); free(full); + /* Clear and recompute the cache-tree */ + cache_tree_free(&istate->cache_tree); + cache_tree_update(istate, 0); + trace2_region_leave("index", "ensure_full_index", istate->repo); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 20/21] sparse-index: loose integration with cache_tree_verify() 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (18 preceding siblings ...) 2021-03-30 13:11 ` [PATCH v5 19/21] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 ` Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 21/21] p2000: add sparse-index repos Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The cache_tree_verify() method is run when GIT_TEST_CHECK_CACHE_TREE is enabled, which it is by default in the test suite. The logic must be adjusted for the presence of these directory entries. For now, leave the test as a simple check for whether the directory entry is sparse. Do not go any further until needed. This allows us to re-enable GIT_TEST_CHECK_CACHE_TREE in t1092-sparse-checkout-compatibility.sh. Further, p2000-sparse-operations.sh uses the test suite and hence this is enabled for all tests. We need to integrate with it before we run our performance tests with a sparse-index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- cache-tree.c | 19 +++++++++++++++++++ t/t1092-sparse-checkout-compatibility.sh | 3 --- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 950a9615db8f..11bf1fcae6e1 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -808,6 +808,19 @@ int cache_tree_matches_traversal(struct cache_tree *root, return 0; } +static void verify_one_sparse(struct repository *r, + struct index_state *istate, + struct cache_tree *it, + struct strbuf *path, + int pos) +{ + struct cache_entry *ce = istate->cache[pos]; + + if (!S_ISSPARSEDIR(ce->ce_mode)) + BUG("directory '%s' is present in index, but not sparse", + path->buf); +} + static void verify_one(struct repository *r, struct index_state *istate, struct cache_tree *it, @@ -830,6 +843,12 @@ static void verify_one(struct repository *r, if (path->len) { pos = index_name_pos(istate, path->buf, path->len); + + if (pos >= 0) { + verify_one_sparse(r, istate, it, path, pos); + return; + } + pos = -pos - 1; } else { pos = 0; diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 472c5337de1b..12e6c453024f 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -2,9 +2,6 @@ test_description='compare full workdir to sparse workdir' -# The verify_cache_tree() check is not sparse-aware (yet). -# So, disable the check until that integration is complete. -GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 GIT_TEST_SPARSE_INDEX= -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* [PATCH v5 21/21] p2000: add sparse-index repos 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (19 preceding siblings ...) 2021-03-30 13:11 ` [PATCH v5 20/21] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 ` Derrick Stolee via GitGitGadget 2021-03-30 20:11 ` [PATCH v5 00/21] Sparse Index: Design, Format, Tests Junio C Hamano 2021-04-01 4:38 ` Elijah Newren 22 siblings, 0 replies; 203+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-03-30 13:11 UTC (permalink / raw) To: git Cc: newren, gitster, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> p2000-sparse-operations.sh compares different Git commands in repositories with many files at HEAD but using sparse-checkout to focus on a small portion of those files. Add extra copies of the repository that use the sparse-index format so we can track how that affects the performance of different commands. At this point in time, the sparse-index is 100% overhead from the CPU front, and this is measurable in these tests: Test --------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.59(0.51+0.12) 2000.3: git status (full-index-v4) 0.59(0.52+0.11) 2000.4: git status (sparse-index-v3) 1.40(1.32+0.12) 2000.5: git status (sparse-index-v4) 1.41(1.36+0.08) 2000.6: git add -A (full-index-v3) 2.32(1.97+0.19) 2000.7: git add -A (full-index-v4) 2.17(1.92+0.14) 2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15) 2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13) 2000.10: git add . (full-index-v3) 2.39(2.02+0.20) 2000.11: git add . (full-index-v4) 2.20(1.94+0.16) 2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12) 2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16) 2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20) 2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17) 2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16) 2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15) Note that there is very little difference between the v3 and v4 index formats when the sparse-index is enabled. This is primarily due to the fact that the relative file sizes are the same, and the command time is mostly taken up by parsing tree objects to expand the sparse index into a full one. With the current file layout, the index file sizes are given by this table: | full index | sparse index | +-------------+--------------+ v3 | 108 MiB | 1.6 MiB | v4 | 80 MiB | 1.2 MiB | Future updates will improve the performance of Git commands when the index is sparse. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- t/perf/p2000-sparse-operations.sh | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index dddd527b6330..94513c977489 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -59,12 +59,29 @@ test_expect_success 'setup repo and indexes' ' git sparse-checkout set $SPARSE_CONE && git config index.version 4 && git update-index --index-version=4 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v3 && + ( + cd sparse-index-v3 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 3 && + git update-index --index-version=3 + ) && + git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . sparse-index-v4 && + ( + cd sparse-index-v4 && + git sparse-checkout init --cone --sparse-index && + git sparse-checkout set $SPARSE_CONE && + git config index.version 4 && + git update-index --index-version=4 ) ' test_perf_on_all () { command="$@" - for repo in full-index-v3 full-index-v4 + for repo in full-index-v3 full-index-v4 \ + sparse-index-v3 sparse-index-v4 do test_perf "$command ($repo)" " ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 203+ messages in thread
* Re: [PATCH v5 00/21] Sparse Index: Design, Format, Tests 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (20 preceding siblings ...) 2021-03-30 13:11 ` [PATCH v5 21/21] p2000: add sparse-index repos Derrick Stolee via GitGitGadget @ 2021-03-30 20:11 ` Junio C Hamano 2021-03-30 21:31 ` Derrick Stolee 2021-04-01 4:38 ` Elijah Newren 22 siblings, 1 reply; 203+ messages in thread From: Junio C Hamano @ 2021-03-30 20:11 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ repo-settings.c: void prepare_repo_settings(struct repository *r) > + * Initialize this as off. > + */ > + r->settings.sparse_index = 0; > -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) > ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value) > + r->settings.sparse_index = 1; > } It would be helpful to have a way for the repository owner to say "Even if the version of Git may be capable of handling 'sdir' extension, and my checkout uses sparse-cone settings, I do not want to use it", and the other way around, i.e. "Even if my checkout currently does not use sparse-cone settings, do use 'sdir' extension". But for that, .sparse_index member may need to be tristate (i.e. forbidden, enable-if-needed, use-even-unneeded)? We have a similar setting in index.version; I believe we always auto-demote 3 down to 2 when extended flags are not used, and I think "always auto-demote" would be sufficient (iow, "use-even-unneeded" may not be necessary, even though that might help debugging). Thanks. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v5 00/21] Sparse Index: Design, Format, Tests 2021-03-30 20:11 ` [PATCH v5 00/21] Sparse Index: Design, Format, Tests Junio C Hamano @ 2021-03-30 21:31 ` Derrick Stolee 2021-03-30 21:49 ` Junio C Hamano 2021-04-01 5:59 ` Elijah Newren 0 siblings, 2 replies; 203+ messages in thread From: Derrick Stolee @ 2021-03-30 21:31 UTC (permalink / raw) To: Junio C Hamano, Derrick Stolee via GitGitGadget Cc: git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On 3/30/2021 4:11 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> @@ repo-settings.c: void prepare_repo_settings(struct repository *r) >> + * Initialize this as off. >> + */ >> + r->settings.sparse_index = 0; >> -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) >> ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value) >> + r->settings.sparse_index = 1; >> } > > It would be helpful to have a way for the repository owner to say > "Even if the version of Git may be capable of handling 'sdir' > extension, and my checkout uses sparse-cone settings, I do not want > to use it", and the other way around, i.e. "Even if my checkout > currently does not use sparse-cone settings, do use 'sdir' > extension". But for that, .sparse_index member may need to be > tristate (i.e. forbidden, enable-if-needed, use-even-unneeded)? I believe as presented, index.sparse=false will prevent the sdir extension from being used. If index.sparse=true, then it will only be used if sparse-checkout is enabled in cone mode. I don't see the value in using the 'sdir' extension when not using sparse-checkout in cone mode (and hence there are no sparse directory entries in the index). What am I missing? > We have a similar setting in index.version; I believe we always > auto-demote 3 down to 2 when extended flags are not used, and > I think "always auto-demote" would be sufficient (iow, > "use-even-unneeded" may not be necessary, even though that might > help debugging). Yes, the same is happening here: we auto-demote to not use 'sdir' if it the other settings are not configured as well. There is the rare scenario where these things all occur: 1. index.sparse = true 2. core.sparseCheckout = true 3. core.sparseCheckoutCone = true 4. Every path in the index matches the cone-mode patterns. In this case, convert_to_sparse() is called and the istate->sparse bit is set, telling do_write_index() to add the 'sdir' extension. This seems like a rare occurrence. Is it still worth adding logic to avoid 'sdir' when these are all true? Thanks, -Stolee ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v5 00/21] Sparse Index: Design, Format, Tests 2021-03-30 21:31 ` Derrick Stolee @ 2021-03-30 21:49 ` Junio C Hamano 2021-04-01 5:59 ` Elijah Newren 1 sibling, 0 replies; 203+ messages in thread From: Junio C Hamano @ 2021-03-30 21:49 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, newren, pclouds, jrnieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee Derrick Stolee <stolee@gmail.com> writes: >> We have a similar setting in index.version; I believe we always >> auto-demote 3 down to 2 when extended flags are not used, and >> I think "always auto-demote" would be sufficient (iow, >> "use-even-unneeded" may not be necessary, even though that might >> help debugging). > > Yes, the same is happening here: we auto-demote to not use 'sdir' > if it the other settings are not configured as well. > > There is the rare scenario where these things all occur: > ... > This seems like a rare occurrence. Is it still worth adding logic > to avoid 'sdir' when these are all true? You'd be the primary one who will be debugging the system while and after this goes through the stabilization effort, so whichever you find is more convenient is good enough for us, I guess. Thanks. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v5 00/21] Sparse Index: Design, Format, Tests 2021-03-30 21:31 ` Derrick Stolee 2021-03-30 21:49 ` Junio C Hamano @ 2021-04-01 5:59 ` Elijah Newren 1 sibling, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-04-01 5:59 UTC (permalink / raw) To: Derrick Stolee Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, Git Mailing List, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On Tue, Mar 30, 2021 at 2:31 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/30/2021 4:11 PM, Junio C Hamano wrote: > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > >> @@ repo-settings.c: void prepare_repo_settings(struct repository *r) > >> + * Initialize this as off. > >> + */ > >> + r->settings.sparse_index = 0; > >> -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) > >> ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value) > >> + r->settings.sparse_index = 1; > >> } > > > > It would be helpful to have a way for the repository owner to say > > "Even if the version of Git may be capable of handling 'sdir' > > extension, and my checkout uses sparse-cone settings, I do not want > > to use it", and the other way around, i.e. "Even if my checkout > > currently does not use sparse-cone settings, do use 'sdir' > > extension". But for that, .sparse_index member may need to be > > tristate (i.e. forbidden, enable-if-needed, use-even-unneeded)? > > I believe as presented, index.sparse=false will prevent the sdir > extension from being used. If index.sparse=true, then it will only > be used if sparse-checkout is enabled in cone mode. > > I don't see the value in using the 'sdir' extension when not using > sparse-checkout in cone mode (and hence there are no sparse directory > entries in the index). What am I missing? > > > We have a similar setting in index.version; I believe we always > > auto-demote 3 down to 2 when extended flags are not used, and > > I think "always auto-demote" would be sufficient (iow, > > "use-even-unneeded" may not be necessary, even though that might > > help debugging). > > Yes, the same is happening here: we auto-demote to not use 'sdir' > if it the other settings are not configured as well. > > There is the rare scenario where these things all occur: > > 1. index.sparse = true > 2. core.sparseCheckout = true > 3. core.sparseCheckoutCone = true > 4. Every path in the index matches the cone-mode patterns. > > In this case, convert_to_sparse() is called and the istate->sparse > bit is set, telling do_write_index() to add the 'sdir' extension. > > This seems like a rare occurrence. Is it still worth adding logic > to avoid 'sdir' when these are all true? I'd agree that this would be very rare; probably indicative of someone either having a bug in their sparsity patterns or making a simplistic testcase to see how things operate. ^ permalink raw reply [flat|nested] 203+ messages in thread
* Re: [PATCH v5 00/21] Sparse Index: Design, Format, Tests 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget ` (21 preceding siblings ...) 2021-03-30 20:11 ` [PATCH v5 00/21] Sparse Index: Design, Format, Tests Junio C Hamano @ 2021-04-01 4:38 ` Elijah Newren 22 siblings, 0 replies; 203+ messages in thread From: Elijah Newren @ 2021-04-01 4:38 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget Cc: Git Mailing List, Junio C Hamano, Nguyễn Thái Ngọc, Jonathan Nieder, Martin Ågren, Derrick Stolee, SZEDER Gábor, Ævar Arnfjörð Bjarmason, Derrick Stolee On Tue, Mar 30, 2021 at 6:11 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. > > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@gmail.com/ > > I won't waste too much space here, because PATCH 1 includes a sizeable > design document that describes the feature, the reasoning behind it, and my > plan for getting this implemented widely throughout the codebase. > > There are some new things here that were not in the RFC: > > * Design doc and format updates. (Patch 1) > * Performance test script. (Patches 2 and 20) > > Notably missing in this series from the RFC: > > * The mega-patch inserting ensure_full_index() throughout the codebase. > That will be a follow-up series to this one. > * The integrations with git status and git add to demonstrate the improved > performance. Those will also appear in their own series later. > > I plan to keep my latest work in this area in my 'sparse-index/wip' branch > [2]. It includes all of the work from the RFC right now, updated with the > work from this series. > > [2] https://github.com/derrickstolee/git/tree/sparse-index/wip > > > Updates in V5 > ============= > > This version is updated to use an index extension instead of a repository > format extension. Thanks, Szeder! This one change affects the range-diff > quite a bit, so please review those changes carefully. > > In particular: git sparse-checkout init --cone --sparse-index now sets a new > index.sparse config option as an indicator that we should attempt writing > the index in sparse form. > > > Updates in V4 > ============= > > * Rebased onto the latest copy of ab/read-tree. > * Updated the design document as per Junio's comments. > * Updated the submodule handling in the performance test. > * Followed up on some other review from Ævar, mostly style or commit > message things. > > > Updates in V3 > ============= > > For this version, I took Ævar's latest patches and applied them to v2.31.0 > and rebased this series on top. It uses his new "read_tree_at()" helper and > the associated changes to the function pointer type. > > * Fixed more typos. Thanks Martin and Elijah! > * Updated the test_sparse_match() macro to use "$@" instead of $* > * Added a test that git sparse-checkout init --no-sparse-index rewrites the > index to be full. > > > Updates in V2 > ============= > > * Various typos and awkward grammar is fixed. > * Cleaned up unnecessary commands in p2000-sparse-operations.sh > * Added a comment to the sparse_index member of struct index_state. > * Used tree_type, commit_type, and blob_type in test-read-cache.c. > > Thanks, -Stolee > > Derrick Stolee (21): > sparse-index: design doc and format update > t/perf: add performance test for sparse operations > t1092: clean up script quoting > sparse-index: add guard to ensure full index > sparse-index: implement ensure_full_index() > t1092: compare sparse-checkout to sparse-index > test-read-cache: print cache entries with --table > test-tool: don't force full index > unpack-trees: ensure full index > sparse-checkout: hold pattern list in index > sparse-index: add 'sdir' index extension > sparse-index: convert from full to sparse > submodule: sparse-index should not collapse links > unpack-trees: allow sparse directories > sparse-index: check index conversion happens > sparse-index: add index.sparse config option > sparse-checkout: toggle sparse index from builtin > sparse-checkout: disable sparse-index > cache-tree: integrate with sparse directory entries > sparse-index: loose integration with cache_tree_verify() > p2000: add sparse-index repos > > Documentation/config/index.txt | 5 + > Documentation/git-sparse-checkout.txt | 14 ++ > Documentation/technical/index-format.txt | 19 ++ > Documentation/technical/sparse-index.txt | 175 ++++++++++++++ > Makefile | 1 + > builtin/sparse-checkout.c | 44 +++- > cache-tree.c | 40 ++++ > cache.h | 18 +- > read-cache.c | 44 +++- > repo-settings.c | 15 ++ > repository.c | 11 +- > repository.h | 3 + > sparse-index.c | 285 +++++++++++++++++++++++ > sparse-index.h | 11 + > t/README | 3 + > t/helper/test-read-cache.c | 66 +++++- > t/perf/p2000-sparse-operations.sh | 101 ++++++++ > t/t1091-sparse-checkout-builtin.sh | 13 ++ > t/t1092-sparse-checkout-compatibility.sh | 143 ++++++++++-- > unpack-trees.c | 17 +- > 20 files changed, 988 insertions(+), 40 deletions(-) > create mode 100644 Documentation/technical/sparse-index.txt > create mode 100644 sparse-index.c > create mode 100644 sparse-index.h > create mode 100755 t/perf/p2000-sparse-operations.sh > > > base-commit: 47957485b3b731a7860e0554d2bd12c0dce1c75a > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v5 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v5 > Pull-Request: https://github.com/gitgitgadget/git/pull/883 > > Range-diff vs v4: > > 1: 6426a5c60e53 ! 1: 7b600d536c6e sparse-index: design doc and format update > @@ Documentation/technical/sparse-index.txt (new) > +The only noticeable change in behavior will be that the serialized index > +file contains sparse-directory entries. > + > -+To start, we use a new repository extension, `extensions.sparseIndex`, to > -+allow inserting sparse-directory entries into indexes with file format > ++To start, we use a new required index extension, `sdir`, to allow > ++inserting sparse-directory entries into indexes with file format > +versions 2, 3, and 4. This prevents Git versions that do not understand > -+the sparse-index from operating on one, but it also prevents other > -+operations that do not use the index at all. A new format, index v5, will > -+be introduced that includes sparse-directory entries by default. It might > -+also introduce other features that have been considered for improving the > ++the sparse-index from operating on one, while allowing tools that do not > ++understand the sparse-index to operate on repositories as long as they do > ++not interact with the index. A new format, index v5, will be introduced > ++that includes sparse-directory entries by default. It might also > ++introduce other features that have been considered for improving the > +index, as well. > + > +Next, consumers of the index will be guarded against operating on a > 2: 7eabc1d0586c = 2: 202253ec82f3 t/perf: add performance test for sparse operations > 3: c9e21d78ecba = 3: 437a0f144e57 t1092: clean up script quoting > 4: 03cdde756563 = 4: b7e1bf5c55a7 sparse-index: add guard to ensure full index > 5: 6b3b6d86385d = 5: e41d55d2cca9 sparse-index: implement ensure_full_index() > 6: 7f67adba0498 = 6: 7bfbfbd17321 t1092: compare sparse-checkout to sparse-index > 7: 7ebd9570b1ad = 7: a1b8135c0fc8 test-read-cache: print cache entries with --table > 8: db7bbd06dbcc = 8: dd84a2a9121b test-tool: don't force full index > 9: 3ddd5e794b5e = 9: b276d2ed5323 unpack-trees: ensure full index > 10: 7308c87697f1 = 10: c3651e26dc3a sparse-checkout: hold pattern list in index > -: ------------ > 11: f926cf8b2e01 sparse-index: add 'sdir' index extension > 11: 7c10d653ca6b = 12: c870ae5e8749 sparse-index: convert from full to sparse > 12: 6db36f33e960 = 13: bcf0da959ef3 submodule: sparse-index should not collapse links > 13: d24bd3348d98 = 14: 7191b48237de unpack-trees: allow sparse directories > 14: 08d9f5f3c0d1 = 15: 57be9b4a728b sparse-index: check index conversion happens > 15: 6f38cef196b0 ! 16: c22b4111e49e sparse-index: create extension for compatibility > @@ Metadata > Author: Derrick Stolee <dstolee@microsoft.com> > > ## Commit message ## > - sparse-index: create extension for compatibility > + sparse-index: add index.sparse config option > > - Previously, we enabled the sparse index format only using > - GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to > - actually select this mode. Further, sparse directory entries are not > - understood by the index formats as advertised. > - > - We _could_ add a new index version that explicitly adds these > - capabilities, but there are nuances to index formats 2, 3, and 4 that > - are still valuable to select as options. Until we add index format > - version 5, create a repo extension, "extensions.sparseIndex", that > - specifies that the tool reading this repository must understand sparse > - directory entries. > - > - This change only encodes the extension and enables it when > - GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI > - mechanism. > + When enabled, this config option signals that index writes should > + attempt to use sparse-directory entries. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > - ## Documentation/config/extensions.txt ## > -@@ Documentation/config/extensions.txt: extensions.objectFormat:: > - Note that this setting should only be set by linkgit:git-init[1] or > - linkgit:git-clone[1]. Trying to change it after initialization will not > - work and will produce hard-to-diagnose issues. > + ## Documentation/config/index.txt ## > +@@ Documentation/config/index.txt: index.recordOffsetTable:: > + Defaults to 'true' if index.threads has been explicitly enabled, > + 'false' otherwise. > + > ++index.sparse:: > ++ When enabled, write the index using sparse-directory entries. This > ++ has no effect unless `core.sparseCheckout` and > ++ `core.sparseCheckoutCone` are both enabled. Defaults to 'false'. > + > -+extensions.sparseIndex:: > -+ When combined with `core.sparseCheckout=true` and > -+ `core.sparseCheckoutCone=true`, the index may contain entries > -+ corresponding to directories outside of the sparse-checkout > -+ definition in lieu of containing each path under such directories. > -+ Versions of Git that do not understand this extension do not > -+ expect directory entries in the index. > + index.threads:: > + Specifies the number of threads to spawn when loading the index. > + This is meant to reduce index load time on multiprocessor machines. > > ## cache.h ## > @@ cache.h: struct repository_format { > @@ repo-settings.c: void prepare_repo_settings(struct repository *r) > + * Initialize this as off. > + */ > + r->settings.sparse_index = 0; > -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) > ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value) > + r->settings.sparse_index = 1; > } > > @@ repository.h: struct repo_settings { > > struct repository { > > - ## setup.c ## > -@@ setup.c: static enum extension_result handle_extension(const char *var, > - return error("invalid value for 'extensions.objectformat'"); > - data->hash_algo = format; > - return EXTENSION_OK; > -+ } else if (!strcmp(ext, "sparseindex")) { > -+ data->sparse_index = 1; > -+ return EXTENSION_OK; > - } > - return EXTENSION_UNKNOWN; > - } > - > ## sparse-index.c ## > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > return num_converted - start_converted; > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > +{ > + const char *config_path = repo_git_path(repo, "config.worktree"); > + > -+ if (upgrade_repository_format(1) < 0) { > -+ warning(_("unable to upgrade repository format to enable sparse-index")); > -+ return -1; > -+ } > + git_config_set_in_file_gently(config_path, > -+ "extensions.sparseIndex", > ++ "index.sparse", > + "true"); > + > + prepare_repo_settings(repo); > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > + > + /* > + * The GIT_TEST_SPARSE_INDEX environment variable triggers the > -+ * extensions.sparseIndex config variable to be on. > ++ * index.sparse config variable to be on. > + */ > + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { > + int err = enable_sparse_index(istate->repo); > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > - * GIT_TEST_SPARSE_INDEX environment variable. We will relax > - * this once we have a proper way to opt-in (and later still, > - * opt-out). > -+ * Only convert to sparse if extensions.sparseIndex is set. > ++ * Only convert to sparse if index.sparse is set. > */ > - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) > + prepare_repo_settings(istate->repo); > 16: 923081e7e079 ! 17: 75fe9b0f57da sparse-checkout: toggle sparse index from builtin > @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees > +that is not completely understood by external tools. If you have trouble > +with this compatibility, then run `git sparse-checkout init --no-sparse-index` > +to rewrite your index to not be sparse. Older versions of Git will not > -+understand the `sparseIndex` repository extension and may fail to interact > -+with your repository until it is disabled. > ++understand the sparse directory entries index extension and may fail to > ++interact with your repository until it is disabled. > > 'set':: > Write a set of patterns to the sparse-checkout file, as given as > @@ builtin/sparse-checkout.c: static int sparse_checkout_init(int argc, const char > > ## sparse-index.c ## > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > + return num_converted - start_converted; > + } > > - static int enable_sparse_index(struct repository *repo) > +-static int enable_sparse_index(struct repository *repo) > ++static int set_index_sparse_config(struct repository *repo, int enable) > { > - const char *config_path = repo_git_path(repo, "config.worktree"); > -+ int res; > - > - if (upgrade_repository_format(1) < 0) { > - warning(_("unable to upgrade repository format to enable sparse-index")); > - return -1; > - } > +- > - git_config_set_in_file_gently(config_path, > -- "extensions.sparseIndex", > +- "index.sparse", > - "true"); > -+ res = git_config_set_gently("extensions.sparseindex", "true"); > ++ int res; > ++ char *config_path = repo_git_path(repo, "config.worktree"); > ++ res = git_config_set_in_file_gently(config_path, > ++ "index.sparse", > ++ enable ? "true" : NULL); > ++ free(config_path); > > prepare_repo_settings(repo); > repo->settings.sparse_index = 1; > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > + > +int set_sparse_index_config(struct repository *repo, int enable) > +{ > -+ int res; > -+ > -+ if (enable) > -+ return enable_sparse_index(repo); > -+ > -+ /* Don't downgrade repository format, just remove the extension. */ > -+ res = git_config_set_gently("extensions.sparseindex", NULL); > ++ int res = set_index_sparse_config(repo, enable); > + > + prepare_repo_settings(repo); > -+ repo->settings.sparse_index = 0; > ++ repo->settings.sparse_index = enable; > + return res; > } > > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > !core_apply_sparse_checkout || !core_sparse_checkout_cone) > return 0; > @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) > - istate->repo = the_repository; > - > - /* > -- * The GIT_TEST_SPARSE_INDEX environment variable triggers the > -- * extensions.sparseIndex config variable to be on. > -+ * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex > -+ * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), > -+ * then purposefully disable the setting. > + * The GIT_TEST_SPARSE_INDEX environment variable triggers the > + * index.sparse config variable to be on. > */ > - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { > - int err = enable_sparse_index(istate->repo); > @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) > + set_sparse_index_config(istate->repo, test_env); > > /* > - * Only convert to sparse if extensions.sparseIndex is set. > + * Only convert to sparse if index.sparse is set. > > ## sparse-index.h ## > @@ sparse-index.h: struct index_state; > @@ t/t1092-sparse-checkout-compatibility.sh: init_repos () { > - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && > - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep > + git -C sparse-index sparse-checkout init --cone --sparse-index && > -+ test_cmp_config -C sparse-index true extensions.sparseindex && > ++ test_cmp_config -C sparse-index true index.sparse && > + git -C sparse-index sparse-checkout set deep > } > > 17: 6f1ad72c390d ! 18: 7f55a232e647 sparse-checkout: disable sparse-index > @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable > > +test_expect_success 'sparse-index enabled and disabled' ' > + git -C repo sparse-checkout init --cone --sparse-index && > -+ test_cmp_config -C repo true extensions.sparseIndex && > ++ test_cmp_config -C repo true index.sparse && > + test-tool -C repo read-cache --table >cache && > + grep " tree " cache && > + > @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable > + test-tool -C repo read-cache --table >cache && > + ! grep " tree " cache && > + git -C repo config --list >config && > -+ ! grep extensions.sparseindex config > ++ ! grep index.sparse config > +' > + > test_expect_success 'cone mode: init and set' ' > 18: bd94e6b7d089 = 19: 365901809d9d cache-tree: integrate with sparse directory entries > 19: e7190376b806 = 20: 9b068c458898 sparse-index: loose integration with cache_tree_verify() > 20: bcf0a58eb38c = 21: 66602733cc95 p2000: add sparse-index repos I've read through the range-diff and individually read through the new patch 11. Perhaps unsurprisingly since you addressed all my feedback by about round 3, I didn't find any problems with this new version. Looks good to me. ^ permalink raw reply [flat|nested] 203+ messages in thread
end of thread, other threads:[~2021-04-01 6:01 UTC | newest] Thread overview: 203+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-23 20:14 [PATCH 00/20] Sparse Index: Design, Format, Tests Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-02-24 1:13 ` Elijah Newren 2021-02-25 15:29 ` Derrick Stolee 2021-02-25 20:14 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget 2021-02-24 2:30 ` Elijah Newren 2021-03-09 20:03 ` Derrick Stolee 2021-02-23 20:14 ` [PATCH 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget 2021-02-24 2:44 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget 2021-02-24 3:20 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget 2021-02-25 6:37 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-02-25 7:02 ` Elijah Newren 2021-03-09 21:00 ` Derrick Stolee 2021-02-23 20:14 ` [PATCH 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget 2021-02-25 7:14 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget 2021-02-25 7:33 ` Elijah Newren 2021-03-09 21:13 ` Derrick Stolee 2021-02-23 20:14 ` [PATCH 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget 2021-02-25 7:40 ` Elijah Newren 2021-03-09 21:35 ` Derrick Stolee 2021-03-09 21:39 ` Elijah Newren 2021-02-23 20:14 ` [PATCH 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget 2021-02-25 7:45 ` Elijah Newren 2021-03-09 21:45 ` Derrick Stolee 2021-02-23 20:14 ` [PATCH 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget 2021-02-24 19:11 ` Martin Ågren 2021-03-09 20:52 ` Derrick Stolee 2021-03-09 21:03 ` Elijah Newren 2021-03-09 21:10 ` Derrick Stolee 2021-03-09 21:38 ` Elijah Newren 2021-03-14 20:08 ` Martin Ågren 2021-03-15 13:36 ` Derrick Stolee 2021-02-23 20:14 ` [PATCH 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget 2021-02-27 12:32 ` SZEDER Gábor 2021-03-09 20:20 ` Derrick Stolee 2021-03-10 18:20 ` Derrick Stolee 2021-02-23 20:14 ` [PATCH 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget 2021-02-23 20:14 ` [PATCH 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget 2021-02-23 23:49 ` [PATCH 00/20] Sparse Index: Design, Format, Tests Elijah Newren 2021-02-26 21:28 ` Elijah Newren 2021-03-10 19:30 ` [PATCH v2 " Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-10 22:19 ` Elijah Newren 2021-03-10 19:30 ` [PATCH v2 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget 2021-03-12 6:50 ` Junio C Hamano 2021-03-12 13:56 ` Derrick Stolee 2021-03-12 20:08 ` Junio C Hamano 2021-03-12 20:11 ` Derrick Stolee 2021-03-15 23:52 ` Ævar Arnfjörð Bjarmason 2021-03-10 19:30 ` [PATCH v2 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget 2021-03-10 23:04 ` Elijah Newren 2021-03-11 14:17 ` Derrick Stolee 2021-03-10 19:30 ` [PATCH v2 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget 2021-03-10 23:44 ` Elijah Newren 2021-03-11 14:13 ` Derrick Stolee 2021-03-10 19:30 ` [PATCH v2 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget 2021-03-10 19:30 ` [PATCH v2 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget 2021-03-10 19:31 ` [PATCH v2 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget 2021-03-11 0:07 ` [PATCH v2 00/20] Sparse Index: Design, Format, Tests Elijah Newren 2021-03-16 16:42 ` [PATCH v3 " Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-19 23:43 ` Junio C Hamano 2021-03-23 11:16 ` Derrick Stolee 2021-03-23 20:10 ` Junio C Hamano 2021-03-23 20:42 ` Derrick Stolee 2021-03-16 16:42 ` [PATCH v3 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget 2021-03-17 8:41 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:05 ` Derrick Stolee 2021-03-17 13:21 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:02 ` Derrick Stolee 2021-03-16 16:42 ` [PATCH v3 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget 2021-03-17 8:47 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget 2021-03-17 13:03 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-03-17 13:28 ` [RFC/PATCH 0/5] " Ævar Arnfjörð Bjarmason 2021-03-17 18:28 ` Elijah Newren 2021-03-17 19:46 ` Derrick Stolee 2021-03-17 20:26 ` Elijah Newren 2021-03-17 20:34 ` Derrick Stolee 2021-03-17 13:28 ` [RFC/PATCH 1/5] ls-files: defer read_index() after parse_options() etc Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 2/5] ls-files: make "mode" in show_ce() loop a variable Ævar Arnfjörð Bjarmason 2021-03-17 18:11 ` Elijah Newren 2021-03-24 0:46 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 3/5] ls-files: add and use a new --sparse option Ævar Arnfjörð Bjarmason 2021-03-17 18:19 ` Elijah Newren 2021-03-17 18:27 ` Ævar Arnfjörð Bjarmason 2021-03-17 18:44 ` Elijah Newren 2021-03-17 20:43 ` Derrick Stolee 2021-03-24 0:52 ` Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 4/5] test-tool read-cache: --table is redundant to ls-files Ævar Arnfjörð Bjarmason 2021-03-17 13:28 ` [RFC/PATCH 5/5] test-tool: split up test-tool read-cache Ævar Arnfjörð Bjarmason 2021-03-17 13:32 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget 2021-03-17 13:43 ` Ævar Arnfjörð Bjarmason 2021-03-17 19:55 ` Derrick Stolee 2021-03-18 13:38 ` Derrick Stolee 2021-03-16 16:42 ` [PATCH v3 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget 2021-03-17 13:35 ` Ævar Arnfjörð Bjarmason 2021-03-16 16:42 ` [PATCH v3 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget 2021-03-16 16:42 ` [PATCH v3 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget 2021-03-16 16:43 ` [PATCH v3 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget 2021-03-16 16:59 ` [PATCH v3 00/20] Sparse Index: Design, Format, Tests Derrick Stolee 2021-03-16 21:18 ` Elijah Newren 2021-03-18 21:50 ` Junio C Hamano 2021-03-19 13:00 ` Derrick Stolee 2021-03-23 13:44 ` [PATCH v4 " Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 01/20] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-26 20:29 ` SZEDER Gábor 2021-03-28 1:47 ` Junio C Hamano 2021-03-29 14:32 ` Derrick Stolee 2021-03-23 13:44 ` [PATCH v4 02/20] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 03/20] t1092: clean up script quoting Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 04/20] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 05/20] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 06/20] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 07/20] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-03-24 1:24 ` Ævar Arnfjörð Bjarmason 2021-03-24 12:33 ` Derrick Stolee 2021-03-25 3:41 ` Ævar Arnfjörð Bjarmason 2021-03-26 0:12 ` Elijah Newren 2021-03-28 15:31 ` Ævar Arnfjörð Bjarmason 2021-03-29 19:46 ` Derrick Stolee 2021-03-29 21:44 ` Junio C Hamano 2021-03-30 11:28 ` Derrick Stolee 2021-03-29 23:06 ` Ævar Arnfjörð Bjarmason 2021-03-30 11:41 ` Derrick Stolee 2021-03-29 22:02 ` Elijah Newren 2021-03-23 13:44 ` [PATCH v4 08/20] test-tool: don't force full index Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 09/20] unpack-trees: ensure " Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 10/20] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 11/20] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 12/20] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 13/20] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 14/20] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 15/20] sparse-index: create extension for compatibility Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 16/20] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 17/20] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 18/20] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 19/20] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget 2021-03-23 13:44 ` [PATCH v4 20/20] p2000: add sparse-index repos Derrick Stolee via GitGitGadget 2021-03-23 16:16 ` [PATCH v4 00/20] Sparse Index: Design, Format, Tests Elijah Newren 2021-03-30 13:10 ` [PATCH v5 00/21] " Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 01/21] sparse-index: design doc and format update Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 02/21] t/perf: add performance test for sparse operations Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 03/21] t1092: clean up script quoting Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 04/21] sparse-index: add guard to ensure full index Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 05/21] sparse-index: implement ensure_full_index() Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 06/21] t1092: compare sparse-checkout to sparse-index Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 07/21] test-read-cache: print cache entries with --table Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 08/21] test-tool: don't force full index Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 09/21] unpack-trees: ensure " Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 10/21] sparse-checkout: hold pattern list in index Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 11/21] sparse-index: add 'sdir' index extension Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 12/21] sparse-index: convert from full to sparse Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 13/21] submodule: sparse-index should not collapse links Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 14/21] unpack-trees: allow sparse directories Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 15/21] sparse-index: check index conversion happens Derrick Stolee via GitGitGadget 2021-03-30 13:10 ` [PATCH v5 16/21] sparse-index: add index.sparse config option Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 17/21] sparse-checkout: toggle sparse index from builtin Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 18/21] sparse-checkout: disable sparse-index Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 19/21] cache-tree: integrate with sparse directory entries Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 20/21] sparse-index: loose integration with cache_tree_verify() Derrick Stolee via GitGitGadget 2021-03-30 13:11 ` [PATCH v5 21/21] p2000: add sparse-index repos Derrick Stolee via GitGitGadget 2021-03-30 20:11 ` [PATCH v5 00/21] Sparse Index: Design, Format, Tests Junio C Hamano 2021-03-30 21:31 ` Derrick Stolee 2021-03-30 21:49 ` Junio C Hamano 2021-04-01 5:59 ` Elijah Newren 2021-04-01 4:38 ` Elijah Newren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).