* [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() @ 2020-01-29 22:03 Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget ` (6 more replies) 0 siblings, 7 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren This patch series builds on en/fill-directory-fixes-more. This series should be considered an RFC because of the untracked-cache changes (see the last two commits), for which I'm hoping to get an untracked-cache expert to comment. This series does provide some modest speedups (see second to last commit message), and should allow 'git status --ignored' to complete in a more reasonable timeframe for Martin Melka (see https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ ) Elijah Newren (6): dir: consolidate treat_path() and treat_one_path() dir: fix broken comment dir: fix confusion based on variable tense dir: move setting of nested_repo next to its actual usage dir: replace exponential algorithm with a linear one t7063: blindly accept diffs dir.c | 295 +++++++++++++++++------------- t/t7063-status-untracked-cache.sh | 50 ++--- 2 files changed, 191 insertions(+), 154 deletions(-) base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v1 Pull-Request: https://github.com/git/git/pull/700 -- gitgitgadget ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget @ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget ` (5 subsequent siblings) 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 16e2cfa90993 ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit ad6f2157f951 ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 121 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/dir.c b/dir.c index b460211e61..68c56aeddb 100644 --- a/dir.c +++ b/dir.c @@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate, return dtype; } -static enum path_treatment treat_one_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec, - int dtype) -{ - int exclude; - int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); +static enum path_treatment treat_path_fast(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + strbuf_setlen(path, baselen); + if (!cdir->ucd) { + strbuf_addstr(path, cdir->file); + return path_untracked; + } + strbuf_addstr(path, cdir->ucd->name); + /* treat_one_path() does this before it calls treat_directory() */ + strbuf_complete(path, '/'); + if (cdir->ucd->check_only) + /* + * check_only is set as a result of treat_directory() getting + * to its bottom. Verify again the same set of directories + * with check_only set. + */ + return read_directory_recursive(dir, istate, path->buf, path->len, + cdir->ucd, 1, 0, pathspec); + /* + * We get path_recurse in the first run when + * directory_exists_in_index() returns index_nonexistent. We + * are sure that new changes in the index does not impact the + * outcome. Return now. + */ + return path_recurse; +} + +static enum path_treatment treat_path(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + int has_path_in_index, dtype, exclude; enum path_treatment path_treatment; - dtype = resolve_dtype(dtype, istate, path->buf, path->len); + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, + baselen, pathspec); + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) + return path_none; + strbuf_setlen(path, baselen); + strbuf_addstr(path, cdir->d_name); + if (simplify_away(path->buf, path->len, pathspec)) + return path_none; + + dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len); /* Always exclude indexed files */ + has_path_in_index = !!index_file_exists(istate, path->buf, path->len, + ignore_case); if (dtype != DT_DIR && has_path_in_index) return path_none; @@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, } } -static enum path_treatment treat_path_fast(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - strbuf_setlen(path, baselen); - if (!cdir->ucd) { - strbuf_addstr(path, cdir->file); - return path_untracked; - } - strbuf_addstr(path, cdir->ucd->name); - /* treat_one_path() does this before it calls treat_directory() */ - strbuf_complete(path, '/'); - if (cdir->ucd->check_only) - /* - * check_only is set as a result of treat_directory() getting - * to its bottom. Verify again the same set of directories - * with check_only set. - */ - return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, 0, pathspec); - /* - * We get path_recurse in the first run when - * directory_exists_in_index() returns index_nonexistent. We - * are sure that new changes in the index does not impact the - * outcome. Return now. - */ - return path_recurse; -} - -static enum path_treatment treat_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - if (!cdir->d_name) - return treat_path_fast(dir, untracked, cdir, istate, path, - baselen, pathspec); - if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) - return path_none; - strbuf_setlen(path, baselen); - strbuf_addstr(path, cdir->d_name); - if (simplify_away(path->buf, path->len, pathspec)) - return path_none; - - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, - cdir->d_type); -} - static void add_untracked(struct untracked_cache_dir *dir, const char *name) { if (!dir) -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 2/6] dir: fix broken comment 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget @ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget ` (4 subsequent siblings) 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index 68c56aeddb..c358158f55 100644 --- a/dir.c +++ b/dir.c @@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, add_untracked(untracked, path.buf + baselen); break; } - /* skip the dir_add_* part */ + /* skip the add_path_to_appropriate_result_list() */ continue; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH 3/6] dir: fix confusion based on variable tense 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget @ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget 2020-01-30 15:20 ` Derrick Stolee 2020-01-31 18:04 ` SZEDER Gábor 2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget ` (3 subsequent siblings) 6 siblings, 2 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index c358158f55..225f0bc082 100644 --- a/dir.c +++ b/dir.c @@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, static enum path_treatment treat_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked, - const char *dirname, int len, int baselen, int exclude, + const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { int nested_repo = 0; @@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } if (nested_repo) return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + (excluded ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; - if (exclude && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { /* * This is an excluded directory and we are @@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); @@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * the directory contains any files. */ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, exclude, pathspec); + untracked, 1, excluded, pathspec); } /* @@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int has_path_in_index, dtype, exclude; + int has_path_in_index, dtype, excluded; enum path_treatment path_treatment; if (!cdir->d_name) @@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent)) return path_none; - exclude = is_excluded(dir, istate, path->buf, &dtype); + excluded = is_excluded(dir, istate, path->buf, &dtype); /* * Excluded? If we don't explicitly want to show * ignored files, ignore it */ - if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) + if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) return path_excluded; switch (dtype) { @@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, strbuf_addch(path, '/'); path_treatment = treat_directory(dir, istate, untracked, path->buf, path->len, - baselen, exclude, pathspec); + baselen, excluded, pathspec); /* * If 1) we only want to return directories that * match an exclude pattern and 2) this directory does @@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, * recurse into this directory (instead of marking the * directory itself as an ignored path). */ - if (!exclude && + if (!excluded && path_treatment == path_excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) @@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_treatment; case DT_REG: case DT_LNK: - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; } } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH 3/6] dir: fix confusion based on variable tense 2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget @ 2020-01-30 15:20 ` Derrick Stolee 2020-01-31 18:04 ` SZEDER Gábor 1 sibling, 0 replies; 76+ messages in thread From: Derrick Stolee @ 2020-01-30 15:20 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Despite having contributed several fixes in this area, I have for months > (years?) assumed that the "exclude" variable was a directive; this > caused me to think of it as a different mode we operate in and left me > confused as I tried to build up a mental model around why we'd need such > a directive. I mostly tried to ignore it while focusing on the pieces I > was trying to understand. > > Then I finally traced this variable all back to a call to is_excluded(), > meaning it was actually functioning as an adjective. In particular, it > was a checked property ("Does this path match a rule in .gitignore?"), > rather than a mode passed in from the caller. Change the variable name > to match the part of speech used by the function called to define it, > which will hopefully make these bits of code slightly clearer to the > next reader. I agree that some of the terminology in the .gitignore is confusing, especially when the terminology was used in the opposite sense for the sparse-checkout feature. I think this rename is worth the noise. For reference, here are some commits from ds/include-exclude that performed similar refactors: 468ce99b77 unpack-trees: rename 'is_excluded_from_list()' 65edd96aec treewide: rename 'exclude' methods to 'pattern' 4ff89ee52c treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_' caa3d55444 treewide: rename 'struct exclude_list' to 'struct pattern_list' ab8db61390 treewide: rename 'struct exclude' to 'struct path_pattern' Thanks, -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 3/6] dir: fix confusion based on variable tense 2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget 2020-01-30 15:20 ` Derrick Stolee @ 2020-01-31 18:04 ` SZEDER Gábor 2020-01-31 18:17 ` Elijah Newren 1 sibling, 1 reply; 76+ messages in thread From: SZEDER Gábor @ 2020-01-31 18:04 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Martin Melka, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On Wed, Jan 29, 2020 at 10:03:40PM +0000, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Despite having contributed several fixes in this area, I have for months > (years?) assumed that the "exclude" variable was a directive; this > caused me to think of it as a different mode we operate in and left me > confused as I tried to build up a mental model around why we'd need such > a directive. I mostly tried to ignore it while focusing on the pieces I > was trying to understand. > > Then I finally traced this variable all back to a call to is_excluded(), > meaning it was actually functioning as an adjective. In particular, it > was a checked property ("Does this path match a rule in .gitignore?"), > rather than a mode passed in from the caller. Change the variable name > to match the part of speech used by the function called to define it, > which will hopefully make these bits of code slightly clearer to the > next reader. Slightly related questions: Does 'excluded' always mean ignored? Or is it possible for a file to be excluded but for some other reason than being ignored? I'm never really sure, and of course it doesn't help that we have both '.gitignore' and '.git/info/exclude' files and conditions like: > + if (excluded && > + (dir->flags & DIR_SHOW_IGNORED_TOO) && > + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 3/6] dir: fix confusion based on variable tense 2020-01-31 18:04 ` SZEDER Gábor @ 2020-01-31 18:17 ` Elijah Newren 0 siblings, 0 replies; 76+ messages in thread From: Elijah Newren @ 2020-01-31 18:17 UTC (permalink / raw) To: SZEDER Gábor Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, Samuel Lijin, Nguyễn Thái Ngọc Duy On Fri, Jan 31, 2020 at 10:04 AM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > On Wed, Jan 29, 2020 at 10:03:40PM +0000, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Despite having contributed several fixes in this area, I have for months > > (years?) assumed that the "exclude" variable was a directive; this > > caused me to think of it as a different mode we operate in and left me > > confused as I tried to build up a mental model around why we'd need such > > a directive. I mostly tried to ignore it while focusing on the pieces I > > was trying to understand. > > > > Then I finally traced this variable all back to a call to is_excluded(), > > meaning it was actually functioning as an adjective. In particular, it > > was a checked property ("Does this path match a rule in .gitignore?"), > > rather than a mode passed in from the caller. Change the variable name > > to match the part of speech used by the function called to define it, > > which will hopefully make these bits of code slightly clearer to the > > next reader. > > Slightly related questions: Does 'excluded' always mean ignored? Or > is it possible for a file to be excluded but for some other reason > than being ignored? > > I'm never really sure, and of course it doesn't help that we have both > '.gitignore' and '.git/info/exclude' files and conditions like: > > > + if (excluded && > > + (dir->flags & DIR_SHOW_IGNORED_TOO) && > > + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { > Good question; no idea. You can start digging into is_excluded() and the pattern list stored in the dir struct and try to trace it back to see if it's just the combination of ignore rules in .gitignore and .git/info/exclude and core.excludesFile, or if there is something else meant here. ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget @ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget 2020-01-30 15:33 ` Derrick Stolee 2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget ` (2 subsequent siblings) 6 siblings, 1 reply; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/dir.c b/dir.c index 225f0bc082..ef3307718a 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo = 0; + int nested_repo; /* The "len-1" is to strip the final '/' */ switch (directory_exists_in_index(istate, dirname, len-1)) { @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_none; case index_nonexistent: + nested_repo = 0; if ((dir->flags & DIR_SKIP_NESTED_GIT) || !(dir->flags & DIR_NO_GITLINKS)) { struct strbuf sb = STRBUF_INIT; -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget @ 2020-01-30 15:33 ` Derrick Stolee 2020-01-30 15:45 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-01-30 15:33 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > dir.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/dir.c b/dir.c > index 225f0bc082..ef3307718a 100644 > --- a/dir.c > +++ b/dir.c > @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > const char *dirname, int len, int baselen, int excluded, > const struct pathspec *pathspec) > { > - int nested_repo = 0; > + int nested_repo; > > /* The "len-1" is to strip the final '/' */ > switch (directory_exists_in_index(istate, dirname, len-1)) { > @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > return path_none; > > case index_nonexistent: > + nested_repo = 0; I had to look at this code in-full from en/fill-directory-fixes-more to be sure that this case was the only use of nested_repo. However, I found that this switch statement is unnecessarily complicated. By converting the switch to multiple "if" statements, I noticed that the third case actually has a "break" statement that can lead to the final "fourth case" outside the switch statement. Hopefully the patch below is a worthy replacement for this one: -->8-- From b5c04e6e028cb6c7f9e78fbdd2182383d928fe6d Mon Sep 17 00:00:00 2001 From: Derrick Stolee <dstolee@microsoft.com> Date: Thu, 30 Jan 2020 15:28:39 +0000 Subject: [PATCH] dir: refactor treat_directory to clarify variable scope The nested_repo variable in treat_directory() is created and initialized before a multi-case switch statement, but is only used by one case. In fact, this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while highlighting the fact that a "break" statement in a condition of the third case actually leads to jumping to the fourth case (after the switch). This assists a reader who provides an initial scan to notice there is a second way to approach the "show_other_directories" case than simply the response from directory_exists_in_index(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- dir.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/dir.c b/dir.c index b460211e61..e48812efe6 100644 --- a/dir.c +++ b/dir.c @@ -1659,17 +1659,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int exclude, const struct pathspec *pathspec) { - int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; - case index_nonexistent: + if (status == index_nonexistent) { + int nested_repo = 0; if ((dir->flags & DIR_SKIP_NESTED_GIT) || !(dir->flags & DIR_NO_GITLINKS)) { struct strbuf sb = STRBUF_INIT; @@ -1682,7 +1681,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, (exclude ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + goto show_other_directories; if (exclude && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { @@ -1711,7 +1710,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } /* This is the "show_other_directories" case */ - +show_other_directories: if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) return exclude ? path_excluded : path_untracked; -- 2.25.0.vfs.1.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-30 15:33 ` Derrick Stolee @ 2020-01-30 15:45 ` Elijah Newren 2020-01-30 16:00 ` Derrick Stolee 0 siblings, 1 reply; 76+ messages in thread From: Elijah Newren @ 2020-01-30 15:45 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On Thu, Jan 30, 2020 at 7:33 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > dir.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/dir.c b/dir.c > > index 225f0bc082..ef3307718a 100644 > > --- a/dir.c > > +++ b/dir.c > > @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > const char *dirname, int len, int baselen, int excluded, > > const struct pathspec *pathspec) > > { > > - int nested_repo = 0; > > + int nested_repo; > > > > /* The "len-1" is to strip the final '/' */ > > switch (directory_exists_in_index(istate, dirname, len-1)) { > > @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > return path_none; > > > > case index_nonexistent: > > + nested_repo = 0; > > I had to look at this code in-full from en/fill-directory-fixes-more to > be sure that this case was the only use of nested_repo. However, I found > that this switch statement is unnecessarily complicated. By converting > the switch to multiple "if" statements, I noticed that the third case > actually has a "break" statement that can lead to the final "fourth case" > outside the switch statement. > > Hopefully the patch below is a worthy replacement for this one: > > -->8-- > > From b5c04e6e028cb6c7f9e78fbdd2182383d928fe6d Mon Sep 17 00:00:00 2001 > From: Derrick Stolee <dstolee@microsoft.com> > Date: Thu, 30 Jan 2020 15:28:39 +0000 > Subject: [PATCH] dir: refactor treat_directory to clarify variable scope > > The nested_repo variable in treat_directory() is created and > initialized before a multi-case switch statement, but is only > used by one case. In fact, this switch is very asymmetrical, > as the first two cases are simple but the third is more > complicated than the rest of the method. > > Extract the switch statement into a series of "if" statements. > This simplifies the trivial cases, while highlighting the fact > that a "break" statement in a condition of the third case > actually leads to jumping to the fourth case (after the switch). > This assists a reader who provides an initial scan to notice > there is a second way to approach the "show_other_directories" > case than simply the response from directory_exists_in_index(). Wait, I'm lost. Wasn't that break statement the only way to get to the "show_other_directories" block of code after the switch statement? I can't see where the second way is; am I missing something? That is, unless directory_exists_in_index() suddenly starts returning some value other than the three current possibilities. Perhaps we should throw a BUG() if we get anything other than index_directory, index_gitdir, or index_nonexistent. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > dir.c | 17 ++++++++--------- > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/dir.c b/dir.c > index b460211e61..e48812efe6 100644 > --- a/dir.c > +++ b/dir.c > @@ -1659,17 +1659,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > const char *dirname, int len, int baselen, int exclude, > const struct pathspec *pathspec) > { > - int nested_repo = 0; > - > /* The "len-1" is to strip the final '/' */ > - switch (directory_exists_in_index(istate, dirname, len-1)) { > - case index_directory: > - return path_recurse; > + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); > > - case index_gitdir: > + if (status == index_directory) > + return path_recurse; > + if (status == index_gitdir) > return path_none; > > - case index_nonexistent: > + if (status == index_nonexistent) { > + int nested_repo = 0; > if ((dir->flags & DIR_SKIP_NESTED_GIT) || > !(dir->flags & DIR_NO_GITLINKS)) { > struct strbuf sb = STRBUF_INIT; > @@ -1682,7 +1681,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > (exclude ? path_excluded : path_untracked)); > > if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) > - break; > + goto show_other_directories; > if (exclude && > (dir->flags & DIR_SHOW_IGNORED_TOO) && > (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { > @@ -1711,7 +1710,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > } I'd say we'd want to add a BUG("Unhandled value for directory_exists_in_index: %d\n", status); right here. > > /* This is the "show_other_directories" case */ > - > +show_other_directories: > if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > return exclude ? path_excluded : path_untracked; > > -- > 2.25.0.vfs.1.1 Otherwise, the patch looks good to me and I'll be happy to replace my patch with this one. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-30 15:45 ` Elijah Newren @ 2020-01-30 16:00 ` Derrick Stolee 2020-01-30 16:10 ` Derrick Stolee 0 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-01-30 16:00 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On 1/30/2020 10:45 AM, Elijah Newren wrote: > On Thu, Jan 30, 2020 at 7:33 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote: >>> From: Elijah Newren <newren@gmail.com> >>> >>> Signed-off-by: Elijah Newren <newren@gmail.com> >>> --- >>> dir.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/dir.c b/dir.c >>> index 225f0bc082..ef3307718a 100644 >>> --- a/dir.c >>> +++ b/dir.c >>> @@ -1659,7 +1659,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, >>> const char *dirname, int len, int baselen, int excluded, >>> const struct pathspec *pathspec) >>> { >>> - int nested_repo = 0; >>> + int nested_repo; >>> >>> /* The "len-1" is to strip the final '/' */ >>> switch (directory_exists_in_index(istate, dirname, len-1)) { >>> @@ -1670,6 +1670,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, >>> return path_none; >>> >>> case index_nonexistent: >>> + nested_repo = 0; >> >> I had to look at this code in-full from en/fill-directory-fixes-more to >> be sure that this case was the only use of nested_repo. However, I found >> that this switch statement is unnecessarily complicated. By converting >> the switch to multiple "if" statements, I noticed that the third case >> actually has a "break" statement that can lead to the final "fourth case" >> outside the switch statement. >> >> Hopefully the patch below is a worthy replacement for this one: >> >> -->8-- >> >> From b5c04e6e028cb6c7f9e78fbdd2182383d928fe6d Mon Sep 17 00:00:00 2001 >> From: Derrick Stolee <dstolee@microsoft.com> >> Date: Thu, 30 Jan 2020 15:28:39 +0000 >> Subject: [PATCH] dir: refactor treat_directory to clarify variable scope >> >> The nested_repo variable in treat_directory() is created and >> initialized before a multi-case switch statement, but is only >> used by one case. In fact, this switch is very asymmetrical, >> as the first two cases are simple but the third is more >> complicated than the rest of the method. >> >> Extract the switch statement into a series of "if" statements. >> This simplifies the trivial cases, while highlighting the fact >> that a "break" statement in a condition of the third case >> actually leads to jumping to the fourth case (after the switch). >> This assists a reader who provides an initial scan to notice >> there is a second way to approach the "show_other_directories" >> case than simply the response from directory_exists_in_index(). > > Wait, I'm lost. Wasn't that break statement the only way to get to > the "show_other_directories" block of code after the switch statement? > I can't see where the second way is; am I missing something? Ah, I guess I didn't realize that exist_status didn't have a fourth mode. I was assuming that normally the switch would not hit any of the case statements was the way you would _assume_ to hit the block after the switch. So yes, my statement is incorrect, but the intention is correct: the flow of this method is very confusing. > That is, unless directory_exists_in_index() suddenly starts returning > some value other than the three current possibilities. Perhaps we > should throw a BUG() if we get anything other than index_directory, > index_gitdir, or index_nonexistent. > >> >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> >> --- >> dir.c | 17 ++++++++--------- >> 1 file changed, 8 insertions(+), 9 deletions(-) >> >> diff --git a/dir.c b/dir.c >> index b460211e61..e48812efe6 100644 >> --- a/dir.c >> +++ b/dir.c >> @@ -1659,17 +1659,16 @@ static enum path_treatment treat_directory(struct dir_struct *dir, >> const char *dirname, int len, int baselen, int exclude, >> const struct pathspec *pathspec) >> { >> - int nested_repo = 0; >> - >> /* The "len-1" is to strip the final '/' */ >> - switch (directory_exists_in_index(istate, dirname, len-1)) { >> - case index_directory: >> - return path_recurse; >> + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); >> >> - case index_gitdir: >> + if (status == index_directory) >> + return path_recurse; >> + if (status == index_gitdir) >> return path_none; >> >> - case index_nonexistent: >> + if (status == index_nonexistent) { Since exist_status only has three options, this "if" is redundant. >> + int nested_repo = 0; >> if ((dir->flags & DIR_SKIP_NESTED_GIT) || >> !(dir->flags & DIR_NO_GITLINKS)) { >> struct strbuf sb = STRBUF_INIT; >> @@ -1682,7 +1681,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, >> (exclude ? path_excluded : path_untracked)); >> >> if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) >> - break; >> + goto show_other_directories; It would be better to nest the rest of this block in an if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) >> if (exclude && >> (dir->flags & DIR_SHOW_IGNORED_TOO) && >> (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { >> @@ -1711,7 +1710,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, >> } > > I'd say we'd want to add a BUG("Unhandled value for > directory_exists_in_index: %d\n", status); right here. > >> >> /* This is the "show_other_directories" case */ >> - >> +show_other_directories: ...allowing us to drop this. >> if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) >> return exclude ? path_excluded : path_untracked; >> >> -- >> 2.25.0.vfs.1.1 > > Otherwise, the patch looks good to me and I'll be happy to replace my > patch with this one. Let me send a v2 of this patch now that you've pointed out my error. It is worth making this method clearer before you expand substantially on this final case. Thanks, -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-30 16:00 ` Derrick Stolee @ 2020-01-30 16:10 ` Derrick Stolee 2020-01-30 16:20 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-01-30 16:10 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On 1/30/2020 11:00 AM, Derrick Stolee wrote: > > Let me send a v2 of this patch now that you've pointed out my error. It > is worth making this method clearer before you expand substantially on > this final case. Here we are: -->8-- From 3fb4fdda25affe9fe6b3e91050e8ad105bcb6fe0 Mon Sep 17 00:00:00 2001 From: Derrick Stolee <dstolee@microsoft.com> Date: Thu, 30 Jan 2020 15:28:39 +0000 Subject: [PATCH v2] dir: refactor treat_directory to clarify control flow The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- dir.c | 33 +++++++++++++++------------------ 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/dir.c b/dir.c index b460211e61..0989558ae6 100644 --- a/dir.c +++ b/dir.c @@ -1660,29 +1660,26 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const struct pathspec *pathspec) { int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; - case index_nonexistent: - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; - strbuf_addstr(&sb, dirname); - nested_repo = is_nonbare_repository_dir(&sb); - strbuf_release(&sb); - } - if (nested_repo) - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + if ((dir->flags & DIR_SKIP_NESTED_GIT) || + !(dir->flags & DIR_NO_GITLINKS)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + } + if (nested_repo) + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : + (exclude ? path_excluded : path_untracked)); - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { if (exclude && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { -- 2.25.0.vfs.1.1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-30 16:10 ` Derrick Stolee @ 2020-01-30 16:20 ` Elijah Newren 2020-01-30 18:17 ` Derrick Stolee 0 siblings, 1 reply; 76+ messages in thread From: Elijah Newren @ 2020-01-30 16:20 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On Thu, Jan 30, 2020 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 1/30/2020 11:00 AM, Derrick Stolee wrote: > > > > Let me send a v2 of this patch now that you've pointed out my error. It > > is worth making this method clearer before you expand substantially on > > this final case. > > Here we are: > > -->8-- > > From 3fb4fdda25affe9fe6b3e91050e8ad105bcb6fe0 Mon Sep 17 00:00:00 2001 > From: Derrick Stolee <dstolee@microsoft.com> > Date: Thu, 30 Jan 2020 15:28:39 +0000 > Subject: [PATCH v2] dir: refactor treat_directory to clarify control flow > > The logic in treat_directory() is handled by a multi-case > switch statement, but this switch is very asymmetrical, as > the first two cases are simple but the third is more > complicated than the rest of the method. In fact, the third > case includes a "break" statement that leads to the block > of code outside the switch statement. That is the only way > to reach that block, as the switch handles all possible > values from directory_exists_in_index(); > > Extract the switch statement into a series of "if" statements. > This simplifies the trivial cases, while clarifying how to > reach the "show_other_directories" case. This is particularly > important as the "show_other_directories" case will expand > in a later change. > > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > dir.c | 33 +++++++++++++++------------------ > 1 file changed, 15 insertions(+), 18 deletions(-) > > diff --git a/dir.c b/dir.c > index b460211e61..0989558ae6 100644 > --- a/dir.c > +++ b/dir.c > @@ -1660,29 +1660,26 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > const struct pathspec *pathspec) > { > int nested_repo = 0; > - > /* The "len-1" is to strip the final '/' */ > - switch (directory_exists_in_index(istate, dirname, len-1)) { > - case index_directory: > - return path_recurse; > + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); > > - case index_gitdir: > + if (status == index_directory) > + return path_recurse; > + if (status == index_gitdir) > return path_none; I think right here we should add: if (status != index_nonexistent): BUG("Unhandled value for directory_exists_in_index: %d\n", status); for future-proofing, since both you and I had to look up what possibilities existed as a return status from directory_exists_in_index(), and I'd rather a large warning was thrown if someone ever adds a fourth option to that function rather than assume treat_directory() is fine and only needs to special case two choices. Or we could add an assert or a code comment, just so long as we document to future readers that the remainder of the code is assuming status==index_nonexistent. > - case index_nonexistent: > - if ((dir->flags & DIR_SKIP_NESTED_GIT) || > - !(dir->flags & DIR_NO_GITLINKS)) { > - struct strbuf sb = STRBUF_INIT; > - strbuf_addstr(&sb, dirname); > - nested_repo = is_nonbare_repository_dir(&sb); > - strbuf_release(&sb); > - } > - if (nested_repo) > - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : > - (exclude ? path_excluded : path_untracked)); > + if ((dir->flags & DIR_SKIP_NESTED_GIT) || > + !(dir->flags & DIR_NO_GITLINKS)) { > + struct strbuf sb = STRBUF_INIT; > + strbuf_addstr(&sb, dirname); > + nested_repo = is_nonbare_repository_dir(&sb); > + strbuf_release(&sb); > + } > + if (nested_repo) > + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : > + (exclude ? path_excluded : path_untracked)); > > - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) > - break; > + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { > if (exclude && > (dir->flags & DIR_SHOW_IGNORED_TOO) && > (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { > -- > 2.25.0.vfs.1.1 Otherwise, I'm quite happy with these changes. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 4/6] dir: move setting of nested_repo next to its actual usage 2020-01-30 16:20 ` Elijah Newren @ 2020-01-30 18:17 ` Derrick Stolee 0 siblings, 0 replies; 76+ messages in thread From: Derrick Stolee @ 2020-01-30 18:17 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On 1/30/2020 11:20 AM, Elijah Newren wrote: > On Thu, Jan 30, 2020 at 8:10 AM Derrick Stolee <stolee@gmail.com> wrote: >> diff --git a/dir.c b/dir.c >> index b460211e61..0989558ae6 100644 >> --- a/dir.c >> +++ b/dir.c >> @@ -1660,29 +1660,26 @@ static enum path_treatment treat_directory(struct dir_struct *dir, >> const struct pathspec *pathspec) >> { >> int nested_repo = 0; >> - >> /* The "len-1" is to strip the final '/' */ >> - switch (directory_exists_in_index(istate, dirname, len-1)) { >> - case index_directory: >> - return path_recurse; >> + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); >> >> - case index_gitdir: >> + if (status == index_directory) >> + return path_recurse; >> + if (status == index_gitdir) >> return path_none; > > I think right here we should add: > > if (status != index_nonexistent): > BUG("Unhandled value for directory_exists_in_index: > %d\n", status); > > for future-proofing, since both you and I had to look up what > possibilities existed as a return status from > directory_exists_in_index(), and I'd rather a large warning was thrown > if someone ever adds a fourth option to that function rather than > assume treat_directory() is fine and only needs to special case two > choices. > > Or we could add an assert or a code comment, just so long as we > document to future readers that the remainder of the code is assuming > status==index_nonexistent. I'm happy if you squash this into the commit. Thanks! ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH 5/6] dir: replace exponential algorithm with a linear one 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget @ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget 2020-01-30 15:55 ` Derrick Stolee 2020-01-31 17:13 ` SZEDER Gábor 2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 6 siblings, 2 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see e.g. commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can changed based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to (accidentally?) think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it was quite easy to decide to recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we didn't also cause read_directory_recursive() to recurse into the same directory again. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for a some directory and a later one would return e.g. path_none, and the code relied on the side-effect of the first adding untracked entries to dir->entries in order to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. (I'll note that this turmoil makes working with dir.c extremely unpleasant for me; I keep hoping it'll get better, but it never seems to.) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 After this fix, those drop to: 10: 0.00 11: 0.00 12: 0.00 13: 0.00 14: 0.00 15: 0.00 16: 0.00 17: 0.00 18: 0.00 19: 0.00 20: 0.00 21: 0.00 22: 0.00 23: 0.00 24: 0.00 25: 0.00 In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. WARNING: This change breaks t7063. I don't know whether that is to be expected (I now intentionally visit untracked directories differently so naturally the untracked cache should change), or if I've broken something. I'm hoping to get an untracked cache expert to chime in... Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 105 insertions(+), 46 deletions(-) diff --git a/dir.c b/dir.c index ef3307718a..aaf038a9c4 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; + int nested_repo, old_ignored_nr, stop_early; /* The "len-1" is to strip the final '/' */ switch (directory_exists_in_index(istate, dirname, len-1)) { @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* + * We only need to recurse into untracked/ignored directories if + * either of the following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if + * the directory is empty) + */ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; + /* + * If we only want to determine if dirname is empty, then we can + * stop at the first file we find underneath that directory rather + * than continuing to recurse beyond it. If DIR_SHOW_IGNORED_TOO + * is set, then we want MORE than just determining if dirname is + * empty. + */ + stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files + * underneath. To do so, we'll save the current ignored_nr, and + * pop all the ones added after it if it turns out the entire + * directory is ignored. + */ + old_ignored_nr = dir->ignored_nr; + + /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, + stop_early, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { + int i; + + /* + * When stop_early is set, read_directory_recursive() will + * never return path_untracked regardless of whether + * underlying paths were untracked or ignored (because + * returning early means it excluded some paths, or + * something like that -- see commit 5aaa7fd39aaf ("Improve + * performance of git status --ignored", 2017-09-18)). + * However, we're not really concerned with the status of + * files under the directory, we just wanted to know + * whether the directory was empty (state == path_none) or + * not (state == path_excluded), and if not, we'd return + * our original status based on whether the untracked + * directory matched an exclusion pattern. + */ + if (stop_early) + state = excluded ? path_excluded : path_untracked; + + else { + /* + * When + * !stop_early && state == path_excluded + * then all paths under dirname were ignored. For + * this case, git status --porcelain wants to just + * list the directory itself as ignored and not + * list the individual paths underneath. Remove + * the individual paths underneath. + */ + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) + free(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } /* - * If this is an excluded directory, then we only need to check if - * the directory contains any files. + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. */ - return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, excluded, pathspec); + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; + + /* + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. + */ + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; + + return state; } /* @@ -1872,6 +1961,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); @@ -2177,14 +2271,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, int stop_at_first_file, const struct pathspec *pathspec) { /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * WARNING: Do NOT call recurse unless path_recurse is returned + * from treat_path(). Recursing on any other return value + * results in exponential slowdown. */ - struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; struct strbuf path = STRBUF_INIT; @@ -2206,13 +2296,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, dir_state = state; /* recurse into subdir if instructed by treat_path */ - if ((state == path_recurse) || - ((state == path_untracked) && - (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && - ((dir->flags & DIR_SHOW_IGNORED_TOO) || - (pathspec && - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { + if (state == path_recurse) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -2296,15 +2380,6 @@ static int treat_leading_path(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec) { - /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in read_directory_recursive(). See 777b420347 (dir: - * synchronize treat_leading_path() and read_directory_recursive(), - * 2019-12-19) and its parent commit for details. - */ - struct strbuf sb = STRBUF_INIT; struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; @@ -2355,23 +2430,7 @@ static int treat_leading_path(struct dir_struct *dir, strbuf_reset(&subdir); strbuf_add(&subdir, path+prevlen, baselen-prevlen); cdir.d_name = subdir.buf; - state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, - pathspec); - if (state == path_untracked && - resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && - (dir->flags & DIR_SHOW_IGNORED_TOO || - do_match_pathspec(istate, pathspec, sb.buf, sb.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { - if (!match_pathspec(istate, pathspec, sb.buf, sb.len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - add_path_to_appropriate_result_list(dir, NULL, &cdir, - istate, - &sb, baselen, - pathspec, state); - state = path_recurse; - } + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state != path_recurse) break; /* do not recurse into it */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one 2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget @ 2020-01-30 15:55 ` Derrick Stolee 2020-01-30 17:13 ` Elijah Newren 2020-01-31 17:13 ` SZEDER Gábor 1 sibling, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-01-30 15:55 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Kevin.Willford I am very enticed by the subject! On 1/29/2020 5:03 PM, Elijah Newren via GitGitGadget wrote: > Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs > for ignored files", 2017-05-18), added exactly such a case to the code, I was disappointed that the commit you mention did not add a test for the new behavior, but then found a test change in the following commit fb89888849 (dir: hide untracked contents of untracked dirs, 2017-05-18). This makes me feel better that your changes are less likely to un-do the intention of df5bcdf83aeb. > meaning we'd have two calls to read_directory_recursive() for an > untracked directory. So, if we had a file named > one/two/three/four/five/somefile.txt > and nothing in one/ was tracked, then 'git status --ignored' would > call read_directory_recursive() twice on the directory 'one/', and > each of those would call read_directory_recursive() twice on the > directory 'one/two/', and so on until read_directory_recursive() was > called 2^5 times for 'one/two/three/four/five/'. Wow! Good find. "Accidentally exponential" is a lot worse than "accidentally quadratic". At least the N here _usually_ does not grow too quickly, but the constant here (lstat-ing directories and files) is significant enough that 2^3 or 2^4 is enough to notice the difference. > Avoid calling read_directory_recursive() twice per level by moving a > lot of the special logic into treat_directory(). > > Since dir.c is somewhat complex, extra cruft built up around this over > time. While trying to unravel it, I noticed several instances where the > first call to read_directory_recursive() would return e.g. > path_untracked for a some directory and a later one would return e.g. > path_none, and the code relied on the side-effect of the first adding > untracked entries to dir->entries in order to get the correct output > despite the supposed override in return value by the later call. > > I am somewhat concerned that there are still bugs and maybe even > testcases with the wrong expectation. I have tried to carefully > document treat_directory() since it becomes more complex after this > change (though much of this complexity came from elsewhere that probably > deserved better comments to begin with). However, much of my work felt > more like a game of whackamole while attempting to make the code match > the existing regression tests than an attempt to create an > implementation that matched some clear design. That seems wrong to me, > but the rules of existing behavior had so many special cases that I had > a hard time coming up with some overarching rules about what correct > behavior is for all cases, forcing me to hope that the regression tests > are correct and sufficient. (I'll note that this turmoil makes working > with dir.c extremely unpleasant for me; I keep hoping it'll get better, > but it never seems to.) Keep fighting the good fight! It appears that some of our most-important code has these complicated cases and side-effects because it has grown so organically over time. It's unlikely that someone _could_ rewrite it to avoid that pain, as dir.c contains a lot of accumulated knowledge from the many special-cases Git handles. I suppose the only thing we can do is try to write as many detailed tests as possible. > However, on the positive side, it does make the code much faster. For > the following simple shell loop in an empty repository: > > for depth in $(seq 10 25) > do > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) > rm -rf dir > mkdir -p $dirs > >$dirs/untracked-file > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null > done > > I saw the following timings, in seconds (note that the numbers are a > little noisy from run-to-run, but the trend is very clear with every > run): > > 10: 0.03 > 11: 0.05 > 12: 0.08 > 13: 0.19 > 14: 0.29 > 15: 0.50 > 16: 1.05 > 17: 2.11 > 18: 4.11 > 19: 8.60 > 20: 17.55 > 21: 33.87 > 22: 68.71 > 23: 140.05 > 24: 274.45 > 25: 551.15 Are these timings on Linux? I imagine that the timings will increase much more quickly on Windows. > After this fix, those drop to: > > 10: 0.00 ... > 25: 0.00 Nice. I wonder if presenting these 0.00 values as a table is worth the space? At least the effect is dramatic. > In fact, it isn't until a depth of 190 nested directories that it > sometimes starts reporting a time of 0.01 seconds and doesn't > consistently report 0.01 seconds until there are 240 nested directories. > The previous code would have taken > 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS > to have completed the 240 nested directories case. It's not often > that you get to speed something up by a factor of 3*10^69. Awesome. > WARNING: This change breaks t7063. I don't know whether that is to be expected > (I now intentionally visit untracked directories differently so naturally the > untracked cache should change), or if I've broken something. I'm hoping to get > an untracked cache expert to chime in... I suppose that when the untracked cache is enabled, your expectation that we do not need to recurse into an untracked directory is incorrect: we actually want to explore that directory. Is there a mode we can check to see if we are REALLY REALLY collecting _all_ untracked paths? Perhaps we need to create one? I'm CC'ing Kevin Willford because he is more familiar with the Git index than me, and perhaps the untracked cache in particular. > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 105 insertions(+), 46 deletions(-) > > diff --git a/dir.c b/dir.c > index ef3307718a..aaf038a9c4 100644 > --- a/dir.c > +++ b/dir.c > @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > const char *dirname, int len, int baselen, int excluded, > const struct pathspec *pathspec) > { > - int nested_repo; > + /* > + * WARNING: From this function, you can return path_recurse or you > + * can call read_directory_recursive() (or neither), but > + * you CAN'T DO BOTH. > + */ > + enum path_treatment state; > + int nested_repo, old_ignored_nr, stop_early; > > /* The "len-1" is to strip the final '/' */ > switch (directory_exists_in_index(istate, dirname, len-1)) { > @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > /* This is the "show_other_directories" case */ > > - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > + /* > + * We only need to recurse into untracked/ignored directories if > + * either of the following bits is set: > + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if > + * there are ignored directories below) > + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if > + * the directory is empty) Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED flag to ensure the untracked cache loads all untracked paths? > + */ > + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) > return excluded ? path_excluded : path_untracked; > > + /* > + * If we only want to determine if dirname is empty, then we can > + * stop at the first file we find underneath that directory rather > + * than continuing to recurse beyond it. If DIR_SHOW_IGNORED_TOO > + * is set, then we want MORE than just determining if dirname is > + * empty. > + */ > + stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && > + !(dir->flags & DIR_SHOW_IGNORED_TOO)); > + > + /* > + * If /every/ file within an untracked directory is ignored, then > + * we want to treat the directory as ignored (for e.g. status > + * --porcelain), without listing the individual ignored files > + * underneath. To do so, we'll save the current ignored_nr, and > + * pop all the ones added after it if it turns out the entire > + * directory is ignored. Here is a question for an untracked cache expert: Do we store ignored paths in the untracked cache? > + */ > + old_ignored_nr = dir->ignored_nr; > + > + /* Actually recurse into dirname now, we'll fixup the state later. */ > untracked = lookup_untracked(dir->untracked, untracked, > dirname + baselen, len - baselen); > + state = read_directory_recursive(dir, istate, dirname, len, untracked, > + stop_early, stop_early, pathspec); > + > + /* There are a variety of reasons we may need to fixup the state... */ > + if (state == path_excluded) { > + int i; > + > + /* > + * When stop_early is set, read_directory_recursive() will > + * never return path_untracked regardless of whether > + * underlying paths were untracked or ignored (because > + * returning early means it excluded some paths, or > + * something like that -- see commit 5aaa7fd39aaf ("Improve > + * performance of git status --ignored", 2017-09-18)). > + * However, we're not really concerned with the status of > + * files under the directory, we just wanted to know > + * whether the directory was empty (state == path_none) or > + * not (state == path_excluded), and if not, we'd return > + * our original status based on whether the untracked > + * directory matched an exclusion pattern. > + */ > + if (stop_early) > + state = excluded ? path_excluded : path_untracked; > + > + else { > + /* > + * When > + * !stop_early && state == path_excluded > + * then all paths under dirname were ignored. For > + * this case, git status --porcelain wants to just > + * list the directory itself as ignored and not > + * list the individual paths underneath. Remove > + * the individual paths underneath. > + */ > + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) > + free(dir->ignored[i]); > + dir->ignored_nr = old_ignored_nr; > + } > + } > > /* > - * If this is an excluded directory, then we only need to check if > - * the directory contains any files. > + * If there is nothing under the current directory and we are not > + * hiding empty directories, then we need to report on the > + * untracked or ignored status of the directory itself. > */ > - return read_directory_recursive(dir, istate, dirname, len, > - untracked, 1, excluded, pathspec); > + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > + state = excluded ? path_excluded : path_untracked; > + > + /* > + * We can recurse into untracked directories that don't match any > + * of the given pathspecs when some file underneath the directory > + * might match one of the pathspecs. If so, we should make sure > + * to note that the directory itself did not match. > + */ > + if (pathspec && > + !match_pathspec(istate, pathspec, dirname, len, > + 0 /* prefix */, NULL, > + 0 /* do NOT special case dirs */)) > + state = path_none; > + > + return state; > } This is certainly a substantial change, and I'm not able to read it carefully right now. I hope to return to it soon, but hopefully I've pointed out some places that may lead you to resolve your untracked cache issues. Thanks, -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one 2020-01-30 15:55 ` Derrick Stolee @ 2020-01-30 17:13 ` Elijah Newren 2020-01-30 17:45 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Elijah Newren @ 2020-01-30 17:13 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Kevin.Willford On Thu, Jan 30, 2020 at 7:55 AM Derrick Stolee <stolee@gmail.com> wrote: > > However, on the positive side, it does make the code much faster. For > > the following simple shell loop in an empty repository: > > > > for depth in $(seq 10 25) > > do > > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) > > rm -rf dir > > mkdir -p $dirs > > >$dirs/untracked-file > > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null > > done > > > > I saw the following timings, in seconds (note that the numbers are a > > little noisy from run-to-run, but the trend is very clear with every > > run): > > > > 10: 0.03 > > 11: 0.05 > > 12: 0.08 > > 13: 0.19 > > 14: 0.29 > > 15: 0.50 > > 16: 1.05 > > 17: 2.11 > > 18: 4.11 > > 19: 8.60 > > 20: 17.55 > > 21: 33.87 > > 22: 68.71 > > 23: 140.05 > > 24: 274.45 > > 25: 551.15 > > Are these timings on Linux? I imagine that the timings will increase > much more quickly on Windows. Yes, on Linux, with an SSD for the hard drive in this case (though I suspect OS caching of the directories would probably eliminate any differences between an SSD and a spinny disk since the same directories are visited so many times). > > After this fix, those drop to: > > > > 10: 0.00 > ... > > 25: 0.00 > > Nice. I wonder if presenting these 0.00 values as a table is worth > the space? At least the effect is dramatic. I first considered a table, but then noted it didn't match the code snippet I provided and was worried I'd have to spend more time explaining how I post-processed the output from two runs than we'd gain from compressing the number of lines of the commit message. Assuming reader time was more valuable, I opted to just keep the two snippets of output. > > WARNING: This change breaks t7063. I don't know whether that is to be expected > > (I now intentionally visit untracked directories differently so naturally the > > untracked cache should change), or if I've broken something. I'm hoping to get > > an untracked cache expert to chime in... > > I suppose that when the untracked cache is enabled, your expectation that we > do not need to recurse into an untracked directory is incorrect: we actually > want to explore that directory. Is there a mode we can check to see if we > are REALLY REALLY collecting _all_ untracked paths? Perhaps we need to create > one? I don't think I made any significant changes about using the untracked cache versus traversing; the primary differences should be that I traverse each directory once instead of 2^N times. However, the previous code would traverse with both check_only=0 and check_only=1, and to avoid the whole 2^N thing I only traverse once. That fundamentally means I only won't traverse with both settings of that flag. The output in t7063 seems to suggest to me that the check_only flag matters to what the untracked-cache stores ("check_only" literally appears as part of the expected output), and the output also suggests that the untracked-cache is recording when entries are visited multiple times somehow. Or maybe I'm just totally misunderstanding the expected output in t7063. I really have no clue about that stuff. > I'm CC'ing Kevin Willford because he is more familiar with the Git index > than me, and perhaps the untracked cache in particular. Getting another set of eyes, even if they only know enough to provide hunches or guesses, would be very welcome. > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------ > > 1 file changed, 105 insertions(+), 46 deletions(-) > > > > diff --git a/dir.c b/dir.c > > index ef3307718a..aaf038a9c4 100644 > > --- a/dir.c > > +++ b/dir.c > > @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > const char *dirname, int len, int baselen, int excluded, > > const struct pathspec *pathspec) > > { > > - int nested_repo; > > + /* > > + * WARNING: From this function, you can return path_recurse or you > > + * can call read_directory_recursive() (or neither), but > > + * you CAN'T DO BOTH. > > + */ > > + enum path_treatment state; > > + int nested_repo, old_ignored_nr, stop_early; > > > > /* The "len-1" is to strip the final '/' */ > > switch (directory_exists_in_index(istate, dirname, len-1)) { > > @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > > > /* This is the "show_other_directories" case */ > > > > - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > > + /* > > + * We only need to recurse into untracked/ignored directories if > > + * either of the following bits is set: > > + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if > > + * there are ignored directories below) > > + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if > > + * the directory is empty) > > Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED > flag to ensure the untracked cache loads all untracked paths? Do you mean DIR_KEEP_UNTRACKED_CONTENTS (which is documented in dir.h as only having meaning when DIR_SHOW_IGNORED_TOO is also set, and thus caused me to not list it separately)? Speaking of DIR_KEEP_UNTRACKED_CONTENTS, though, its handling as a post-processing step in read_directory() is now inconsistent with how we handle squashing a directory full of ignores into just marking the containing directory as ignored. I think I should move the read_directory() logic for DIR_KEEP_UNTRACKED_CONTENTS to treat_directory() and use another counter similar to old_ignored_nr. It should be more efficient that way, too. > > > + */ > > + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) > > return excluded ? path_excluded : path_untracked; > > > > + /* > > + * If we only want to determine if dirname is empty, then we can > > + * stop at the first file we find underneath that directory rather > > + * than continuing to recurse beyond it. If DIR_SHOW_IGNORED_TOO > > + * is set, then we want MORE than just determining if dirname is > > + * empty. > > + */ > > + stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && > > + !(dir->flags & DIR_SHOW_IGNORED_TOO)); > > + > > + /* > > + * If /every/ file within an untracked directory is ignored, then > > + * we want to treat the directory as ignored (for e.g. status > > + * --porcelain), without listing the individual ignored files > > + * underneath. To do so, we'll save the current ignored_nr, and > > + * pop all the ones added after it if it turns out the entire > > + * directory is ignored. > > Here is a question for an untracked cache expert: Do we store ignored > paths in the untracked cache? According to 0dcb8d7fe0ec ("untracked cache: record .gitignore information and dir hierarchy", 2015-03-08), no: This cached output is about untracked files only, not ignored files because the number of tracked files is usually small, so small cache overhead, while the number of ignored files could go really high (e.g. *.o files mixing with source code). ...unless, of course, someone came along later and changed the design goals. [...] > This is certainly a substantial change, and I'm not able to read it > carefully right now. I hope to return to it soon, but hopefully I've > pointed out some places that may lead you to resolve your untracked > cache issues. Yeah, it's pretty hard to reason about; personally I needed lots of dumps of state during traversals just to partially make sense of it. I had dumps of output from both before and after my changes printing out return values of treat_directory() and paths and a bunch of other stuff and was doing lots of comparisons (and repeatedly did this for many, many different testcases with different toplevel git commands). It was particularly annoying that the old stuff would traverse everything 2^N times, half the time with check_only on and half the time with it off. It would return different state values for the same path from different calls, often depending on the side effects of dir.entries having had more entries added by the first recursion to get the right output, despite the fact that the "wrong" state was returned by treat_directory() for later visits to the same path (e.g. path_untracked returned for the first time it was visited, then path_none later, and it was a case where path_untracked was correct in my view). Despite those difficulties, having an extra set of eyes try to reason about it and pointing out anything that looks amiss or even that just looks hard to understand would be very welcome. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one 2020-01-30 17:13 ` Elijah Newren @ 2020-01-30 17:45 ` Elijah Newren 0 siblings, 0 replies; 76+ messages in thread From: Elijah Newren @ 2020-01-30 17:45 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Kevin.Willford On Thu, Jan 30, 2020 at 9:13 AM Elijah Newren <newren@gmail.com> wrote: > > On Thu, Jan 30, 2020 at 7:55 AM Derrick Stolee <stolee@gmail.com> wrote: [...] > > > @@ -1713,18 +1719,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > > > > > /* This is the "show_other_directories" case */ > > > > > > - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > > > + /* > > > + * We only need to recurse into untracked/ignored directories if > > > + * either of the following bits is set: > > > + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if > > > + * there are ignored directories below) > > > + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if > > > + * the directory is empty) > > > > Perhaps here is where you could also have a DIR_LIST_ALL_UNTRACKED > > flag to ensure the untracked cache loads all untracked paths? > > Do you mean DIR_KEEP_UNTRACKED_CONTENTS (which is documented in dir.h > as only having meaning when DIR_SHOW_IGNORED_TOO is also set, and thus > caused me to not list it separately)? > > Speaking of DIR_KEEP_UNTRACKED_CONTENTS, though, its handling as a > post-processing step in read_directory() is now inconsistent with how > we handle squashing a directory full of ignores into just marking the > containing directory as ignored. I think I should move the > read_directory() logic for DIR_KEEP_UNTRACKED_CONTENTS to > treat_directory() and use another counter similar to old_ignored_nr. > It should be more efficient that way, too. Oh, actually, I think I understand what you're getting at so let me clear it up. With DIR_SHOW_IGNORED_TOO, we always recurse to the bottom, because it's needed to find any files that might be ignored. (Maybe we could do something clever with checking .gitignore entries and seeing if it's impossible for them to match anything below the current directory, but the code doesn't do anything that clever.) As a side effect, we'll get all untracked files whenever that flag is set. As such, the only question is whether we want to keep all those extra untracked files that we found or not, which is the purpose of DIR_KEEP_UNTRACKED_CONTENTS. Without DIR_SHOW_IGNORED_TOO, there's no need or want to visit all untracked files without also learning of all ignored files (and, in fact, git-clean is currently the only one that wants to know about all untracked files). As far as a simple test goes, in a simple repository with a file named one/two/three/four/five/untracked-file and with nothing else under one/: Before my changes: $ strace -e trace=file git status --ignored 2>&1 | grep 'open("one/' | grep -v gitignore.*ENOENT | wc -l 62 Note that 62 == 2^5 + 2^4 + 2^3 + 2^2 + 2^1, showing how many directories we open and read. After my changes: $ strace -e trace=file git status --ignored 2>&1 | grep 'open("one/' | grep -v gitignore.*ENOENT | wc -l 5 showing that it does open and read each directory, but does so only once. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one 2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget 2020-01-30 15:55 ` Derrick Stolee @ 2020-01-31 17:13 ` SZEDER Gábor 2020-01-31 17:47 ` Elijah Newren 1 sibling, 1 reply; 76+ messages in thread From: SZEDER Gábor @ 2020-01-31 17:13 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Martin Melka, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On Wed, Jan 29, 2020 at 10:03:42PM +0000, Elijah Newren via GitGitGadget wrote: > Part of the tension noted above is that the treatment of a directory can > changed based on the files within it, and based on the various settings s/changed/change/, or perhaps s/changed/be changed/ ? > Since dir.c is somewhat complex, extra cruft built up around this over > time. While trying to unravel it, I noticed several instances where the > first call to read_directory_recursive() would return e.g. > path_untracked for a some directory and a later one would return e.g. s/for a some/for some/ > However, on the positive side, it does make the code much faster. For > the following simple shell loop in an empty repository: > > for depth in $(seq 10 25) > do > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) > rm -rf dir > mkdir -p $dirs > >$dirs/untracked-file > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null > done > > I saw the following timings, in seconds (note that the numbers are a > little noisy from run-to-run, but the trend is very clear with every > run): > > 10: 0.03 > 11: 0.05 > 12: 0.08 > 13: 0.19 > 14: 0.29 > 15: 0.50 > 16: 1.05 > 17: 2.11 > 18: 4.11 > 19: 8.60 > 20: 17.55 > 21: 33.87 > 22: 68.71 > 23: 140.05 > 24: 274.45 > 25: 551.15 > > After this fix, those drop to: > > 10: 0.00 > 11: 0.00 > 12: 0.00 > 13: 0.00 > 14: 0.00 > 15: 0.00 > 16: 0.00 > 17: 0.00 > 18: 0.00 > 19: 0.00 > 20: 0.00 > 21: 0.00 > 22: 0.00 > 23: 0.00 > 24: 0.00 > 25: 0.00 I agree with Derrick here: if you just said that all these report 0.00, I would have taken your word for it. Having said that... I don't know how to get more decimal places out of /use/bin/time, but our trace performance facility uses nanosecond resolution timestamps. So using this command in the loop above: GIT_TRACE_PERFORMANCE=2 git status --ignored 2>&1 >/dev/null | sed -n -e "s/.* performance: \(.*\): git command.*/$depth: \1/p" gave me this: 1: 0.000574302 s 2: 0.000584995 s 3: 0.000608684 s 4: 0.000951336 s 5: 0.000762019 s 6: 0.000816685 s 7: 0.000672516 s 8: 0.000912628 s 9: 0.000661538 s 10: 0.000687465 s 11: 0.000708880 s 12: 0.000693754 s 13: 0.000726120 s 14: 0.000737334 s 15: 0.000787362 s 16: 0.000856687 s 17: 0.000780892 s 18: 0.000790798 s 19: 0.000834411 s 20: 0.000859094 s 21: 0.001230912 s 22: 0.001048852 s 23: 0.000891057 s 24: 0.000934097 s 25: 0.001051704 s Not sure it's worth including, though. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one 2020-01-31 17:13 ` SZEDER Gábor @ 2020-01-31 17:47 ` Elijah Newren 0 siblings, 0 replies; 76+ messages in thread From: Elijah Newren @ 2020-01-31 17:47 UTC (permalink / raw) To: SZEDER Gábor Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, Samuel Lijin, Nguyễn Thái Ngọc Duy On Fri, Jan 31, 2020 at 9:13 AM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > On Wed, Jan 29, 2020 at 10:03:42PM +0000, Elijah Newren via GitGitGadget wrote: > > Part of the tension noted above is that the treatment of a directory can > > changed based on the files within it, and based on the various settings > > s/changed/change/, or perhaps s/changed/be changed/ ? > > > Since dir.c is somewhat complex, extra cruft built up around this over > > time. While trying to unravel it, I noticed several instances where the > > first call to read_directory_recursive() would return e.g. > > path_untracked for a some directory and a later one would return e.g. > > s/for a some/for some/ > > > However, on the positive side, it does make the code much faster. For > > the following simple shell loop in an empty repository: > > > > for depth in $(seq 10 25) > > do > > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) > > rm -rf dir > > mkdir -p $dirs > > >$dirs/untracked-file > > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null > > done > > > > I saw the following timings, in seconds (note that the numbers are a > > little noisy from run-to-run, but the trend is very clear with every > > run): > > > > 10: 0.03 > > 11: 0.05 > > 12: 0.08 > > 13: 0.19 > > 14: 0.29 > > 15: 0.50 > > 16: 1.05 > > 17: 2.11 > > 18: 4.11 > > 19: 8.60 > > 20: 17.55 > > 21: 33.87 > > 22: 68.71 > > 23: 140.05 > > 24: 274.45 > > 25: 551.15 > > > > After this fix, those drop to: > > > > 10: 0.00 > > 11: 0.00 > > 12: 0.00 > > 13: 0.00 > > 14: 0.00 > > 15: 0.00 > > 16: 0.00 > > 17: 0.00 > > 18: 0.00 > > 19: 0.00 > > 20: 0.00 > > 21: 0.00 > > 22: 0.00 > > 23: 0.00 > > 24: 0.00 > > 25: 0.00 > > I agree with Derrick here: if you just said that all these report > 0.00, I would have taken your word for it. Thanks, I'll include all these fixes. Good timing too, as I was about to send a re-roll. > Having said that... I don't know how to get more decimal places out > of /use/bin/time, but our trace performance facility uses nanosecond > resolution timestamps. So using this command in the loop above: > > GIT_TRACE_PERFORMANCE=2 git status --ignored 2>&1 >/dev/null | > sed -n -e "s/.* performance: \(.*\): git command.*/$depth: \1/p" > > gave me this: > > 1: 0.000574302 s > 2: 0.000584995 s > 3: 0.000608684 s > 4: 0.000951336 s > 5: 0.000762019 s > 6: 0.000816685 s > 7: 0.000672516 s > 8: 0.000912628 s > 9: 0.000661538 s > 10: 0.000687465 s > 11: 0.000708880 s > 12: 0.000693754 s > 13: 0.000726120 s > 14: 0.000737334 s > 15: 0.000787362 s > 16: 0.000856687 s > 17: 0.000780892 s > 18: 0.000790798 s > 19: 0.000834411 s > 20: 0.000859094 s > 21: 0.001230912 s > 22: 0.001048852 s > 23: 0.000891057 s > 24: 0.000934097 s > 25: 0.001051704 s > > Not sure it's worth including, though. Yeah, I'm afraid people will spend time trying to analyze it and the numbers are extremely noisy. I instead included some words about counting the number of untracked files opened according to strace, which shows before we had 2^(1+$depth)-2 untracked directories get opened and after we had exactly $depth get opened. ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH 6/6] t7063: blindly accept diffs 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget @ 2020-01-29 22:03 ` Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-29 22:03 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Assuming that the changes I made in the last commit to drastically modify how and when and especially how frequently untracked paths are visited should result in changes to the untracked-cache, this commit simply updates the t7063 testcases to match what the code now reports. If this is correct, this commit should be squashed into the previous one. It'd be nice if I could get an untracked-cache expert to comment on this... Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7063-status-untracked-cache.sh | 50 ++++++++++++------------------- 1 file changed, 19 insertions(+), 31 deletions(-) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 190ae149cf..c1b0fd0540 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -85,9 +85,7 @@ dtwo/ three /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_expect_success 'status first time (empty cache)' ' @@ -140,8 +138,6 @@ test_expect_success 'modify in root directory, one dir invalidation' ' A done/one A one A two -?? dthree/ -?? dtwo/ ?? four ?? three EOF @@ -164,15 +160,11 @@ core.excludesfile 0000000000000000000000000000000000000000 exclude_per_dir .gitignore flags 00000006 / 0000000000000000000000000000000000000000 recurse valid -dthree/ -dtwo/ four three /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -217,9 +209,7 @@ dtwo/ three /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -235,6 +225,7 @@ A done/one A one A two ?? .gitignore +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../actual && @@ -256,11 +247,11 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -277,7 +268,6 @@ flags 00000006 /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -290,7 +280,6 @@ test_expect_success 'status after the move' ' A done/one A one ?? .gitignore -?? dtwo/ ?? two EOF test_cmp ../status.expect ../actual && @@ -312,12 +301,10 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ two /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -334,7 +321,6 @@ flags 00000006 /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -348,7 +334,6 @@ A done/one A one A two ?? .gitignore -?? dtwo/ EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && @@ -369,11 +354,9 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -392,7 +375,6 @@ test_expect_success 'status after commit' ' git status --porcelain >../actual && cat >../status.expect <<EOF && ?? .gitignore -?? dtwo/ EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && @@ -413,11 +395,9 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -451,7 +431,6 @@ test_expect_success 'test sparse status with untracked cache' ' M done/two ?? .gitignore ?? done/five -?? dtwo/ EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && @@ -472,12 +451,10 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -491,7 +468,6 @@ test_expect_success 'test sparse status again with untracked cache' ' M done/two ?? .gitignore ?? done/five -?? dtwo/ EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && @@ -519,7 +495,6 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' ?? .gitignore ?? done/five ?? done/sub/ -?? dtwo/ EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && @@ -540,17 +515,13 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five sub/ /done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -sub/ /done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -file /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect-from-test-dump ../actual ' @@ -615,6 +586,23 @@ test_expect_success 'setting core.untrackedCache to true and using git status cr test_cmp ../expect-no-uc ../actual && git status && test-tool dump-untracked-cache >../actual && + cat >../expect-from-test-dump <<EOF && +info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +core.excludesfile 0000000000000000000000000000000000000000 +exclude_per_dir .gitignore +flags 00000006 +/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid +.gitignore +dthree/ +dtwo/ +/done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid +five +sub/ +/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid +/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid +/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +EOF test_cmp ../expect-from-test-dump ../actual ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget @ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget ` (6 more replies) 6 siblings, 7 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren This patch series builds on en/fill-directory-fixes-more. This series should be considered an RFC because of the untracked-cache changes (see the last two commits), for which I'm hoping to get an untracked-cache expert to comment. This series does provide some modest speedups (see second to last commit message), and should allow 'git status --ignored' to complete in a more reasonable timeframe for Martin Melka (see https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ ) Changes since v1: * Replaced patch 4 with improved version from Stolee (with additional improvement of my own) * Clarifications, wording fixes, and more about linear perf in commit message to patch 5 * More detail in patch 5 about why "whackamole" particularly makes me uneasy for dir.c Stuff clearly still missing from v2: * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ which I think would make the code cleaner & clearer. * I still have not addressed the untracked-cache issue mentioned in the last two commits. I looked at it very, very briefly, but I was really close to doing something similar to [1] and just dropping my patches in this series before even submitting them on Wednesday[2] (dir.c is a really unpleasant to work in). Other than wording fixes, I just need a week or two off from this area before I dig further, unless someone else wants to dive in and needs me to provide pointers on what I've done so far. [1] https://lore.kernel.org/git/pull.676.v3.git.git.1576571586.gitgitgadget@gmail.com/ [2] I was inches from doing that Wednesday morning. I had done several rounds of "Okay, I fixed all the tests that broke with my changes last time, let's re-run the testsuite -- wow, four totally different tests from testfiles I hadn't looked at before now break", and decided that I would only do one more before dropping it an maybe coming back in a month or two. That time happened to work, minus the untracked-cache, so I decided to put it in front of other eyeballs. Derrick Stolee (1): dir: refactor treat_directory to clarify control flow Elijah Newren (5): dir: consolidate treat_path() and treat_one_path() dir: fix broken comment dir: fix confusion based on variable tense dir: replace exponential algorithm with a linear one t7063: blindly accept diffs dir.c | 331 +++++++++++++++++------------- t/t7063-status-untracked-cache.sh | 50 ++--- 2 files changed, 208 insertions(+), 173 deletions(-) base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v2 Pull-Request: https://github.com/git/git/pull/700 Range-diff vs v1: 1: 27bc135796 = 1: 27bc135796 dir: consolidate treat_path() and treat_one_path() 2: 2ceb64ae61 = 2: 2ceb64ae61 dir: fix broken comment 3: e6d21228d1 = 3: e6d21228d1 dir: fix confusion based on variable tense 4: 3b2ec5eaf6 ! 4: f73f0d66d1 dir: move setting of nested_repo next to its actual usage @@ -1,26 +1,73 @@ -Author: Elijah Newren <newren@gmail.com> +Author: Derrick Stolee <dstolee@microsoft.com> - dir: move setting of nested_repo next to its actual usage + dir: refactor treat_directory to clarify control flow + The logic in treat_directory() is handled by a multi-case + switch statement, but this switch is very asymmetrical, as + the first two cases are simple but the third is more + complicated than the rest of the method. In fact, the third + case includes a "break" statement that leads to the block + of code outside the switch statement. That is the only way + to reach that block, as the switch handles all possible + values from directory_exists_in_index(); + + Extract the switch statement into a series of "if" statements. + This simplifies the trivial cases, while clarifying how to + reach the "show_other_directories" case. This is particularly + important as the "show_other_directories" case will expand + in a later change. + + Helped-by: Elijah Newren <newren@gmail.com> + Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> diff --git a/dir.c b/dir.c --- a/dir.c +++ b/dir.c @@ - const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { -- int nested_repo = 0; -+ int nested_repo; - + int nested_repo = 0; +- /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { -@@ +- switch (directory_exists_in_index(istate, dirname, len-1)) { +- case index_directory: +- return path_recurse; ++ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); + +- case index_gitdir: ++ if (status == index_directory) ++ return path_recurse; ++ if (status == index_gitdir) return path_none; ++ if (status != index_nonexistent) ++ BUG("Unhandled value for directory_exists_in_index: %d\n", status); + +- case index_nonexistent: +- if ((dir->flags & DIR_SKIP_NESTED_GIT) || +- !(dir->flags & DIR_NO_GITLINKS)) { +- struct strbuf sb = STRBUF_INIT; +- strbuf_addstr(&sb, dirname); +- nested_repo = is_nonbare_repository_dir(&sb); +- strbuf_release(&sb); +- } +- if (nested_repo) +- return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : +- (excluded ? path_excluded : path_untracked)); ++ if ((dir->flags & DIR_SKIP_NESTED_GIT) || ++ !(dir->flags & DIR_NO_GITLINKS)) { ++ struct strbuf sb = STRBUF_INIT; ++ strbuf_addstr(&sb, dirname); ++ nested_repo = is_nonbare_repository_dir(&sb); ++ strbuf_release(&sb); ++ } ++ if (nested_repo) ++ return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : ++ (excluded ? path_excluded : path_untracked)); - case index_nonexistent: -+ nested_repo = 0; - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; +- if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) +- break; ++ if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { 5: 40b378e7ad ! 5: d3136ef52f dir: replace exponential algorithm with a linear one @@ -20,26 +20,29 @@ and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the - directory and just report the directory itself as ignored. + directory and just report the directory itself as ignored. (Note + that although this forces us to walk all untracked files underneath + the directory as well, we strip them from the output, except for + users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the - directory itself from the list of paths (see e.g. commit - 404ebceda01c ("dir: also check directories for matching pathspecs", - 2019-09-17)) + directory itself from the list of paths (see commit 404ebceda01c + ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can - changed based on the files within it, and based on the various settings + change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, - it is easy to (accidentally?) think in terms of "treat_directory() tells - us what to do with a directory, and read_directory_recursive() is the - thing that recurses". Since we need to look into a directory to know - how to treat it, though, it was quite easy to decide to recurse into the + it is easy to think in terms of "treat_directory() tells us what to do + with a directory, and read_directory_recursive() is the thing that + recurses". Since we need to look into a directory to know how to treat + it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() - call. Adding such a call is actually fine, IF we didn't also cause - read_directory_recursive() to recurse into the same directory again. + call. Adding such a call is actually fine, IF we make sure that + read_directory_recursive() does not also recurse into that same + directory. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, @@ -58,10 +61,12 @@ Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. - path_untracked for a some directory and a later one would return e.g. - path_none, and the code relied on the side-effect of the first adding - untracked entries to dir->entries in order to get the correct output - despite the supposed override in return value by the later call. + path_untracked for some directory and a later one would return e.g. + path_none, despite the fact that the directory clearly should have been + considered untracked. The code happened to work due to the side-effect + from the first invocation of adding untracked entries to dir->entries; + this allowed it to get the correct output despite the supposed override + in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully @@ -74,9 +79,40 @@ but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests - are correct and sufficient. (I'll note that this turmoil makes working - with dir.c extremely unpleasant for me; I keep hoping it'll get better, - but it never seems to.) + are correct and sufficient. Such a hope seems likely to be ill-founded, + given my experience with dir.c-related testcases in the last few months: + + Examples where the documentation was hard to parse or even just wrong: + * 3aca58045f4f (git-clean.txt: do not claim we will delete files with + -n/--dry-run, 2019-09-17) + * 09487f2cbad3 (clean: avoid removing untracked files in a nested git + repository, 2019-09-17) + * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) + Examples where testcases were declared wrong and changed: + * 09487f2cbad3 (clean: avoid removing untracked files in a nested git + repository, 2019-09-17) + * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) + * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within + leading directories", 2019-12-10) + Examples where testcases were clearly inadequate: + * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an + ignored file breakage, 2019-08-25) + * 7541cc530239 (t7300: add testcases showing failure to clean specified + pathspecs, 2019-09-17) + * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, + 2019-09-17) + * 404ebceda01c (dir: also check directories for matching pathspecs, + 2019-09-17) + * 09487f2cbad3 (clean: avoid removing untracked files in a nested git + repository, 2019-09-17) + * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) + * 452efd11fbf6 (t3011: demonstrate directory traversal failures, + 2019-12-10) + * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) + Examples where "correct behavior" was unclear to everyone: + https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ + Other commits of note: + * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: @@ -111,27 +147,14 @@ 24: 274.45 25: 551.15 - After this fix, those drop to: - - 10: 0.00 - 11: 0.00 - 12: 0.00 - 13: 0.00 - 14: 0.00 - 15: 0.00 - 16: 0.00 - 17: 0.00 - 18: 0.00 - 19: 0.00 - 20: 0.00 - 21: 0.00 - 22: 0.00 - 23: 0.00 - 24: 0.00 - 25: 0.00 + For the above run, using strace I can look for the number of untracked + directories opened and can verify that it matches the expected + 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). - In fact, it isn't until a depth of 190 nested directories that it - sometimes starts reporting a time of 0.01 seconds and doesn't + After this fix, with strace I can verify that the number of untracked + directories that are opened drops to just $depth, and the timings all + drop to 0.00. In fact, it isn't until a depth of 190 nested directories + that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS @@ -152,17 +175,17 @@ const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { -- int nested_repo; +- int nested_repo = 0; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; -+ int nested_repo, old_ignored_nr, stop_early; - ++ int nested_repo = 0, old_ignored_nr, stop_early; /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); + @@ /* This is the "show_other_directories" case */ @@ -292,9 +315,9 @@ - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. -+ * WARNING: Do NOT call recurse unless path_recurse is returned -+ * from treat_path(). Recursing on any other return value -+ * results in exponential slowdown. ++ * WARNING: Do NOT recurse unless path_recurse is returned from ++ * treat_path(). Recursing on any other return value ++ * can result in exponential slowdown. */ - struct cached_dir cdir; 6: 7fb8063541 = 6: 9a3f20656e t7063: blindly accept diffs -- gitgitgadget ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget @ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget ` (5 subsequent siblings) 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 16e2cfa90993 ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit ad6f2157f951 ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 121 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/dir.c b/dir.c index b460211e61..68c56aeddb 100644 --- a/dir.c +++ b/dir.c @@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate, return dtype; } -static enum path_treatment treat_one_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec, - int dtype) -{ - int exclude; - int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); +static enum path_treatment treat_path_fast(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + strbuf_setlen(path, baselen); + if (!cdir->ucd) { + strbuf_addstr(path, cdir->file); + return path_untracked; + } + strbuf_addstr(path, cdir->ucd->name); + /* treat_one_path() does this before it calls treat_directory() */ + strbuf_complete(path, '/'); + if (cdir->ucd->check_only) + /* + * check_only is set as a result of treat_directory() getting + * to its bottom. Verify again the same set of directories + * with check_only set. + */ + return read_directory_recursive(dir, istate, path->buf, path->len, + cdir->ucd, 1, 0, pathspec); + /* + * We get path_recurse in the first run when + * directory_exists_in_index() returns index_nonexistent. We + * are sure that new changes in the index does not impact the + * outcome. Return now. + */ + return path_recurse; +} + +static enum path_treatment treat_path(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + int has_path_in_index, dtype, exclude; enum path_treatment path_treatment; - dtype = resolve_dtype(dtype, istate, path->buf, path->len); + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, + baselen, pathspec); + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) + return path_none; + strbuf_setlen(path, baselen); + strbuf_addstr(path, cdir->d_name); + if (simplify_away(path->buf, path->len, pathspec)) + return path_none; + + dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len); /* Always exclude indexed files */ + has_path_in_index = !!index_file_exists(istate, path->buf, path->len, + ignore_case); if (dtype != DT_DIR && has_path_in_index) return path_none; @@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, } } -static enum path_treatment treat_path_fast(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - strbuf_setlen(path, baselen); - if (!cdir->ucd) { - strbuf_addstr(path, cdir->file); - return path_untracked; - } - strbuf_addstr(path, cdir->ucd->name); - /* treat_one_path() does this before it calls treat_directory() */ - strbuf_complete(path, '/'); - if (cdir->ucd->check_only) - /* - * check_only is set as a result of treat_directory() getting - * to its bottom. Verify again the same set of directories - * with check_only set. - */ - return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, 0, pathspec); - /* - * We get path_recurse in the first run when - * directory_exists_in_index() returns index_nonexistent. We - * are sure that new changes in the index does not impact the - * outcome. Return now. - */ - return path_recurse; -} - -static enum path_treatment treat_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - if (!cdir->d_name) - return treat_path_fast(dir, untracked, cdir, istate, path, - baselen, pathspec); - if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) - return path_none; - strbuf_setlen(path, baselen); - strbuf_addstr(path, cdir->d_name); - if (simplify_away(path->buf, path->len, pathspec)) - return path_none; - - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, - cdir->d_type); -} - static void add_untracked(struct untracked_cache_dir *dir, const char *name) { if (!dir) -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v2 2/6] dir: fix broken comment 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget @ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget ` (4 subsequent siblings) 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index 68c56aeddb..c358158f55 100644 --- a/dir.c +++ b/dir.c @@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, add_untracked(untracked, path.buf + baselen); break; } - /* skip the dir_add_* part */ + /* skip the add_path_to_appropriate_result_list() */ continue; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v2 3/6] dir: fix confusion based on variable tense 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget @ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index c358158f55..225f0bc082 100644 --- a/dir.c +++ b/dir.c @@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, static enum path_treatment treat_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked, - const char *dirname, int len, int baselen, int exclude, + const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { int nested_repo = 0; @@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } if (nested_repo) return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + (excluded ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; - if (exclude && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { /* * This is an excluded directory and we are @@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); @@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * the directory contains any files. */ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, exclude, pathspec); + untracked, 1, excluded, pathspec); } /* @@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int has_path_in_index, dtype, exclude; + int has_path_in_index, dtype, excluded; enum path_treatment path_treatment; if (!cdir->d_name) @@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent)) return path_none; - exclude = is_excluded(dir, istate, path->buf, &dtype); + excluded = is_excluded(dir, istate, path->buf, &dtype); /* * Excluded? If we don't explicitly want to show * ignored files, ignore it */ - if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) + if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) return path_excluded; switch (dtype) { @@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, strbuf_addch(path, '/'); path_treatment = treat_directory(dir, istate, untracked, path->buf, path->len, - baselen, exclude, pathspec); + baselen, excluded, pathspec); /* * If 1) we only want to return directories that * match an exclude pattern and 2) this directory does @@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, * recurse into this directory (instead of marking the * directory itself as an ignored path). */ - if (!exclude && + if (!excluded && path_treatment == path_excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) @@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_treatment; case DT_REG: case DT_LNK: - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; } } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2020-01-31 18:31 ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget @ 2020-01-31 18:31 ` Derrick Stolee via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget ` (2 subsequent siblings) 6 siblings, 0 replies; 76+ messages in thread From: Derrick Stolee via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/dir.c b/dir.c index 225f0bc082..6867356a31 100644 --- a/dir.c +++ b/dir.c @@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const struct pathspec *pathspec) { int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; + if (status != index_nonexistent) + BUG("Unhandled value for directory_exists_in_index: %d\n", status); - case index_nonexistent: - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; - strbuf_addstr(&sb, dirname); - nested_repo = is_nonbare_repository_dir(&sb); - strbuf_release(&sb); - } - if (nested_repo) - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (excluded ? path_excluded : path_untracked)); + if ((dir->flags & DIR_SKIP_NESTED_GIT) || + !(dir->flags & DIR_NO_GITLINKS)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + } + if (nested_repo) + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : + (excluded ? path_excluded : path_untracked)); - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { if (excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v2 5/6] dir: replace exponential algorithm with a linear one 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2020-01-31 18:31 ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget @ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. (Note that although this forces us to walk all untracked files underneath the directory as well, we strip them from the output, except for users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we make sure that read_directory_recursive() does not also recurse into that same directory. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for some directory and a later one would return e.g. path_none, despite the fact that the directory clearly should have been considered untracked. The code happened to work due to the side-effect from the first invocation of adding untracked entries to dir->entries; this allowed it to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. Such a hope seems likely to be ill-founded, given my experience with dir.c-related testcases in the last few months: Examples where the documentation was hard to parse or even just wrong: * 3aca58045f4f (git-clean.txt: do not claim we will delete files with -n/--dry-run, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) Examples where testcases were declared wrong and changed: * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within leading directories", 2019-12-10) Examples where testcases were clearly inadequate: * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an ignored file breakage, 2019-08-25) * 7541cc530239 (t7300: add testcases showing failure to clean specified pathspecs, 2019-09-17) * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, 2019-09-17) * 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * 452efd11fbf6 (t3011: demonstrate directory traversal failures, 2019-12-10) * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) Examples where "correct behavior" was unclear to everyone: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ Other commits of note: * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 For the above run, using strace I can look for the number of untracked directories opened and can verify that it matches the expected 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). After this fix, with strace I can verify that the number of untracked directories that are opened drops to just $depth, and the timings all drop to 0.00. In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. WARNING: This change breaks t7063. I don't know whether that is to be expected (I now intentionally visit untracked directories differently so naturally the untracked cache should change), or if I've broken something. I'm hoping to get an untracked cache expert to chime in... Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 151 ++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 105 insertions(+), 46 deletions(-) diff --git a/dir.c b/dir.c index 6867356a31..9816fa31d9 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo = 0; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; + int nested_repo = 0, old_ignored_nr, stop_early; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -1711,18 +1717,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* + * We only need to recurse into untracked/ignored directories if + * either of the following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if + * the directory is empty) + */ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; + /* + * If we only want to determine if dirname is empty, then we can + * stop at the first file we find underneath that directory rather + * than continuing to recurse beyond it. If DIR_SHOW_IGNORED_TOO + * is set, then we want MORE than just determining if dirname is + * empty. + */ + stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files + * underneath. To do so, we'll save the current ignored_nr, and + * pop all the ones added after it if it turns out the entire + * directory is ignored. + */ + old_ignored_nr = dir->ignored_nr; + + /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, + stop_early, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { + int i; + + /* + * When stop_early is set, read_directory_recursive() will + * never return path_untracked regardless of whether + * underlying paths were untracked or ignored (because + * returning early means it excluded some paths, or + * something like that -- see commit 5aaa7fd39aaf ("Improve + * performance of git status --ignored", 2017-09-18)). + * However, we're not really concerned with the status of + * files under the directory, we just wanted to know + * whether the directory was empty (state == path_none) or + * not (state == path_excluded), and if not, we'd return + * our original status based on whether the untracked + * directory matched an exclusion pattern. + */ + if (stop_early) + state = excluded ? path_excluded : path_untracked; + + else { + /* + * When + * !stop_early && state == path_excluded + * then all paths under dirname were ignored. For + * this case, git status --porcelain wants to just + * list the directory itself as ignored and not + * list the individual paths underneath. Remove + * the individual paths underneath. + */ + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) + free(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } /* - * If this is an excluded directory, then we only need to check if - * the directory contains any files. + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. */ - return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, excluded, pathspec); + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; + + /* + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. + */ + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; + + return state; } /* @@ -1870,6 +1959,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); @@ -2175,14 +2269,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, int stop_at_first_file, const struct pathspec *pathspec) { /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * WARNING: Do NOT recurse unless path_recurse is returned from + * treat_path(). Recursing on any other return value + * can result in exponential slowdown. */ - struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; struct strbuf path = STRBUF_INIT; @@ -2204,13 +2294,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, dir_state = state; /* recurse into subdir if instructed by treat_path */ - if ((state == path_recurse) || - ((state == path_untracked) && - (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && - ((dir->flags & DIR_SHOW_IGNORED_TOO) || - (pathspec && - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { + if (state == path_recurse) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -2294,15 +2378,6 @@ static int treat_leading_path(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec) { - /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in read_directory_recursive(). See 777b420347 (dir: - * synchronize treat_leading_path() and read_directory_recursive(), - * 2019-12-19) and its parent commit for details. - */ - struct strbuf sb = STRBUF_INIT; struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; @@ -2353,23 +2428,7 @@ static int treat_leading_path(struct dir_struct *dir, strbuf_reset(&subdir); strbuf_add(&subdir, path+prevlen, baselen-prevlen); cdir.d_name = subdir.buf; - state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, - pathspec); - if (state == path_untracked && - resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && - (dir->flags & DIR_SHOW_IGNORED_TOO || - do_match_pathspec(istate, pathspec, sb.buf, sb.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { - if (!match_pathspec(istate, pathspec, sb.buf, sb.len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - add_path_to_appropriate_result_list(dir, NULL, &cdir, - istate, - &sb, baselen, - pathspec, state); - state = path_recurse; - } + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state != path_recurse) break; /* do not recurse into it */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v2 6/6] t7063: blindly accept diffs 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2020-01-31 18:31 ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget @ 2020-01-31 18:31 ` Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 6 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-01-31 18:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Assuming that the changes I made in the last commit to drastically modify how and when and especially how frequently untracked paths are visited should result in changes to the untracked-cache, this commit simply updates the t7063 testcases to match what the code now reports. If this is correct, this commit should be squashed into the previous one. It'd be nice if I could get an untracked-cache expert to comment on this... Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7063-status-untracked-cache.sh | 50 ++++++++++++------------------- 1 file changed, 19 insertions(+), 31 deletions(-) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 190ae149cf..c1b0fd0540 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -85,9 +85,7 @@ dtwo/ three /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_expect_success 'status first time (empty cache)' ' @@ -140,8 +138,6 @@ test_expect_success 'modify in root directory, one dir invalidation' ' A done/one A one A two -?? dthree/ -?? dtwo/ ?? four ?? three EOF @@ -164,15 +160,11 @@ core.excludesfile 0000000000000000000000000000000000000000 exclude_per_dir .gitignore flags 00000006 / 0000000000000000000000000000000000000000 recurse valid -dthree/ -dtwo/ four three /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -217,9 +209,7 @@ dtwo/ three /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -235,6 +225,7 @@ A done/one A one A two ?? .gitignore +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../actual && @@ -256,11 +247,11 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -277,7 +268,6 @@ flags 00000006 /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -290,7 +280,6 @@ test_expect_success 'status after the move' ' A done/one A one ?? .gitignore -?? dtwo/ ?? two EOF test_cmp ../status.expect ../actual && @@ -312,12 +301,10 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ two /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -334,7 +321,6 @@ flags 00000006 /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -348,7 +334,6 @@ A done/one A one A two ?? .gitignore -?? dtwo/ EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && @@ -369,11 +354,9 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -392,7 +375,6 @@ test_expect_success 'status after commit' ' git status --porcelain >../actual && cat >../status.expect <<EOF && ?? .gitignore -?? dtwo/ EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && @@ -413,11 +395,9 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -451,7 +431,6 @@ test_expect_success 'test sparse status with untracked cache' ' M done/two ?? .gitignore ?? done/five -?? dtwo/ EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && @@ -472,12 +451,10 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -491,7 +468,6 @@ test_expect_success 'test sparse status again with untracked cache' ' M done/two ?? .gitignore ?? done/five -?? dtwo/ EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && @@ -519,7 +495,6 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' ?? .gitignore ?? done/five ?? done/sub/ -?? dtwo/ EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && @@ -540,17 +515,13 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore -dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five sub/ /done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -sub/ /done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -file /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect-from-test-dump ../actual ' @@ -615,6 +586,23 @@ test_expect_success 'setting core.untrackedCache to true and using git status cr test_cmp ../expect-no-uc ../actual && git status && test-tool dump-untracked-cache >../actual && + cat >../expect-from-test-dump <<EOF && +info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +core.excludesfile 0000000000000000000000000000000000000000 +exclude_per_dir .gitignore +flags 00000006 +/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid +.gitignore +dthree/ +dtwo/ +/done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid +five +sub/ +/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid +/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid +/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +EOF test_cmp ../expect-from-test-dump ../actual ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2020-01-31 18:31 ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget ` (7 more replies) 6 siblings, 8 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren Sorry for the really long delay on this series; I think I finally figured out the untracked cache stuff, though it comes with bad news: existing users of untracked cache are dealing with a buggy implementation that may be causing git commands that list untracked files to omit expected paths from the output. This series fixes it, though with a big hammer (partial disabling of the cache; see the final commit in the series). Also, as before, this series provides some "modest" speedups (see last commit message), and should allow 'git status --ignored' to complete in a more reasonable timeframe for Martin Melka (see https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ ). Changes since v2: * Added a patch at the beginning which highlights how the untracked cache has been broken from the beginning. Using it will result in other commands giving erroneous output. At least, before this series it did. * Added another patch at the beginning of the series to fix a simple comment typo. * Dropped the final patch of the previous series, and instead squashed in a fix for the untracked cache problems to what is now the final patch of the series. I would have liked to have made that a separate commit earlier in the series, but the fix depended on both disabling the check_only portion of the cache and the avoid-exponential-visiting. If I move the partial disabling earlier, nothing is fixed and stuff is still visited. If I move the partial disabling later, then I have to temporarily mark lots of extra tests with test_expect_failure. It's only three extra one-line changes to dir.c, which you can probably spot in the range-diff. Stuff still missing from v3: * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ which I think would make the code cleaner & clearer. I guess I'm leaving that for future work. As per the commit message of the final patch, this series has some risk. Extra eyes would be greatly appreciated. Also, we should probably merge it early in some cycle, either this one or a later one. Derrick Stolee (1): dir: refactor treat_directory to clarify control flow Elijah Newren (6): t7063: correct broken test expectation dir: fix simple typo in comment dir: consolidate treat_path() and treat_one_path() dir: fix broken comment dir: fix confusion based on variable tense dir: replace exponential algorithm with a linear one, fix untracked cache dir.c | 339 +++++++++++++++++------------- t/t7063-status-untracked-cache.sh | 79 +++---- 2 files changed, 220 insertions(+), 198 deletions(-) base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v3 Pull-Request: https://github.com/git/git/pull/700 Range-diff vs v2: 6: 9a3f20656e2 ! 1: d4fe5d33577 t7063: blindly accept diffs @@ -1,17 +1,61 @@ Author: Elijah Newren <newren@gmail.com> - t7063: blindly accept diffs + t7063: correct broken test expectation - Assuming that the changes I made in the last commit to drastically - modify how and when and especially how frequently untracked paths are - visited should result in changes to the untracked-cache, this commit - simply updates the t7063 testcases to match what the code now reports. + The untracked cache is caching wrong information, resulting in commands + like `git status --porcelain` producing erroneous answers. The tests in + t7063 actually have a wide enough test to catch a relevant case, in + particular surrounding the directory 'dthree/', but it appears the + answers were not checked quite closely enough and the tests were coded + with the wrong expectation. Once the wrong info got into the cache in + an early test, since later tests built on it, many others have a wrong + expectation as well. This affects just over a third of the tests in + t7063. - If this is correct, this commit should be squashed into the previous - one. + The error can be seen starting at t7063.12 (the first one switched from + expect_success to expect_failure in this patch). That test runs in a + directory with the following files present: + done/one + dthree/three + dtwo/two + four + .gitignore + one + three + two - It'd be nice if I could get an untracked-cache expert to comment on - this... + Of those files, the following files are tracked: + done/one + one + two + + and the contents of .gitignore are: + four + + and the contents of .git/info/exclude are: + three + + And there is no core.excludesfile. Therefore, the following should be + untracked: + .gitignore + dthree/ + dtwo/ + Indeed, these three paths are reported if you run + git ls-files -o --directory --exclude-standard + within this directory. However, 'git status --porcelain' was reporting + for this test: + A done/one + A one + A two + ?? .gitignore + ?? dtwo/ + which was clearly wrong -- dthree/ should also be listed as untracked. + This appears to have been broken since the test was introduced with + commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08). + Correct the test to expect the right output, marking the test as failed + for now. Make the same change throughout the remainder of the testsuite + to reflect that dthree/ remains an untracked directory throughout and + should be recognized as such. Signed-off-by: Elijah Newren <newren@gmail.com> @@ -19,50 +63,14 @@ --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ - three - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --three - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - - test_expect_success 'status first time (empty cache)' ' -@@ - A done/one - A one - A two --?? dthree/ --?? dtwo/ - ?? four - ?? three - EOF -@@ - exclude_per_dir .gitignore - flags 00000006 - / 0000000000000000000000000000000000000000 recurse valid --dthree/ --dtwo/ - four - three - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --three - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' -@@ - three - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --three - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF test_cmp ../expect ../actual ' + +-test_expect_success 'new info/exclude invalidates everything' ' ++test_expect_failure 'new info/exclude invalidates everything' ' + avoid_racy && + echo three >>.git/info/exclude && + : >../trace && @@ A one A two @@ -71,6 +79,15 @@ ?? dtwo/ EOF test_cmp ../status.expect ../actual && +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'verify untracked cache dump' ' ++test_expect_failure 'verify untracked cache dump' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid @@ -79,164 +96,246 @@ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' @@ - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF test_cmp ../expect ../actual ' + +-test_expect_success 'status after the move' ' ++test_expect_failure 'status after the move' ' + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && @@ A done/one A one ?? .gitignore --?? dtwo/ ++?? dthree/ + ?? dtwo/ ?? two EOF - test_cmp ../status.expect ../actual && +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'verify untracked cache dump' ' ++test_expect_failure 'verify untracked cache dump' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore --dtwo/ ++dthree/ + dtwo/ two /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' @@ - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF test_cmp ../expect ../actual ' + +-test_expect_success 'status after the move' ' ++test_expect_failure 'status after the move' ' + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && @@ A one A two ?? .gitignore --?? dtwo/ ++?? dthree/ + ?? dtwo/ EOF test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'verify untracked cache dump' ' ++test_expect_failure 'verify untracked cache dump' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore --dtwo/ ++dthree/ + dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' @@ + git commit -m "first commit" + ' + +-test_expect_success 'status after commit' ' ++test_expect_failure 'status after commit' ' + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && cat >../status.expect <<EOF && ?? .gitignore --?? dtwo/ ++?? dthree/ + ?? dtwo/ EOF test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'untracked cache correct after commit' ' ++test_expect_failure 'untracked cache correct after commit' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore --dtwo/ ++dthree/ + dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual +@@ + sync_mtime ' + +-test_expect_success 'test sparse status with untracked cache' ' ++test_expect_failure 'test sparse status with untracked cache' ' + : >../trace && + avoid_racy && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ M done/two ?? .gitignore ?? done/five --?? dtwo/ ++?? dthree/ + ?? dtwo/ EOF test_cmp ../status.expect ../status.actual && - cat >../trace.expect <<EOF && +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'untracked cache correct after status' ' ++test_expect_failure 'untracked cache correct after status' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore --dtwo/ ++dthree/ + dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF +@@ test_cmp ../expect ../actual ' + +-test_expect_success 'test sparse status again with untracked cache' ' ++test_expect_failure 'test sparse status again with untracked cache' ' + avoid_racy && + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ M done/two ?? .gitignore ?? done/five --?? dtwo/ ++?? dthree/ + ?? dtwo/ EOF test_cmp ../status.expect ../status.actual && - cat >../trace.expect <<EOF && +@@ + echo "sub" > done/sub/sub/file + ' + +-test_expect_success 'test sparse status with untracked cache and subdir' ' ++test_expect_failure 'test sparse status with untracked cache and subdir' ' + avoid_racy && + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ ?? .gitignore ?? done/five ?? done/sub/ --?? dtwo/ ++?? dthree/ + ?? dtwo/ EOF test_cmp ../status.expect ../status.actual && - cat >../trace.expect <<EOF && +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' ++test_expect_failure 'verify untracked cache dump (sparse/subdirs)' ' + test-tool dump-untracked-cache >../actual && + cat >../expect-from-test-dump <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore --dtwo/ ++dthree/ + dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five - sub/ - /done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid --sub/ - /done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid --file - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid - /dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF +@@ test_cmp ../expect-from-test-dump ../actual ' + +-test_expect_success 'test sparse status again with untracked cache and subdir' ' ++test_expect_failure 'test sparse status again with untracked cache and subdir' ' + avoid_racy && + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ - test_cmp ../expect-no-uc ../actual && - git status && + test_cmp ../trace.expect ../trace + ' + +-test_expect_success 'move entry in subdir from untracked to cached' ' ++test_expect_failure 'move entry in subdir from untracked to cached' ' + git add dtwo/two && + git status --porcelain >../status.actual && + cat >../status.expect <<EOF && +@@ + ?? .gitignore + ?? done/five + ?? done/sub/ ++?? dthree/ + EOF + test_cmp ../status.expect ../status.actual + ' + +-test_expect_success 'move entry in subdir from cached to untracked' ' ++test_expect_failure 'move entry in subdir from cached to untracked' ' + git rm --cached dtwo/two && + git status --porcelain >../status.actual && + cat >../status.expect <<EOF && +@@ + ?? .gitignore + ?? done/five + ?? done/sub/ ++?? dthree/ + ?? dtwo/ + EOF + test_cmp ../status.expect ../status.actual +@@ + test_cmp ../expect-no-uc ../actual + ' + +-test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' ' ++test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' ' + git config core.untrackedCache true && test-tool dump-untracked-cache >../actual && -+ cat >../expect-from-test-dump <<EOF && -+info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -+core.excludesfile 0000000000000000000000000000000000000000 -+exclude_per_dir .gitignore -+flags 00000006 -+/ e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid -+.gitignore -+dthree/ -+dtwo/ -+/done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid -+five -+sub/ -+/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -+/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -+/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -+/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -+EOF - test_cmp ../expect-from-test-dump ../actual + test_cmp ../expect-no-uc ../actual && +@@ + test_cmp ../expect-empty ../actual ' +-test_expect_success 'setting core.untrackedCache to keep' ' ++test_expect_failure 'setting core.untrackedCache to keep' ' + git config core.untrackedCache keep && + git update-index --untracked-cache && + test-tool dump-untracked-cache >../actual && -: ----------- > 2: b20bc3b9afd dir: fix simple typo in comment 1: 27bc1357964 = 3: fa9035949e0 dir: consolidate treat_path() and treat_one_path() 2: 2ceb64ae61e = 4: 02e652d1869 dir: fix broken comment 3: e6d21228d12 = 5: 705c008d993 dir: fix confusion based on variable tense 4: f73f0d66d14 = 6: f5d69102946 dir: refactor treat_directory to clarify control flow 5: d3136ef52f3 ! 7: 6cfca619e2c dir: replace exponential algorithm with a linear one @@ -1,6 +1,6 @@ Author: Elijah Newren <newren@gmail.com> - dir: replace exponential algorithm with a linear one + dir: replace exponential algorithm with a linear one, fix untracked cache dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird @@ -161,10 +161,27 @@ to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. - WARNING: This change breaks t7063. I don't know whether that is to be expected - (I now intentionally visit untracked directories differently so naturally the - untracked cache should change), or if I've broken something. I'm hoping to get - an untracked cache expert to chime in... + Finally, this also fixes the untracked cache, as noted by the test fixes + in t7063. Unfortunately, it does so by passing stop_at_first_file to + close_cached_dir() in order to disable the caching of whether + directories were empty (this caching was only relevant for directories + that we knew we didn't need to walk all the entries under but just + needed to know whether the directory had any entries within it in order + to know if the directory itself should be marked as path_none or + path_untracked). I'm not convinced that disabling the is-the-dir-empty + check is necessary; there is probably some way to still cache that and + not get erroneous results. However, I have not figured out how to do + so. If I revert the change to close_cached_dir() in this patch (thus + continuing to cache cases where stop_at_first_file is true meaning we + continue to cache whether directories are empty), then the untracked + cache breakage in t7063 becomes more prevalant. With my change to + close_cached_dir() and the other changes to avoid traversing directories + 2^n times in this patch, I not only avoid making the untracked_cache + breakage in t7063 worse but actually fix the existing breakage. Update + the test results in t7063 to no longer expect check_only cache entries, + to reflect that we have to do a bit more work in terms of how many + directories we have to open, and to reflect that we fixed the 1/3 of + tests that were broken in that testsuite. Signed-off-by: Elijah Newren <newren@gmail.com> @@ -305,6 +322,24 @@ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); +@@ + return -1; + } + +-static void close_cached_dir(struct cached_dir *cdir) ++static void close_cached_dir(struct cached_dir *cdir, int stop_at_first_file) + { + if (cdir->fdir) + closedir(cdir->fdir); +@@ + * We have gone through this directory and found no untracked + * entries. Mark it valid. + */ +- if (cdir->untracked) { ++ if (!stop_at_first_file && cdir->untracked) { + cdir->untracked->valid = 1; + cdir->untracked->recurse = 1; + } @@ int stop_at_first_file, const struct pathspec *pathspec) { @@ -338,6 +373,15 @@ struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, +@@ + istate, &path, baselen, + pathspec, state); + } +- close_cached_dir(&cdir); ++ close_cached_dir(&cdir, stop_at_first_file); + out: + strbuf_release(&path); + @@ const char *path, int len, const struct pathspec *pathspec) @@ -379,3 +423,342 @@ if (state != path_recurse) break; /* do not recurse into it */ + + diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh + --- a/t/t7063-status-untracked-cache.sh + +++ b/t/t7063-status-untracked-cache.sh +@@ + dtwo/ + three + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-three +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + + test_expect_success 'status first time (empty cache)' ' +@@ + EOF + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 0 + directory invalidation: 1 +-opendir: 1 ++opendir: 3 + EOF + test_cmp ../trace.expect ../trace + +@@ + four + three + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-three +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' +@@ + EOF + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 1 + directory invalidation: 1 + opendir: 4 +@@ + dtwo/ + three + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-three +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' + +-test_expect_failure 'new info/exclude invalidates everything' ' ++test_expect_success 'new info/exclude invalidates everything' ' + avoid_racy && + echo three >>.git/info/exclude && + : >../trace && +@@ + EOF + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 1 + directory invalidation: 0 + opendir: 4 +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'verify untracked cache dump' ' ++test_expect_success 'verify untracked cache dump' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +@@ + dthree/ + dtwo/ + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' +@@ + flags 00000006 + / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' + +-test_expect_failure 'status after the move' ' ++test_expect_success 'status after the move' ' + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && +@@ + EOF + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 0 + directory invalidation: 0 +-opendir: 1 ++opendir: 3 + EOF + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'verify untracked cache dump' ' ++test_expect_success 'verify untracked cache dump' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +@@ + dtwo/ + two + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' +@@ + flags 00000006 + / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' + +-test_expect_failure 'status after the move' ' ++test_expect_success 'status after the move' ' + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && +@@ + EOF + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 0 + directory invalidation: 0 +-opendir: 1 ++opendir: 3 + EOF + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'verify untracked cache dump' ' ++test_expect_success 'verify untracked cache dump' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +@@ + dthree/ + dtwo/ + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' +@@ + git commit -m "first commit" + ' + +-test_expect_failure 'status after commit' ' ++test_expect_success 'status after commit' ' + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && +@@ + EOF + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 0 + directory invalidation: 0 +-opendir: 2 ++opendir: 4 + EOF + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'untracked cache correct after commit' ' ++test_expect_success 'untracked cache correct after commit' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +@@ + dthree/ + dtwo/ + /done/ 0000000000000000000000000000000000000000 recurse valid +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' +@@ + sync_mtime + ' + +-test_expect_failure 'test sparse status with untracked cache' ' ++test_expect_success 'test sparse status with untracked cache' ' + : >../trace && + avoid_racy && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ +@@ + EOF + test_cmp ../status.expect ../status.actual && + cat >../trace.expect <<EOF && +-node creation: 0 ++node creation: 2 + gitignore invalidation: 1 + directory invalidation: 2 +-opendir: 2 ++opendir: 4 + EOF + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'untracked cache correct after status' ' ++test_expect_success 'untracked cache correct after status' ' + test-tool dump-untracked-cache >../actual && + cat >../expect <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +@@ + dtwo/ + /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid + five +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect ../actual + ' + +-test_expect_failure 'test sparse status again with untracked cache' ' ++test_expect_success 'test sparse status again with untracked cache' ' + avoid_racy && + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ +@@ + echo "sub" > done/sub/sub/file + ' + +-test_expect_failure 'test sparse status with untracked cache and subdir' ' ++test_expect_success 'test sparse status with untracked cache and subdir' ' + avoid_racy && + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'verify untracked cache dump (sparse/subdirs)' ' ++test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' + test-tool dump-untracked-cache >../actual && + cat >../expect-from-test-dump <<EOF && + info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 +@@ + /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid + five + sub/ +-/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid +-sub/ +-/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid +-file +-/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid +-/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid +-two + EOF + test_cmp ../expect-from-test-dump ../actual + ' + +-test_expect_failure 'test sparse status again with untracked cache and subdir' ' ++test_expect_success 'test sparse status again with untracked cache and subdir' ' + avoid_racy && + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ +@@ + test_cmp ../trace.expect ../trace + ' + +-test_expect_failure 'move entry in subdir from untracked to cached' ' ++test_expect_success 'move entry in subdir from untracked to cached' ' + git add dtwo/two && + git status --porcelain >../status.actual && + cat >../status.expect <<EOF && +@@ + test_cmp ../status.expect ../status.actual + ' + +-test_expect_failure 'move entry in subdir from cached to untracked' ' ++test_expect_success 'move entry in subdir from cached to untracked' ' + git rm --cached dtwo/two && + git status --porcelain >../status.actual && + cat >../status.expect <<EOF && +@@ + test_cmp ../expect-no-uc ../actual + ' + +-test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' ' ++test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' ' + git config core.untrackedCache true && + test-tool dump-untracked-cache >../actual && + test_cmp ../expect-no-uc ../actual && +@@ + test_cmp ../expect-empty ../actual + ' + +-test_expect_failure 'setting core.untrackedCache to keep' ' ++test_expect_success 'setting core.untrackedCache to keep' ' + git config core.untrackedCache keep && + git update-index --untracked-cache && + test-tool dump-untracked-cache >../actual && -- gitgitgadget ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v3 1/7] t7063: correct broken test expectation 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-26 13:02 ` Derrick Stolee 2020-03-25 19:31 ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget ` (6 subsequent siblings) 7 siblings, 1 reply; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The untracked cache is caching wrong information, resulting in commands like `git status --porcelain` producing erroneous answers. The tests in t7063 actually have a wide enough test to catch a relevant case, in particular surrounding the directory 'dthree/', but it appears the answers were not checked quite closely enough and the tests were coded with the wrong expectation. Once the wrong info got into the cache in an early test, since later tests built on it, many others have a wrong expectation as well. This affects just over a third of the tests in t7063. The error can be seen starting at t7063.12 (the first one switched from expect_success to expect_failure in this patch). That test runs in a directory with the following files present: done/one dthree/three dtwo/two four .gitignore one three two Of those files, the following files are tracked: done/one one two and the contents of .gitignore are: four and the contents of .git/info/exclude are: three And there is no core.excludesfile. Therefore, the following should be untracked: .gitignore dthree/ dtwo/ Indeed, these three paths are reported if you run git ls-files -o --directory --exclude-standard within this directory. However, 'git status --porcelain' was reporting for this test: A done/one A one A two ?? .gitignore ?? dtwo/ which was clearly wrong -- dthree/ should also be listed as untracked. This appears to have been broken since the test was introduced with commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08). Correct the test to expect the right output, marking the test as failed for now. Make the same change throughout the remainder of the testsuite to reflect that dthree/ remains an untracked directory throughout and should be recognized as such. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7063-status-untracked-cache.sh | 51 ++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 190ae149cf3..41705ec1526 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -224,7 +224,7 @@ EOF test_cmp ../expect ../actual ' -test_expect_success 'new info/exclude invalidates everything' ' +test_expect_failure 'new info/exclude invalidates everything' ' avoid_racy && echo three >>.git/info/exclude && : >../trace && @@ -235,6 +235,7 @@ A done/one A one A two ?? .gitignore +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../actual && @@ -247,7 +248,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'verify untracked cache dump' ' +test_expect_failure 'verify untracked cache dump' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -256,6 +257,7 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid @@ -282,7 +284,7 @@ EOF test_cmp ../expect ../actual ' -test_expect_success 'status after the move' ' +test_expect_failure 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && @@ -290,6 +292,7 @@ test_expect_success 'status after the move' ' A done/one A one ?? .gitignore +?? dthree/ ?? dtwo/ ?? two EOF @@ -303,7 +306,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'verify untracked cache dump' ' +test_expect_failure 'verify untracked cache dump' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -312,6 +315,7 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ two /done/ 0000000000000000000000000000000000000000 recurse valid @@ -339,7 +343,7 @@ EOF test_cmp ../expect ../actual ' -test_expect_success 'status after the move' ' +test_expect_failure 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && @@ -348,6 +352,7 @@ A done/one A one A two ?? .gitignore +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../actual && @@ -360,7 +365,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'verify untracked cache dump' ' +test_expect_failure 'verify untracked cache dump' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -369,6 +374,7 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid @@ -386,12 +392,13 @@ test_expect_success 'set up for sparse checkout testing' ' git commit -m "first commit" ' -test_expect_success 'status after commit' ' +test_expect_failure 'status after commit' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && cat >../status.expect <<EOF && ?? .gitignore +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../actual && @@ -404,7 +411,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'untracked cache correct after commit' ' +test_expect_failure 'untracked cache correct after commit' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -413,6 +420,7 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid @@ -442,7 +450,7 @@ test_expect_success 'create/modify files, some of which are gitignored' ' sync_mtime ' -test_expect_success 'test sparse status with untracked cache' ' +test_expect_failure 'test sparse status with untracked cache' ' : >../trace && avoid_racy && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -451,6 +459,7 @@ test_expect_success 'test sparse status with untracked cache' ' M done/two ?? .gitignore ?? done/five +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../status.actual && @@ -463,7 +472,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'untracked cache correct after status' ' +test_expect_failure 'untracked cache correct after status' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -472,6 +481,7 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five @@ -482,7 +492,7 @@ EOF test_cmp ../expect ../actual ' -test_expect_success 'test sparse status again with untracked cache' ' +test_expect_failure 'test sparse status again with untracked cache' ' avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -491,6 +501,7 @@ test_expect_success 'test sparse status again with untracked cache' ' M done/two ?? .gitignore ?? done/five +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../status.actual && @@ -509,7 +520,7 @@ test_expect_success 'set up for test of subdir and sparse checkouts' ' echo "sub" > done/sub/sub/file ' -test_expect_success 'test sparse status with untracked cache and subdir' ' +test_expect_failure 'test sparse status with untracked cache and subdir' ' avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -519,6 +530,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' ?? .gitignore ?? done/five ?? done/sub/ +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../status.actual && @@ -531,7 +543,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' +test_expect_failure 'verify untracked cache dump (sparse/subdirs)' ' test-tool dump-untracked-cache >../actual && cat >../expect-from-test-dump <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -540,6 +552,7 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid .gitignore +dthree/ dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five @@ -555,7 +568,7 @@ EOF test_cmp ../expect-from-test-dump ../actual ' -test_expect_success 'test sparse status again with untracked cache and subdir' ' +test_expect_failure 'test sparse status again with untracked cache and subdir' ' avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -570,7 +583,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_success 'move entry in subdir from untracked to cached' ' +test_expect_failure 'move entry in subdir from untracked to cached' ' git add dtwo/two && git status --porcelain >../status.actual && cat >../status.expect <<EOF && @@ -579,11 +592,12 @@ A dtwo/two ?? .gitignore ?? done/five ?? done/sub/ +?? dthree/ EOF test_cmp ../status.expect ../status.actual ' -test_expect_success 'move entry in subdir from cached to untracked' ' +test_expect_failure 'move entry in subdir from cached to untracked' ' git rm --cached dtwo/two && git status --porcelain >../status.actual && cat >../status.expect <<EOF && @@ -591,6 +605,7 @@ test_expect_success 'move entry in subdir from cached to untracked' ' ?? .gitignore ?? done/five ?? done/sub/ +?? dthree/ ?? dtwo/ EOF test_cmp ../status.expect ../status.actual @@ -609,7 +624,7 @@ test_expect_success 'git status does not change anything' ' test_cmp ../expect-no-uc ../actual ' -test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' ' +test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' ' git config core.untrackedCache true && test-tool dump-untracked-cache >../actual && test_cmp ../expect-no-uc ../actual && @@ -642,7 +657,7 @@ test_expect_success 'using --untracked-cache does not fail when core.untrackedCa test_cmp ../expect-empty ../actual ' -test_expect_success 'setting core.untrackedCache to keep' ' +test_expect_failure 'setting core.untrackedCache to keep' ' git config core.untrackedCache keep && git update-index --untracked-cache && test-tool dump-untracked-cache >../actual && -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v3 1/7] t7063: correct broken test expectation 2020-03-25 19:31 ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget @ 2020-03-26 13:02 ` Derrick Stolee 2020-03-26 21:18 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-03-26 13:02 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 3/25/2020 3:31 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > The untracked cache is caching wrong information, resulting in commands > like `git status --porcelain` producing erroneous answers. The tests in > t7063 actually have a wide enough test to catch a relevant case, in > particular surrounding the directory 'dthree/', but it appears the > answers were not checked quite closely enough and the tests were coded > with the wrong expectation. Once the wrong info got into the cache in > an early test, since later tests built on it, many others have a wrong > expectation as well. This affects just over a third of the tests in > t7063. Wow. Good find. > The error can be seen starting at t7063.12 (the first one switched from > expect_success to expect_failure in this patch). That test runs in a > directory with the following files present: > done/one > dthree/three > dtwo/two > four > .gitignore > one > three > two > > Of those files, the following files are tracked: > done/one > one > two > > and the contents of .gitignore are: > four > > and the contents of .git/info/exclude are: > three > > And there is no core.excludesfile. Therefore, the following should be > untracked: > .gitignore > dthree/ > dtwo/ > Indeed, these three paths are reported if you run > git ls-files -o --directory --exclude-standard > within this directory. However, 'git status --porcelain' was reporting > for this test: > A done/one > A one > A two > ?? .gitignore > ?? dtwo/ > which was clearly wrong -- dthree/ should also be listed as untracked. > This appears to have been broken since the test was introduced with > commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08). > Correct the test to expect the right output, marking the test as failed > for now. Make the same change throughout the remainder of the testsuite > to reflect that dthree/ remains an untracked directory throughout and > should be recognized as such. I wonder if we could simultaneously verify these "expected" results match using another command without the untracked cache? It's good that we have the expected outputs explicitly, but perhaps double-checking the command with `-c core.untrackedCache=false` would help us know these are the correct expected outputs? Thanks, -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v3 1/7] t7063: correct broken test expectation 2020-03-26 13:02 ` Derrick Stolee @ 2020-03-26 21:18 ` Elijah Newren 0 siblings, 0 replies; 76+ messages in thread From: Elijah Newren @ 2020-03-26 21:18 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On Thu, Mar 26, 2020 at 6:02 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/25/2020 3:31 PM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > The untracked cache is caching wrong information, resulting in commands > > like `git status --porcelain` producing erroneous answers. The tests in > > t7063 actually have a wide enough test to catch a relevant case, in > > particular surrounding the directory 'dthree/', but it appears the > > answers were not checked quite closely enough and the tests were coded > > with the wrong expectation. Once the wrong info got into the cache in > > an early test, since later tests built on it, many others have a wrong > > expectation as well. This affects just over a third of the tests in > > t7063. > > Wow. Good find. or maybe not... > > The error can be seen starting at t7063.12 (the first one switched from > > expect_success to expect_failure in this patch). That test runs in a > > directory with the following files present: > > done/one > > dthree/three > > dtwo/two > > four > > .gitignore > > one > > three > > two > > > > Of those files, the following files are tracked: > > done/one > > one > > two > > > > and the contents of .gitignore are: > > four > > > > and the contents of .git/info/exclude are: > > three > > > > And there is no core.excludesfile. Therefore, the following should be > > untracked: > > .gitignore > > dthree/ > > dtwo/ > > Indeed, these three paths are reported if you run > > git ls-files -o --directory --exclude-standard > > within this directory. However, 'git status --porcelain' was reporting > > for this test: > > A done/one > > A one > > A two > > ?? .gitignore > > ?? dtwo/ > > which was clearly wrong -- dthree/ should also be listed as untracked. > > This appears to have been broken since the test was introduced with > > commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08). > > Correct the test to expect the right output, marking the test as failed > > for now. Make the same change throughout the remainder of the testsuite > > to reflect that dthree/ remains an untracked directory throughout and > > should be recognized as such. > > I wonder if we could simultaneously verify these "expected" results match > using another command without the untracked cache? It's good that we have > the expected outputs explicitly, but perhaps double-checking the command > with `-c core.untrackedCache=false` would help us know these are the correct > expected outputs? This was an *awesome* idea, even if the implementation doesn't quite work. It turns out that -c core.untrackedCache=false does not instruct status to ignore the untracked cache, it instructs status to delete it. Since we had subsequent tests that depended on the untrackedCache created in previous tests, this would break a number of tests. But I can introduce a helper to workaround that: # Ignore_Untracked_Cache, abbreviated to 3 letters because then people can # compare commands side-by-side, e.g. # iuc status --porcelain >expect && # git status --porcelain >actual && # test_cmp expect actual iuc() { git ls-files -s >../current-index-entries git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries GIT_INDEX_FILE=.git/tmp_index export GIT_INDEX_FILE git update-index --index-info <../current-index-entries git update-index --skip-worktree $(cat ../current-sparse-entries) git -c core.untrackedCache=false "$@" ret=$? rm ../current-index-entries rm $GIT_INDEX_FILE unset GIT_INDEX_FILE return $ret } Doing that helped me discover that the test didn't have a wrong expectation; I did. When a directory that is not tracked is filled entirely with files that are ignored, then status --porcelain treats the directory itself as ignored...and thus doesn't display it. (`git status --porcelain --ignored` will show it). I had seen that somewhere, but hadn't fully understood the check_only and stop_at_first_file pieces related to it. Anyway, with this helpful hint: * I can say that there was not a bug in the untracked cache (at least not any that I'm aware of) * I can update my first patch to do more thorough checking instead of changing the expectation * I found the bug in my final patch that had been evading me * I added a huge comment explaining check_only and stop_at_first_file, how they're used, and what they mean for the future reader * I also no longer need to partially disable the untracked cache in my changes. New patches incoming... ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v3 2/7] dir: fix simple typo in comment 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget ` (5 subsequent siblings) 7 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index b460211e614..b505ba747bb 100644 --- a/dir.c +++ b/dir.c @@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir, * If 'stop_at_first_file' is specified, 'path_excluded' is returned * to signal that a file was found. This is the least significant value that * indicates that a file was encountered that does not depend on the order of - * whether an untracked or exluded path was encountered first. + * whether an untracked or excluded path was encountered first. * * Returns the most significant path_treatment value encountered in the scan. * If 'stop_at_first_file' is specified, `path_excluded` is the most -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget ` (4 subsequent siblings) 7 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 16e2cfa90993 ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit ad6f2157f951 ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 121 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/dir.c b/dir.c index b505ba747bb..d0f3d660850 100644 --- a/dir.c +++ b/dir.c @@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate, return dtype; } -static enum path_treatment treat_one_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec, - int dtype) -{ - int exclude; - int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); +static enum path_treatment treat_path_fast(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + strbuf_setlen(path, baselen); + if (!cdir->ucd) { + strbuf_addstr(path, cdir->file); + return path_untracked; + } + strbuf_addstr(path, cdir->ucd->name); + /* treat_one_path() does this before it calls treat_directory() */ + strbuf_complete(path, '/'); + if (cdir->ucd->check_only) + /* + * check_only is set as a result of treat_directory() getting + * to its bottom. Verify again the same set of directories + * with check_only set. + */ + return read_directory_recursive(dir, istate, path->buf, path->len, + cdir->ucd, 1, 0, pathspec); + /* + * We get path_recurse in the first run when + * directory_exists_in_index() returns index_nonexistent. We + * are sure that new changes in the index does not impact the + * outcome. Return now. + */ + return path_recurse; +} + +static enum path_treatment treat_path(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + int has_path_in_index, dtype, exclude; enum path_treatment path_treatment; - dtype = resolve_dtype(dtype, istate, path->buf, path->len); + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, + baselen, pathspec); + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) + return path_none; + strbuf_setlen(path, baselen); + strbuf_addstr(path, cdir->d_name); + if (simplify_away(path->buf, path->len, pathspec)) + return path_none; + + dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len); /* Always exclude indexed files */ + has_path_in_index = !!index_file_exists(istate, path->buf, path->len, + ignore_case); if (dtype != DT_DIR && has_path_in_index) return path_none; @@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, } } -static enum path_treatment treat_path_fast(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - strbuf_setlen(path, baselen); - if (!cdir->ucd) { - strbuf_addstr(path, cdir->file); - return path_untracked; - } - strbuf_addstr(path, cdir->ucd->name); - /* treat_one_path() does this before it calls treat_directory() */ - strbuf_complete(path, '/'); - if (cdir->ucd->check_only) - /* - * check_only is set as a result of treat_directory() getting - * to its bottom. Verify again the same set of directories - * with check_only set. - */ - return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, 0, pathspec); - /* - * We get path_recurse in the first run when - * directory_exists_in_index() returns index_nonexistent. We - * are sure that new changes in the index does not impact the - * outcome. Return now. - */ - return path_recurse; -} - -static enum path_treatment treat_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - if (!cdir->d_name) - return treat_path_fast(dir, untracked, cdir, istate, path, - baselen, pathspec); - if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) - return path_none; - strbuf_setlen(path, baselen); - strbuf_addstr(path, cdir->d_name); - if (simplify_away(path->buf, path->len, pathspec)) - return path_none; - - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, - cdir->d_type); -} - static void add_untracked(struct untracked_cache_dir *dir, const char *name) { if (!dir) -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 4/7] dir: fix broken comment 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2020-03-25 19:31 ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget ` (3 subsequent siblings) 7 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index d0f3d660850..3a367683661 100644 --- a/dir.c +++ b/dir.c @@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, add_untracked(untracked, path.buf + baselen); break; } - /* skip the dir_add_* part */ + /* skip the add_path_to_appropriate_result_list() */ continue; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 5/7] dir: fix confusion based on variable tense 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2020-03-25 19:31 ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 7 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index 3a367683661..8074e651e6f 100644 --- a/dir.c +++ b/dir.c @@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, static enum path_treatment treat_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked, - const char *dirname, int len, int baselen, int exclude, + const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { int nested_repo = 0; @@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } if (nested_repo) return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + (excluded ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; - if (exclude && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { /* * This is an excluded directory and we are @@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); @@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * the directory contains any files. */ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, exclude, pathspec); + untracked, 1, excluded, pathspec); } /* @@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int has_path_in_index, dtype, exclude; + int has_path_in_index, dtype, excluded; enum path_treatment path_treatment; if (!cdir->d_name) @@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent)) return path_none; - exclude = is_excluded(dir, istate, path->buf, &dtype); + excluded = is_excluded(dir, istate, path->buf, &dtype); /* * Excluded? If we don't explicitly want to show * ignored files, ignore it */ - if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) + if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) return path_excluded; switch (dtype) { @@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, strbuf_addch(path, '/'); path_treatment = treat_directory(dir, istate, untracked, path->buf, path->len, - baselen, exclude, pathspec); + baselen, excluded, pathspec); /* * If 1) we only want to return directories that * match an exclude pattern and 2) this directory does @@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, * recurse into this directory (instead of marking the * directory itself as an ignored path). */ - if (!exclude && + if (!excluded && path_treatment == path_excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) @@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_treatment; case DT_REG: case DT_LNK: - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; } } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2020-03-25 19:31 ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget @ 2020-03-25 19:31 ` Derrick Stolee via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 7 siblings, 0 replies; 76+ messages in thread From: Derrick Stolee via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/dir.c b/dir.c index 8074e651e6f..d9bcb7e19b6 100644 --- a/dir.c +++ b/dir.c @@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const struct pathspec *pathspec) { int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; + if (status != index_nonexistent) + BUG("Unhandled value for directory_exists_in_index: %d\n", status); - case index_nonexistent: - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; - strbuf_addstr(&sb, dirname); - nested_repo = is_nonbare_repository_dir(&sb); - strbuf_release(&sb); - } - if (nested_repo) - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (excluded ? path_excluded : path_untracked)); + if ((dir->flags & DIR_SKIP_NESTED_GIT) || + !(dir->flags & DIR_NO_GITLINKS)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + } + if (nested_repo) + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : + (excluded ? path_excluded : path_untracked)); - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { if (excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2020-03-25 19:31 ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget @ 2020-03-25 19:31 ` Elijah Newren via GitGitGadget 2020-03-26 13:13 ` Derrick Stolee 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 7 siblings, 1 reply; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-25 19:31 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. (Note that although this forces us to walk all untracked files underneath the directory as well, we strip them from the output, except for users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we make sure that read_directory_recursive() does not also recurse into that same directory. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for some directory and a later one would return e.g. path_none, despite the fact that the directory clearly should have been considered untracked. The code happened to work due to the side-effect from the first invocation of adding untracked entries to dir->entries; this allowed it to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. Such a hope seems likely to be ill-founded, given my experience with dir.c-related testcases in the last few months: Examples where the documentation was hard to parse or even just wrong: * 3aca58045f4f (git-clean.txt: do not claim we will delete files with -n/--dry-run, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) Examples where testcases were declared wrong and changed: * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within leading directories", 2019-12-10) Examples where testcases were clearly inadequate: * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an ignored file breakage, 2019-08-25) * 7541cc530239 (t7300: add testcases showing failure to clean specified pathspecs, 2019-09-17) * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, 2019-09-17) * 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * 452efd11fbf6 (t3011: demonstrate directory traversal failures, 2019-12-10) * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) Examples where "correct behavior" was unclear to everyone: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ Other commits of note: * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 For the above run, using strace I can look for the number of untracked directories opened and can verify that it matches the expected 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). After this fix, with strace I can verify that the number of untracked directories that are opened drops to just $depth, and the timings all drop to 0.00. In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. Finally, this also fixes the untracked cache, as noted by the test fixes in t7063. Unfortunately, it does so by passing stop_at_first_file to close_cached_dir() in order to disable the caching of whether directories were empty (this caching was only relevant for directories that we knew we didn't need to walk all the entries under but just needed to know whether the directory had any entries within it in order to know if the directory itself should be marked as path_none or path_untracked). I'm not convinced that disabling the is-the-dir-empty check is necessary; there is probably some way to still cache that and not get erroneous results. However, I have not figured out how to do so. If I revert the change to close_cached_dir() in this patch (thus continuing to cache cases where stop_at_first_file is true meaning we continue to cache whether directories are empty), then the untracked cache breakage in t7063 becomes more prevalant. With my change to close_cached_dir() and the other changes to avoid traversing directories 2^n times in this patch, I not only avoid making the untracked_cache breakage in t7063 worse but actually fix the existing breakage. Update the test results in t7063 to no longer expect check_only cache entries, to reflect that we have to do a bit more work in terms of how many directories we have to open, and to reflect that we fixed the 1/3 of tests that were broken in that testsuite. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 157 ++++++++++++++++++++---------- t/t7063-status-untracked-cache.sh | 100 ++++++------------- 2 files changed, 138 insertions(+), 119 deletions(-) diff --git a/dir.c b/dir.c index d9bcb7e19b6..803e2851964 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo = 0; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; + int nested_repo = 0, old_ignored_nr, stop_early; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -1711,18 +1717,101 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* + * We only need to recurse into untracked/ignored directories if + * either of the following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if + * the directory is empty) + */ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; + /* + * If we only want to determine if dirname is empty, then we can + * stop at the first file we find underneath that directory rather + * than continuing to recurse beyond it. If DIR_SHOW_IGNORED_TOO + * is set, then we want MORE than just determining if dirname is + * empty. + */ + stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files + * underneath. To do so, we'll save the current ignored_nr, and + * pop all the ones added after it if it turns out the entire + * directory is ignored. + */ + old_ignored_nr = dir->ignored_nr; + + /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, + stop_early, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { + int i; + + /* + * When stop_early is set, read_directory_recursive() will + * never return path_untracked regardless of whether + * underlying paths were untracked or ignored (because + * returning early means it excluded some paths, or + * something like that -- see commit 5aaa7fd39aaf ("Improve + * performance of git status --ignored", 2017-09-18)). + * However, we're not really concerned with the status of + * files under the directory, we just wanted to know + * whether the directory was empty (state == path_none) or + * not (state == path_excluded), and if not, we'd return + * our original status based on whether the untracked + * directory matched an exclusion pattern. + */ + if (stop_early) + state = excluded ? path_excluded : path_untracked; + + else { + /* + * When + * !stop_early && state == path_excluded + * then all paths under dirname were ignored. For + * this case, git status --porcelain wants to just + * list the directory itself as ignored and not + * list the individual paths underneath. Remove + * the individual paths underneath. + */ + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) + free(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } /* - * If this is an excluded directory, then we only need to check if - * the directory contains any files. + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. */ - return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, excluded, pathspec); + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; + + /* + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. + */ + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; + + return state; } /* @@ -1870,6 +1959,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); @@ -2102,7 +2196,7 @@ static int read_cached_dir(struct cached_dir *cdir) return -1; } -static void close_cached_dir(struct cached_dir *cdir) +static void close_cached_dir(struct cached_dir *cdir, int stop_at_first_file) { if (cdir->fdir) closedir(cdir->fdir); @@ -2110,7 +2204,7 @@ static void close_cached_dir(struct cached_dir *cdir) * We have gone through this directory and found no untracked * entries. Mark it valid. */ - if (cdir->untracked) { + if (!stop_at_first_file && cdir->untracked) { cdir->untracked->valid = 1; cdir->untracked->recurse = 1; } @@ -2175,14 +2269,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, int stop_at_first_file, const struct pathspec *pathspec) { /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * WARNING: Do NOT recurse unless path_recurse is returned from + * treat_path(). Recursing on any other return value + * can result in exponential slowdown. */ - struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; struct strbuf path = STRBUF_INIT; @@ -2204,13 +2294,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, dir_state = state; /* recurse into subdir if instructed by treat_path */ - if ((state == path_recurse) || - ((state == path_untracked) && - (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && - ((dir->flags & DIR_SHOW_IGNORED_TOO) || - (pathspec && - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { + if (state == path_recurse) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -2266,7 +2350,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, istate, &path, baselen, pathspec, state); } - close_cached_dir(&cdir); + close_cached_dir(&cdir, stop_at_first_file); out: strbuf_release(&path); @@ -2294,15 +2378,6 @@ static int treat_leading_path(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec) { - /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in read_directory_recursive(). See 777b420347 (dir: - * synchronize treat_leading_path() and read_directory_recursive(), - * 2019-12-19) and its parent commit for details. - */ - struct strbuf sb = STRBUF_INIT; struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; @@ -2353,23 +2428,7 @@ static int treat_leading_path(struct dir_struct *dir, strbuf_reset(&subdir); strbuf_add(&subdir, path+prevlen, baselen-prevlen); cdir.d_name = subdir.buf; - state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, - pathspec); - if (state == path_untracked && - resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && - (dir->flags & DIR_SHOW_IGNORED_TOO || - do_match_pathspec(istate, pathspec, sb.buf, sb.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { - if (!match_pathspec(istate, pathspec, sb.buf, sb.len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - add_path_to_appropriate_result_list(dir, NULL, &cdir, - istate, - &sb, baselen, - pathspec, state); - state = path_recurse; - } + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state != path_recurse) break; /* do not recurse into it */ diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 41705ec1526..72b6877837b 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -84,10 +84,6 @@ dthree/ dtwo/ three /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_expect_success 'status first time (empty cache)' ' @@ -147,10 +143,10 @@ A two EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 0 directory invalidation: 1 -opendir: 1 +opendir: 3 EOF test_cmp ../trace.expect ../trace @@ -169,10 +165,6 @@ dtwo/ four three /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -194,7 +186,7 @@ A two EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 1 directory invalidation: 1 opendir: 4 @@ -216,15 +208,11 @@ dthree/ dtwo/ three /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -three -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' -test_expect_failure 'new info/exclude invalidates everything' ' +test_expect_success 'new info/exclude invalidates everything' ' avoid_racy && echo three >>.git/info/exclude && : >../trace && @@ -240,7 +228,7 @@ A two EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 1 directory invalidation: 0 opendir: 4 @@ -248,7 +236,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'verify untracked cache dump' ' +test_expect_success 'verify untracked cache dump' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -260,9 +248,6 @@ flags 00000006 dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -277,14 +262,11 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' -test_expect_failure 'status after the move' ' +test_expect_success 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && @@ -298,15 +280,15 @@ A one EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 0 directory invalidation: 0 -opendir: 1 +opendir: 3 EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'verify untracked cache dump' ' +test_expect_success 'verify untracked cache dump' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -319,9 +301,6 @@ dthree/ dtwo/ two /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -336,14 +315,11 @@ exclude_per_dir .gitignore flags 00000006 / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' -test_expect_failure 'status after the move' ' +test_expect_success 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && @@ -357,15 +333,15 @@ A two EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 0 directory invalidation: 0 -opendir: 1 +opendir: 3 EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'verify untracked cache dump' ' +test_expect_success 'verify untracked cache dump' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -377,9 +353,6 @@ flags 00000006 dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -392,7 +365,7 @@ test_expect_success 'set up for sparse checkout testing' ' git commit -m "first commit" ' -test_expect_failure 'status after commit' ' +test_expect_success 'status after commit' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && @@ -403,15 +376,15 @@ test_expect_failure 'status after commit' ' EOF test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 0 directory invalidation: 0 -opendir: 2 +opendir: 4 EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'untracked cache correct after commit' ' +test_expect_success 'untracked cache correct after commit' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -423,9 +396,6 @@ flags 00000006 dthree/ dtwo/ /done/ 0000000000000000000000000000000000000000 recurse valid -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' @@ -450,7 +420,7 @@ test_expect_success 'create/modify files, some of which are gitignored' ' sync_mtime ' -test_expect_failure 'test sparse status with untracked cache' ' +test_expect_success 'test sparse status with untracked cache' ' : >../trace && avoid_racy && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -464,15 +434,15 @@ test_expect_failure 'test sparse status with untracked cache' ' EOF test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && -node creation: 0 +node creation: 2 gitignore invalidation: 1 directory invalidation: 2 -opendir: 2 +opendir: 4 EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'untracked cache correct after status' ' +test_expect_success 'untracked cache correct after status' ' test-tool dump-untracked-cache >../actual && cat >../expect <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -485,14 +455,11 @@ dthree/ dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect ../actual ' -test_expect_failure 'test sparse status again with untracked cache' ' +test_expect_success 'test sparse status again with untracked cache' ' avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -520,7 +487,7 @@ test_expect_success 'set up for test of subdir and sparse checkouts' ' echo "sub" > done/sub/sub/file ' -test_expect_failure 'test sparse status with untracked cache and subdir' ' +test_expect_success 'test sparse status with untracked cache and subdir' ' avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -543,7 +510,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'verify untracked cache dump (sparse/subdirs)' ' +test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' test-tool dump-untracked-cache >../actual && cat >../expect-from-test-dump <<EOF && info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 @@ -557,18 +524,11 @@ dtwo/ /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid five sub/ -/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -sub/ -/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid -file -/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid -two EOF test_cmp ../expect-from-test-dump ../actual ' -test_expect_failure 'test sparse status again with untracked cache and subdir' ' +test_expect_success 'test sparse status again with untracked cache and subdir' ' avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ @@ -583,7 +543,7 @@ EOF test_cmp ../trace.expect ../trace ' -test_expect_failure 'move entry in subdir from untracked to cached' ' +test_expect_success 'move entry in subdir from untracked to cached' ' git add dtwo/two && git status --porcelain >../status.actual && cat >../status.expect <<EOF && @@ -597,7 +557,7 @@ EOF test_cmp ../status.expect ../status.actual ' -test_expect_failure 'move entry in subdir from cached to untracked' ' +test_expect_success 'move entry in subdir from cached to untracked' ' git rm --cached dtwo/two && git status --porcelain >../status.actual && cat >../status.expect <<EOF && @@ -624,7 +584,7 @@ test_expect_success 'git status does not change anything' ' test_cmp ../expect-no-uc ../actual ' -test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' ' +test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' ' git config core.untrackedCache true && test-tool dump-untracked-cache >../actual && test_cmp ../expect-no-uc ../actual && @@ -657,7 +617,7 @@ test_expect_success 'using --untracked-cache does not fail when core.untrackedCa test_cmp ../expect-empty ../actual ' -test_expect_failure 'setting core.untrackedCache to keep' ' +test_expect_success 'setting core.untrackedCache to keep' ' git config core.untrackedCache keep && git update-index --untracked-cache && test-tool dump-untracked-cache >../actual && -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache 2020-03-25 19:31 ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget @ 2020-03-26 13:13 ` Derrick Stolee 0 siblings, 0 replies; 76+ messages in thread From: Derrick Stolee @ 2020-03-26 13:13 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 3/25/2020 3:31 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > dir's read_directory_recursive() naturally operates recursively in order > to walk the directory tree. Treating of directories is sometimes weird > because there are so many different permutations about how to handle > directories. Some examples: > > * 'git ls-files -o --directory' only needs to know that a directory > itself is untracked; it doesn't need to recurse into it to see what > is underneath. > > * 'git status' needs to recurse into an untracked directory, but only > to determine whether or not it is empty. If there are no files > underneath, the directory itself will be omitted from the output. > If it is not empty, only the directory will be listed. > > * 'git status --ignored' needs to recurse into untracked directories > and report all the ignored entries and then report the directory as > untracked -- UNLESS all the entries under the directory are > ignored, in which case we don't print any of the entries under the > directory and just report the directory itself as ignored. (Note > that although this forces us to walk all untracked files underneath > the directory as well, we strip them from the output, except for > users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) > > * For 'git clean', we may need to recurse into a directory that > doesn't match any specified pathspecs, if it's possible that there > is an entry underneath the directory that can match one of the > pathspecs. In such a case, we need to be careful to omit the > directory itself from the list of paths (see commit 404ebceda01c > ("dir: also check directories for matching pathspecs", 2019-09-17)) > > Part of the tension noted above is that the treatment of a directory can > change based on the files within it, and based on the various settings > in dir->flags. Trying to keep this in mind while reading over the code, > it is easy to think in terms of "treat_directory() tells us what to do > with a directory, and read_directory_recursive() is the thing that > recurses". Since we need to look into a directory to know how to treat > it, though, it is quite easy to decide to (also) recurse into the > directory from treat_directory() by adding a read_directory_recursive() > call. Adding such a call is actually fine, IF we make sure that > read_directory_recursive() does not also recurse into that same > directory. > > Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs > for ignored files", 2017-05-18), added exactly such a case to the code, > meaning we'd have two calls to read_directory_recursive() for an > untracked directory. So, if we had a file named > one/two/three/four/five/somefile.txt > and nothing in one/ was tracked, then 'git status --ignored' would > call read_directory_recursive() twice on the directory 'one/', and > each of those would call read_directory_recursive() twice on the > directory 'one/two/', and so on until read_directory_recursive() was > called 2^5 times for 'one/two/three/four/five/'. > > Avoid calling read_directory_recursive() twice per level by moving a > lot of the special logic into treat_directory(). > > Since dir.c is somewhat complex, extra cruft built up around this over > time. While trying to unravel it, I noticed several instances where the > first call to read_directory_recursive() would return e.g. > path_untracked for some directory and a later one would return e.g. > path_none, despite the fact that the directory clearly should have been > considered untracked. The code happened to work due to the side-effect > from the first invocation of adding untracked entries to dir->entries; > this allowed it to get the correct output despite the supposed override > in return value by the later call. > > I am somewhat concerned that there are still bugs and maybe even > testcases with the wrong expectation. For my part, I recently set up draft PRs to test the 'next' branch in Scalar [1] and VFS for Git [2]. I'll create a Git installer using these patches as well so I can run our functional test suite for a little extra check of the behavior here. [1] https://github.com/microsoft/scalar/pull/354/files [2] https://github.com/microsoft/VFSForGit/pull/1645 > I have tried to carefully > document treat_directory() since it becomes more complex after this > change (though much of this complexity came from elsewhere that probably > deserved better comments to begin with). I do enjoy your warning comments. > However, much of my work felt > more like a game of whackamole while attempting to make the code match > the existing regression tests than an attempt to create an > implementation that matched some clear design. That seems wrong to me, > but the rules of existing behavior had so many special cases that I had > a hard time coming up with some overarching rules about what correct > behavior is for all cases, forcing me to hope that the regression tests > are correct and sufficient. Such a hope seems likely to be ill-founded, > given my experience with dir.c-related testcases in the last few months: > > Examples where the documentation was hard to parse or even just wrong: > * 3aca58045f4f (git-clean.txt: do not claim we will delete files with > -n/--dry-run, 2019-09-17) > * 09487f2cbad3 (clean: avoid removing untracked files in a nested git > repository, 2019-09-17) > * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) > Examples where testcases were declared wrong and changed: > * 09487f2cbad3 (clean: avoid removing untracked files in a nested git > repository, 2019-09-17) > * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) > * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within > leading directories", 2019-12-10) > Examples where testcases were clearly inadequate: > * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an > ignored file breakage, 2019-08-25) > * 7541cc530239 (t7300: add testcases showing failure to clean specified > pathspecs, 2019-09-17) > * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, > 2019-09-17) > * 404ebceda01c (dir: also check directories for matching pathspecs, > 2019-09-17) > * 09487f2cbad3 (clean: avoid removing untracked files in a nested git > repository, 2019-09-17) > * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) > * 452efd11fbf6 (t3011: demonstrate directory traversal failures, > 2019-12-10) > * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) > Examples where "correct behavior" was unclear to everyone: > https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ > Other commits of note: > * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) Thanks for all of these pointers. These will be helpful if we ever do find a regression that bisects to this patch. > However, on the positive side, it does make the code much faster. For > the following simple shell loop in an empty repository: > > for depth in $(seq 10 25) > do > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) > rm -rf dir > mkdir -p $dirs > >$dirs/untracked-file > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null > done > > I saw the following timings, in seconds (note that the numbers are a > little noisy from run-to-run, but the trend is very clear with every > run): > > 10: 0.03 > 11: 0.05 > 12: 0.08 > 13: 0.19 > 14: 0.29 > 15: 0.50 > 16: 1.05 > 17: 2.11 > 18: 4.11 > 19: 8.60 > 20: 17.55 > 21: 33.87 > 22: 68.71 > 23: 140.05 > 24: 274.45 > 25: 551.15 These are still impressive. > For the above run, using strace I can look for the number of untracked > directories opened and can verify that it matches the expected > 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). > > After this fix, with strace I can verify that the number of untracked > directories that are opened drops to just $depth, and the timings all > drop to 0.00. In fact, it isn't until a depth of 190 nested directories > that it sometimes starts reporting a time of 0.01 seconds and doesn't > consistently report 0.01 seconds until there are 240 nested directories. > The previous code would have taken > 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS > to have completed the 240 nested directories case. It's not often > that you get to speed something up by a factor of 3*10^69. > > Finally, this also fixes the untracked cache, as noted by the test fixes > in t7063. Unfortunately, it does so by passing stop_at_first_file to > close_cached_dir() in order to disable the caching of whether > directories were empty (this caching was only relevant for directories > that we knew we didn't need to walk all the entries under but just > needed to know whether the directory had any entries within it in order > to know if the directory itself should be marked as path_none or > path_untracked). I'm not convinced that disabling the is-the-dir-empty > check is necessary; there is probably some way to still cache that and > not get erroneous results. However, I have not figured out how to do > so. If I revert the change to close_cached_dir() in this patch (thus > continuing to cache cases where stop_at_first_file is true meaning we > continue to cache whether directories are empty), then the untracked > cache breakage in t7063 becomes more prevalant. With my change to > close_cached_dir() and the other changes to avoid traversing directories > 2^n times in this patch, I not only avoid making the untracked_cache > breakage in t7063 worse but actually fix the existing breakage. Update > the test results in t7063 to no longer expect check_only cache entries, > to reflect that we have to do a bit more work in terms of how many > directories we have to open, and to reflect that we fixed the 1/3 of > tests that were broken in that testsuite. We use the untracked cache in Scalar, so we should get some coverage of that, too. I'll let you know when the tests are done, and then do a review. -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2020-03-25 19:31 ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget ` (8 more replies) 7 siblings, 9 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren This series provides some "modest" speedups (see last commit message), and should allow 'git status --ignored' to complete in a more reasonable timeframe for Martin Melka (see https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ ). Changes since v3: * Turns out I was wrong about the untracked cache stuff and had some bugs around untracked directories with nothing bug ignored sub-entries. * First patch now is no longer a change of expectation of the untracked cache, but some more thorough testing/verification in that test that helped explain my misunderstanding and uncover the bug in my refactor. * Corrected the check_only and stop_at_first_file logic in the last patch and added a big comment explaining how/why it all works. Also stopped disabling part of the untracked cache in the same patch, and undid all the changes to t7063 in that patch. Stuff still missing from v4: * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ which I think would make the code cleaner & clearer. I guess I'm leaving that for future work. As per the commit message of the final patch, this series has some risk. Extra eyes would be greatly appreciated; one pair already helped me find one bug. Also, we should probably merge it early in some cycle, either this one or a later one. Derrick Stolee (1): dir: refactor treat_directory to clarify control flow Elijah Newren (6): t7063: more thorough status checking dir: fix simple typo in comment dir: consolidate treat_path() and treat_one_path() dir: fix broken comment dir: fix confusion based on variable tense dir: replace exponential algorithm with a linear one dir.c | 349 ++++++++++++++++++------------ t/t7063-status-untracked-cache.sh | 52 +++++ 2 files changed, 258 insertions(+), 143 deletions(-) base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v4 Pull-Request: https://github.com/git/git/pull/700 Range-diff vs v3: 1: d4fe5d33577 ! 1: 752403e339b t7063: correct broken test expectation @@ -1,61 +1,23 @@ Author: Elijah Newren <newren@gmail.com> - t7063: correct broken test expectation + t7063: more thorough status checking - The untracked cache is caching wrong information, resulting in commands - like `git status --porcelain` producing erroneous answers. The tests in - t7063 actually have a wide enough test to catch a relevant case, in - particular surrounding the directory 'dthree/', but it appears the - answers were not checked quite closely enough and the tests were coded - with the wrong expectation. Once the wrong info got into the cache in - an early test, since later tests built on it, many others have a wrong - expectation as well. This affects just over a third of the tests in - t7063. + It turns out the t7063 has some testcases that even without using the + untracked cache cover situations that nothing else in the testsuite + handles. Checking the results of + git status --porcelain + both with and without the untracked cache, and comparing both against + our expected results helped uncover a critical bug in some dir.c + restructuring. - The error can be seen starting at t7063.12 (the first one switched from - expect_success to expect_failure in this patch). That test runs in a - directory with the following files present: - done/one - dthree/three - dtwo/two - four - .gitignore - one - three - two + Unfortunately, it's not easy to run status and tell it to ignore the + untracked cache; the only knob we have it to instruct it to *delete* + (and ignore) the untracked cache. - Of those files, the following files are tracked: - done/one - one - two - - and the contents of .gitignore are: - four - - and the contents of .git/info/exclude are: - three - - And there is no core.excludesfile. Therefore, the following should be - untracked: - .gitignore - dthree/ - dtwo/ - Indeed, these three paths are reported if you run - git ls-files -o --directory --exclude-standard - within this directory. However, 'git status --porcelain' was reporting - for this test: - A done/one - A one - A two - ?? .gitignore - ?? dtwo/ - which was clearly wrong -- dthree/ should also be listed as untracked. - This appears to have been broken since the test was introduced with - commit a3ddcefd97 ("t7063: tests for untracked cache", 2015-03-08). - Correct the test to expect the right output, marking the test as failed - for now. Make the same change throughout the remainder of the testsuite - to reflect that dthree/ remains an untracked directory throughout and - should be recognized as such. + Create a simple helper that will create a clone of the index that is + missing the untracked cache bits, and use it to compare that the results + with the untracked cache match the results we get without the untracked + cache. Signed-off-by: Elijah Newren <newren@gmail.com> @@ -63,279 +25,230 @@ --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ - test_cmp ../expect ../actual - ' - --test_expect_success 'new info/exclude invalidates everything' ' -+test_expect_failure 'new info/exclude invalidates everything' ' - avoid_racy && - echo three >>.git/info/exclude && + test_must_be_empty ../status.actual + } + ++# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can ++# compare commands side-by-side, e.g. ++# iuc status --porcelain >expect && ++# git status --porcelain >actual && ++# test_cmp expect actual ++iuc() { ++ git ls-files -s >../current-index-entries ++ git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries ++ ++ GIT_INDEX_FILE=.git/tmp_index ++ export GIT_INDEX_FILE ++ git update-index --index-info <../current-index-entries ++ git update-index --skip-worktree $(cat ../current-sparse-entries) ++ ++ git -c core.untrackedCache=false "$@" ++ ret=$? ++ ++ rm ../current-index-entries ++ rm $GIT_INDEX_FILE ++ unset GIT_INDEX_FILE ++ ++ return $ret ++} ++ + test_lazy_prereq UNTRACKED_CACHE ' + { git update-index --test-untracked-cache; ret=$?; } && + test $ret -ne 1 +@@ : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && ++ iuc status --porcelain >../status.iuc && ++ test_cmp ../status.expect ../status.iuc && + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 3 +@@ + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && ++ iuc status --porcelain >../status.iuc && ++ test_cmp ../status.expect ../status.iuc && + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && + A done/one A one - A two - ?? .gitignore -+?? dthree/ - ?? dtwo/ +@@ + ?? four + ?? three EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'verify untracked cache dump' ' -+test_expect_failure 'verify untracked cache dump' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && + A done/one + A one @@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid - .gitignore -+dthree/ - dtwo/ - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid + ?? dtwo/ + ?? three + EOF ++ test_cmp ../status.expect ../status.iuc && + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../expect ../actual - ' - --test_expect_success 'status after the move' ' -+test_expect_failure 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && -@@ ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && A done/one A one +@@ ?? .gitignore -+?? dthree/ ?? dtwo/ - ?? two EOF ++ test_cmp ../status.expect ../status.iuc && + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'verify untracked cache dump' ' -+test_expect_failure 'verify untracked cache dump' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 + : >../trace && + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../actual && ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && + A done/one + A one @@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid - .gitignore -+dthree/ - dtwo/ - two - /done/ 0000000000000000000000000000000000000000 recurse valid + ?? dtwo/ + ?? two + EOF ++ test_cmp ../status.expect ../status.iuc && + test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../expect ../actual - ' - --test_expect_success 'status after the move' ' -+test_expect_failure 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && -@@ ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && + A done/one A one - A two +@@ ?? .gitignore -+?? dthree/ ?? dtwo/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'verify untracked cache dump' ' -+test_expect_failure 'verify untracked cache dump' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid - .gitignore -+dthree/ - dtwo/ - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -@@ - git commit -m "first commit" - ' - --test_expect_success 'status after commit' ' -+test_expect_failure 'status after commit' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && ++ iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && ?? .gitignore -+?? dthree/ ?? dtwo/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'untracked cache correct after commit' ' -+test_expect_failure 'untracked cache correct after commit' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid - .gitignore -+dthree/ - dtwo/ - /done/ 0000000000000000000000000000000000000000 recurse valid - /dthree/ 0000000000000000000000000000000000000000 recurse check_only valid -@@ - sync_mtime - ' - --test_expect_success 'test sparse status with untracked cache' ' -+test_expect_failure 'test sparse status with untracked cache' ' - : >../trace && avoid_racy && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ + git status --porcelain >../status.actual && ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && M done/two ?? .gitignore ?? done/five -+?? dthree/ ?? dtwo/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'untracked cache correct after status' ' -+test_expect_failure 'untracked cache correct after status' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid - .gitignore -+dthree/ - dtwo/ - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid - five -@@ - test_cmp ../expect ../actual - ' - --test_expect_success 'test sparse status again with untracked cache' ' -+test_expect_failure 'test sparse status again with untracked cache' ' - avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ + git status --porcelain >../status.actual && ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && M done/two ?? .gitignore ?? done/five -+?? dthree/ ?? dtwo/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - echo "sub" > done/sub/sub/file - ' - --test_expect_success 'test sparse status with untracked cache and subdir' ' -+test_expect_failure 'test sparse status with untracked cache and subdir' ' - avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ + git status --porcelain >../status.actual && ++ iuc status --porcelain >../status.iuc && + cat >../status.expect <<EOF && + M done/two ?? .gitignore - ?? done/five +@@ ?? done/sub/ -+?? dthree/ ?? dtwo/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + cat >../trace.expect <<EOF && + node creation: 2 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' -+test_expect_failure 'verify untracked cache dump (sparse/subdirs)' ' - test-tool dump-untracked-cache >../actual && - cat >../expect-from-test-dump <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse valid - .gitignore -+dthree/ - dtwo/ - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid - five -@@ - test_cmp ../expect-from-test-dump ../actual - ' - --test_expect_success 'test sparse status again with untracked cache and subdir' ' -+test_expect_failure 'test sparse status again with untracked cache and subdir' ' - avoid_racy && : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + git status --porcelain >../status.actual && ++ iuc status --porcelain >../status.iuc && ++ test_cmp ../status.expect ../status.iuc && + test_cmp ../status.expect ../status.actual && + cat >../trace.expect <<EOF && + node creation: 0 @@ - test_cmp ../trace.expect ../trace - ' - --test_expect_success 'move entry in subdir from untracked to cached' ' -+test_expect_failure 'move entry in subdir from untracked to cached' ' + test_expect_success 'move entry in subdir from untracked to cached' ' git add dtwo/two && git status --porcelain >../status.actual && ++ iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && + M done/two + A dtwo/two @@ - ?? .gitignore ?? done/five ?? done/sub/ -+?? dthree/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual ' --test_expect_success 'move entry in subdir from cached to untracked' ' -+test_expect_failure 'move entry in subdir from cached to untracked' ' + test_expect_success 'move entry in subdir from cached to untracked' ' git rm --cached dtwo/two && git status --porcelain >../status.actual && ++ iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && -@@ + M done/two ?? .gitignore - ?? done/five +@@ ?? done/sub/ -+?? dthree/ ?? dtwo/ EOF ++ test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual -@@ - test_cmp ../expect-no-uc ../actual - ' - --test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' ' -+test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' ' - git config core.untrackedCache true && - test-tool dump-untracked-cache >../actual && - test_cmp ../expect-no-uc ../actual && -@@ - test_cmp ../expect-empty ../actual ' --test_expect_success 'setting core.untrackedCache to keep' ' -+test_expect_failure 'setting core.untrackedCache to keep' ' - git config core.untrackedCache keep && - git update-index --untracked-cache && - test-tool dump-untracked-cache >../actual && 2: b20bc3b9afd = 2: a4287d690be dir: fix simple typo in comment 3: fa9035949e0 = 3: 48f37e5b114 dir: consolidate treat_path() and treat_one_path() 4: 02e652d1869 = 4: b5ad1939379 dir: fix broken comment 5: 705c008d993 = 5: 2603c1a9d13 dir: fix confusion based on variable tense 6: f5d69102946 = 6: 576f364329d dir: refactor treat_directory to clarify control flow 7: 6cfca619e2c ! 7: e20525429e5 dir: replace exponential algorithm with a linear one, fix untracked cache @@ -1,6 +1,6 @@ Author: Elijah Newren <newren@gmail.com> - dir: replace exponential algorithm with a linear one, fix untracked cache + dir: replace exponential algorithm with a linear one dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird @@ -161,28 +161,6 @@ to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. - Finally, this also fixes the untracked cache, as noted by the test fixes - in t7063. Unfortunately, it does so by passing stop_at_first_file to - close_cached_dir() in order to disable the caching of whether - directories were empty (this caching was only relevant for directories - that we knew we didn't need to walk all the entries under but just - needed to know whether the directory had any entries within it in order - to know if the directory itself should be marked as path_none or - path_untracked). I'm not convinced that disabling the is-the-dir-empty - check is necessary; there is probably some way to still cache that and - not get erroneous results. However, I have not figured out how to do - so. If I revert the change to close_cached_dir() in this patch (thus - continuing to cache cases where stop_at_first_file is true meaning we - continue to cache whether directories are empty), then the untracked - cache breakage in t7063 becomes more prevalant. With my change to - close_cached_dir() and the other changes to avoid traversing directories - 2^n times in this patch, I not only avoid making the untracked_cache - breakage in t7063 worse but actually fix the existing breakage. Update - the test results in t7063 to no longer expect check_only cache entries, - to reflect that we have to do a bit more work in terms of how many - directories we have to open, and to reflect that we fixed the 1/3 of - tests that were broken in that testsuite. - Signed-off-by: Elijah Newren <newren@gmail.com> diff --git a/dir.c b/dir.c @@ -199,7 +177,7 @@ + * you CAN'T DO BOTH. + */ + enum path_treatment state; -+ int nested_repo = 0, old_ignored_nr, stop_early; ++ int nested_repo = 0, old_ignored_nr, check_only, stop_early; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -220,16 +198,32 @@ return excluded ? path_excluded : path_untracked; + /* -+ * If we only want to determine if dirname is empty, then we can -+ * stop at the first file we find underneath that directory rather -+ * than continuing to recurse beyond it. If DIR_SHOW_IGNORED_TOO -+ * is set, then we want MORE than just determining if dirname is -+ * empty. ++ * If we have we don't want to know the all the paths under an ++ * untracked or ignored directory, we still need to go into the ++ * directory to determine if it is empty (because an empty directory ++ * should be path_none instead of path_excluded or path_untracked). + */ -+ stop_early = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && ++ check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* ++ * However, there's another optimization possible as a subset of ++ * check_only, based on the cases we have to consider: ++ * A) Directory matches no exclude patterns: ++ * * Directory is empty => path_none ++ * * Directory has an untracked file under it => path_untracked ++ * * Directory has only ignored files under it => path_excluded ++ * B) Directory matches an exclude pattern: ++ * * Directory is empty => path_none ++ * * Directory has an untracked file under it => path_excluded ++ * * Directory has only ignored files under it => path_excluded ++ * In case A, we can exit as soon as we've found an untracked ++ * file but otherwise have to walk all files. In case B, though, ++ * we can stop at the first file we find under the directory. ++ */ ++ stop_early = check_only && excluded; ++ ++ /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files @@ -243,7 +237,7 @@ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, -+ stop_early, stop_early, pathspec); ++ check_only, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { @@ -281,25 +275,25 @@ + dir->ignored_nr = old_ignored_nr; + } + } - - /* -- * If this is an excluded directory, then we only need to check if -- * the directory contains any files. ++ ++ /* + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. - */ -- return read_directory_recursive(dir, istate, dirname, len, -- untracked, 1, excluded, pathspec); ++ */ + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; -+ -+ /* + + /* +- * If this is an excluded directory, then we only need to check if +- * the directory contains any files. + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. -+ */ + */ +- return read_directory_recursive(dir, istate, dirname, len, +- untracked, 1, excluded, pathspec); + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, @@ -322,24 +316,6 @@ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); -@@ - return -1; - } - --static void close_cached_dir(struct cached_dir *cdir) -+static void close_cached_dir(struct cached_dir *cdir, int stop_at_first_file) - { - if (cdir->fdir) - closedir(cdir->fdir); -@@ - * We have gone through this directory and found no untracked - * entries. Mark it valid. - */ -- if (cdir->untracked) { -+ if (!stop_at_first_file && cdir->untracked) { - cdir->untracked->valid = 1; - cdir->untracked->recurse = 1; - } @@ int stop_at_first_file, const struct pathspec *pathspec) { @@ -373,15 +349,6 @@ struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, -@@ - istate, &path, baselen, - pathspec, state); - } -- close_cached_dir(&cdir); -+ close_cached_dir(&cdir, stop_at_first_file); - out: - strbuf_release(&path); - @@ const char *path, int len, const struct pathspec *pathspec) @@ -423,342 +390,3 @@ if (state != path_recurse) break; /* do not recurse into it */ - - diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh - --- a/t/t7063-status-untracked-cache.sh - +++ b/t/t7063-status-untracked-cache.sh -@@ - dtwo/ - three - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --three --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - - test_expect_success 'status first time (empty cache)' ' -@@ - EOF - test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 0 - directory invalidation: 1 --opendir: 1 -+opendir: 3 - EOF - test_cmp ../trace.expect ../trace - -@@ - four - three - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --three --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' -@@ - EOF - test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 1 - directory invalidation: 1 - opendir: 4 -@@ - dtwo/ - three - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --three --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' - --test_expect_failure 'new info/exclude invalidates everything' ' -+test_expect_success 'new info/exclude invalidates everything' ' - avoid_racy && - echo three >>.git/info/exclude && - : >../trace && -@@ - EOF - test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 1 - directory invalidation: 0 - opendir: 4 -@@ - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'verify untracked cache dump' ' -+test_expect_success 'verify untracked cache dump' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - dthree/ - dtwo/ - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' -@@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' - --test_expect_failure 'status after the move' ' -+test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ - git status --porcelain >../actual && -@@ - EOF - test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 0 - directory invalidation: 0 --opendir: 1 -+opendir: 3 - EOF - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'verify untracked cache dump' ' -+test_expect_success 'verify untracked cache dump' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - dtwo/ - two - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' -@@ - flags 00000006 - / e6fcc8f2ee31bae321d66afd183fcb7237afae6e recurse - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' - --test_expect_failure 'status after the move' ' -+test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ - git status --porcelain >../actual && -@@ - EOF - test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 0 - directory invalidation: 0 --opendir: 1 -+opendir: 3 - EOF - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'verify untracked cache dump' ' -+test_expect_success 'verify untracked cache dump' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - dthree/ - dtwo/ - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' -@@ - git commit -m "first commit" - ' - --test_expect_failure 'status after commit' ' -+test_expect_success 'status after commit' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ - git status --porcelain >../actual && -@@ - EOF - test_cmp ../status.expect ../actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 0 - directory invalidation: 0 --opendir: 2 -+opendir: 4 - EOF - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'untracked cache correct after commit' ' -+test_expect_success 'untracked cache correct after commit' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - dthree/ - dtwo/ - /done/ 0000000000000000000000000000000000000000 recurse valid --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' -@@ - sync_mtime - ' - --test_expect_failure 'test sparse status with untracked cache' ' -+test_expect_success 'test sparse status with untracked cache' ' - : >../trace && - avoid_racy && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ - EOF - test_cmp ../status.expect ../status.actual && - cat >../trace.expect <<EOF && --node creation: 0 -+node creation: 2 - gitignore invalidation: 1 - directory invalidation: 2 --opendir: 2 -+opendir: 4 - EOF - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'untracked cache correct after status' ' -+test_expect_success 'untracked cache correct after status' ' - test-tool dump-untracked-cache >../actual && - cat >../expect <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - dtwo/ - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid - five --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect ../actual - ' - --test_expect_failure 'test sparse status again with untracked cache' ' -+test_expect_success 'test sparse status again with untracked cache' ' - avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ - echo "sub" > done/sub/sub/file - ' - --test_expect_failure 'test sparse status with untracked cache and subdir' ' -+test_expect_success 'test sparse status with untracked cache and subdir' ' - avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'verify untracked cache dump (sparse/subdirs)' ' -+test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' - test-tool dump-untracked-cache >../actual && - cat >../expect-from-test-dump <<EOF && - info/exclude 13263c0978fb9fad16b2d580fb800b6d811c3ff0 -@@ - /done/ 1946f0437f90c5005533cbe1736a6451ca301714 recurse valid - five - sub/ --/done/sub/ 0000000000000000000000000000000000000000 recurse check_only valid --sub/ --/done/sub/sub/ 0000000000000000000000000000000000000000 recurse check_only valid --file --/dthree/ 0000000000000000000000000000000000000000 recurse check_only valid --/dtwo/ 0000000000000000000000000000000000000000 recurse check_only valid --two - EOF - test_cmp ../expect-from-test-dump ../actual - ' - --test_expect_failure 'test sparse status again with untracked cache and subdir' ' -+test_expect_success 'test sparse status again with untracked cache and subdir' ' - avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ -@@ - test_cmp ../trace.expect ../trace - ' - --test_expect_failure 'move entry in subdir from untracked to cached' ' -+test_expect_success 'move entry in subdir from untracked to cached' ' - git add dtwo/two && - git status --porcelain >../status.actual && - cat >../status.expect <<EOF && -@@ - test_cmp ../status.expect ../status.actual - ' - --test_expect_failure 'move entry in subdir from cached to untracked' ' -+test_expect_success 'move entry in subdir from cached to untracked' ' - git rm --cached dtwo/two && - git status --porcelain >../status.actual && - cat >../status.expect <<EOF && -@@ - test_cmp ../expect-no-uc ../actual - ' - --test_expect_failure 'setting core.untrackedCache to true and using git status creates the cache' ' -+test_expect_success 'setting core.untrackedCache to true and using git status creates the cache' ' - git config core.untrackedCache true && - test-tool dump-untracked-cache >../actual && - test_cmp ../expect-no-uc ../actual && -@@ - test_cmp ../expect-empty ../actual - ' - --test_expect_failure 'setting core.untrackedCache to keep' ' -+test_expect_success 'setting core.untrackedCache to keep' ' - git config core.untrackedCache keep && - git update-index --untracked-cache && - test-tool dump-untracked-cache >../actual && -- gitgitgadget ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v4 1/7] t7063: more thorough status checking 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-27 13:09 ` Derrick Stolee 2020-03-26 21:27 ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget ` (7 subsequent siblings) 8 siblings, 1 reply; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> It turns out the t7063 has some testcases that even without using the untracked cache cover situations that nothing else in the testsuite handles. Checking the results of git status --porcelain both with and without the untracked cache, and comparing both against our expected results helped uncover a critical bug in some dir.c restructuring. Unfortunately, it's not easy to run status and tell it to ignore the untracked cache; the only knob we have it to instruct it to *delete* (and ignore) the untracked cache. Create a simple helper that will create a clone of the index that is missing the untracked cache bits, and use it to compare that the results with the untracked cache match the results we get without the untracked cache. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 190ae149cf3..156d06c34e8 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -30,6 +30,30 @@ status_is_clean() { test_must_be_empty ../status.actual } +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can +# compare commands side-by-side, e.g. +# iuc status --porcelain >expect && +# git status --porcelain >actual && +# test_cmp expect actual +iuc() { + git ls-files -s >../current-index-entries + git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries + + GIT_INDEX_FILE=.git/tmp_index + export GIT_INDEX_FILE + git update-index --index-info <../current-index-entries + git update-index --skip-worktree $(cat ../current-sparse-entries) + + git -c core.untrackedCache=false "$@" + ret=$? + + rm ../current-index-entries + rm $GIT_INDEX_FILE + unset GIT_INDEX_FILE + + return $ret +} + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -95,6 +119,8 @@ test_expect_success 'status first time (empty cache)' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 3 @@ -115,6 +141,8 @@ test_expect_success 'status second time (fully populated cache)' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -136,6 +164,7 @@ test_expect_success 'modify in root directory, one dir invalidation' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -145,6 +174,7 @@ A two ?? four ?? three EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -183,6 +213,7 @@ test_expect_success 'new .gitignore invalidates recursively' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -192,6 +223,7 @@ A two ?? dtwo/ ?? three EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -230,6 +262,7 @@ test_expect_success 'new info/exclude invalidates everything' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -237,6 +270,7 @@ A two ?? .gitignore ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -286,6 +320,7 @@ test_expect_success 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -293,6 +328,7 @@ A one ?? dtwo/ ?? two EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -343,6 +379,7 @@ test_expect_success 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -350,6 +387,7 @@ A two ?? .gitignore ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -390,10 +428,12 @@ test_expect_success 'status after commit' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && ?? .gitignore ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -447,12 +487,14 @@ test_expect_success 'test sparse status with untracked cache' ' avoid_racy && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore ?? done/five ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 0 @@ -487,12 +529,14 @@ test_expect_success 'test sparse status again with untracked cache' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore ?? done/five ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 0 @@ -514,6 +558,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore @@ -521,6 +566,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' ?? done/sub/ ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 2 @@ -560,6 +606,8 @@ test_expect_success 'test sparse status again with untracked cache and subdir' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 0 @@ -573,6 +621,7 @@ EOF test_expect_success 'move entry in subdir from untracked to cached' ' git add dtwo/two && git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two A dtwo/two @@ -580,12 +629,14 @@ A dtwo/two ?? done/five ?? done/sub/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual ' test_expect_success 'move entry in subdir from cached to untracked' ' git rm --cached dtwo/two && git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore @@ -593,6 +644,7 @@ test_expect_success 'move entry in subdir from cached to untracked' ' ?? done/sub/ ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v4 1/7] t7063: more thorough status checking 2020-03-26 21:27 ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget @ 2020-03-27 13:09 ` Derrick Stolee 2020-03-29 18:18 ` Junio C Hamano 0 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-03-27 13:09 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 3/26/2020 5:27 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > It turns out the t7063 has some testcases that even without using the > untracked cache cover situations that nothing else in the testsuite > handles. Checking the results of > git status --porcelain > both with and without the untracked cache, and comparing both against > our expected results helped uncover a critical bug in some dir.c > restructuring. > > Unfortunately, it's not easy to run status and tell it to ignore the > untracked cache; the only knob we have it to instruct it to *delete* > (and ignore) the untracked cache. > > Create a simple helper that will create a clone of the index that is > missing the untracked cache bits, and use it to compare that the results > with the untracked cache match the results we get without the untracked > cache. > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++ > 1 file changed, 52 insertions(+) > > diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh > index 190ae149cf3..156d06c34e8 100755 > --- a/t/t7063-status-untracked-cache.sh > +++ b/t/t7063-status-untracked-cache.sh > @@ -30,6 +30,30 @@ status_is_clean() { > test_must_be_empty ../status.actual > } > > +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can > +# compare commands side-by-side, e.g. > +# iuc status --porcelain >expect && > +# git status --porcelain >actual && > +# test_cmp expect actual > +iuc() { > + git ls-files -s >../current-index-entries > + git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries > + > + GIT_INDEX_FILE=.git/tmp_index > + export GIT_INDEX_FILE > + git update-index --index-info <../current-index-entries > + git update-index --skip-worktree $(cat ../current-sparse-entries) > + > + git -c core.untrackedCache=false "$@" > + ret=$? > + > + rm ../current-index-entries > + rm $GIT_INDEX_FILE > + unset GIT_INDEX_FILE > + > + return $ret > +} This is a clever way to get around the untracked cache deletion. Thanks for adding these extra comparisons! It really does help guarantee that we are doing the right thing in each case. -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v4 1/7] t7063: more thorough status checking 2020-03-27 13:09 ` Derrick Stolee @ 2020-03-29 18:18 ` Junio C Hamano 2020-03-31 20:15 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Junio C Hamano @ 2020-03-29 18:18 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, git, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren Derrick Stolee <stolee@gmail.com> writes: >> +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can >> +# compare commands side-by-side, e.g. >> +# iuc status --porcelain >expect && >> +# git status --porcelain >actual && >> +# test_cmp expect actual ;-) >> +iuc() { Missing SP after "iuc". >> + git ls-files -s >../current-index-entries >> + git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries When you see yourself piping grep output to sed, think twice to see if you can lose one of them. sed -ne 's/^S.//p' perhaps? >> + >> + GIT_INDEX_FILE=.git/tmp_index >> + export GIT_INDEX_FILE >> + git update-index --index-info <../current-index-entries >> + git update-index --skip-worktree $(cat ../current-sparse-entries) Are the dances with ls-files and update-index to prepare us for a possible future in which we do not use .git/index as the index file, or something? IOW, would export GIT_INDEX_FILE=.git/tmp_index && cp .git/index "$GIT_INDEX_FILE && be insufficient? >> + >> + git -c core.untrackedCache=false "$@" >> + ret=$? >> + >> + rm ../current-index-entries >> + rm $GIT_INDEX_FILE >> + unset GIT_INDEX_FILE >> + >> + return $ret >> +} > > This is a clever way to get around the untracked cache deletion. > > Thanks for adding these extra comparisons! It really does help guarantee > that we are doing the right thing in each case. Yes, I think it is a great idea to see tested commands behave the same way with or without the untracked cache. Thanks. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v4 1/7] t7063: more thorough status checking 2020-03-29 18:18 ` Junio C Hamano @ 2020-03-31 20:15 ` Elijah Newren 0 siblings, 0 replies; 76+ messages in thread From: Elijah Newren @ 2020-03-31 20:15 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On Sun, Mar 29, 2020 at 11:18 AM Junio C Hamano <gitster@pobox.com> wrote: > > Derrick Stolee <stolee@gmail.com> writes: > > >> +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can > >> +# compare commands side-by-side, e.g. > >> +# iuc status --porcelain >expect && > >> +# git status --porcelain >actual && > >> +# test_cmp expect actual > > ;-) > > >> +iuc() { > > Missing SP after "iuc". Will fix. > >> + git ls-files -s >../current-index-entries > >> + git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries > > When you see yourself piping grep output to sed, think twice to see > if you can lose one of them. sed -ne 's/^S.//p' perhaps? Ooh, thanks. I have to admit that I don't know sed very well. In fact, 'sed -e s/pattern/replacement/' was the _only_ piece of sed I knew. But the -n flag and p modifier look handy; I think I ran across them in perl before as well. > >> + > >> + GIT_INDEX_FILE=.git/tmp_index > >> + export GIT_INDEX_FILE > >> + git update-index --index-info <../current-index-entries > >> + git update-index --skip-worktree $(cat ../current-sparse-entries) > > Are the dances with ls-files and update-index to prepare us for a > possible future in which we do not use .git/index as the index file, > or something? IOW, would > > export GIT_INDEX_FILE=.git/tmp_index && > cp .git/index "$GIT_INDEX_FILE && > > be insufficient? I guess it's a matter of perspective. Do we want to compare to how git behaves when there is no untracked cache (as I was trying to implement), or compare to how git behaves when there is an untracked cache and git is told to remove it? (The documentation for core.untrackedCache doesn't actually say when core.untrackedCache=false that git will ignore it, just that it will delete the untracked cache when that option is set. Perhaps if we do go the route with your alternative, we at least need to update the documentation as well and perhaps also audit the code to make sure it ignores the untracked cache as I'd expect? Or maybe we just need to run two operations, one to delete the untracked cache, and then the second that we are actually comparing to?) > >> + > >> + git -c core.untrackedCache=false "$@" > >> + ret=$? > >> + > >> + rm ../current-index-entries > >> + rm $GIT_INDEX_FILE > >> + unset GIT_INDEX_FILE > >> + > >> + return $ret > >> +} > > > > This is a clever way to get around the untracked cache deletion. > > > > Thanks for adding these extra comparisons! It really does help guarantee > > that we are doing the right thing in each case. > > Yes, I think it is a great idea to see tested commands behave the > same way with or without the untracked cache. Thanks. ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v4 2/7] dir: fix simple typo in comment 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget ` (6 subsequent siblings) 8 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index b460211e614..b505ba747bb 100644 --- a/dir.c +++ b/dir.c @@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir, * If 'stop_at_first_file' is specified, 'path_excluded' is returned * to signal that a file was found. This is the least significant value that * indicates that a file was encountered that does not depend on the order of - * whether an untracked or exluded path was encountered first. + * whether an untracked or excluded path was encountered first. * * Returns the most significant path_treatment value encountered in the scan. * If 'stop_at_first_file' is specified, `path_excluded` is the most -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget ` (5 subsequent siblings) 8 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 16e2cfa90993 ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit ad6f2157f951 ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 121 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/dir.c b/dir.c index b505ba747bb..d0f3d660850 100644 --- a/dir.c +++ b/dir.c @@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate, return dtype; } -static enum path_treatment treat_one_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec, - int dtype) -{ - int exclude; - int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); +static enum path_treatment treat_path_fast(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + strbuf_setlen(path, baselen); + if (!cdir->ucd) { + strbuf_addstr(path, cdir->file); + return path_untracked; + } + strbuf_addstr(path, cdir->ucd->name); + /* treat_one_path() does this before it calls treat_directory() */ + strbuf_complete(path, '/'); + if (cdir->ucd->check_only) + /* + * check_only is set as a result of treat_directory() getting + * to its bottom. Verify again the same set of directories + * with check_only set. + */ + return read_directory_recursive(dir, istate, path->buf, path->len, + cdir->ucd, 1, 0, pathspec); + /* + * We get path_recurse in the first run when + * directory_exists_in_index() returns index_nonexistent. We + * are sure that new changes in the index does not impact the + * outcome. Return now. + */ + return path_recurse; +} + +static enum path_treatment treat_path(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + int has_path_in_index, dtype, exclude; enum path_treatment path_treatment; - dtype = resolve_dtype(dtype, istate, path->buf, path->len); + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, + baselen, pathspec); + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) + return path_none; + strbuf_setlen(path, baselen); + strbuf_addstr(path, cdir->d_name); + if (simplify_away(path->buf, path->len, pathspec)) + return path_none; + + dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len); /* Always exclude indexed files */ + has_path_in_index = !!index_file_exists(istate, path->buf, path->len, + ignore_case); if (dtype != DT_DIR && has_path_in_index) return path_none; @@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, } } -static enum path_treatment treat_path_fast(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - strbuf_setlen(path, baselen); - if (!cdir->ucd) { - strbuf_addstr(path, cdir->file); - return path_untracked; - } - strbuf_addstr(path, cdir->ucd->name); - /* treat_one_path() does this before it calls treat_directory() */ - strbuf_complete(path, '/'); - if (cdir->ucd->check_only) - /* - * check_only is set as a result of treat_directory() getting - * to its bottom. Verify again the same set of directories - * with check_only set. - */ - return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, 0, pathspec); - /* - * We get path_recurse in the first run when - * directory_exists_in_index() returns index_nonexistent. We - * are sure that new changes in the index does not impact the - * outcome. Return now. - */ - return path_recurse; -} - -static enum path_treatment treat_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - if (!cdir->d_name) - return treat_path_fast(dir, untracked, cdir, istate, path, - baselen, pathspec); - if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) - return path_none; - strbuf_setlen(path, baselen); - strbuf_addstr(path, cdir->d_name); - if (simplify_away(path->buf, path->len, pathspec)) - return path_none; - - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, - cdir->d_type); -} - static void add_untracked(struct untracked_cache_dir *dir, const char *name) { if (!dir) -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v4 4/7] dir: fix broken comment 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2020-03-26 21:27 ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget ` (4 subsequent siblings) 8 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index d0f3d660850..3a367683661 100644 --- a/dir.c +++ b/dir.c @@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, add_untracked(untracked, path.buf + baselen); break; } - /* skip the dir_add_* part */ + /* skip the add_path_to_appropriate_result_list() */ continue; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v4 5/7] dir: fix confusion based on variable tense 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2020-03-26 21:27 ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 8 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index 3a367683661..8074e651e6f 100644 --- a/dir.c +++ b/dir.c @@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, static enum path_treatment treat_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked, - const char *dirname, int len, int baselen, int exclude, + const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { int nested_repo = 0; @@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } if (nested_repo) return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + (excluded ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; - if (exclude && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { /* * This is an excluded directory and we are @@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); @@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * the directory contains any files. */ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, exclude, pathspec); + untracked, 1, excluded, pathspec); } /* @@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int has_path_in_index, dtype, exclude; + int has_path_in_index, dtype, excluded; enum path_treatment path_treatment; if (!cdir->d_name) @@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent)) return path_none; - exclude = is_excluded(dir, istate, path->buf, &dtype); + excluded = is_excluded(dir, istate, path->buf, &dtype); /* * Excluded? If we don't explicitly want to show * ignored files, ignore it */ - if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) + if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) return path_excluded; switch (dtype) { @@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, strbuf_addch(path, '/'); path_treatment = treat_directory(dir, istate, untracked, path->buf, path->len, - baselen, exclude, pathspec); + baselen, excluded, pathspec); /* * If 1) we only want to return directories that * match an exclude pattern and 2) this directory does @@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, * recurse into this directory (instead of marking the * directory itself as an ignored path). */ - if (!exclude && + if (!excluded && path_treatment == path_excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) @@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_treatment; case DT_REG: case DT_LNK: - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; } } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2020-03-26 21:27 ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget @ 2020-03-26 21:27 ` Derrick Stolee via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget ` (2 subsequent siblings) 8 siblings, 0 replies; 76+ messages in thread From: Derrick Stolee via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/dir.c b/dir.c index 8074e651e6f..d9bcb7e19b6 100644 --- a/dir.c +++ b/dir.c @@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const struct pathspec *pathspec) { int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; + if (status != index_nonexistent) + BUG("Unhandled value for directory_exists_in_index: %d\n", status); - case index_nonexistent: - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; - strbuf_addstr(&sb, dirname); - nested_repo = is_nonbare_repository_dir(&sb); - strbuf_release(&sb); - } - if (nested_repo) - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (excluded ? path_excluded : path_untracked)); + if ((dir->flags & DIR_SKIP_NESTED_GIT) || + !(dir->flags & DIR_NO_GITLINKS)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + } + if (nested_repo) + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : + (excluded ? path_excluded : path_untracked)); - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { if (excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v4 7/7] dir: replace exponential algorithm with a linear one 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2020-03-26 21:27 ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget @ 2020-03-26 21:27 ` Elijah Newren via GitGitGadget 2020-03-27 13:13 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget 8 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-03-26 21:27 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. (Note that although this forces us to walk all untracked files underneath the directory as well, we strip them from the output, except for users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we make sure that read_directory_recursive() does not also recurse into that same directory. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for some directory and a later one would return e.g. path_none, despite the fact that the directory clearly should have been considered untracked. The code happened to work due to the side-effect from the first invocation of adding untracked entries to dir->entries; this allowed it to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. Such a hope seems likely to be ill-founded, given my experience with dir.c-related testcases in the last few months: Examples where the documentation was hard to parse or even just wrong: * 3aca58045f4f (git-clean.txt: do not claim we will delete files with -n/--dry-run, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) Examples where testcases were declared wrong and changed: * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within leading directories", 2019-12-10) Examples where testcases were clearly inadequate: * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an ignored file breakage, 2019-08-25) * 7541cc530239 (t7300: add testcases showing failure to clean specified pathspecs, 2019-09-17) * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, 2019-09-17) * 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * 452efd11fbf6 (t3011: demonstrate directory traversal failures, 2019-12-10) * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) Examples where "correct behavior" was unclear to everyone: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ Other commits of note: * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 For the above run, using strace I can look for the number of untracked directories opened and can verify that it matches the expected 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). After this fix, with strace I can verify that the number of untracked directories that are opened drops to just $depth, and the timings all drop to 0.00. In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 167 ++++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 121 insertions(+), 46 deletions(-) diff --git a/dir.c b/dir.c index d9bcb7e19b6..29283fc2588 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo = 0; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; + int nested_repo = 0, old_ignored_nr, check_only, stop_early; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -1711,18 +1717,117 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* + * We only need to recurse into untracked/ignored directories if + * either of the following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if + * the directory is empty) + */ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; + /* + * If we have we don't want to know the all the paths under an + * untracked or ignored directory, we still need to go into the + * directory to determine if it is empty (because an empty directory + * should be path_none instead of path_excluded or path_untracked). + */ + check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* + * However, there's another optimization possible as a subset of + * check_only, based on the cases we have to consider: + * A) Directory matches no exclude patterns: + * * Directory is empty => path_none + * * Directory has an untracked file under it => path_untracked + * * Directory has only ignored files under it => path_excluded + * B) Directory matches an exclude pattern: + * * Directory is empty => path_none + * * Directory has an untracked file under it => path_excluded + * * Directory has only ignored files under it => path_excluded + * In case A, we can exit as soon as we've found an untracked + * file but otherwise have to walk all files. In case B, though, + * we can stop at the first file we find under the directory. + */ + stop_early = check_only && excluded; + + /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files + * underneath. To do so, we'll save the current ignored_nr, and + * pop all the ones added after it if it turns out the entire + * directory is ignored. + */ + old_ignored_nr = dir->ignored_nr; + + /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, + check_only, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { + int i; + + /* + * When stop_early is set, read_directory_recursive() will + * never return path_untracked regardless of whether + * underlying paths were untracked or ignored (because + * returning early means it excluded some paths, or + * something like that -- see commit 5aaa7fd39aaf ("Improve + * performance of git status --ignored", 2017-09-18)). + * However, we're not really concerned with the status of + * files under the directory, we just wanted to know + * whether the directory was empty (state == path_none) or + * not (state == path_excluded), and if not, we'd return + * our original status based on whether the untracked + * directory matched an exclusion pattern. + */ + if (stop_early) + state = excluded ? path_excluded : path_untracked; + + else { + /* + * When + * !stop_early && state == path_excluded + * then all paths under dirname were ignored. For + * this case, git status --porcelain wants to just + * list the directory itself as ignored and not + * list the individual paths underneath. Remove + * the individual paths underneath. + */ + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) + free(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } + + /* + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. + */ + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; /* - * If this is an excluded directory, then we only need to check if - * the directory contains any files. + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. */ - return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, excluded, pathspec); + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; + + return state; } /* @@ -1870,6 +1975,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); @@ -2175,14 +2285,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, int stop_at_first_file, const struct pathspec *pathspec) { /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * WARNING: Do NOT recurse unless path_recurse is returned from + * treat_path(). Recursing on any other return value + * can result in exponential slowdown. */ - struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; struct strbuf path = STRBUF_INIT; @@ -2204,13 +2310,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, dir_state = state; /* recurse into subdir if instructed by treat_path */ - if ((state == path_recurse) || - ((state == path_untracked) && - (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && - ((dir->flags & DIR_SHOW_IGNORED_TOO) || - (pathspec && - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { + if (state == path_recurse) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -2294,15 +2394,6 @@ static int treat_leading_path(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec) { - /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in read_directory_recursive(). See 777b420347 (dir: - * synchronize treat_leading_path() and read_directory_recursive(), - * 2019-12-19) and its parent commit for details. - */ - struct strbuf sb = STRBUF_INIT; struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; @@ -2353,23 +2444,7 @@ static int treat_leading_path(struct dir_struct *dir, strbuf_reset(&subdir); strbuf_add(&subdir, path+prevlen, baselen-prevlen); cdir.d_name = subdir.buf; - state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, - pathspec); - if (state == path_untracked && - resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && - (dir->flags & DIR_SHOW_IGNORED_TOO || - do_match_pathspec(istate, pathspec, sb.buf, sb.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { - if (!match_pathspec(istate, pathspec, sb.buf, sb.len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - add_path_to_appropriate_result_list(dir, NULL, &cdir, - istate, - &sb, baselen, - pathspec, state); - state = path_recurse; - } + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state != path_recurse) break; /* do not recurse into it */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2020-03-26 21:27 ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget @ 2020-03-27 13:13 ` Derrick Stolee 2020-03-28 17:33 ` Elijah Newren 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget 8 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-03-27 13:13 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 3/26/2020 5:27 PM, Elijah Newren via GitGitGadget wrote: > This series provides some "modest" speedups (see last commit message), and > should allow 'git status --ignored' to complete in a more reasonable > timeframe for Martin Melka (see > https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ > ). > > Changes since v3: > > * Turns out I was wrong about the untracked cache stuff and had some bugs > around untracked directories with nothing bug ignored sub-entries. > * First patch now is no longer a change of expectation of the untracked > cache, but some more thorough testing/verification in that test that > helped explain my misunderstanding and uncover the bug in my refactor. > * Corrected the check_only and stop_at_first_file logic in the last patch > and added a big comment explaining how/why it all works. Also stopped > disabling part of the untracked cache in the same patch, and undid all > the changes to t7063 in that patch. > > Stuff still missing from v4: > > * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in > https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ > which I think would make the code cleaner & clearer. I guess I'm leaving > that for future work. > > As per the commit message of the final patch, this series has some risk. > Extra eyes would be greatly appreciated; one pair already helped me find one > bug. I'm glad that I could help you discover mixed expectations. This pair of eyes is now satisfied with this series to the extent I can check it. Adding the previous patch to our microsoft/git fork pass the functional tests in Scalar and VFS for Git, for what it's worth: [1] https://github.com/microsoft/scalar/pull/358 [2] https://github.com/microsoft/VFSForGit/pull/1646 Thanks, -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-03-27 13:13 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee @ 2020-03-28 17:33 ` Elijah Newren 2020-03-29 18:20 ` Junio C Hamano 0 siblings, 1 reply; 76+ messages in thread From: Elijah Newren @ 2020-03-28 17:33 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On Fri, Mar 27, 2020 at 6:13 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/26/2020 5:27 PM, Elijah Newren via GitGitGadget wrote: > > This series provides some "modest" speedups (see last commit message), and > > should allow 'git status --ignored' to complete in a more reasonable > > timeframe for Martin Melka (see > > https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ > > ). > > > > Changes since v3: > > > > * Turns out I was wrong about the untracked cache stuff and had some bugs > > around untracked directories with nothing bug ignored sub-entries. > > * First patch now is no longer a change of expectation of the untracked > > cache, but some more thorough testing/verification in that test that > > helped explain my misunderstanding and uncover the bug in my refactor. > > * Corrected the check_only and stop_at_first_file logic in the last patch > > and added a big comment explaining how/why it all works. Also stopped > > disabling part of the untracked cache in the same patch, and undid all > > the changes to t7063 in that patch. > > > > Stuff still missing from v4: > > > > * I didn't make the DIR_KEEP_UNTRACKED_CONTENTS changes I mentioned in > > https://lore.kernel.org/git/CABPp-BEQ5s=+6Rnb-A+pdEaoPXxfo-hMSegSe1eai=RE74A3Og@mail.gmail.com/ > > which I think would make the code cleaner & clearer. I guess I'm leaving > > that for future work. > > > > As per the commit message of the final patch, this series has some risk. > > Extra eyes would be greatly appreciated; one pair already helped me find one > > bug. > > I'm glad that I could help you discover mixed expectations. This pair of eyes > is now satisfied with this series to the extent I can check it. > > Adding the previous patch to our microsoft/git fork pass the functional tests > in Scalar and VFS for Git, for what it's worth: > > [1] https://github.com/microsoft/scalar/pull/358 > [2] https://github.com/microsoft/VFSForGit/pull/1646 Thanks, that helps. An update of my own for this series: Based on Felipe's reported bash-completion issue I was modifying commands to try out a number of other things and discovered some cases that can trigger the die("git ls-files: internal error - directory entry not superset of prefix") message from ls-files so there's still some fixes I need to make. Will send an update when I've got it. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-03-28 17:33 ` Elijah Newren @ 2020-03-29 18:20 ` Junio C Hamano 0 siblings, 0 replies; 76+ messages in thread From: Junio C Hamano @ 2020-03-29 18:20 UTC (permalink / raw) To: Elijah Newren Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy Elijah Newren <newren@gmail.com> writes: > An update of my own for this series: Based on Felipe's reported > bash-completion issue I was modifying commands to try out a number of > other things and discovered some cases that can trigger the die("git > ls-files: internal error - directory entry not superset of prefix") > message from ls-files so there's still some fixes I need to make. > Will send an update when I've got it. Thanks. This is uncomfortably exciting ;-) ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v5 00/12] Avoid multiple recursive calls for same path in read_directory_recursive() 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget ` (7 preceding siblings ...) 2020-03-27 13:13 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget ` (11 more replies) 8 siblings, 12 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren This series provides some "modest" speedups (see commit message for patch 8), and should allow 'git status --ignored' to complete in a more reasonable timeframe for Martin Melka (see https://lore.kernel.org/git/CANt4O2L_DZnMqVxZzTBMvr=BTWqB6L0uyORkoN_yMHLmUX7yHw@mail.gmail.com/ ). It also cleans up the fill_directory() code and API, and fixes bash-completion for 'git add untracked-dir/'. Changes since v4: * cleanups suggested by Junio (patch 1) * new testcases that would have displayed multiple bugs with v4 (patch 2) * fixed the bugs with v4 (look for LEADING_PATHSPEC in patch 8) * fixed ANOTHER exponential slowdown codepath (look for MODE_MATCHING in patch 8) * make DIR_KEEP_UNTRACKED_CONTENTS less of a weird one-off (patch 9) * reduce number of calls to [do_]match_pathspec() (patch 10) * fix error-proneness of fill_directory() API (patch 11) * fix bash-completion results for 'git add' on an untracked dir (patch 12) This is one of those rare patchsets that is absolutely perfect and risk-free. That's right, bask in their glory and the ease of conscience from using such solid stuff. Using this series will even innoculate you from bugs outside of dir.c, and ones external to git, and even bugs external to your computer. It's just that good. Pay no attention to the man behind the curtain, er, I mean the huge warnings in patch 8, er...I mean what warnings? There's no warnings to view, this stuff is solid as can be. But if an extra pair of eyes wants to look at commit message in patch 8, or at the new patches (2 and 9-12) and opine on how perfect everything looks and feels, be my guest. Derrick Stolee (1): dir: refactor treat_directory to clarify control flow Elijah Newren (11): t7063: more thorough status checking t3000: add more testcases testing a variety of ls-files issues dir: fix simple typo in comment dir: consolidate treat_path() and treat_one_path() dir: fix broken comment dir: fix confusion based on variable tense dir: replace exponential algorithm with a linear one dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() dir: replace double pathspec matching with single in treat_directory() Fix error-prone fill_directory() API; make it only return matches completion: fix 'git add' on paths under an untracked directory builtin/clean.c | 6 - builtin/grep.c | 2 - builtin/ls-files.c | 5 +- builtin/stash.c | 17 +- contrib/completion/git-completion.bash | 2 +- dir.c | 422 +++++++++++++++---------- t/t3000-ls-files-others.sh | 121 +++++++ t/t7063-status-untracked-cache.sh | 52 +++ t/t9902-completion.sh | 5 + wt-status.c | 6 +- 10 files changed, 437 insertions(+), 201 deletions(-) base-commit: 0cbb60574e741e8255ba457606c4c90898cfc755 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-700%2Fnewren%2Ffill-directory-exponential-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-700/newren/fill-directory-exponential-v5 Pull-Request: https://github.com/git/git/pull/700 Range-diff vs v4: 1: 752403e339b ! 1: e2704245854 t7063: more thorough status checking @@ -11,8 +11,10 @@ restructuring. Unfortunately, it's not easy to run status and tell it to ignore the - untracked cache; the only knob we have it to instruct it to *delete* - (and ignore) the untracked cache. + untracked cache; the only knob we have is core.untrackedCache=false, + which is used to instruct git to *delete* the untracked cache (which + might also ignore the untracked cache when it operates, but that isn't + specified in the docs). Create a simple helper that will create a clone of the index that is missing the untracked cache bits, and use it to compare that the results @@ -33,9 +35,9 @@ +# iuc status --porcelain >expect && +# git status --porcelain >actual && +# test_cmp expect actual -+iuc() { ++iuc () { + git ls-files -s >../current-index-entries -+ git ls-files -t | grep ^S | sed -e s/^S.// >../current-sparse-entries ++ git ls-files -t | sed -ne s/^S.//p >../current-sparse-entries + + GIT_INDEX_FILE=.git/tmp_index + export GIT_INDEX_FILE -: ----------- > 2: 88e9d5d5dbd t3000: add more testcases testing a variety of ls-files issues 2: a4287d690be = 3: 38d4d5a46b1 dir: fix simple typo in comment 3: 48f37e5b114 = 4: eeb38a25f3a dir: consolidate treat_path() and treat_one_path() 4: b5ad1939379 = 5: 6e29f1f6aec dir: fix broken comment 5: 2603c1a9d13 = 6: 62dae938c8f dir: fix confusion based on variable tense 6: 576f364329d = 7: 25921cb792e dir: refactor treat_directory to clarify control flow 7: e20525429e5 ! 8: b2caa426790 dir: replace exponential algorithm with a linear one @@ -187,8 +187,23 @@ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* -+ * We only need to recurse into untracked/ignored directories if -+ * either of the following bits is set: ++ * If we have a pathspec which could match something _below_ this ++ * directory (e.g. when checking 'subdir/' having a pathspec like ++ * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we ++ * need to recurse. ++ */ ++ if (pathspec) { ++ int ret = do_match_pathspec(istate, pathspec, dirname, len, ++ 0 /* prefix */, NULL /* seen */, ++ DO_MATCH_LEADING_PATHSPEC); ++ if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC) ++ return path_recurse; ++ } ++ ++ /* ++ * Other than the path_recurse case immediately above, we only need ++ * to recurse into untracked/ignored directories if either of the ++ * following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if @@ -197,6 +212,16 @@ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; ++ /* ++ * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid ++ * recursing into ignored directories if the path is excluded and ++ * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. ++ */ ++ if (excluded && ++ (dir->flags & DIR_SHOW_IGNORED_TOO) && ++ (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) ++ return path_excluded; ++ + /* + * If we have we don't want to know the all the paths under an + * untracked or ignored directory, we still need to go into the @@ -241,59 +266,52 @@ + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { -+ int i; -+ -+ /* -+ * When stop_early is set, read_directory_recursive() will -+ * never return path_untracked regardless of whether -+ * underlying paths were untracked or ignored (because -+ * returning early means it excluded some paths, or -+ * something like that -- see commit 5aaa7fd39aaf ("Improve -+ * performance of git status --ignored", 2017-09-18)). -+ * However, we're not really concerned with the status of -+ * files under the directory, we just wanted to know -+ * whether the directory was empty (state == path_none) or -+ * not (state == path_excluded), and if not, we'd return -+ * our original status based on whether the untracked -+ * directory matched an exclusion pattern. ++ /* state == path_excluded implies all paths under ++ * dirname were ignored... ++ * ++ * if running e.g. `git status --porcelain --ignored=matching`, ++ * then we want to see the subpaths that are ignored. ++ * ++ * if running e.g. just `git status --porcelain`, then ++ * we just want the directory itself to be listed as ignored ++ * and not the individual paths underneath. + */ -+ if (stop_early) -+ state = excluded ? path_excluded : path_untracked; ++ int want_ignored_subpaths = ++ ((dir->flags & DIR_SHOW_IGNORED_TOO) && ++ (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)); + -+ else { ++ if (want_ignored_subpaths) { + /* -+ * When -+ * !stop_early && state == path_excluded -+ * then all paths under dirname were ignored. For -+ * this case, git status --porcelain wants to just -+ * list the directory itself as ignored and not -+ * list the individual paths underneath. Remove -+ * the individual paths underneath. ++ * with --ignored=matching, we want the subpaths ++ * INSTEAD of the directory itself. + */ ++ state = path_none; ++ } else { ++ int i; + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) -+ free(dir->ignored[i]); ++ FREE_AND_NULL(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } -+ -+ /* + + /* +- * If this is an excluded directory, then we only need to check if +- * the directory contains any files. + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. -+ */ + */ +- return read_directory_recursive(dir, istate, dirname, len, +- untracked, 1, excluded, pathspec); + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; - - /* -- * If this is an excluded directory, then we only need to check if -- * the directory contains any files. ++ ++ /* + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. - */ -- return read_directory_recursive(dir, istate, dirname, len, -- untracked, 1, excluded, pathspec); ++ */ + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, @@ -316,6 +334,47 @@ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); +@@ + const struct pathspec *pathspec) + { + int has_path_in_index, dtype, excluded; +- enum path_treatment path_treatment; + + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, +@@ + default: + return path_none; + case DT_DIR: +- strbuf_addch(path, '/'); +- path_treatment = treat_directory(dir, istate, untracked, +- path->buf, path->len, +- baselen, excluded, pathspec); + /* +- * If 1) we only want to return directories that +- * match an exclude pattern and 2) this directory does +- * not match an exclude pattern but all of its +- * contents are excluded, then indicate that we should +- * recurse into this directory (instead of marking the +- * directory itself as an ignored path). ++ * WARNING: Do not ignore/amend the return value from ++ * treat_directory(), and especially do not change it to return ++ * path_recurse as that can cause exponential slowdown. ++ * Instead, modify treat_directory() to return the right value. + */ +- if (!excluded && +- path_treatment == path_excluded && +- (dir->flags & DIR_SHOW_IGNORED_TOO) && +- (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) +- return path_recurse; +- return path_treatment; ++ strbuf_addch(path, '/'); ++ return treat_directory(dir, istate, untracked, ++ path->buf, path->len, ++ baselen, excluded, pathspec); + case DT_REG: + case DT_LNK: + return excluded ? path_excluded : path_untracked; @@ int stop_at_first_file, const struct pathspec *pathspec) { -: ----------- > 9: 08a10869816 dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() -: ----------- > 10: cee74871e43 dir: replace double pathspec matching with single in treat_directory() -: ----------- > 11: 61d9c9d758e Fix error-prone fill_directory() API; make it only return matches -: ----------- > 12: 725adf0a9b8 completion: fix 'git add' on paths under an untracked directory -- gitgitgadget ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v5 01/12] t7063: more thorough status checking 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget ` (10 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> It turns out the t7063 has some testcases that even without using the untracked cache cover situations that nothing else in the testsuite handles. Checking the results of git status --porcelain both with and without the untracked cache, and comparing both against our expected results helped uncover a critical bug in some dir.c restructuring. Unfortunately, it's not easy to run status and tell it to ignore the untracked cache; the only knob we have is core.untrackedCache=false, which is used to instruct git to *delete* the untracked cache (which might also ignore the untracked cache when it operates, but that isn't specified in the docs). Create a simple helper that will create a clone of the index that is missing the untracked cache bits, and use it to compare that the results with the untracked cache match the results we get without the untracked cache. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7063-status-untracked-cache.sh | 52 +++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 190ae149cf3..69c39ff2e49 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -30,6 +30,30 @@ status_is_clean() { test_must_be_empty ../status.actual } +# Ignore_Untracked_Cache, abbreviated to 3 letters because then people can +# compare commands side-by-side, e.g. +# iuc status --porcelain >expect && +# git status --porcelain >actual && +# test_cmp expect actual +iuc () { + git ls-files -s >../current-index-entries + git ls-files -t | sed -ne s/^S.//p >../current-sparse-entries + + GIT_INDEX_FILE=.git/tmp_index + export GIT_INDEX_FILE + git update-index --index-info <../current-index-entries + git update-index --skip-worktree $(cat ../current-sparse-entries) + + git -c core.untrackedCache=false "$@" + ret=$? + + rm ../current-index-entries + rm $GIT_INDEX_FILE + unset GIT_INDEX_FILE + + return $ret +} + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -95,6 +119,8 @@ test_expect_success 'status first time (empty cache)' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 3 @@ -115,6 +141,8 @@ test_expect_success 'status second time (fully populated cache)' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -136,6 +164,7 @@ test_expect_success 'modify in root directory, one dir invalidation' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -145,6 +174,7 @@ A two ?? four ?? three EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -183,6 +213,7 @@ test_expect_success 'new .gitignore invalidates recursively' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -192,6 +223,7 @@ A two ?? dtwo/ ?? three EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -230,6 +262,7 @@ test_expect_success 'new info/exclude invalidates everything' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -237,6 +270,7 @@ A two ?? .gitignore ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -286,6 +320,7 @@ test_expect_success 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -293,6 +328,7 @@ A one ?? dtwo/ ?? two EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -343,6 +379,7 @@ test_expect_success 'status after the move' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && A done/one A one @@ -350,6 +387,7 @@ A two ?? .gitignore ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -390,10 +428,12 @@ test_expect_success 'status after commit' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && ?? .gitignore ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && cat >../trace.expect <<EOF && node creation: 0 @@ -447,12 +487,14 @@ test_expect_success 'test sparse status with untracked cache' ' avoid_racy && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore ?? done/five ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 0 @@ -487,12 +529,14 @@ test_expect_success 'test sparse status again with untracked cache' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore ?? done/five ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 0 @@ -514,6 +558,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore @@ -521,6 +566,7 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' ?? done/sub/ ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 2 @@ -560,6 +606,8 @@ test_expect_success 'test sparse status again with untracked cache and subdir' ' : >../trace && GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && cat >../trace.expect <<EOF && node creation: 0 @@ -573,6 +621,7 @@ EOF test_expect_success 'move entry in subdir from untracked to cached' ' git add dtwo/two && git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two A dtwo/two @@ -580,12 +629,14 @@ A dtwo/two ?? done/five ?? done/sub/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual ' test_expect_success 'move entry in subdir from cached to untracked' ' git rm --cached dtwo/two && git status --porcelain >../status.actual && + iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && M done/two ?? .gitignore @@ -593,6 +644,7 @@ test_expect_success 'move entry in subdir from cached to untracked' ' ?? done/sub/ ?? dtwo/ EOF + test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget ` (9 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> This adds seven new ls-files tests. While currently all seven test pass, my earlier rounds of restructuring dir.c to replace an exponential algorithm with a linear one passed all the tests in the testsuite but failed six of these seven new tests. Add these tests to increase our case coverage. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t3000-ls-files-others.sh | 121 +++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) diff --git a/t/t3000-ls-files-others.sh b/t/t3000-ls-files-others.sh index 0aefadacb05..ffdfb16f580 100755 --- a/t/t3000-ls-files-others.sh +++ b/t/t3000-ls-files-others.sh @@ -91,4 +91,125 @@ test_expect_success SYMLINKS 'ls-files --others with symlinked submodule' ' test_cmp expect actual ' +test_expect_success 'setup nested pathspec search' ' + test_create_repo nested && + ( + cd nested && + + mkdir -p partially_tracked/untracked_dir && + > partially_tracked/content && + > partially_tracked/untracked_dir/file && + + mkdir -p untracked/deep && + > untracked/deep/path && + > untracked/deep/foo.c && + + git add partially_tracked/content + ) +' + +test_expect_success 'ls-files -o --directory with single deep dir pathspec' ' + ( + cd nested && + + git ls-files -o --directory untracked/deep/ >actual && + + cat <<-EOF >expect && + untracked/deep/ + EOF + + test_cmp expect actual + ) +' + +test_expect_success 'ls-files -o --directory with multiple dir pathspecs' ' + ( + cd nested && + + git ls-files -o --directory partially_tracked/ untracked/ >actual && + + cat <<-EOF >expect && + partially_tracked/untracked_dir/ + untracked/ + EOF + + test_cmp expect actual + ) +' + +test_expect_success 'ls-files -o --directory with mix dir/file pathspecs' ' + ( + cd nested && + + git ls-files -o --directory partially_tracked/ untracked/deep/path >actual && + + cat <<-EOF >expect && + partially_tracked/untracked_dir/ + untracked/deep/path + EOF + + test_cmp expect actual + ) +' + +test_expect_success 'ls-files --o --directory with glob filetype match' ' + ( + cd nested && + + # globs kinda defeat --directory, but only for that pathspec + git ls-files --others --directory partially_tracked "untracked/*.c" >actual && + + cat <<-EOF >expect && + partially_tracked/untracked_dir/ + untracked/deep/foo.c + EOF + + test_cmp expect actual + ) +' + +test_expect_success 'ls-files --o --directory with mix of tracked states' ' + ( + cd nested && + + # globs kinda defeat --directory, but only for that pathspec + git ls-files --others --directory partially_tracked/ "untracked/?*" >actual && + + cat <<-EOF >expect && + partially_tracked/untracked_dir/ + untracked/deep/ + EOF + + test_cmp expect actual + ) +' + +test_expect_success 'ls-files --o --directory with glob filetype match only' ' + ( + cd nested && + + git ls-files --others --directory "untracked/*.c" >actual && + + cat <<-EOF >expect && + untracked/deep/foo.c + EOF + + test_cmp expect actual + ) +' + +test_expect_success 'ls-files --o --directory to get immediate paths under one dir only' ' + ( + cd nested && + + git ls-files --others --directory "untracked/?*" >actual && + + cat <<-EOF >expect && + untracked/deep/ + EOF + + test_cmp expect actual + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 03/12] dir: fix simple typo in comment 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget ` (8 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index b460211e614..b505ba747bb 100644 --- a/dir.c +++ b/dir.c @@ -2174,7 +2174,7 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir, * If 'stop_at_first_file' is specified, 'path_excluded' is returned * to signal that a file was found. This is the least significant value that * indicates that a file was encountered that does not depend on the order of - * whether an untracked or exluded path was encountered first. + * whether an untracked or excluded path was encountered first. * * Returns the most significant path_treatment value encountered in the scan. * If 'stop_at_first_file' is specified, `path_excluded` is the most -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget ` (7 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Commit 16e2cfa90993 ("read_directory(): further split treat_path()", 2010-01-08) split treat_one_path() out of treat_path(), because treat_leading_path() would not have access to a dirent but wanted to re-use as much of treat_path() as possible. Not re-using all of treat_path() caused other bugs, as noted in commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19). Finally, in commit ad6f2157f951 ("dir: restructure in a way to avoid passing around a struct dirent", 2020-01-16), dirents were removed from treat_path() and other functions entirely. Since the only reason for splitting these functions was the lack of a dirent -- which no longer applies to either function -- and since the split caused problems in the past resulting in us not using treat_one_path() separately anymore, just undo the split. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 121 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 55 insertions(+), 66 deletions(-) diff --git a/dir.c b/dir.c index b505ba747bb..d0f3d660850 100644 --- a/dir.c +++ b/dir.c @@ -1863,21 +1863,65 @@ static int resolve_dtype(int dtype, struct index_state *istate, return dtype; } -static enum path_treatment treat_one_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec, - int dtype) -{ - int exclude; - int has_path_in_index = !!index_file_exists(istate, path->buf, path->len, ignore_case); +static enum path_treatment treat_path_fast(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + strbuf_setlen(path, baselen); + if (!cdir->ucd) { + strbuf_addstr(path, cdir->file); + return path_untracked; + } + strbuf_addstr(path, cdir->ucd->name); + /* treat_one_path() does this before it calls treat_directory() */ + strbuf_complete(path, '/'); + if (cdir->ucd->check_only) + /* + * check_only is set as a result of treat_directory() getting + * to its bottom. Verify again the same set of directories + * with check_only set. + */ + return read_directory_recursive(dir, istate, path->buf, path->len, + cdir->ucd, 1, 0, pathspec); + /* + * We get path_recurse in the first run when + * directory_exists_in_index() returns index_nonexistent. We + * are sure that new changes in the index does not impact the + * outcome. Return now. + */ + return path_recurse; +} + +static enum path_treatment treat_path(struct dir_struct *dir, + struct untracked_cache_dir *untracked, + struct cached_dir *cdir, + struct index_state *istate, + struct strbuf *path, + int baselen, + const struct pathspec *pathspec) +{ + int has_path_in_index, dtype, exclude; enum path_treatment path_treatment; - dtype = resolve_dtype(dtype, istate, path->buf, path->len); + if (!cdir->d_name) + return treat_path_fast(dir, untracked, cdir, istate, path, + baselen, pathspec); + if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) + return path_none; + strbuf_setlen(path, baselen); + strbuf_addstr(path, cdir->d_name); + if (simplify_away(path->buf, path->len, pathspec)) + return path_none; + + dtype = resolve_dtype(cdir->d_type, istate, path->buf, path->len); /* Always exclude indexed files */ + has_path_in_index = !!index_file_exists(istate, path->buf, path->len, + ignore_case); if (dtype != DT_DIR && has_path_in_index) return path_none; @@ -1942,61 +1986,6 @@ static enum path_treatment treat_one_path(struct dir_struct *dir, } } -static enum path_treatment treat_path_fast(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - strbuf_setlen(path, baselen); - if (!cdir->ucd) { - strbuf_addstr(path, cdir->file); - return path_untracked; - } - strbuf_addstr(path, cdir->ucd->name); - /* treat_one_path() does this before it calls treat_directory() */ - strbuf_complete(path, '/'); - if (cdir->ucd->check_only) - /* - * check_only is set as a result of treat_directory() getting - * to its bottom. Verify again the same set of directories - * with check_only set. - */ - return read_directory_recursive(dir, istate, path->buf, path->len, - cdir->ucd, 1, 0, pathspec); - /* - * We get path_recurse in the first run when - * directory_exists_in_index() returns index_nonexistent. We - * are sure that new changes in the index does not impact the - * outcome. Return now. - */ - return path_recurse; -} - -static enum path_treatment treat_path(struct dir_struct *dir, - struct untracked_cache_dir *untracked, - struct cached_dir *cdir, - struct index_state *istate, - struct strbuf *path, - int baselen, - const struct pathspec *pathspec) -{ - if (!cdir->d_name) - return treat_path_fast(dir, untracked, cdir, istate, path, - baselen, pathspec); - if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git")) - return path_none; - strbuf_setlen(path, baselen); - strbuf_addstr(path, cdir->d_name); - if (simplify_away(path->buf, path->len, pathspec)) - return path_none; - - return treat_one_path(dir, untracked, istate, path, baselen, pathspec, - cdir->d_type); -} - static void add_untracked(struct untracked_cache_dir *dir, const char *name) { if (!dir) -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 05/12] dir: fix broken comment 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget ` (6 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index d0f3d660850..3a367683661 100644 --- a/dir.c +++ b/dir.c @@ -2259,7 +2259,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, add_untracked(untracked, path.buf + baselen); break; } - /* skip the dir_add_* part */ + /* skip the add_path_to_appropriate_result_list() */ continue; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 06/12] dir: fix confusion based on variable tense 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget ` (5 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Despite having contributed several fixes in this area, I have for months (years?) assumed that the "exclude" variable was a directive; this caused me to think of it as a different mode we operate in and left me confused as I tried to build up a mental model around why we'd need such a directive. I mostly tried to ignore it while focusing on the pieces I was trying to understand. Then I finally traced this variable all back to a call to is_excluded(), meaning it was actually functioning as an adjective. In particular, it was a checked property ("Does this path match a rule in .gitignore?"), rather than a mode passed in from the caller. Change the variable name to match the part of speech used by the function called to define it, which will hopefully make these bits of code slightly clearer to the next reader. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/dir.c b/dir.c index 3a367683661..8074e651e6f 100644 --- a/dir.c +++ b/dir.c @@ -1656,7 +1656,7 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, static enum path_treatment treat_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked, - const char *dirname, int len, int baselen, int exclude, + const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { int nested_repo = 0; @@ -1679,13 +1679,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } if (nested_repo) return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (exclude ? path_excluded : path_untracked)); + (excluded ? path_excluded : path_untracked)); if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) break; - if (exclude && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { /* * This is an excluded directory and we are @@ -1713,7 +1713,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); @@ -1723,7 +1723,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * the directory contains any files. */ return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, exclude, pathspec); + untracked, 1, excluded, pathspec); } /* @@ -1904,7 +1904,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { - int has_path_in_index, dtype, exclude; + int has_path_in_index, dtype, excluded; enum path_treatment path_treatment; if (!cdir->d_name) @@ -1949,13 +1949,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, (directory_exists_in_index(istate, path->buf, path->len) == index_nonexistent)) return path_none; - exclude = is_excluded(dir, istate, path->buf, &dtype); + excluded = is_excluded(dir, istate, path->buf, &dtype); /* * Excluded? If we don't explicitly want to show * ignored files, ignore it */ - if (exclude && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) + if (excluded && !(dir->flags & (DIR_SHOW_IGNORED|DIR_SHOW_IGNORED_TOO))) return path_excluded; switch (dtype) { @@ -1965,7 +1965,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, strbuf_addch(path, '/'); path_treatment = treat_directory(dir, istate, untracked, path->buf, path->len, - baselen, exclude, pathspec); + baselen, excluded, pathspec); /* * If 1) we only want to return directories that * match an exclude pattern and 2) this directory does @@ -1974,7 +1974,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, * recurse into this directory (instead of marking the * directory itself as an ignored path). */ - if (!exclude && + if (!excluded && path_treatment == path_excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) @@ -1982,7 +1982,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_treatment; case DT_REG: case DT_LNK: - return exclude ? path_excluded : path_untracked; + return excluded ? path_excluded : path_untracked; } } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Derrick Stolee via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget ` (4 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Derrick Stolee via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Derrick Stolee From: Derrick Stolee <dstolee@microsoft.com> The logic in treat_directory() is handled by a multi-case switch statement, but this switch is very asymmetrical, as the first two cases are simple but the third is more complicated than the rest of the method. In fact, the third case includes a "break" statement that leads to the block of code outside the switch statement. That is the only way to reach that block, as the switch handles all possible values from directory_exists_in_index(); Extract the switch statement into a series of "if" statements. This simplifies the trivial cases, while clarifying how to reach the "show_other_directories" case. This is particularly important as the "show_other_directories" case will expand in a later change. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/dir.c b/dir.c index 8074e651e6f..d9bcb7e19b6 100644 --- a/dir.c +++ b/dir.c @@ -1660,29 +1660,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const struct pathspec *pathspec) { int nested_repo = 0; - /* The "len-1" is to strip the final '/' */ - switch (directory_exists_in_index(istate, dirname, len-1)) { - case index_directory: - return path_recurse; + enum exist_status status = directory_exists_in_index(istate, dirname, len-1); - case index_gitdir: + if (status == index_directory) + return path_recurse; + if (status == index_gitdir) return path_none; + if (status != index_nonexistent) + BUG("Unhandled value for directory_exists_in_index: %d\n", status); - case index_nonexistent: - if ((dir->flags & DIR_SKIP_NESTED_GIT) || - !(dir->flags & DIR_NO_GITLINKS)) { - struct strbuf sb = STRBUF_INIT; - strbuf_addstr(&sb, dirname); - nested_repo = is_nonbare_repository_dir(&sb); - strbuf_release(&sb); - } - if (nested_repo) - return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : - (excluded ? path_excluded : path_untracked)); + if ((dir->flags & DIR_SKIP_NESTED_GIT) || + !(dir->flags & DIR_NO_GITLINKS)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(&sb, dirname); + nested_repo = is_nonbare_repository_dir(&sb); + strbuf_release(&sb); + } + if (nested_repo) + return ((dir->flags & DIR_SKIP_NESTED_GIT) ? path_none : + (excluded ? path_excluded : path_untracked)); - if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES) - break; + if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) { if (excluded && (dir->flags & DIR_SHOW_IGNORED_TOO) && (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) { -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 08/12] dir: replace exponential algorithm with a linear one 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 13:57 ` Derrick Stolee 2020-04-01 4:17 ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget ` (3 subsequent siblings) 11 siblings, 1 reply; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> dir's read_directory_recursive() naturally operates recursively in order to walk the directory tree. Treating of directories is sometimes weird because there are so many different permutations about how to handle directories. Some examples: * 'git ls-files -o --directory' only needs to know that a directory itself is untracked; it doesn't need to recurse into it to see what is underneath. * 'git status' needs to recurse into an untracked directory, but only to determine whether or not it is empty. If there are no files underneath, the directory itself will be omitted from the output. If it is not empty, only the directory will be listed. * 'git status --ignored' needs to recurse into untracked directories and report all the ignored entries and then report the directory as untracked -- UNLESS all the entries under the directory are ignored, in which case we don't print any of the entries under the directory and just report the directory itself as ignored. (Note that although this forces us to walk all untracked files underneath the directory as well, we strip them from the output, except for users like 'git clean' who also set DIR_KEEP_TRACKED_CONTENTS.) * For 'git clean', we may need to recurse into a directory that doesn't match any specified pathspecs, if it's possible that there is an entry underneath the directory that can match one of the pathspecs. In such a case, we need to be careful to omit the directory itself from the list of paths (see commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17)) Part of the tension noted above is that the treatment of a directory can change based on the files within it, and based on the various settings in dir->flags. Trying to keep this in mind while reading over the code, it is easy to think in terms of "treat_directory() tells us what to do with a directory, and read_directory_recursive() is the thing that recurses". Since we need to look into a directory to know how to treat it, though, it is quite easy to decide to (also) recurse into the directory from treat_directory() by adding a read_directory_recursive() call. Adding such a call is actually fine, IF we make sure that read_directory_recursive() does not also recurse into that same directory. Unfortunately, commit df5bcdf83aeb ("dir: recurse into untracked dirs for ignored files", 2017-05-18), added exactly such a case to the code, meaning we'd have two calls to read_directory_recursive() for an untracked directory. So, if we had a file named one/two/three/four/five/somefile.txt and nothing in one/ was tracked, then 'git status --ignored' would call read_directory_recursive() twice on the directory 'one/', and each of those would call read_directory_recursive() twice on the directory 'one/two/', and so on until read_directory_recursive() was called 2^5 times for 'one/two/three/four/five/'. Avoid calling read_directory_recursive() twice per level by moving a lot of the special logic into treat_directory(). Since dir.c is somewhat complex, extra cruft built up around this over time. While trying to unravel it, I noticed several instances where the first call to read_directory_recursive() would return e.g. path_untracked for some directory and a later one would return e.g. path_none, despite the fact that the directory clearly should have been considered untracked. The code happened to work due to the side-effect from the first invocation of adding untracked entries to dir->entries; this allowed it to get the correct output despite the supposed override in return value by the later call. I am somewhat concerned that there are still bugs and maybe even testcases with the wrong expectation. I have tried to carefully document treat_directory() since it becomes more complex after this change (though much of this complexity came from elsewhere that probably deserved better comments to begin with). However, much of my work felt more like a game of whackamole while attempting to make the code match the existing regression tests than an attempt to create an implementation that matched some clear design. That seems wrong to me, but the rules of existing behavior had so many special cases that I had a hard time coming up with some overarching rules about what correct behavior is for all cases, forcing me to hope that the regression tests are correct and sufficient. Such a hope seems likely to be ill-founded, given my experience with dir.c-related testcases in the last few months: Examples where the documentation was hard to parse or even just wrong: * 3aca58045f4f (git-clean.txt: do not claim we will delete files with -n/--dry-run, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) Examples where testcases were declared wrong and changed: * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * a2b13367fe55 (Revert "dir.c: make 'git-status --ignored' work within leading directories", 2019-12-10) Examples where testcases were clearly inadequate: * 502c386ff944 (t7300-clean: demonstrate deleting nested repo with an ignored file breakage, 2019-08-25) * 7541cc530239 (t7300: add testcases showing failure to clean specified pathspecs, 2019-09-17) * a5e916c7453b (dir: fix off-by-one error in match_pathspec_item, 2019-09-17) * 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) * 09487f2cbad3 (clean: avoid removing untracked files in a nested git repository, 2019-09-17) * e86bbcf987fa (clean: disambiguate the definition of -d, 2019-09-17) * 452efd11fbf6 (t3011: demonstrate directory traversal failures, 2019-12-10) * b9670c1f5e6b (dir: fix checks on common prefix directory, 2019-12-19) Examples where "correct behavior" was unclear to everyone: https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/ Other commits of note: * 902b90cf42bc (clean: fix theoretical path corruption, 2019-09-17) However, on the positive side, it does make the code much faster. For the following simple shell loop in an empty repository: for depth in $(seq 10 25) do dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done) rm -rf dir mkdir -p $dirs >$dirs/untracked-file /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null done I saw the following timings, in seconds (note that the numbers are a little noisy from run-to-run, but the trend is very clear with every run): 10: 0.03 11: 0.05 12: 0.08 13: 0.19 14: 0.29 15: 0.50 16: 1.05 17: 2.11 18: 4.11 19: 8.60 20: 17.55 21: 33.87 22: 68.71 23: 140.05 24: 274.45 25: 551.15 For the above run, using strace I can look for the number of untracked directories opened and can verify that it matches the expected 2^($depth+1)-2 (the sum of 2^1 + 2^2 + 2^3 + ... + 2^$depth). After this fix, with strace I can verify that the number of untracked directories that are opened drops to just $depth, and the timings all drop to 0.00. In fact, it isn't until a depth of 190 nested directories that it sometimes starts reporting a time of 0.01 seconds and doesn't consistently report 0.01 seconds until there are 240 nested directories. The previous code would have taken 17.55 * 2^220 / (60*60*24*365) = 9.4 * 10^59 YEARS to have completed the 240 nested directories case. It's not often that you get to speed something up by a factor of 3*10^69. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 210 ++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 147 insertions(+), 63 deletions(-) diff --git a/dir.c b/dir.c index d9bcb7e19b6..1b3c095b5a4 100644 --- a/dir.c +++ b/dir.c @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, const char *dirname, int len, int baselen, int excluded, const struct pathspec *pathspec) { - int nested_repo = 0; + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ + enum path_treatment state; + int nested_repo = 0, old_ignored_nr, check_only, stop_early; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -1711,18 +1717,135 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* This is the "show_other_directories" case */ - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + /* + * If we have a pathspec which could match something _below_ this + * directory (e.g. when checking 'subdir/' having a pathspec like + * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we + * need to recurse. + */ + if (pathspec) { + int ret = do_match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL /* seen */, + DO_MATCH_LEADING_PATHSPEC); + if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC) + return path_recurse; + } + + /* + * Other than the path_recurse case immediately above, we only need + * to recurse into untracked/ignored directories if either of the + * following bits is set: + * - DIR_SHOW_IGNORED_TOO (because then we need to determine if + * there are ignored directories below) + * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if + * the directory is empty) + */ + if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) return excluded ? path_excluded : path_untracked; + /* + * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid + * recursing into ignored directories if the path is excluded and + * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. + */ + if (excluded && + (dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) + return path_excluded; + + /* + * If we have we don't want to know the all the paths under an + * untracked or ignored directory, we still need to go into the + * directory to determine if it is empty (because an empty directory + * should be path_none instead of path_excluded or path_untracked). + */ + check_only = ((dir->flags & DIR_HIDE_EMPTY_DIRECTORIES) && + !(dir->flags & DIR_SHOW_IGNORED_TOO)); + + /* + * However, there's another optimization possible as a subset of + * check_only, based on the cases we have to consider: + * A) Directory matches no exclude patterns: + * * Directory is empty => path_none + * * Directory has an untracked file under it => path_untracked + * * Directory has only ignored files under it => path_excluded + * B) Directory matches an exclude pattern: + * * Directory is empty => path_none + * * Directory has an untracked file under it => path_excluded + * * Directory has only ignored files under it => path_excluded + * In case A, we can exit as soon as we've found an untracked + * file but otherwise have to walk all files. In case B, though, + * we can stop at the first file we find under the directory. + */ + stop_early = check_only && excluded; + + /* + * If /every/ file within an untracked directory is ignored, then + * we want to treat the directory as ignored (for e.g. status + * --porcelain), without listing the individual ignored files + * underneath. To do so, we'll save the current ignored_nr, and + * pop all the ones added after it if it turns out the entire + * directory is ignored. + */ + old_ignored_nr = dir->ignored_nr; + + /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, dirname + baselen, len - baselen); + state = read_directory_recursive(dir, istate, dirname, len, untracked, + check_only, stop_early, pathspec); + + /* There are a variety of reasons we may need to fixup the state... */ + if (state == path_excluded) { + /* state == path_excluded implies all paths under + * dirname were ignored... + * + * if running e.g. `git status --porcelain --ignored=matching`, + * then we want to see the subpaths that are ignored. + * + * if running e.g. just `git status --porcelain`, then + * we just want the directory itself to be listed as ignored + * and not the individual paths underneath. + */ + int want_ignored_subpaths = + ((dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)); + + if (want_ignored_subpaths) { + /* + * with --ignored=matching, we want the subpaths + * INSTEAD of the directory itself. + */ + state = path_none; + } else { + int i; + for (i = old_ignored_nr + 1; i<dir->ignored_nr; ++i) + FREE_AND_NULL(dir->ignored[i]); + dir->ignored_nr = old_ignored_nr; + } + } /* - * If this is an excluded directory, then we only need to check if - * the directory contains any files. + * If there is nothing under the current directory and we are not + * hiding empty directories, then we need to report on the + * untracked or ignored status of the directory itself. */ - return read_directory_recursive(dir, istate, dirname, len, - untracked, 1, excluded, pathspec); + if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + state = excluded ? path_excluded : path_untracked; + + /* + * We can recurse into untracked directories that don't match any + * of the given pathspecs when some file underneath the directory + * might match one of the pathspecs. If so, we should make sure + * to note that the directory itself did not match. + */ + if (pathspec && + !match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; + + return state; } /* @@ -1870,6 +1993,11 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir, int baselen, const struct pathspec *pathspec) { + /* + * WARNING: From this function, you can return path_recurse or you + * can call read_directory_recursive() (or neither), but + * you CAN'T DO BOTH. + */ strbuf_setlen(path, baselen); if (!cdir->ucd) { strbuf_addstr(path, cdir->file); @@ -1904,7 +2032,6 @@ static enum path_treatment treat_path(struct dir_struct *dir, const struct pathspec *pathspec) { int has_path_in_index, dtype, excluded; - enum path_treatment path_treatment; if (!cdir->d_name) return treat_path_fast(dir, untracked, cdir, istate, path, @@ -1961,24 +2088,16 @@ static enum path_treatment treat_path(struct dir_struct *dir, default: return path_none; case DT_DIR: - strbuf_addch(path, '/'); - path_treatment = treat_directory(dir, istate, untracked, - path->buf, path->len, - baselen, excluded, pathspec); /* - * If 1) we only want to return directories that - * match an exclude pattern and 2) this directory does - * not match an exclude pattern but all of its - * contents are excluded, then indicate that we should - * recurse into this directory (instead of marking the - * directory itself as an ignored path). + * WARNING: Do not ignore/amend the return value from + * treat_directory(), and especially do not change it to return + * path_recurse as that can cause exponential slowdown. + * Instead, modify treat_directory() to return the right value. */ - if (!excluded && - path_treatment == path_excluded && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) - return path_recurse; - return path_treatment; + strbuf_addch(path, '/'); + return treat_directory(dir, istate, untracked, + path->buf, path->len, + baselen, excluded, pathspec); case DT_REG: case DT_LNK: return excluded ? path_excluded : path_untracked; @@ -2175,14 +2294,10 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, int stop_at_first_file, const struct pathspec *pathspec) { /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in treat_leading_path(). See the commit message for the - * commit adding this warning as well as the commit preceding it - * for details. + * WARNING: Do NOT recurse unless path_recurse is returned from + * treat_path(). Recursing on any other return value + * can result in exponential slowdown. */ - struct cached_dir cdir; enum path_treatment state, subdir_state, dir_state = path_none; struct strbuf path = STRBUF_INIT; @@ -2204,13 +2319,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, dir_state = state; /* recurse into subdir if instructed by treat_path */ - if ((state == path_recurse) || - ((state == path_untracked) && - (resolve_dtype(cdir.d_type, istate, path.buf, path.len) == DT_DIR) && - ((dir->flags & DIR_SHOW_IGNORED_TOO) || - (pathspec && - do_match_pathspec(istate, pathspec, path.buf, path.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)))) { + if (state == path_recurse) { struct untracked_cache_dir *ud; ud = lookup_untracked(dir->untracked, untracked, path.buf + baselen, @@ -2294,15 +2403,6 @@ static int treat_leading_path(struct dir_struct *dir, const char *path, int len, const struct pathspec *pathspec) { - /* - * WARNING WARNING WARNING: - * - * Any updates to the traversal logic here may need corresponding - * updates in read_directory_recursive(). See 777b420347 (dir: - * synchronize treat_leading_path() and read_directory_recursive(), - * 2019-12-19) and its parent commit for details. - */ - struct strbuf sb = STRBUF_INIT; struct strbuf subdir = STRBUF_INIT; int prevlen, baselen; @@ -2353,23 +2453,7 @@ static int treat_leading_path(struct dir_struct *dir, strbuf_reset(&subdir); strbuf_add(&subdir, path+prevlen, baselen-prevlen); cdir.d_name = subdir.buf; - state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, - pathspec); - if (state == path_untracked && - resolve_dtype(cdir.d_type, istate, sb.buf, sb.len) == DT_DIR && - (dir->flags & DIR_SHOW_IGNORED_TOO || - do_match_pathspec(istate, pathspec, sb.buf, sb.len, - baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { - if (!match_pathspec(istate, pathspec, sb.buf, sb.len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - add_path_to_appropriate_result_list(dir, NULL, &cdir, - istate, - &sb, baselen, - pathspec, state); - state = path_recurse; - } + state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec); if (state != path_recurse) break; /* do not recurse into it */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v5 08/12] dir: replace exponential algorithm with a linear one 2020-04-01 4:17 ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget @ 2020-04-01 13:57 ` Derrick Stolee 2020-04-01 15:59 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Derrick Stolee @ 2020-04-01 13:57 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Elijah Newren On 4/1/2020 12:17 AM, Elijah Newren via GitGitGadget wrote: > @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > const char *dirname, int len, int baselen, int excluded, > const struct pathspec *pathspec) > { > - int nested_repo = 0; > + /* > + * WARNING: From this function, you can return path_recurse or you > + * can call read_directory_recursive() (or neither), but > + * you CAN'T DO BOTH. > + */ > + enum path_treatment state; > + int nested_repo = 0, old_ignored_nr, check_only, stop_early; > /* The "len-1" is to strip the final '/' */ > enum exist_status status = directory_exists_in_index(istate, dirname, len-1); > > @@ -1711,18 +1717,135 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > /* This is the "show_other_directories" case */ > > - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > + /* > + * If we have a pathspec which could match something _below_ this > + * directory (e.g. when checking 'subdir/' having a pathspec like > + * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we > + * need to recurse. I was extremely skeptical about this approach due to leading wildcards like "*.c" or "sub*/*.h" but found this comment inside math_pathspec_item(): /* * Here is where we would perform a wildmatch to check if * "name" can be matched as a directory (or a prefix) against * the pathspec. Since wildmatch doesn't have this capability * at the present we have to punt and say that it is a match, * potentially returning a false positive * The submodules themselves will be able to perform more * accurate matching to determine if the pathspec matches. */ return MATCHED_RECURSIVELY_LEADING_PATHSPEC; So it looks like it will return MATCHED_RECURSIVELY_LEADING_PATHSPEC as expected by this block below: > + */ > + if (pathspec) { > + int ret = do_match_pathspec(istate, pathspec, dirname, len, > + 0 /* prefix */, NULL /* seen */, > + DO_MATCH_LEADING_PATHSPEC); > + if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC) > + return path_recurse; > + } I can't say that I fully understand the change to this patch yet, but at least my initial "THAT CAN'T POSSIBLY WORK!" reaction has been tempered. Thanks, -Stolee ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v5 08/12] dir: replace exponential algorithm with a linear one 2020-04-01 13:57 ` Derrick Stolee @ 2020-04-01 15:59 ` Elijah Newren 0 siblings, 0 replies; 76+ messages in thread From: Elijah Newren @ 2020-04-01 15:59 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy On Wed, Apr 1, 2020 at 6:57 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 4/1/2020 12:17 AM, Elijah Newren via GitGitGadget wrote: > > @@ -1659,7 +1659,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > const char *dirname, int len, int baselen, int excluded, > > const struct pathspec *pathspec) > > { > > - int nested_repo = 0; > > + /* > > + * WARNING: From this function, you can return path_recurse or you > > + * can call read_directory_recursive() (or neither), but > > + * you CAN'T DO BOTH. > > + */ > > + enum path_treatment state; > > + int nested_repo = 0, old_ignored_nr, check_only, stop_early; > > /* The "len-1" is to strip the final '/' */ > > enum exist_status status = directory_exists_in_index(istate, dirname, len-1); > > > > @@ -1711,18 +1717,135 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > > > > /* This is the "show_other_directories" case */ > > > > - if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) > > + /* > > + * If we have a pathspec which could match something _below_ this > > + * directory (e.g. when checking 'subdir/' having a pathspec like > > + * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we > > + * need to recurse. > > I was extremely skeptical about this approach due to leading wildcards > like "*.c" or "sub*/*.h" but found this comment inside math_pathspec_item(): > > /* > * Here is where we would perform a wildmatch to check if > * "name" can be matched as a directory (or a prefix) against > * the pathspec. Since wildmatch doesn't have this capability > * at the present we have to punt and say that it is a match, > * potentially returning a false positive > * The submodules themselves will be able to perform more > * accurate matching to determine if the pathspec matches. > */ > return MATCHED_RECURSIVELY_LEADING_PATHSPEC; > > So it looks like it will return MATCHED_RECURSIVELY_LEADING_PATHSPEC as > expected by this block below: > > > + */ > > + if (pathspec) { > > + int ret = do_match_pathspec(istate, pathspec, dirname, len, > > + 0 /* prefix */, NULL /* seen */, > > + DO_MATCH_LEADING_PATHSPEC); > > + if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC) > > + return path_recurse; > > + } > > I can't say that I fully understand the change to this patch yet, but at > least my initial "THAT CAN'T POSSIBLY WORK!" reaction has been tempered. I don't know if it helps you feel better about this block or not, but it existed (in just slightly modified form) in dir.c before patch 7; I just missed it when I was restructuring and thus didn't have it in my first four rounds of this series. (Funnily enough, I was the one who added this LEADING_PATHSPEC logic to dir.c a while back, and you'd think if I was going to overlook some code when I was restructuring, that it surely couldn't be bits that I had added myself.) So, that basically means that dir.c has been relying on this logic for some time, and I just needed to make sure to include it in this restructuring. Elijah ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (7 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget ` (2 subsequent siblings) 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Handling DIR_KEEP_UNTRACKED_CONTENTS within treat_directory() instead of as a post-processing step in read_directory(): * allows us to directly access and remove the relevant entries instead of needing to calculate which ones need to be removed * keeps the logic for directory handling in one location (and puts it closer the the logic for stripping out extra ignored entries, which seems logical). Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 43 +++++++++++++++++++------------------------ 1 file changed, 19 insertions(+), 24 deletions(-) diff --git a/dir.c b/dir.c index 1b3c095b5a4..8be31df58c2 100644 --- a/dir.c +++ b/dir.c @@ -1665,7 +1665,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * you CAN'T DO BOTH. */ enum path_treatment state; - int nested_repo = 0, old_ignored_nr, check_only, stop_early; + int nested_repo = 0, check_only, stop_early; + int old_ignored_nr, old_untracked_nr; /* The "len-1" is to strip the final '/' */ enum exist_status status = directory_exists_in_index(istate, dirname, len-1); @@ -1785,9 +1786,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * --porcelain), without listing the individual ignored files * underneath. To do so, we'll save the current ignored_nr, and * pop all the ones added after it if it turns out the entire - * directory is ignored. + * directory is ignored. Also, when DIR_SHOW_IGNORED_TOO and + * !DIR_KEEP_UNTRACKED_CONTENTS then we don't want to show + * untracked paths so will need to pop all those off the last + * after we traverse. */ old_ignored_nr = dir->ignored_nr; + old_untracked_nr = dir->nr; /* Actually recurse into dirname now, we'll fixup the state later. */ untracked = lookup_untracked(dir->untracked, untracked, @@ -1825,6 +1830,18 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } } + /* + * We may need to ignore some of the untracked paths we found while + * traversing subdirectories. + */ + if ((dir->flags & DIR_SHOW_IGNORED_TOO) && + !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) { + int i; + for (i = old_untracked_nr + 1; i<dir->nr; ++i) + FREE_AND_NULL(dir->entries[i]); + dir->nr = old_untracked_nr; + } + /* * If there is nothing under the current directory and we are not * hiding empty directories, then we need to report on the @@ -2653,28 +2670,6 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->entries, dir->nr, cmp_dir_entry); QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - /* - * If DIR_SHOW_IGNORED_TOO is set, read_directory_recursive() will - * also pick up untracked contents of untracked dirs; by default - * we discard these, but given DIR_KEEP_UNTRACKED_CONTENTS we do not. - */ - if ((dir->flags & DIR_SHOW_IGNORED_TOO) && - !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) { - int i, j; - - /* remove from dir->entries untracked contents of untracked dirs */ - for (i = j = 0; j < dir->nr; j++) { - if (i && - check_dir_entry_contains(dir->entries[i - 1], dir->entries[j])) { - FREE_AND_NULL(dir->entries[j]); - } else { - dir->entries[i++] = dir->entries[j]; - } - } - - dir->nr = i; - } - trace_performance_leave("read directory %.*s", len, path); if (dir->untracked) { static int force_untracked_cache = -1; -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 10/12] dir: replace double pathspec matching with single in treat_directory() 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (8 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> treat_directory() had a call to both do_match_pathspec() and match_pathspec(). These calls have migrated through the code somewhat since their introduction, but we don't actually need both. Replace the two calls with one, and while at it, move the check earlier in order to reduce the need for callers of fill_directory() to do post-filtering of results. The next patch will address post-filtering more forcefully and provide more relevant history and context. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/dir.c b/dir.c index 8be31df58c2..a67930dcff6 100644 --- a/dir.c +++ b/dir.c @@ -1665,6 +1665,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * you CAN'T DO BOTH. */ enum path_treatment state; + int matches_how = 0; int nested_repo = 0, check_only, stop_early; int old_ignored_nr, old_untracked_nr; /* The "len-1" is to strip the final '/' */ @@ -1677,6 +1678,22 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (status != index_nonexistent) BUG("Unhandled value for directory_exists_in_index: %d\n", status); + /* + * We don't want to descend into paths that don't match the necessary + * patterns. Clearly, if we don't have a pathspec, then we can't check + * for matching patterns. Also, if (excluded) then we know we matched + * the exclusion patterns so as an optimization we can skip checking + * for matching patterns. + */ + if (pathspec && !excluded) { + matches_how = do_match_pathspec(istate, pathspec, dirname, len, + 0 /* prefix */, NULL /* seen */, + DO_MATCH_LEADING_PATHSPEC); + if (!matches_how) + return path_none; + } + + if ((dir->flags & DIR_SKIP_NESTED_GIT) || !(dir->flags & DIR_NO_GITLINKS)) { struct strbuf sb = STRBUF_INIT; @@ -1724,13 +1741,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir, * 'subdir/some/deep/path/file' or 'subdir/widget-*.c'), then we * need to recurse. */ - if (pathspec) { - int ret = do_match_pathspec(istate, pathspec, dirname, len, - 0 /* prefix */, NULL /* seen */, - DO_MATCH_LEADING_PATHSPEC); - if (ret == MATCHED_RECURSIVELY_LEADING_PATHSPEC) - return path_recurse; - } + if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC) + return path_recurse; /* * Other than the path_recurse case immediately above, we only need @@ -1850,18 +1862,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (state == path_none && !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) state = excluded ? path_excluded : path_untracked; - /* - * We can recurse into untracked directories that don't match any - * of the given pathspecs when some file underneath the directory - * might match one of the pathspecs. If so, we should make sure - * to note that the directory itself did not match. - */ - if (pathspec && - !match_pathspec(istate, pathspec, dirname, len, - 0 /* prefix */, NULL, - 0 /* do NOT special case dirs */)) - state = path_none; - return state; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (9 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 2020-07-19 6:33 ` Andreas Schwab 2020-04-01 4:17 ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget 11 siblings, 1 reply; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Traditionally, the expected calling convention for the dir.c API was: fill_directory(&dir, ..., pathspec) foreach entry in dir->entries: if (dir_path_match(entry, pathspec)) process_or_display(entry) This may have made sense once upon a time, because the fill_directory() call could use cheap checks to avoid doing full pathspec matching, and an external caller may have wanted to do other post-processing of the results anyway. However: * this structure makes it easy for users of the API to get it wrong * this structure actually makes it harder to understand fill_directory() and the functions it uses internally. It has tripped me up several times while trying to fix bugs and restructure things. * relying on post-filtering was already found to produce wrong results; pathspec matching had to be added internally for multiple cases in order to get the right results (see commits 404ebceda01c (dir: also check directories for matching pathspecs, 2019-09-17) and 89a1f4aaf765 (dir: if our pathspec might match files under a dir, recurse into it, 2019-09-17)) * it's bad for performance: fill_directory() already has to do lots of checks and knows the subset of cases where it still needs to do more checks. Forcing external callers to do full pathspec matching means they must re-check _every_ path. So, add the pathspec matching within the fill_directory() internals, and remove it from external callers. Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/clean.c | 6 ------ builtin/grep.c | 2 -- builtin/ls-files.c | 5 +++-- builtin/stash.c | 17 +++++------------ dir.c | 9 ++++++++- wt-status.c | 6 ++---- 6 files changed, 18 insertions(+), 27 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 5abf087e7c4..b189b7b4ea0 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -989,12 +989,6 @@ int cmd_clean(int argc, const char **argv, const char *prefix) if (!cache_name_is_other(ent->name, ent->len)) continue; - if (pathspec.nr) - matches = dir_path_match(&the_index, ent, &pathspec, 0, NULL); - - if (pathspec.nr && !matches) - continue; - if (lstat(ent->name, &st)) die_errno("Cannot lstat '%s'", ent->name); diff --git a/builtin/grep.c b/builtin/grep.c index 50ce8d94612..f3425102999 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -691,8 +691,6 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec, fill_directory(&dir, opt->repo->index, pathspec); for (i = 0; i < dir.nr; i++) { - if (!dir_path_match(opt->repo->index, dir.entries[i], pathspec, 0, NULL)) - continue; hit |= grep_file(opt, dir.entries[i]->name); if (hit && opt->status_only) break; diff --git a/builtin/ls-files.c b/builtin/ls-files.c index f069a028cea..b87c22ac240 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -128,8 +128,9 @@ static void show_dir_entry(const struct index_state *istate, if (len > ent->len) die("git ls-files: internal error - directory entry not superset of prefix"); - if (!dir_path_match(istate, ent, &pathspec, len, ps_matched)) - return; + /* If ps_matches is non-NULL, figure out which pathspec(s) match. */ + if (ps_matched) + dir_path_match(istate, ent, &pathspec, len, ps_matched); fputs(tag, stdout); write_eolinfo(istate, NULL, ent->name); diff --git a/builtin/stash.c b/builtin/stash.c index 4ad3adf4ba5..704740b245c 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -856,30 +856,23 @@ static int get_untracked_files(const struct pathspec *ps, int include_untracked, struct strbuf *untracked_files) { int i; - int max_len; int found = 0; - char *seen; struct dir_struct dir; memset(&dir, 0, sizeof(dir)); if (include_untracked != INCLUDE_ALL_FILES) setup_standard_excludes(&dir); - seen = xcalloc(ps->nr, 1); - - max_len = fill_directory(&dir, the_repository->index, ps); + fill_directory(&dir, the_repository->index, ps); for (i = 0; i < dir.nr; i++) { struct dir_entry *ent = dir.entries[i]; - if (dir_path_match(&the_index, ent, ps, max_len, seen)) { - found++; - strbuf_addstr(untracked_files, ent->name); - /* NUL-terminate: will be fed to update-index -z */ - strbuf_addch(untracked_files, '\0'); - } + found++; + strbuf_addstr(untracked_files, ent->name); + /* NUL-terminate: will be fed to update-index -z */ + strbuf_addch(untracked_files, '\0'); free(ent); } - free(seen); free(dir.entries); free(dir.ignored); clear_directory(&dir); diff --git a/dir.c b/dir.c index a67930dcff6..2de64910401 100644 --- a/dir.c +++ b/dir.c @@ -2117,7 +2117,14 @@ static enum path_treatment treat_path(struct dir_struct *dir, baselen, excluded, pathspec); case DT_REG: case DT_LNK: - return excluded ? path_excluded : path_untracked; + if (excluded) + return path_excluded; + if (pathspec && + !do_match_pathspec(istate, pathspec, path->buf, path->len, + 0 /* prefix */, NULL /* seen */, + 0 /* flags */)) + return path_none; + return path_untracked; } } diff --git a/wt-status.c b/wt-status.c index cc6f94504d9..98dfa6f73f9 100644 --- a/wt-status.c +++ b/wt-status.c @@ -722,16 +722,14 @@ static void wt_status_collect_untracked(struct wt_status *s) for (i = 0; i < dir.nr; i++) { struct dir_entry *ent = dir.entries[i]; - if (index_name_is_other(istate, ent->name, ent->len) && - dir_path_match(istate, ent, &s->pathspec, 0, NULL)) + if (index_name_is_other(istate, ent->name, ent->len)) string_list_insert(&s->untracked, ent->name); free(ent); } for (i = 0; i < dir.ignored_nr; i++) { struct dir_entry *ent = dir.ignored[i]; - if (index_name_is_other(istate, ent->name, ent->len) && - dir_path_match(istate, ent, &s->pathspec, 0, NULL)) + if (index_name_is_other(istate, ent->name, ent->len)) string_list_insert(&s->ignored, ent->name); free(ent); } -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches 2020-04-01 4:17 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget @ 2020-07-19 6:33 ` Andreas Schwab 2020-07-19 12:39 ` Martin Ågren 0 siblings, 1 reply; 76+ messages in thread From: Andreas Schwab @ 2020-07-19 6:33 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren This breaks git status --ignored. $ ./git status --porcelain --ignored -- a !! abspath.o !! add-interactive.o !! add-patch.o !! advice.o !! alias.o !! alloc.o !! apply.o !! archive-tar.o !! archive-zip.o !! archive.o !! argv-array.o !! attr.o Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches 2020-07-19 6:33 ` Andreas Schwab @ 2020-07-19 12:39 ` Martin Ågren 2020-07-20 15:25 ` Elijah Newren 0 siblings, 1 reply; 76+ messages in thread From: Martin Ågren @ 2020-07-19 12:39 UTC (permalink / raw) To: Andreas Schwab Cc: Elijah Newren via GitGitGadget, git, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren On Sun, 19 Jul 2020 at 08:37, Andreas Schwab <schwab@linux-m68k.org> wrote: > > This breaks git status --ignored. > > $ ./git status --porcelain --ignored -- a > !! abspath.o > !! add-interactive.o ... > !! attr.o Thanks for bisecting. This is 95c11ecc73 ("Fix error-prone fill_directory() API; make it only return matches", 2020-04-01). I wonder if the below makes any sense. It seems to fix this usage and the tests pass, but I have no idea what else this might be breaking... Maybe Elijah has an idea whether this is roughly the right approach? Looking at the commit in question (95c11ecc73), there must have been some reason that it injected the pathspec check between the "path_excluded" and the "path_untracked" cases. The diff below basically undoes that split, so I have a feeling I'm missing something. Martin diff --git a/dir.c b/dir.c index 1045cc9c6f..fe64be30ed 100644 --- a/dir.c +++ b/dir.c @@ -2209,13 +2209,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, baselen, excluded, pathspec); case DT_REG: case DT_LNK: - if (excluded) - return path_excluded; if (pathspec && !match_pathspec(istate, pathspec, path->buf, path->len, 0 /* prefix */, NULL /* seen */, 0 /* is_dir */)) return path_none; + if (excluded) + return path_excluded; return path_untracked; } } diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh index e4cf5484f9..2f9bea9793 100755 --- a/t/t7061-wtstatus-ignore.sh +++ b/t/t7061-wtstatus-ignore.sh @@ -30,6 +30,31 @@ test_expect_success 'same with gitignore starting with BOM' ' test_cmp expected actual ' +test_expect_success 'status untracked files --ignored with pathspec (no match)' ' + git status --porcelain --ignored -- untracked/i >actual && + test_must_be_empty actual && + git status --porcelain --ignored -- untracked/u >actual && + test_must_be_empty actual +' + +test_expect_success 'status untracked files --ignored with pathspec (literal match)' ' + git status --porcelain --ignored -- untracked/ignored >actual && + echo "!! untracked/ignored" >expected && + test_cmp expected actual && + git status --porcelain --ignored -- untracked/uncommitted >actual && + echo "?? untracked/uncommitted" >expected && + test_cmp expected actual +' + +test_expect_success 'status untracked files --ignored with pathspec (glob match)' ' + git status --porcelain --ignored -- untracked/i\* >actual && + echo "!! untracked/ignored" >expected && + test_cmp expected actual && + git status --porcelain --ignored -- untracked/u\* >actual && + echo "?? untracked/uncommitted" >expected && + test_cmp expected actual +' + cat >expected <<\EOF ?? .gitignore ?? actual -- 2.28.0.rc1.7.g31f2d237fa ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches 2020-07-19 12:39 ` Martin Ågren @ 2020-07-20 15:25 ` Elijah Newren 2020-07-20 18:45 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren 2020-07-20 18:58 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Junio C Hamano 0 siblings, 2 replies; 76+ messages in thread From: Elijah Newren @ 2020-07-20 15:25 UTC (permalink / raw) To: Martin Ågren Cc: Andreas Schwab, Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee Hi, On Sun, Jul 19, 2020 at 5:39 AM Martin Ågren <martin.agren@gmail.com> wrote: > > On Sun, 19 Jul 2020 at 08:37, Andreas Schwab <schwab@linux-m68k.org> wrote: > > > > This breaks git status --ignored. > > > > $ ./git status --porcelain --ignored -- a > > !! abspath.o > > !! add-interactive.o > ... > > !! attr.o > > Thanks for bisecting. This is 95c11ecc73 ("Fix error-prone > fill_directory() API; make it only return matches", 2020-04-01). > > I wonder if the below makes any sense. It seems to fix this usage and > the tests pass, but I have no idea what else this might be breaking... > > Maybe Elijah has an idea whether this is roughly the right approach? > Looking at the commit in question (95c11ecc73), there must have been > some reason that it injected the pathspec check between the > "path_excluded" and the "path_untracked" cases. The diff below > basically undoes that split, so I have a feeling I'm missing something. Awesome, thanks Andreas for the bisected report and Martin for finding and fixing the bug. As for the reason that the old patch injected the pathspec check between the path_excluded and the path_untracked cases, that appears to me to just be "I'm good at making boneheaded mistakes". Your changes here are the right fix. As a separate optimization, we could maybe make simplify_away() a bit more involved and have it exclude a few more paths so that fewer make it to this final check, but that's just optimization work that is separate from your fix here. Reviewed-by: Elijah Newren <newren@gmail.com> > Martin > > diff --git a/dir.c b/dir.c > index 1045cc9c6f..fe64be30ed 100644 > --- a/dir.c > +++ b/dir.c > @@ -2209,13 +2209,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, > baselen, excluded, pathspec); > case DT_REG: > case DT_LNK: > - if (excluded) > - return path_excluded; > if (pathspec && > !match_pathspec(istate, pathspec, path->buf, path->len, > 0 /* prefix */, NULL /* seen */, > 0 /* is_dir */)) > return path_none; > + if (excluded) > + return path_excluded; > return path_untracked; > } > } > diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh > index e4cf5484f9..2f9bea9793 100755 > --- a/t/t7061-wtstatus-ignore.sh > +++ b/t/t7061-wtstatus-ignore.sh > @@ -30,6 +30,31 @@ test_expect_success 'same with gitignore starting with BOM' ' > test_cmp expected actual > ' > > +test_expect_success 'status untracked files --ignored with pathspec (no match)' ' > + git status --porcelain --ignored -- untracked/i >actual && > + test_must_be_empty actual && > + git status --porcelain --ignored -- untracked/u >actual && > + test_must_be_empty actual > +' > + > +test_expect_success 'status untracked files --ignored with pathspec (literal match)' ' > + git status --porcelain --ignored -- untracked/ignored >actual && > + echo "!! untracked/ignored" >expected && > + test_cmp expected actual && > + git status --porcelain --ignored -- untracked/uncommitted >actual && > + echo "?? untracked/uncommitted" >expected && > + test_cmp expected actual > +' > + > +test_expect_success 'status untracked files --ignored with pathspec (glob match)' ' > + git status --porcelain --ignored -- untracked/i\* >actual && > + echo "!! untracked/ignored" >expected && > + test_cmp expected actual && > + git status --porcelain --ignored -- untracked/u\* >actual && > + echo "?? untracked/uncommitted" >expected && > + test_cmp expected actual > +' > + > cat >expected <<\EOF > ?? .gitignore > ?? actual > -- > 2.28.0.rc1.7.g31f2d237fa > ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH] dir: check pathspecs before returning `path_excluded` 2020-07-20 15:25 ` Elijah Newren @ 2020-07-20 18:45 ` Martin Ågren 2020-07-20 18:49 ` Elijah Newren 2020-07-20 20:25 ` Junio C Hamano 2020-07-20 18:58 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Junio C Hamano 1 sibling, 2 replies; 76+ messages in thread From: Martin Ågren @ 2020-07-20 18:45 UTC (permalink / raw) To: Elijah Newren Cc: Andreas Schwab, Elijah Newren via GitGitGadget, git, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee On Mon, 20 Jul 2020 at 17:25, Elijah Newren <newren@gmail.com> wrote: > > Awesome, thanks Andreas for the bisected report and Martin for finding > and fixing the bug. As for the reason that the old patch injected the > pathspec check between the path_excluded and the path_untracked cases, > that appears to me to just be "I'm good at making boneheaded > mistakes". Your changes here are the right fix. Ok, here it is as a proper patch. > Reviewed-by: Elijah Newren <newren@gmail.com> Thanks. I've included your reviewed-by below. The log message is obviously new, but the diff is identical to what I posted earlier. BTW, this bug first appeared in v2.27.0, so this is not a regression during the v2.28.0 cycle. Martin -- >8 -- In 95c11ecc73 ("Fix error-prone fill_directory() API; make it only return matches", 2020-04-01), we taught `fill_directory()`, or more specifically `treat_path()`, to check against any pathspecs so that we could simplify the callers. But in doing so, we added a slightly-to-early return for the "excluded" case. We end up not checking the pathspecs, meaning we return `path_excluded` when maybe we should return `path_none`. As a result, `git status --ignored -- pathspec` might show paths that don't actually match "pathspec". Move the "excluded" check down to after we've checked any pathspecs. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- dir.c | 4 ++-- t/t7061-wtstatus-ignore.sh | 25 +++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/dir.c b/dir.c index 1045cc9c6f..fe64be30ed 100644 --- a/dir.c +++ b/dir.c @@ -2209,13 +2209,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, baselen, excluded, pathspec); case DT_REG: case DT_LNK: - if (excluded) - return path_excluded; if (pathspec && !match_pathspec(istate, pathspec, path->buf, path->len, 0 /* prefix */, NULL /* seen */, 0 /* is_dir */)) return path_none; + if (excluded) + return path_excluded; return path_untracked; } } diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh index e4cf5484f9..2f9bea9793 100755 --- a/t/t7061-wtstatus-ignore.sh +++ b/t/t7061-wtstatus-ignore.sh @@ -30,6 +30,31 @@ test_expect_success 'same with gitignore starting with BOM' ' test_cmp expected actual ' +test_expect_success 'status untracked files --ignored with pathspec (no match)' ' + git status --porcelain --ignored -- untracked/i >actual && + test_must_be_empty actual && + git status --porcelain --ignored -- untracked/u >actual && + test_must_be_empty actual +' + +test_expect_success 'status untracked files --ignored with pathspec (literal match)' ' + git status --porcelain --ignored -- untracked/ignored >actual && + echo "!! untracked/ignored" >expected && + test_cmp expected actual && + git status --porcelain --ignored -- untracked/uncommitted >actual && + echo "?? untracked/uncommitted" >expected && + test_cmp expected actual +' + +test_expect_success 'status untracked files --ignored with pathspec (glob match)' ' + git status --porcelain --ignored -- untracked/i\* >actual && + echo "!! untracked/ignored" >expected && + test_cmp expected actual && + git status --porcelain --ignored -- untracked/u\* >actual && + echo "?? untracked/uncommitted" >expected && + test_cmp expected actual +' + cat >expected <<\EOF ?? .gitignore ?? actual -- 2.28.0.rc1 ^ permalink raw reply related [flat|nested] 76+ messages in thread
* Re: [PATCH] dir: check pathspecs before returning `path_excluded` 2020-07-20 18:45 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren @ 2020-07-20 18:49 ` Elijah Newren 2020-07-20 18:51 ` Martin Ågren 2020-07-20 20:25 ` Junio C Hamano 1 sibling, 1 reply; 76+ messages in thread From: Elijah Newren @ 2020-07-20 18:49 UTC (permalink / raw) To: Martin Ågren Cc: Andreas Schwab, Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee On Mon, Jul 20, 2020 at 11:46 AM Martin Ågren <martin.agren@gmail.com> wrote: > > On Mon, 20 Jul 2020 at 17:25, Elijah Newren <newren@gmail.com> wrote: > > > > Awesome, thanks Andreas for the bisected report and Martin for finding > > and fixing the bug. As for the reason that the old patch injected the > > pathspec check between the path_excluded and the path_untracked cases, > > that appears to me to just be "I'm good at making boneheaded > > mistakes". Your changes here are the right fix. > > Ok, here it is as a proper patch. > > > Reviewed-by: Elijah Newren <newren@gmail.com> > > Thanks. I've included your reviewed-by below. The log message is > obviously new, but the diff is identical to what I posted earlier. > > BTW, this bug first appeared in v2.27.0, so this is not a regression > during the v2.28.0 cycle. Looks good other than a minor typo in the new commit message. > Martin > > -- >8 -- > In 95c11ecc73 ("Fix error-prone fill_directory() API; make it only > return matches", 2020-04-01), we taught `fill_directory()`, or more > specifically `treat_path()`, to check against any pathspecs so that we > could simplify the callers. > > But in doing so, we added a slightly-to-early return for the "excluded" s/to/too/ > case. We end up not checking the pathspecs, meaning we return > `path_excluded` when maybe we should return `path_none`. As a result, > `git status --ignored -- pathspec` might show paths that don't actually > match "pathspec". > > Move the "excluded" check down to after we've checked any pathspecs. > > Reported-by: Andreas Schwab <schwab@linux-m68k.org> > Reviewed-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- > dir.c | 4 ++-- > t/t7061-wtstatus-ignore.sh | 25 +++++++++++++++++++++++++ > 2 files changed, 27 insertions(+), 2 deletions(-) > > diff --git a/dir.c b/dir.c > index 1045cc9c6f..fe64be30ed 100644 > --- a/dir.c > +++ b/dir.c > @@ -2209,13 +2209,13 @@ static enum path_treatment treat_path(struct dir_struct *dir, > baselen, excluded, pathspec); > case DT_REG: > case DT_LNK: > - if (excluded) > - return path_excluded; > if (pathspec && > !match_pathspec(istate, pathspec, path->buf, path->len, > 0 /* prefix */, NULL /* seen */, > 0 /* is_dir */)) > return path_none; > + if (excluded) > + return path_excluded; > return path_untracked; > } > } > diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh > index e4cf5484f9..2f9bea9793 100755 > --- a/t/t7061-wtstatus-ignore.sh > +++ b/t/t7061-wtstatus-ignore.sh > @@ -30,6 +30,31 @@ test_expect_success 'same with gitignore starting with BOM' ' > test_cmp expected actual > ' > > +test_expect_success 'status untracked files --ignored with pathspec (no match)' ' > + git status --porcelain --ignored -- untracked/i >actual && > + test_must_be_empty actual && > + git status --porcelain --ignored -- untracked/u >actual && > + test_must_be_empty actual > +' > + > +test_expect_success 'status untracked files --ignored with pathspec (literal match)' ' > + git status --porcelain --ignored -- untracked/ignored >actual && > + echo "!! untracked/ignored" >expected && > + test_cmp expected actual && > + git status --porcelain --ignored -- untracked/uncommitted >actual && > + echo "?? untracked/uncommitted" >expected && > + test_cmp expected actual > +' > + > +test_expect_success 'status untracked files --ignored with pathspec (glob match)' ' > + git status --porcelain --ignored -- untracked/i\* >actual && > + echo "!! untracked/ignored" >expected && > + test_cmp expected actual && > + git status --porcelain --ignored -- untracked/u\* >actual && > + echo "?? untracked/uncommitted" >expected && > + test_cmp expected actual > +' > + > cat >expected <<\EOF > ?? .gitignore > ?? actual > -- > 2.28.0.rc1 > ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH] dir: check pathspecs before returning `path_excluded` 2020-07-20 18:49 ` Elijah Newren @ 2020-07-20 18:51 ` Martin Ågren 0 siblings, 0 replies; 76+ messages in thread From: Martin Ågren @ 2020-07-20 18:51 UTC (permalink / raw) To: Elijah Newren Cc: Andreas Schwab, Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee On Mon, 20 Jul 2020 at 20:49, Elijah Newren <newren@gmail.com> wrote: > > On Mon, Jul 20, 2020 at 11:46 AM Martin Ågren <martin.agren@gmail.com> wrote: > > Looks good other than a minor typo in the new commit message. > > > But in doing so, we added a slightly-to-early return for the "excluded" > > s/to/too/ Thanks! Martin ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH] dir: check pathspecs before returning `path_excluded` 2020-07-20 18:45 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren 2020-07-20 18:49 ` Elijah Newren @ 2020-07-20 20:25 ` Junio C Hamano 1 sibling, 0 replies; 76+ messages in thread From: Junio C Hamano @ 2020-07-20 20:25 UTC (permalink / raw) To: Martin Ågren Cc: Elijah Newren, Andreas Schwab, Elijah Newren via GitGitGadget, git, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee Martin Ågren <martin.agren@gmail.com> writes: > In 95c11ecc73 ("Fix error-prone fill_directory() API; make it only > return matches", 2020-04-01), we taught `fill_directory()`, or more > specifically `treat_path()`, to check against any pathspecs so that we > could simplify the callers. > > But in doing so, we added a slightly-to-early return for the "excluded" > case. We end up not checking the pathspecs, meaning we return > `path_excluded` when maybe we should return `path_none`. As a result, > `git status --ignored -- pathspec` might show paths that don't actually > match "pathspec". > > Move the "excluded" check down to after we've checked any pathspecs. > > Reported-by: Andreas Schwab <schwab@linux-m68k.org> > Reviewed-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- Makes sense. Thanks. Will queue. ^ permalink raw reply [flat|nested] 76+ messages in thread
* Re: [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches 2020-07-20 15:25 ` Elijah Newren 2020-07-20 18:45 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren @ 2020-07-20 18:58 ` Junio C Hamano 1 sibling, 0 replies; 76+ messages in thread From: Junio C Hamano @ 2020-07-20 18:58 UTC (permalink / raw) To: Elijah Newren Cc: Martin Ågren, Andreas Schwab, Elijah Newren via GitGitGadget, Git Mailing List, Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee Elijah Newren <newren@gmail.com> writes: >> Looking at the commit in question (95c11ecc73), there must have been >> some reason that it injected the pathspec check between the >> "path_excluded" and the "path_untracked" cases. The diff below >> basically undoes that split, so I have a feeling I'm missing something. > > Awesome, thanks Andreas for the bisected report and Martin for finding > and fixing the bug. As for the reason that the old patch injected the > pathspec check between the path_excluded and the path_untracked cases, > that appears to me to just be "I'm good at making boneheaded > mistakes". Your changes here are the right fix. As a separate > optimization, we could maybe make simplify_away() a bit more involved > and have it exclude a few more paths so that fewer make it to this > final check, but that's just optimization work that is separate from > your fix here. > > Reviewed-by: Elijah Newren <newren@gmail.com> This is in 2.27-rc0, so it is not ultra-urgent to fix it at the tip of the current development track, but let's make sure we have a fix with a proper log messasge in a mergeable state (meaning "have already been cooked in 'next' for a week or two) early in the next cycle. Thank you, Andreas, Martin and Elijah, as always. ^ permalink raw reply [flat|nested] 76+ messages in thread
* [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget ` (10 preceding siblings ...) 2020-04-01 4:17 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget @ 2020-04-01 4:17 ` Elijah Newren via GitGitGadget 11 siblings, 0 replies; 76+ messages in thread From: Elijah Newren via GitGitGadget @ 2020-04-01 4:17 UTC (permalink / raw) To: git Cc: Martin Melka, SZEDER Gábor, Samuel Lijin, Nguyễn Thái Ngọc Duy, Derrick Stolee, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> As reported on the git mailing list, since git-2.25, git add untracked-dir/ has been tab completing to git add untracked-dir/./ The cause for this was that with commit b9670c1f5e (dir: fix checks on common prefix directory, 2019-12-19), git ls-files -o --directory untracked-dir/ (or the equivalent `git -C untracked-dir ls-files -o --directory`) began reporting untracked-dir/ instead of listing paths underneath that directory. It may also be worth noting that the real command in question was git -C untracked-dir ls-files -o --directory '*' which is equivalent to git ls-files -o --directory 'untracked-dir/*' which behaves the same for the purposes of this issue (the '*' can match the empty string), but becomes relevant for the proposed fix. At first, based on the report, I decided to try to view this as a regression and tried to find a way to recover the old behavior without breaking other stuff, or at least breaking as little as possible. However, in the end, I couldn't figure out a way to do it that wouldn't just cause lots more problems than it solved. The old behavior was a bug: * Although older git would avoid cleaning anything with `git clean -f .git`, it would wipe out everything under that direcotry with `git clean -f .git/`. Despite the difference in command used, this is relevant because the exact same change that fixed clean changed the behavior of ls-files. * Older git would report different results based solely on presence or absence of a trailing slash for $SUBDIR in the command `git ls-files -o --directory $SUBDIR`. * Older git violated the documented behavior of not recursing into directories that matched the pathspec when --directory was specified. * And, after all, commit b9670c1f5e (dir: fix checks on common prefix directory, 2019-12-19) didn't overlook this issue; it explicitly stated that the behavior of the command was being changed to bring it inline with the docs. (Also, if it helps, despite that commit being merged during the 2.25 series, this bug was not reported during the 2.25 cycle, nor even during most of the 2.26 cycle -- it was reported a day before 2.26 was released. So the impact of the change is at least somewhat small.) Instead of relying on a bug of ls-files in reporting the wrong content, change the invocation of ls-files used by git-completion to make it grab paths one depth deeper. Do this by changing '$DIR/*' (match $DIR/ plus 0 or more characters) into '$DIR/?*' (match $DIR/ plus 1 or more characters). Note that the '?' character should not be added when trying to complete a filename (e.g. 'git ls-files -o --directory "merge.c?*"' would not correctly return "merge.c" when such a file exists), so we have to make sure to add the '?' character only in cases where the path specified so far is a directory. Signed-off-by: Elijah Newren <newren@gmail.com> --- contrib/completion/git-completion.bash | 2 +- t/t9902-completion.sh | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index e4d9ff4a95c..1032b642297 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -504,7 +504,7 @@ __git_index_files () { local root="$2" match="$3" - __git_ls_files_helper "$root" "$1" "$match" | + __git_ls_files_helper "$root" "$1" "${match:-?}" | awk -F / -v pfx="${2//\\/\\\\}" '{ paths[$1] = 1 } diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 93877ba9cd6..d9a6425671f 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1581,6 +1581,11 @@ test_expect_success 'complete files' ' echo modify > modified && test_completion "git add " "modified" && + mkdir -p some/deep && + touch some/deep/path && + test_completion "git add some/" "some/deep" && + git clean -f some && + touch untracked && : TODO .gitignore should not be here && -- gitgitgadget ^ permalink raw reply related [flat|nested] 76+ messages in thread
end of thread, other threads:[~2020-07-20 20:25 UTC | newest] Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget 2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget 2020-01-30 15:20 ` Derrick Stolee 2020-01-31 18:04 ` SZEDER Gábor 2020-01-31 18:17 ` Elijah Newren 2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget 2020-01-30 15:33 ` Derrick Stolee 2020-01-30 15:45 ` Elijah Newren 2020-01-30 16:00 ` Derrick Stolee 2020-01-30 16:10 ` Derrick Stolee 2020-01-30 16:20 ` Elijah Newren 2020-01-30 18:17 ` Derrick Stolee 2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget 2020-01-30 15:55 ` Derrick Stolee 2020-01-30 17:13 ` Elijah Newren 2020-01-30 17:45 ` Elijah Newren 2020-01-31 17:13 ` SZEDER Gábor 2020-01-31 17:47 ` Elijah Newren 2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget 2020-01-31 18:31 ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget 2020-03-26 13:02 ` Derrick Stolee 2020-03-26 21:18 ` Elijah Newren 2020-03-25 19:31 ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget 2020-03-25 19:31 ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget 2020-03-26 13:13 ` Derrick Stolee 2020-03-26 21:27 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget 2020-03-27 13:09 ` Derrick Stolee 2020-03-29 18:18 ` Junio C Hamano 2020-03-31 20:15 ` Elijah Newren 2020-03-26 21:27 ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget 2020-03-26 21:27 ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget 2020-03-27 13:13 ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee 2020-03-28 17:33 ` Elijah Newren 2020-03-29 18:20 ` Junio C Hamano 2020-04-01 4:17 ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget 2020-04-01 13:57 ` Derrick Stolee 2020-04-01 15:59 ` Elijah Newren 2020-04-01 4:17 ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget 2020-04-01 4:17 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget 2020-07-19 6:33 ` Andreas Schwab 2020-07-19 12:39 ` Martin Ågren 2020-07-20 15:25 ` Elijah Newren 2020-07-20 18:45 ` [PATCH] dir: check pathspecs before returning `path_excluded` Martin Ågren 2020-07-20 18:49 ` Elijah Newren 2020-07-20 18:51 ` Martin Ågren 2020-07-20 20:25 ` Junio C Hamano 2020-07-20 18:58 ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Junio C Hamano 2020-04-01 4:17 ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.