All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Empty untracked cache performance issue
@ 2021-06-24 18:29 Tao Klerks via GitGitGadget
  2021-06-24 18:29 ` [PATCH 1/3] Add a second's delay to t7519 for untracked cache Tao Klerks via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2021-06-24 18:29 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks

This patchset addresses a performance issue with untracked cache. When a new
untracked cache structure is added to the index but remains empty,
subsequent "git status" calls populate it but do not write the index - so
they perform as though the index were disabled.

This situation can be caused in several different ways:

 * Running "git update-index --untracked-cache" on a repo that did not have
   the untracked cache
 * Modifying the git configuration to enable untracked cache, but then
   immediately running "git status -uall", causing the untracked cache to be
   created, but not used/populated (and the index written).
 * (likely others)

The patchset includes fixes to t7519, which otherwise starts failing because
it wasn't testing what it intended to.

Tao Klerks (3):
  Add a second's delay to t7519 for untracked cache
  In t7519, populate untracked cache before test
  Write index when populating empty untracked cache

 dir.c                       | 14 +++++++++++---
 t/t7519-status-fsmonitor.sh |  6 ++++++
 2 files changed, 17 insertions(+), 3 deletions(-)


base-commit: ebf3c04b262aa27fbb97f8a0156c2347fecafafb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-986%2FTaoK%2Ftaok-empty-untracked-cache-bug-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-986/TaoK/taok-empty-untracked-cache-bug-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/986
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] Add a second's delay to t7519 for untracked cache
  2021-06-24 18:29 [PATCH 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
@ 2021-06-24 18:29 ` Tao Klerks via GitGitGadget
  2021-06-29  4:22   ` Junio C Hamano
  2021-06-24 18:30 ` [PATCH 2/3] In t7519, populate untracked cache before test Tao Klerks via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2021-06-24 18:29 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

In t7519 there is a test that writes files to disk, and immediately
writes the untracked index. Because of mtime-comparison logic
that uses a 1-second resolution, this means the cached entries
are not trusted/used under some circumstances (see
read-cache.c#is_racy_stat()).

Untracked cache tests in t7063 use a 1-second delay to avoid
this issue. We should do the same here.

This change doesn't actually affect the outcome of the test,
but does enhance its validity, and becomes relevant after
later changes

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7519-status-fsmonitor.sh | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 637391c6ce4..1209fa93499 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -13,6 +13,10 @@ clean_repo () {
 	git clean -fd
 }
 
+avoid_racy() {
+	sleep 1
+}
+
 dirty_repo () {
 	: >untracked &&
 	: >dir1/untracked &&
@@ -332,6 +336,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		: >dir2/modified &&
 		write_integration_script &&
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
+		avoid_racy &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
 		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] In t7519, populate untracked cache before test
  2021-06-24 18:29 [PATCH 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  2021-06-24 18:29 ` [PATCH 1/3] Add a second's delay to t7519 for untracked cache Tao Klerks via GitGitGadget
@ 2021-06-24 18:30 ` Tao Klerks via GitGitGadget
  2021-06-24 18:30 ` [PATCH 3/3] Write index when populating empty untracked cache Tao Klerks via GitGitGadget
  2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  3 siblings, 0 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2021-06-24 18:30 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

In its current state, the t7519 test dealing with untracked
cache assumes that
"git update-index --untracked-cache" will *populate*
the untracked cache. This is not correct - it will only add
an empty untracked cache structure to the index.

If we're going to compare two git status runs with
something interesting happening in-between, we
need to ensure that the index is in a stable/steady
state *before* that first run.

We achieve this by adding another prior "git status"
run.

At this stage this change does nothing, because there
is a bug, addressed in the next patch. whereby once
the empty untracked cache structure is added by the
update-index invocation, the untracked cache gets
updated in every subsequent "git status" call, but the
index with these updates does not get written down.

That bug actually invalidates this entire test case -
but we're fixing that next.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7519-status-fsmonitor.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 1209fa93499..ef75c548d90 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -339,6 +339,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		avoid_racy &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
+		git status &&
 		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] Write index when populating empty untracked cache
  2021-06-24 18:29 [PATCH 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  2021-06-24 18:29 ` [PATCH 1/3] Add a second's delay to t7519 for untracked cache Tao Klerks via GitGitGadget
  2021-06-24 18:30 ` [PATCH 2/3] In t7519, populate untracked cache before test Tao Klerks via GitGitGadget
@ 2021-06-24 18:30 ` Tao Klerks via GitGitGadget
  2021-06-29  4:42   ` Junio C Hamano
  2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  3 siblings, 1 reply; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2021-06-24 18:30 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

It is expected that an empty/unpopulated untracked
cache structure can be written to the index - by update-
index, or by a "git status" call that sees the untracked cache
should be enabled, but is running with options that make
it non-applicable in that run.

Currently, if that happens, then subsequent "git status"
calls end up populating the untracked cache, but not
writing the index (not saving their work) - so the
performance outcome is almost identical to the cache
being altogether disabled.

This continues until the index gets written with the cache
populated, for some *other* reason.

In this change, we detect the "existing cache is empty
and it looks like we are using it" condition, and queues
an index write when this happens.

This change depends on previous fixes to t7519 for the
"ignore .git changes when invalidating UNTR" test case to
pass - before this fix, the test never actually did anything
as it was not set up correctly.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 dir.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index ebe5ec046e0..a326e40e1c1 100644
--- a/dir.c
+++ b/dir.c
@@ -2703,7 +2703,8 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+						      struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
@@ -2767,8 +2768,15 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
-	if (!dir->untracked->root)
+	if (!dir->untracked->root) {
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
+		/*
+		 * If we've had to initialize the root, then what we had was an
+		 * empty uninitialized untracked cache structure. We will be
+		 * populating it now, so we should trigger an index write.
+		 */
+		istate->cache_changed |= UNTRACKED_CHANGED;
+	}
 
 	/* Validate $GIT_DIR/info/exclude and core.excludesfile */
 	root = dir->untracked->root;
@@ -2838,7 +2846,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] Add a second's delay to t7519 for untracked cache
  2021-06-24 18:29 ` [PATCH 1/3] Add a second's delay to t7519 for untracked cache Tao Klerks via GitGitGadget
@ 2021-06-29  4:22   ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2021-06-29  4:22 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Tao Klerks

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Tao Klerks <tao@klerks.biz>
>
> In t7519 there is a test that writes files to disk, and immediately
> writes the untracked index. Because of mtime-comparison logic
> that uses a 1-second resolution, this means the cached entries
> are not trusted/used under some circumstances (see
> read-cache.c#is_racy_stat()).
>
> Untracked cache tests in t7063 use a 1-second delay to avoid
> this issue. We should do the same here.

We shouldn't waste wallclock time by slowing down tests, which
already take way too much time.  If you can use "test-tool chmtime"
to pretend as if the index file was written earlier than it actually
is, that would be a better solution.

> +avoid_racy() {

Style (see below).

> +	sleep 1
> +}
> +
>  dirty_repo () {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] Write index when populating empty untracked cache
  2021-06-24 18:30 ` [PATCH 3/3] Write index when populating empty untracked cache Tao Klerks via GitGitGadget
@ 2021-06-29  4:42   ` Junio C Hamano
  2022-02-24 17:52     ` Tao Klerks
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2021-06-29  4:42 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Tao Klerks

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH 3/3] Write index when populating empty untracked cache

Common to all three patches.  

Look at "git shortlog --no-merges -200 master" and observe the
pattern, i.e. our commit title are <area> prefix, a colon and then a
summary.  Try to make your commits blend in.

> From: Tao Klerks <tao@klerks.biz>
>
> It is expected that an empty/unpopulated untracked
> cache structure can be written to the index - by update-
> index, or by a "git status" call that sees the untracked cache
> should be enabled, but is running with options that make
> it non-applicable in that run.

Would an example be helpful?  What requests the untracked cache to
be created, and what options (or perhaps the lack of what options)
prevent the untracked cache to be "non-applicable"?

> Currently, if that happens, then subsequent "git status"
> calls end up populating the untracked cache, but not
> writing the index (not saving their work) - so the
> performance outcome is almost identical to the cache
> being altogether disabled.
>
> This continues until the index gets written with the cache
> populated, for some *other* reason.

Here (and only here), the word "cache" is used instead of "untracked
cache"---but I suspect that "the cache" here is the same "untracked
cache".  If that is the case, being consistent and spelling it out
would reduce the risk of confusing readers.

> In this change, we detect the "existing cache is empty
> and it looks like we are using it" condition, and queues
> an index write when this happens.

"we detect ... and queues" sounds like a grammo.

In this project, we describe the change being proposed as if we are
giving an order to the codebase to "become like so".  So, perhaps

    Detect the condition where an empty untracked cache exists in
    the index and we collect the list of untracked paths, and queue
    an index write under that condition, so that the collected
    untracked paths can be written out to the untracked cache
    extension in the index.

or something along the line.

> This change depends on previous fixes to t7519 for the
> "ignore .git changes when invalidating UNTR" test case to
> pass - before this fix, the test never actually did anything
> as it was not set up correctly.
>
> Signed-off-by: Tao Klerks <tao@klerks.biz>
> ---
>  dir.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index ebe5ec046e0..a326e40e1c1 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2703,7 +2703,8 @@ void remove_untracked_cache(struct index_state *istate)
>  
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
>  						      int base_len,
> -						      const struct pathspec *pathspec)
> +						      const struct pathspec *pathspec,
> +						      struct index_state *istate)
>  {
>  	struct untracked_cache_dir *root;
>  	static int untracked_cache_disabled = -1;
> @@ -2767,8 +2768,15 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  		return NULL;
>  	}
>  
> -	if (!dir->untracked->root)
> +	if (!dir->untracked->root) {
>  		FLEX_ALLOC_STR(dir->untracked->root, name, "");
> +		/*
> +		 * If we've had to initialize the root, then what we had was an
> +		 * empty uninitialized untracked cache structure. We will be
> +		 * populating it now, so we should trigger an index write.
> +		 */
> +		istate->cache_changed |= UNTRACKED_CHANGED;
> +	}

The logic sounds fairly straight-forward.  The fact that we came
this far in the helper function means that we know we want to use
(i.e. untracked cache is not disabled, and various other "we
shouldn't be contaminating the cache with the one-shot information"
cases did not return from the helper) untracked cache, and root
being NULL originally means the untracked cache wasn't populated.

>  	/* Validate $GIT_DIR/info/exclude and core.excludesfile */
>  	root = dir->untracked->root;
> @@ -2838,7 +2846,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  		return dir->nr;
>  	}
>  
> -	untracked = validate_untracked_cache(dir, len, pathspec);
> +	untracked = validate_untracked_cache(dir, len, pathspec, istate);
>  	if (!untracked)
>  		/*
>  		 * make sure untracked cache code path is disabled,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] Write index when populating empty untracked cache
  2021-06-29  4:42   ` Junio C Hamano
@ 2022-02-24 17:52     ` Tao Klerks
  2022-02-24 20:35       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Tao Klerks @ 2022-02-24 17:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tao Klerks via GitGitGadget, git

Hi Junio,

Thanks so much for reviewing this, and my apologies for not reacting
earlier, after you took the time.

> our commit title are <area> prefix, a colon and then a
> summary.  Try to make your commits blend in.

Noted, will fix, thx. I believe I did this better in an unrelated
patch series today, but I guess I will see what people say :)

> What requests the untracked cache to
> be created, and what options (or perhaps the lack of what options)
> prevent the untracked cache to be "non-applicable"?

Will try to clarify / be more specific.

> I suspect that "the cache" here is the same "untracked
> cache"

Will fix.

> "we detect ... and queues" sounds like a grammo.

Yep, dreaded perspective shift, will fix.

> perhaps [proposed text] or something along the line.

Will incorporate, thx.

> The logic sounds fairly straight-forward.

I didn't understand here whether you were confirming that the change
seems to make sense (yay!), or commenting that the extra comment block
is redundant, stating something obvious, and should better be removed.
Could you confirm please?

In the meantime, I'll "re-roll" (?) with the proposed changes.

Thanks again,
Tao

On Tue, Jun 29, 2021 at 6:42 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Subject: Re: [PATCH 3/3] Write index when populating empty untracked cache
>
> Common to all three patches.
>
> Look at "git shortlog --no-merges -200 master" and observe the
> pattern, i.e. our commit title are <area> prefix, a colon and then a
> summary.  Try to make your commits blend in.
>
> > From: Tao Klerks <tao@klerks.biz>
> >
> > It is expected that an empty/unpopulated untracked
> > cache structure can be written to the index - by update-
> > index, or by a "git status" call that sees the untracked cache
> > should be enabled, but is running with options that make
> > it non-applicable in that run.
>
> Would an example be helpful?  What requests the untracked cache to
> be created, and what options (or perhaps the lack of what options)
> prevent the untracked cache to be "non-applicable"?
>
> > Currently, if that happens, then subsequent "git status"
> > calls end up populating the untracked cache, but not
> > writing the index (not saving their work) - so the
> > performance outcome is almost identical to the cache
> > being altogether disabled.
> >
> > This continues until the index gets written with the cache
> > populated, for some *other* reason.
>
> Here (and only here), the word "cache" is used instead of "untracked
> cache"---but I suspect that "the cache" here is the same "untracked
> cache".  If that is the case, being consistent and spelling it out
> would reduce the risk of confusing readers.
>
> > In this change, we detect the "existing cache is empty
> > and it looks like we are using it" condition, and queues
> > an index write when this happens.
>
> "we detect ... and queues" sounds like a grammo.
>
> In this project, we describe the change being proposed as if we are
> giving an order to the codebase to "become like so".  So, perhaps
>
>     Detect the condition where an empty untracked cache exists in
>     the index and we collect the list of untracked paths, and queue
>     an index write under that condition, so that the collected
>     untracked paths can be written out to the untracked cache
>     extension in the index.
>
> or something along the line.
>
> > This change depends on previous fixes to t7519 for the
> > "ignore .git changes when invalidating UNTR" test case to
> > pass - before this fix, the test never actually did anything
> > as it was not set up correctly.
> >
> > Signed-off-by: Tao Klerks <tao@klerks.biz>
> > ---
> >  dir.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/dir.c b/dir.c
> > index ebe5ec046e0..a326e40e1c1 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -2703,7 +2703,8 @@ void remove_untracked_cache(struct index_state *istate)
> >
> >  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
> >                                                     int base_len,
> > -                                                   const struct pathspec *pathspec)
> > +                                                   const struct pathspec *pathspec,
> > +                                                   struct index_state *istate)
> >  {
> >       struct untracked_cache_dir *root;
> >       static int untracked_cache_disabled = -1;
> > @@ -2767,8 +2768,15 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
> >               return NULL;
> >       }
> >
> > -     if (!dir->untracked->root)
> > +     if (!dir->untracked->root) {
> >               FLEX_ALLOC_STR(dir->untracked->root, name, "");
> > +             /*
> > +              * If we've had to initialize the root, then what we had was an
> > +              * empty uninitialized untracked cache structure. We will be
> > +              * populating it now, so we should trigger an index write.
> > +              */
> > +             istate->cache_changed |= UNTRACKED_CHANGED;
> > +     }
>
> The logic sounds fairly straight-forward.  The fact that we came
> this far in the helper function means that we know we want to use
> (i.e. untracked cache is not disabled, and various other "we
> shouldn't be contaminating the cache with the one-shot information"
> cases did not return from the helper) untracked cache, and root
> being NULL originally means the untracked cache wasn't populated.
>
> >       /* Validate $GIT_DIR/info/exclude and core.excludesfile */
> >       root = dir->untracked->root;
> > @@ -2838,7 +2846,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
> >               return dir->nr;
> >       }
> >
> > -     untracked = validate_untracked_cache(dir, len, pathspec);
> > +     untracked = validate_untracked_cache(dir, len, pathspec, istate);
> >       if (!untracked)
> >               /*
> >                * make sure untracked cache code path is disabled,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] Write index when populating empty untracked cache
  2022-02-24 17:52     ` Tao Klerks
@ 2022-02-24 20:35       ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2022-02-24 20:35 UTC (permalink / raw)
  To: Tao Klerks; +Cc: Tao Klerks via GitGitGadget, git

Tao Klerks <tao@klerks.biz> writes:

>> The logic sounds fairly straight-forward.
>
> I didn't understand here whether you were confirming that the change
> seems to make sense (yay!), or commenting that the extra comment block
> is redundant, stating something obvious, and should better be removed.
> Could you confirm please?

I meant the former when I wrote it.

But now you made me re-read the patch, I am becoming slightly
sympathetic to the "do we even need to comment?" interpretation, too
;-)

The question is if the comment to these two statements is redundant.

	if (!dir->untracked->root) {
		/*
		 * If we've had to initialize the root, then what we had was an
		 * empty uninitialized untracked cache structure. We will be
		 * populating it now, so we should trigger an index write.
		 */
		FLEX_ALLOC_STR(dir->untracked->root, name, "");
		istate->cache_changed |= UNTRACKED_CHANGED;
	}

I can be pursuaded either way.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 0/3] Empty untracked cache performance issue
  2021-06-24 18:29 [PATCH 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-06-24 18:30 ` [PATCH 3/3] Write index when populating empty untracked cache Tao Klerks via GitGitGadget
@ 2022-02-25 17:10 ` Tao Klerks via GitGitGadget
  2022-02-25 17:10   ` [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
                     ` (3 more replies)
  3 siblings, 4 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-25 17:10 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks

This patchset addresses a performance issue with untracked cache. When a new
untracked cache structure is added to the index but remains empty,
subsequent "git status" calls populate it but do not write the index - so
they perform as though the index were disabled.

This situation can be caused in several different ways:

 * Running "git update-index --untracked-cache" on a repo that did not have
   the untracked cache
 * Modifying the git configuration to enable untracked cache, but then
   immediately running "git status -uall", causing the untracked cache to be
   created, but not used/populated (and the index written).
 * (likely others)

The patchset includes fixes to t7519, which otherwise starts failing because
it wasn't testing what it intended to.

Tao Klerks (3):
  t7519: avoid file to index mtime race for untracked cache
  t7519: populate untracked cache before test
  untracked-cache: write index when populating empty untracked cache

 dir.c                       | 10 +++++++---
 t/t7519-status-fsmonitor.sh |  7 +++++++
 2 files changed, 14 insertions(+), 3 deletions(-)


base-commit: 4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-986%2FTaoK%2Ftaok-empty-untracked-cache-bug-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-986/TaoK/taok-empty-untracked-cache-bug-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/986

Range-diff vs v1:

 1:  c4d2a3cab4b ! 1:  9421b71540d Add a second's delay to t7519 for untracked cache
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    Add a second's delay to t7519 for untracked cache
     +    t7519: avoid file to index mtime race for untracked cache
      
          In t7519 there is a test that writes files to disk, and immediately
     -    writes the untracked index. Because of mtime-comparison logic
     -    that uses a 1-second resolution, this means the cached entries
     -    are not trusted/used under some circumstances (see
     -    read-cache.c#is_racy_stat()).
     +    writes the untracked index. Because of mtime-comparison logic that
     +    uses a 1-second resolution, this means the cached entries are not
     +    trusted/used under some circumstances
     +    (see read-cache.c#is_racy_stat()).
      
     -    Untracked cache tests in t7063 use a 1-second delay to avoid
     -    this issue. We should do the same here.
     +    Untracked cache tests in t7063 use a 1-second delay to avoid this
     +    issue, but we don't want to introduce arbitrary slowdowns, so instead
     +    use test-tool chmtime to backdate the files slightly.
      
     -    This change doesn't actually affect the outcome of the test,
     -    but does enhance its validity, and becomes relevant after
     -    later changes
     +    This change doesn't actually affect the outcome of the test, but does
     +    enhance its validity, and becomes relevant after later changes.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
       ## t/t7519-status-fsmonitor.sh ##
     -@@ t/t7519-status-fsmonitor.sh: clean_repo () {
     - 	git clean -fd
     - }
     - 
     -+avoid_racy() {
     -+	sleep 1
     -+}
     -+
     - dirty_repo () {
     - 	: >untracked &&
     - 	: >dir1/untracked &&
      @@ t/t7519-status-fsmonitor.sh: test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
     + 		cd dot-git &&
     + 		mkdir -p .git/hooks &&
     + 		: >tracked &&
     ++		test-tool chmtime =-60 tracked &&
     + 		: >modified &&
     ++		test-tool chmtime =-60 modified &&
     + 		mkdir dir1 &&
     + 		: >dir1/tracked &&
     ++		test-tool chmtime =-60 dir1/tracked &&
     + 		: >dir1/modified &&
     ++		test-tool chmtime =-60 dir1/modified &&
     + 		mkdir dir2 &&
     + 		: >dir2/tracked &&
     ++		test-tool chmtime =-60 dir2/tracked &&
       		: >dir2/modified &&
     ++		test-tool chmtime =-60 dir2/modified &&
       		write_integration_script &&
       		git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+		avoid_racy &&
       		git update-index --untracked-cache &&
     - 		git update-index --fsmonitor &&
     - 		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 2:  d63faad03a4 ! 2:  d29a68e65a0 In t7519, populate untracked cache before test
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    In t7519, populate untracked cache before test
     +    t7519: populate untracked cache before test
      
     -    In its current state, the t7519 test dealing with untracked
     -    cache assumes that
     -    "git update-index --untracked-cache" will *populate*
     -    the untracked cache. This is not correct - it will only add
     -    an empty untracked cache structure to the index.
     +    In its current state, the t7519 test dealing with untracked cache
     +    assumes that "git update-index --untracked-cache" will *populate* the
     +    untracked cache. This is not correct - it will only add an empty
     +    untracked cache structure to the index.
      
     -    If we're going to compare two git status runs with
     -    something interesting happening in-between, we
     -    need to ensure that the index is in a stable/steady
     -    state *before* that first run.
     +    If we're going to compare two git status runs with something
     +    interesting happening in-between, we need to ensure that the index is
     +    in a stable/steady state *before* that first run.
      
     -    We achieve this by adding another prior "git status"
     -    run.
     +    Achieve this by adding another prior "git status" run.
      
     -    At this stage this change does nothing, because there
     -    is a bug, addressed in the next patch. whereby once
     -    the empty untracked cache structure is added by the
     -    update-index invocation, the untracked cache gets
     -    updated in every subsequent "git status" call, but the
     -    index with these updates does not get written down.
     +    At this stage this change does nothing, because there is a bug,
     +    addressed in the next patch, whereby once the empty untracked cache
     +    structure is added by the update-index invocation, the untracked cache
     +    gets updated in every subsequent "git status" call, but the index with
     +    these updates does not get written down.
      
     -    That bug actually invalidates this entire test case -
     -    but we're fixing that next.
     +    That bug actually invalidates this entire test case - but we're fixing
     +    that next.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
       ## t/t7519-status-fsmonitor.sh ##
      @@ t/t7519-status-fsmonitor.sh: test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
     - 		avoid_racy &&
     + 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
       		git update-index --untracked-cache &&
       		git update-index --fsmonitor &&
      +		git status &&
 3:  627f1952fd8 ! 3:  190b27e518a Write index when populating empty untracked cache
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    Write index when populating empty untracked cache
     +    untracked-cache: write index when populating empty untracked cache
      
     -    It is expected that an empty/unpopulated untracked
     -    cache structure can be written to the index - by update-
     -    index, or by a "git status" call that sees the untracked cache
     -    should be enabled, but is running with options that make
     -    it non-applicable in that run.
     +    It is expected that an empty/unpopulated untracked cache structure can
     +    be written to the index - by update-index, or by a "git status" call
     +    that sees the untracked cache should be enabled and is not, but is
     +    running with options that make the untracked cache non-applicable in
     +    that run (eg a pathspec).
      
     -    Currently, if that happens, then subsequent "git status"
     -    calls end up populating the untracked cache, but not
     -    writing the index (not saving their work) - so the
     -    performance outcome is almost identical to the cache
     -    being altogether disabled.
     +    Currently, if that happens, then subsequent "git status" calls end up
     +    populating the untracked cache, but not writing the index (not saving
     +    their work) - so the performance outcome is almost identical to the
     +    cache being altogether disabled.
      
     -    This continues until the index gets written with the cache
     -    populated, for some *other* reason.
     +    This continues until the index gets written with the untracked cache
     +    populated, for some *other* reason, such as a working tree change.
      
     -    In this change, we detect the "existing cache is empty
     -    and it looks like we are using it" condition, and queues
     -    an index write when this happens.
     +    Detect the condition where an empty untracked cache exists in the
     +    index and we will collect the list of untracked paths, and queue an
     +    index write under that condition, so that the collected untracked
     +    paths can be written out to the untracked cache extension in the
     +    index.
      
     -    This change depends on previous fixes to t7519 for the
     -    "ignore .git changes when invalidating UNTR" test case to
     -    pass - before this fix, the test never actually did anything
     -    as it was not set up correctly.
     +    This change depends on previous fixes to t7519 for the "ignore .git
     +    changes when invalidating UNTR" test case to pass - before this fix,
     +    the test never actually did anything as it was not set up correctly.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       
      -	if (!dir->untracked->root)
      +	if (!dir->untracked->root) {
     ++		/* Untracked cache existed but is not initialized; fix that */
       		FLEX_ALLOC_STR(dir->untracked->root, name, "");
     -+		/*
     -+		 * If we've had to initialize the root, then what we had was an
     -+		 * empty uninitialized untracked cache structure. We will be
     -+		 * populating it now, so we should trigger an index write.
     -+		 */
      +		istate->cache_changed |= UNTRACKED_CHANGED;
      +	}
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache
  2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
@ 2022-02-25 17:10   ` Tao Klerks via GitGitGadget
  2022-02-25 19:07     ` Junio C Hamano
  2022-02-25 17:10   ` [PATCH v2 2/3] t7519: populate untracked cache before test Tao Klerks via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-25 17:10 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

In t7519 there is a test that writes files to disk, and immediately
writes the untracked index. Because of mtime-comparison logic that
uses a 1-second resolution, this means the cached entries are not
trusted/used under some circumstances
(see read-cache.c#is_racy_stat()).

Untracked cache tests in t7063 use a 1-second delay to avoid this
issue, but we don't want to introduce arbitrary slowdowns, so instead
use test-tool chmtime to backdate the files slightly.

This change doesn't actually affect the outcome of the test, but does
enhance its validity, and becomes relevant after later changes.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7519-status-fsmonitor.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index a6308acf006..3f984136ea9 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -324,13 +324,19 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		cd dot-git &&
 		mkdir -p .git/hooks &&
 		: >tracked &&
+		test-tool chmtime =-60 tracked &&
 		: >modified &&
+		test-tool chmtime =-60 modified &&
 		mkdir dir1 &&
 		: >dir1/tracked &&
+		test-tool chmtime =-60 dir1/tracked &&
 		: >dir1/modified &&
+		test-tool chmtime =-60 dir1/modified &&
 		mkdir dir2 &&
 		: >dir2/tracked &&
+		test-tool chmtime =-60 dir2/tracked &&
 		: >dir2/modified &&
+		test-tool chmtime =-60 dir2/modified &&
 		write_integration_script &&
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/3] t7519: populate untracked cache before test
  2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  2022-02-25 17:10   ` [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
@ 2022-02-25 17:10   ` Tao Klerks via GitGitGadget
  2022-02-25 17:10   ` [PATCH v2 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget
  2022-02-27 21:56   ` [PATCH v3 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  3 siblings, 0 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-25 17:10 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

In its current state, the t7519 test dealing with untracked cache
assumes that "git update-index --untracked-cache" will *populate* the
untracked cache. This is not correct - it will only add an empty
untracked cache structure to the index.

If we're going to compare two git status runs with something
interesting happening in-between, we need to ensure that the index is
in a stable/steady state *before* that first run.

Achieve this by adding another prior "git status" run.

At this stage this change does nothing, because there is a bug,
addressed in the next patch, whereby once the empty untracked cache
structure is added by the update-index invocation, the untracked cache
gets updated in every subsequent "git status" call, but the index with
these updates does not get written down.

That bug actually invalidates this entire test case - but we're fixing
that next.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7519-status-fsmonitor.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 3f984136ea9..fffc57120d6 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -341,6 +341,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
+		git status &&
 		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 3/3] untracked-cache: write index when populating empty untracked cache
  2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  2022-02-25 17:10   ` [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
  2022-02-25 17:10   ` [PATCH v2 2/3] t7519: populate untracked cache before test Tao Klerks via GitGitGadget
@ 2022-02-25 17:10   ` Tao Klerks via GitGitGadget
  2022-02-25 19:12     ` Junio C Hamano
  2022-02-27 21:56   ` [PATCH v3 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  3 siblings, 1 reply; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-25 17:10 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

It is expected that an empty/unpopulated untracked cache structure can
be written to the index - by update-index, or by a "git status" call
that sees the untracked cache should be enabled and is not, but is
running with options that make the untracked cache non-applicable in
that run (eg a pathspec).

Currently, if that happens, then subsequent "git status" calls end up
populating the untracked cache, but not writing the index (not saving
their work) - so the performance outcome is almost identical to the
cache being altogether disabled.

This continues until the index gets written with the untracked cache
populated, for some *other* reason, such as a working tree change.

Detect the condition where an empty untracked cache exists in the
index and we will collect the list of untracked paths, and queue an
index write under that condition, so that the collected untracked
paths can be written out to the untracked cache extension in the
index.

This change depends on previous fixes to t7519 for the "ignore .git
changes when invalidating UNTR" test case to pass - before this fix,
the test never actually did anything as it was not set up correctly.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 dir.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..4eee45dec91 100644
--- a/dir.c
+++ b/dir.c
@@ -2781,7 +2781,8 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+						      struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
@@ -2845,8 +2846,11 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
-	if (!dir->untracked->root)
+	if (!dir->untracked->root) {
+		/* Untracked cache existed but is not initialized; fix that */
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
+		istate->cache_changed |= UNTRACKED_CHANGED;
+	}
 
 	/* Validate $GIT_DIR/info/exclude and core.excludesfile */
 	root = dir->untracked->root;
@@ -2916,7 +2920,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache
  2022-02-25 17:10   ` [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
@ 2022-02-25 19:07     ` Junio C Hamano
  2022-02-27 22:12       ` Tao Klerks
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2022-02-25 19:07 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Tao Klerks

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Tao Klerks <tao@klerks.biz>
>
> In t7519 there is a test that writes files to disk, and immediately
> writes the untracked index. Because of mtime-comparison logic that

"untracked cache", I think.

> uses a 1-second resolution, this means the cached entries are not
> trusted/used under some circumstances
> (see read-cache.c#is_racy_stat()).
>
> Untracked cache tests in t7063 use a 1-second delay to avoid this
> issue, but we don't want to introduce arbitrary slowdowns, so instead
> use test-tool chmtime to backdate the files slightly.

Good approach.  Perhaps fixing 7063 can be marked as #leftoverbit?

> This change doesn't actually affect the outcome of the test, but does
> enhance its validity, and becomes relevant after later changes.
>
> Signed-off-by: Tao Klerks <tao@klerks.biz>
> ---
>  t/t7519-status-fsmonitor.sh | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index a6308acf006..3f984136ea9 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -324,13 +324,19 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
>  		cd dot-git &&
>  		mkdir -p .git/hooks &&
>  		: >tracked &&
> +		test-tool chmtime =-60 tracked &&
>  		: >modified &&
> +		test-tool chmtime =-60 modified &&
>  		mkdir dir1 &&
>  		: >dir1/tracked &&
> +		test-tool chmtime =-60 dir1/tracked &&
>  		: >dir1/modified &&
> +		test-tool chmtime =-60 dir1/modified &&
>  		mkdir dir2 &&
>  		: >dir2/tracked &&
> +		test-tool chmtime =-60 dir2/tracked &&
>  		: >dir2/modified &&
> +		test-tool chmtime =-60 dir2/modified &&
>  		write_integration_script &&
>  		git config core.fsmonitor .git/hooks/fsmonitor-test &&
>  		git update-index --untracked-cache &&

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 3/3] untracked-cache: write index when populating empty untracked cache
  2022-02-25 17:10   ` [PATCH v2 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget
@ 2022-02-25 19:12     ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2022-02-25 19:12 UTC (permalink / raw)
  To: Tao Klerks via GitGitGadget; +Cc: git, Tao Klerks

"Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/dir.c b/dir.c
> index d91295f2bcd..4eee45dec91 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2781,7 +2781,8 @@ void remove_untracked_cache(struct index_state *istate)
>  
>  static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
>  						      int base_len,
> -						      const struct pathspec *pathspec)
> +						      const struct pathspec *pathspec,
> +						      struct index_state *istate)
>  {
>  	struct untracked_cache_dir *root;
>  	static int untracked_cache_disabled = -1;
> @@ -2845,8 +2846,11 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
>  		return NULL;
>  	}
>  
> -	if (!dir->untracked->root)
> +	if (!dir->untracked->root) {
> +		/* Untracked cache existed but is not initialized; fix that */
>  		FLEX_ALLOC_STR(dir->untracked->root, name, "");
> +		istate->cache_changed |= UNTRACKED_CHANGED;
> +	}
>  
>  	/* Validate $GIT_DIR/info/exclude and core.excludesfile */
>  	root = dir->untracked->root;
> @@ -2916,7 +2920,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  		return dir->nr;
>  	}
>  
> -	untracked = validate_untracked_cache(dir, len, pathspec);
> +	untracked = validate_untracked_cache(dir, len, pathspec, istate);
>  	if (!untracked)
>  		/*
>  		 * make sure untracked cache code path is disabled,

Makes sense.  Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 0/3] Empty untracked cache performance issue
  2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-02-25 17:10   ` [PATCH v2 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget
@ 2022-02-27 21:56   ` Tao Klerks via GitGitGadget
  2022-02-27 21:56     ` [PATCH v3 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
                       ` (2 more replies)
  3 siblings, 3 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-27 21:56 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks

This patchset addresses a performance issue with untracked cache. When a new
untracked cache structure is added to the index but remains empty,
subsequent "git status" calls populate it but do not write the index - so
they perform as though the index were disabled.

This situation can be caused in several different ways:

 * Running "git update-index --untracked-cache" on a repo that did not have
   the untracked cache
 * Modifying the git configuration to enable untracked cache, but then
   immediately running "git status -uall", causing the untracked cache to be
   created, but not used/populated (and the index written).
 * (likely others)

The patchset includes fixes to t7519, which otherwise starts failing because
it wasn't testing what it intended to.

Tao Klerks (3):
  t7519: avoid file to index mtime race for untracked cache
  t7519: populate untracked cache before test
  untracked-cache: write index when populating empty untracked cache

 dir.c                       | 10 +++++++---
 t/t7519-status-fsmonitor.sh |  7 +++++++
 2 files changed, 14 insertions(+), 3 deletions(-)


base-commit: 4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-986%2FTaoK%2Ftaok-empty-untracked-cache-bug-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-986/TaoK/taok-empty-untracked-cache-bug-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/986

Range-diff vs v2:

 1:  9421b71540d ! 1:  8b1f89c259e t7519: avoid file to index mtime race for untracked cache
     @@ Commit message
          t7519: avoid file to index mtime race for untracked cache
      
          In t7519 there is a test that writes files to disk, and immediately
     -    writes the untracked index. Because of mtime-comparison logic that
     -    uses a 1-second resolution, this means the cached entries are not
     -    trusted/used under some circumstances
     +    writes the index with the untracked cache. Because of
     +    mtime-comparison logic that uses a 1-second resolution, this means
     +    the cached entries are not trusted/used under some circumstances
          (see read-cache.c#is_racy_stat()).
      
          Untracked cache tests in t7063 use a 1-second delay to avoid this
          issue, but we don't want to introduce arbitrary slowdowns, so instead
     -    use test-tool chmtime to backdate the files slightly.
     +    use test-tool chmtime to backdate the files slightly. The t7063
     +    delays are a #leftoverbit, to be worked on in a separate series.
      
          This change doesn't actually affect the outcome of the test, but does
          enhance its validity, and becomes relevant after later changes.
 2:  d29a68e65a0 = 2:  c901f9d96ca t7519: populate untracked cache before test
 3:  190b27e518a = 3:  9795a08414a untracked-cache: write index when populating empty untracked cache

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/3] t7519: avoid file to index mtime race for untracked cache
  2022-02-27 21:56   ` [PATCH v3 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
@ 2022-02-27 21:56     ` Tao Klerks via GitGitGadget
  2022-02-27 21:57     ` [PATCH v3 2/3] t7519: populate untracked cache before test Tao Klerks via GitGitGadget
  2022-02-27 21:57     ` [PATCH v3 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget
  2 siblings, 0 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-27 21:56 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

In t7519 there is a test that writes files to disk, and immediately
writes the index with the untracked cache. Because of
mtime-comparison logic that uses a 1-second resolution, this means
the cached entries are not trusted/used under some circumstances
(see read-cache.c#is_racy_stat()).

Untracked cache tests in t7063 use a 1-second delay to avoid this
issue, but we don't want to introduce arbitrary slowdowns, so instead
use test-tool chmtime to backdate the files slightly. The t7063
delays are a #leftoverbit, to be worked on in a separate series.

This change doesn't actually affect the outcome of the test, but does
enhance its validity, and becomes relevant after later changes.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7519-status-fsmonitor.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index a6308acf006..3f984136ea9 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -324,13 +324,19 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		cd dot-git &&
 		mkdir -p .git/hooks &&
 		: >tracked &&
+		test-tool chmtime =-60 tracked &&
 		: >modified &&
+		test-tool chmtime =-60 modified &&
 		mkdir dir1 &&
 		: >dir1/tracked &&
+		test-tool chmtime =-60 dir1/tracked &&
 		: >dir1/modified &&
+		test-tool chmtime =-60 dir1/modified &&
 		mkdir dir2 &&
 		: >dir2/tracked &&
+		test-tool chmtime =-60 dir2/tracked &&
 		: >dir2/modified &&
+		test-tool chmtime =-60 dir2/modified &&
 		write_integration_script &&
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/3] t7519: populate untracked cache before test
  2022-02-27 21:56   ` [PATCH v3 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  2022-02-27 21:56     ` [PATCH v3 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
@ 2022-02-27 21:57     ` Tao Klerks via GitGitGadget
  2022-02-27 21:57     ` [PATCH v3 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget
  2 siblings, 0 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-27 21:57 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

In its current state, the t7519 test dealing with untracked cache
assumes that "git update-index --untracked-cache" will *populate* the
untracked cache. This is not correct - it will only add an empty
untracked cache structure to the index.

If we're going to compare two git status runs with something
interesting happening in-between, we need to ensure that the index is
in a stable/steady state *before* that first run.

Achieve this by adding another prior "git status" run.

At this stage this change does nothing, because there is a bug,
addressed in the next patch, whereby once the empty untracked cache
structure is added by the update-index invocation, the untracked cache
gets updated in every subsequent "git status" call, but the index with
these updates does not get written down.

That bug actually invalidates this entire test case - but we're fixing
that next.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 t/t7519-status-fsmonitor.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 3f984136ea9..fffc57120d6 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -341,6 +341,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
 		git update-index --untracked-cache &&
 		git update-index --fsmonitor &&
+		git status &&
 		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 		git status &&
 		test-tool dump-untracked-cache >../before
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/3] untracked-cache: write index when populating empty untracked cache
  2022-02-27 21:56   ` [PATCH v3 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
  2022-02-27 21:56     ` [PATCH v3 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
  2022-02-27 21:57     ` [PATCH v3 2/3] t7519: populate untracked cache before test Tao Klerks via GitGitGadget
@ 2022-02-27 21:57     ` Tao Klerks via GitGitGadget
  2 siblings, 0 replies; 19+ messages in thread
From: Tao Klerks via GitGitGadget @ 2022-02-27 21:57 UTC (permalink / raw)
  To: git; +Cc: Tao Klerks, Tao Klerks

From: Tao Klerks <tao@klerks.biz>

It is expected that an empty/unpopulated untracked cache structure can
be written to the index - by update-index, or by a "git status" call
that sees the untracked cache should be enabled and is not, but is
running with options that make the untracked cache non-applicable in
that run (eg a pathspec).

Currently, if that happens, then subsequent "git status" calls end up
populating the untracked cache, but not writing the index (not saving
their work) - so the performance outcome is almost identical to the
cache being altogether disabled.

This continues until the index gets written with the untracked cache
populated, for some *other* reason, such as a working tree change.

Detect the condition where an empty untracked cache exists in the
index and we will collect the list of untracked paths, and queue an
index write under that condition, so that the collected untracked
paths can be written out to the untracked cache extension in the
index.

This change depends on previous fixes to t7519 for the "ignore .git
changes when invalidating UNTR" test case to pass - before this fix,
the test never actually did anything as it was not set up correctly.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
 dir.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..4eee45dec91 100644
--- a/dir.c
+++ b/dir.c
@@ -2781,7 +2781,8 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+						      struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
@@ -2845,8 +2846,11 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
-	if (!dir->untracked->root)
+	if (!dir->untracked->root) {
+		/* Untracked cache existed but is not initialized; fix that */
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
+		istate->cache_changed |= UNTRACKED_CHANGED;
+	}
 
 	/* Validate $GIT_DIR/info/exclude and core.excludesfile */
 	root = dir->untracked->root;
@@ -2916,7 +2920,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache
  2022-02-25 19:07     ` Junio C Hamano
@ 2022-02-27 22:12       ` Tao Klerks
  0 siblings, 0 replies; 19+ messages in thread
From: Tao Klerks @ 2022-02-27 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tao Klerks via GitGitGadget, git

On Fri, Feb 25, 2022 at 8:07 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Tao Klerks via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > In t7519 there is a test that writes files to disk, and immediately
> > writes the untracked index. Because of mtime-comparison logic that
>
> "untracked cache", I think.
>

Yep, fixed, thx.

> > uses a 1-second resolution, this means the cached entries are not
> > trusted/used under some circumstances
> > (see read-cache.c#is_racy_stat()).
> >
> > Untracked cache tests in t7063 use a 1-second delay to avoid this
> > issue, but we don't want to introduce arbitrary slowdowns, so instead
> > use test-tool chmtime to backdate the files slightly.
>
> Good approach.  Perhaps fixing 7063 can be marked as #leftoverbit?
>

Sure. Updated the commit msg.

I also started another thread with what that looks like - it's not very pretty
in the end because of Windows compatibility issues, and still incomplete
because (as far as I can tell) the sleep is used for a couple of different
purposes, and only one of them lends itself to chmtime manipulation.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-02-27 22:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 18:29 [PATCH 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
2021-06-24 18:29 ` [PATCH 1/3] Add a second's delay to t7519 for untracked cache Tao Klerks via GitGitGadget
2021-06-29  4:22   ` Junio C Hamano
2021-06-24 18:30 ` [PATCH 2/3] In t7519, populate untracked cache before test Tao Klerks via GitGitGadget
2021-06-24 18:30 ` [PATCH 3/3] Write index when populating empty untracked cache Tao Klerks via GitGitGadget
2021-06-29  4:42   ` Junio C Hamano
2022-02-24 17:52     ` Tao Klerks
2022-02-24 20:35       ` Junio C Hamano
2022-02-25 17:10 ` [PATCH v2 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
2022-02-25 17:10   ` [PATCH v2 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
2022-02-25 19:07     ` Junio C Hamano
2022-02-27 22:12       ` Tao Klerks
2022-02-25 17:10   ` [PATCH v2 2/3] t7519: populate untracked cache before test Tao Klerks via GitGitGadget
2022-02-25 17:10   ` [PATCH v2 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget
2022-02-25 19:12     ` Junio C Hamano
2022-02-27 21:56   ` [PATCH v3 0/3] Empty untracked cache performance issue Tao Klerks via GitGitGadget
2022-02-27 21:56     ` [PATCH v3 1/3] t7519: avoid file to index mtime race for untracked cache Tao Klerks via GitGitGadget
2022-02-27 21:57     ` [PATCH v3 2/3] t7519: populate untracked cache before test Tao Klerks via GitGitGadget
2022-02-27 21:57     ` [PATCH v3 3/3] untracked-cache: write index when populating empty untracked cache Tao Klerks via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.