From: Ben Peart <peartben@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Christian Couder <christian.couder@gmail.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
Ben Peart <benpeart@microsoft.com>,
Alex Vandiver <alexmv@dropbox.com>,
git@vger.kernel.org
Subject: Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
Date: Wed, 7 Feb 2018 11:59:58 -0500 [thread overview]
Message-ID: <c755e388-89a5-fc0f-f872-16fd5d5686b0@gmail.com> (raw)
In-Reply-To: <20180207092141.4312-2-pclouds@gmail.com>
On 2/7/2018 4:21 AM, Nguyễn Thái Ngọc Duy wrote:
> read_directory() code ignores all paths named ".git" even if it's not
> a valid git repository. See treat_path() for details. Since ".git" is
> basically invisible to read_directory(), when we are asked to
> invalidate a path that contains ".git", we can safely ignore it
> because the slow path would not consider it anyway.
>
> This helps when fsmonitor is used and we have a real ".git" repo at
> worktree top. Occasionally .git/index will be updated and if the
> fsmonitor hook does not filter it, untracked cache is asked to
> invalidate the path ".git/index".
>
> Without this patch, we invalidate the root directory unncessarily,
> which:
>
> - makes read_directory() fall back to slow path for root directory
> (slower)
>
> - makes the index dirty (because UNTR extension is updated). Depending
> on the index size, writing it down could also be slow.
>
Thank you again, this patch makes much more sense to me.
> A note about the new "safe_path" knob. Since this new check could be
> relatively expensive, avoid it when we know it's not needed. If the
> path comes from the index, it can't contain ".git". If it does
> contain, we may be screwed up at many more levels, not just this one.
>
I do have a simplifying suggestion to make. I noticed that other uses
of verify_path() check when the potentially erroneous path is passed in
and then all the underlying code can assume it is valid. I think that
makes sense here as well and it makes for a smaller patch.
> diff --git a/fsmonitor.h b/fsmonitor.h
> index cd3cc0ccf2..65f3743636 100644
> --- a/fsmonitor.h
> +++ b/fsmonitor.h
> @@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cac
> {
> if (core_fsmonitor) {
> ce->ce_flags &= ~CE_FSMONITOR_VALID;
> - untracked_cache_invalidate_path(istate, ce->name);
> + untracked_cache_invalidate_path(istate, ce->name, 1);
This test isn't needed because we're pulling the name right out of the
cache entry so it doesn't need to be verified.
Here is a modified version of your patch for consideration:
================
read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.
This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".
Without this patch, we invalidate the root directory unnecessarily,
which:
- makes read_directory() fall back to slow path for root directory
(slower)
- makes the index dirty (because UNTR extension is updated). Depending
on the index size, writing it down could also be slow.
Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/218a577618
Checkout: git fetch https://github.com/benpeart/git verify_path-v1
&& git checkout 218a577618
dir.c | 2 +-
fsmonitor.c | 6 +++++-
t/t7519-status-fsmonitor.sh | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/dir.c b/dir.c
index 7c4b45e30e..d431da46f5 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct
dir_struct *dir,
if (!de)
return treat_path_fast(dir, untracked, cdir, istate, path,
baselen, pathspec);
- if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+ if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
return path_none;
strbuf_setlen(path, baselen);
strbuf_addstr(path, de->d_name);
diff --git a/fsmonitor.c b/fsmonitor.c
index 0af7c4edba..019576f306 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t
last_update, struct strbuf *que
static void fsmonitor_refresh_callback(struct index_state *istate,
const char *name)
{
- int pos = index_name_pos(istate, name, strlen(name));
+ int pos;
+ if (!verify_path(name))
+ return;
+
+ pos = index_name_pos(istate, name, strlen(name));
if (pos >= 0) {
struct cache_entry *ce = istate->cache[pos];
ce->ce_flags &= ~CE_FSMONITOR_VALID;
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbcf..756beb0d8e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in
the same state' '
test_cmp expect actual
'
+test_expect_success UNTRACKED_CACHE 'ignore .git changes when
invalidating UNTR' '
+ test_create_repo dot-git &&
+ (
+ cd dot-git &&
+ mkdir -p .git/hooks &&
+ : >tracked &&
+ : >modified &&
+ mkdir dir1 &&
+ : >dir1/tracked &&
+ : >dir1/modified &&
+ mkdir dir2 &&
+ : >dir2/tracked &&
+ : >dir2/modified &&
+ write_integration_script &&
+ git config core.fsmonitor .git/hooks/fsmonitor-test &&
+ git update-index --untracked-cache &&
+ git update-index --fsmonitor &&
+ GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+ git status &&
+ test-dump-untracked-cache >../before
+ ) &&
+ cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
+ printf ".git\0"
+ printf ".git/index\0"
+ printf "dir1/.git\0"
+ printf "dir1/.git/index\0"
+ EOF
+ (
+ cd dot-git &&
+ GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+ git status &&
+ test-dump-untracked-cache >../after
+ ) &&
+ grep "directory invalidation" trace-before >>before &&
+ grep "directory invalidation" trace-after >>after &&
+ # UNTR extension unchanged, dir invalidation count unchanged
+ test_cmp before after
+'
+
test_done
base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
--
2.15.0.windows.1
next prev parent reply other threads:[~2018-02-07 17:00 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-27 0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 1:36 ` Duy Nguyen
2018-01-27 1:39 ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
2018-01-27 11:58 ` Thomas Gummerer
2018-01-27 12:27 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-01-27 11:43 ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 12:39 ` Duy Nguyen
2018-01-27 13:09 ` Duy Nguyen
2018-01-27 19:01 ` Ævar Arnfjörð Bjarmason
2018-01-30 22:41 ` Ben Peart
2018-01-29 9:40 ` Duy Nguyen
2018-01-29 23:16 ` Ben Peart
2018-02-01 10:40 ` Duy Nguyen
2018-01-28 20:44 ` Johannes Schindelin
2018-01-28 22:28 ` Ævar Arnfjörð Bjarmason
2018-01-30 1:21 ` Ben Peart
2018-01-31 10:15 ` Duy Nguyen
2018-02-04 9:38 ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
2018-02-05 17:44 ` Ben Peart
2018-02-06 12:02 ` Duy Nguyen
2018-02-07 9:21 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-02-07 9:21 ` Nguyễn Thái Ngọc Duy
2018-02-07 16:59 ` Ben Peart [this message]
2018-02-13 10:00 ` Duy Nguyen
2018-02-13 17:57 ` Junio C Hamano
2018-02-14 1:24 ` Duy Nguyen
2018-02-14 8:00 ` Junio C Hamano
2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
2018-01-30 23:16 ` Ævar Arnfjörð Bjarmason
2018-01-31 16:12 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c755e388-89a5-fc0f-f872-16fd5d5686b0@gmail.com \
--to=peartben@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=alexmv@dropbox.com \
--cc=avarab@gmail.com \
--cc=benpeart@microsoft.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).