Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: "Nipunn Koorapati via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>, Utsav Shah <utsav@dropbox.com>,
	Nipunn Koorapati <nipunn1313@gmail.com>,
	Nipunn Koorapati <nipunn@dropbox.com>,
	Taylor Blau <me@ttaylorr.com>,
	Nipunn Koorapati <nipunn1313@gmail.com>
Subject: [PATCH v2 0/4] use fsmonitor data in git diff eliminating O(num_files) calls to lstat
Date: Mon, 19 Oct 2020 21:35:11 +0000
Message-ID: <pull.756.v2.git.1603143316.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.756.git.1602968677.gitgitgadget@gmail.com>

Credit to alexmv who made this commit back in Dec, 2017 when he was at dbx.
I've rebased it and am submitting it now.

With fsmonitor enabled, git diff currently lstats every file in the repo
This makes use of the fsmonitor extension to skip lstat() calls on files
that fsmonitor judged as unmodified.

I was able to do some testing with/without this change in a large in-house
repo (~ 400k files).

-----------------------------------------
(1) With fsmonitor enabled - on master of git (2.29.0)
-----------------------------------------
../git/bin-wrappers/git checkout HEAD~200
strace -c ../git/bin-wrappers/git diff

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.64    4.358994          10    446257         3 lstat
  0.12    0.005353           7       764       360 open

(A subsequent call)
strace -c ../git/bin-wrappers/git diff

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.84    4.380955          10    444904         3 lstat
  0.06    0.002564         135        19           munmap
...

-----------------------------------------
(2) With fsmonitor enabled - with my patch
-----------------------------------------
../git/bin-wrappers/git checkout HEAD~200
strace -c ../git/bin-wrappers/git diff

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.72    0.003090         163        19           munmap
 19.63    0.001196         598         2           futex
...
  0.00    0.000000           0         4         3 lstat


-----------------------------------------
(3) With fsmonitor disabled entirely
-----------------------------------------

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.52    0.277085       92362         3           futex
  0.27    0.000752           4       191        63 open
...
  0.14    0.000397           3       158         3 lstat

I was able to encode this into a perf test in one of the commits.

Changes since Patch Series V1

 * Add git diff -- <pathspec> to perf tests
 * improve readability of bitwise ops

Alex Vandiver (1):
  fsmonitor: use fsmonitor data in `git diff`

Nipunn Koorapati (3):
  t/perf/README: elaborate on output format
  t/perf/p7519-fsmonitor.sh: warm cache on first git status
  t/perf: add fsmonitor perf test for git diff

 diff-lib.c                | 15 ++++++--
 t/perf/README             |  2 ++
 t/perf/p7519-fsmonitor.sh | 74 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 88 insertions(+), 3 deletions(-)


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-756%2Fnipunn1313%2Fdiff_fsmon-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-756/nipunn1313/diff_fsmon-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/756

Range-diff vs v1:

 1:  13fd992a37 ! 1:  cba03dd40b fsmonitor: use fsmonitor data in `git diff`
     @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option)
      -		/* If CE_VALID is set, don't look at workdir for file removal */
      -		if (ce->ce_flags & CE_VALID) {
      +		/*
     -+		 * If CE_VALID is set, the user has promised us that the workdir
     -+		 * hasn't changed compared to index, so don't stat workdir
     -+		 * for file removal
     -+		 *  eg - via git udpate-index --assume-unchanged
     -+		 *  eg - via core.ignorestat=true
     -+		 *
     -+		 * When using FSMONITOR:
     -+		 * If CE_FSMONITOR_VALID is set, then we know the metadata on disk
     -+		 * has not changed since the last refresh, and we can skip the
     -+		 * file-removal checks without doing the stat in check_removed.
     ++		 * When CE_VALID is set (via "update-index --assume-unchanged"
     ++		 * or via adding paths while core.ignorestat is set to true),
     ++		 * the user has promised that the working tree file for that
     ++		 * path will not be modified.  When CE_FSMONITOR_VALID is true,
     ++		 * the fsmonitor knows that the path hasn't been modified since
     ++		 * we refreshed the cached stat information.  In either case,
     ++		 * we do not have to stat to see if the path has been removed
     ++		 * or modified.
      +		 */
     -+		if (ce->ce_flags & CE_VALID || ce->ce_flags & CE_FSMONITOR_VALID) {
     ++		if (ce->ce_flags & (CE_VALID | CE_FSMONITOR_VALID)) {
       			changed = 0;
       			newmode = ce->ce_mode;
       		} else {
 2:  024cd07965 = 2:  1c7876166f t/perf/README: elaborate on output format
 3:  6482e372bc = 3:  401f696c81 t/perf/p7519-fsmonitor.sh: warm cache on first git status
 4:  0613b07676 ! 4:  f572e226bb t/perf: add fsmonitor perf test for git diff
     @@ Commit message
          significantly better with this patch series (80% faster on my
          workload)!
      
     -    On master (2.29)
     +    GIT_PERF_LARGE_REPO=~/src/server ./run v2.29.0-rc1 . -- p7519-fsmonitor.sh
      
     -    Test                                                             this tree
     -    --------------------------------------------------------------------------------
     -    7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         0.39(0.33+0.06)
     -    7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.17(0.13+0.05)
     -    7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   1.34(0.77+0.56)
     -    7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman)           0.82(0.24+0.58)
     -    7519.7: status (fsmonitor=)                                      0.70(0.53+0.90)
     -    7519.8: status -uno (fsmonitor=)                                 0.37(0.32+0.78)
     -    7519.9: status -uall (fsmonitor=)                                1.55(1.01+1.25)
     -    7519.10: diff (fsmonitor=)                                       0.34(0.35+0.72)
     +    Test                                                                     v2.29.0-rc1       this tree
     +    -----------------------------------------------------------------------------------------------------------------
     +    7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)                 1.46(0.82+0.64)   1.47(0.83+0.62) +0.7%
     +    7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)            0.16(0.12+0.04)   0.17(0.12+0.05) +6.3%
     +    7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)           1.36(0.73+0.62)   1.37(0.76+0.60) +0.7%
     +    7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman)                   0.85(0.22+0.63)   0.14(0.10+0.05) -83.5%
     +    7519.6: diff -- 0_files (fsmonitor=.git/hooks/fsmonitor-watchman)        0.12(0.08+0.05)   0.13(0.11+0.02) +8.3%
     +    7519.7: diff -- 10_files (fsmonitor=.git/hooks/fsmonitor-watchman)       0.12(0.08+0.04)   0.13(0.09+0.04) +8.3%
     +    7519.8: diff -- 100_files (fsmonitor=.git/hooks/fsmonitor-watchman)      0.12(0.07+0.05)   0.13(0.07+0.06) +8.3%
     +    7519.9: diff -- 1000_files (fsmonitor=.git/hooks/fsmonitor-watchman)     0.12(0.09+0.04)   0.13(0.08+0.05) +8.3%
     +    7519.10: diff -- 10000_files (fsmonitor=.git/hooks/fsmonitor-watchman)   0.14(0.09+0.05)   0.13(0.10+0.03) -7.1%
     +    7519.12: status (fsmonitor=)                                             1.67(0.93+1.49)   1.67(0.99+1.42) +0.0%
     +    7519.13: status -uno (fsmonitor=)                                        0.37(0.30+0.82)   0.37(0.33+0.79) +0.0%
     +    7519.14: status -uall (fsmonitor=)                                       1.58(0.97+1.35)   1.57(0.86+1.45) -0.6%
     +    7519.15: diff (fsmonitor=)                                               0.34(0.28+0.83)   0.34(0.27+0.83) +0.0%
     +    7519.16: diff -- 0_files (fsmonitor=)                                    0.09(0.06+0.04)   0.09(0.08+0.02) +0.0%
     +    7519.17: diff -- 10_files (fsmonitor=)                                   0.09(0.07+0.03)   0.09(0.06+0.05) +0.0%
     +    7519.18: diff -- 100_files (fsmonitor=)                                  0.09(0.06+0.04)   0.09(0.06+0.04) +0.0%
     +    7519.19: diff -- 1000_files (fsmonitor=)                                 0.09(0.06+0.04)   0.09(0.05+0.05) +0.0%
     +    7519.20: diff -- 10000_files (fsmonitor=)                                0.10(0.08+0.04)   0.10(0.06+0.05) +0.0%
      
     -    With this patch series
     +    I also added a benchmark for a tiny git diff workload w/ a pathspec.
     +    I see an approximately .02 second overhead added w/ and w/o fsmonitor
      
     -    Test                                                             this tree
     -    --------------------------------------------------------------------------------
     -    7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         0.39(0.33+0.07)
     -    7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.17(0.12+0.05)
     -    7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   1.35(0.73+0.61)
     -    7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman)           0.14(0.10+0.05)
     -    7519.7: status (fsmonitor=)                                      0.70(0.56+0.87)
     -    7519.8: status -uno (fsmonitor=)                                 0.37(0.31+0.79)
     -    7519.9: status -uall (fsmonitor=)                                1.54(0.97+1.29)
     -    7519.10: diff (fsmonitor=)                                       0.34(0.28+0.79)
     +    From looking at these results, I suspected that refresh_fsmonitor
     +    is already happening during git diff - independent of this patch
     +    series' optimization. Confirmed that suspicion by breaking on
     +    refresh_fsmonitor.
     +
     +    (gdb) bt  [simplified]
     +    0  refresh_fsmonitor  at fsmonitor.c:176
     +    1  ie_match_stat  at read-cache.c:375
     +    2  match_stat_with_submodule at diff-lib.c:237
     +    4  builtin_diff_files  at builtin/diff.c:260
     +    5  cmd_diff  at builtin/diff.c:541
     +    6  run_builtin  at git.c:450
     +    7  handle_builtin  at git.c:700
     +    8  run_argv  at git.c:767
     +    9  cmd_main  at git.c:898
     +    10 main  at common-main.c:52
      
          Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
      
       ## t/perf/p7519-fsmonitor.sh ##
     +@@ t/perf/p7519-fsmonitor.sh: test_expect_success "setup for fsmonitor" '
     + 
     + 	git config core.fsmonitor "$INTEGRATION_SCRIPT" &&
     + 	git update-index --fsmonitor &&
     ++	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
     ++	for i in `seq 1 10`; do touch 10_files/$i; done &&
     ++	for i in `seq 1 100`; do touch 100_files/$i; done &&
     ++	for i in `seq 1 1000`; do touch 1000_files/$i; done &&
     ++	for i in `seq 1 10000`; do touch 10000_files/$i; done &&
     ++	git add 1_file 10_files 100_files 1000_files 10000_files &&
     ++	git commit -m "Add files" &&
     + 	git status  # Warm caches
     + '
     + 
      @@ t/perf/p7519-fsmonitor.sh: test_perf "status -uall (fsmonitor=$INTEGRATION_SCRIPT)" '
       	git status -uall
       '
     @@ t/perf/p7519-fsmonitor.sh: test_perf "status -uall (fsmonitor=$INTEGRATION_SCRIP
      +test_perf "diff (fsmonitor=$INTEGRATION_SCRIPT)" '
      +	git diff
      +'
     ++
     ++if test -n "$GIT_PERF_7519_DROP_CACHE"; then
     ++	test-tool drop-caches
     ++fi
     ++
     ++test_perf "diff -- 0_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 1_file
     ++'
     ++
     ++test_perf "diff -- 10_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 10_files
     ++'
     ++
     ++test_perf "diff -- 100_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 100_files
     ++'
     ++
     ++test_perf "diff -- 1000_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 1000_files
     ++'
     ++
     ++test_perf "diff -- 10000_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 10000_files
     ++'
      +
       test_expect_success "setup without fsmonitor" '
       	unset INTEGRATION_SCRIPT &&
     @@ t/perf/p7519-fsmonitor.sh: test_perf "status -uall (fsmonitor=$INTEGRATION_SCRIP
      +test_perf "diff (fsmonitor=$INTEGRATION_SCRIPT)" '
      +	git diff
      +'
     ++
     ++if test -n "$GIT_PERF_7519_DROP_CACHE"; then
     ++	test-tool drop-caches
     ++fi
     ++
     ++test_perf "diff -- 0_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 1_file
     ++'
     ++
     ++test_perf "diff -- 10_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 10_files
     ++'
     ++
     ++test_perf "diff -- 100_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 100_files
     ++'
     ++
     ++test_perf "diff -- 1000_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 1000_files
     ++'
     ++
     ++test_perf "diff -- 10000_files (fsmonitor=$INTEGRATION_SCRIPT)" '
     ++	git diff -- 10000_files
     ++'
      +
       if test_have_prereq WATCHMAN
       then

-- 
gitgitgadget

  parent reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-17 21:04 [PATCH " Nipunn Koorapati via GitGitGadget
2020-10-17 21:04 ` [PATCH 1/4] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-17 22:25   ` Junio C Hamano
2020-10-18  0:54     ` Nipunn Koorapati
2020-10-18  4:17       ` Taylor Blau
2020-10-18  5:02         ` Junio C Hamano
2020-10-18 23:43           ` Taylor Blau
2020-10-19 17:23             ` Junio C Hamano
2020-10-19 17:37               ` Taylor Blau
2020-10-19 18:07                 ` Nipunn Koorapati
2020-10-17 21:04 ` [PATCH 2/4] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-17 21:04 ` [PATCH 3/4] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-18  4:22   ` Taylor Blau
2020-10-17 21:04 ` [PATCH 4/4] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-17 22:28   ` Junio C Hamano
2020-10-19 21:35 ` Nipunn Koorapati via GitGitGadget [this message]
2020-10-19 21:35   ` [PATCH v2 1/4] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 2/4] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 3/4] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-19 21:35   ` [PATCH v2 4/4] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-19 21:43     ` Taylor Blau
2020-10-19 21:54     ` Taylor Blau
2020-10-19 22:00       ` Nipunn Koorapati
2020-10-19 22:02         ` Taylor Blau
2020-10-19 22:25       ` Nipunn Koorapati
2020-10-19 22:47   ` [PATCH v3 0/7] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 1/7] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 2/7] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 3/7] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 4/7] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-19 22:47     ` [PATCH v3 5/7] perf lint: check test-lint-shell-syntax in perf tests Nipunn Koorapati via GitGitGadget
2020-10-20  2:38       ` Taylor Blau
2020-10-20  3:10         ` Junio C Hamano
2020-10-20  3:15           ` Taylor Blau
2020-10-20 10:16             ` Nipunn Koorapati
2020-10-20 10:09         ` Nipunn Koorapati
2020-10-19 22:47     ` [PATCH v3 6/7] p7519-fsmonitor: refactor to avoid code duplication Nipunn Koorapati via GitGitGadget
2020-10-20  2:43       ` Taylor Blau
2020-10-19 22:47     ` [PATCH v3 7/7] p7519-fsmonitor: add a git add benchmark Nipunn Koorapati via GitGitGadget
2020-10-19 23:02       ` Nipunn Koorapati
2020-10-20  2:40       ` Taylor Blau
2020-10-20 13:40     ` [PATCH v4 0/7] use fsmonitor data in git diff eliminating O(num_files) calls to lstat Nipunn Koorapati via GitGitGadget
2020-10-20 13:40       ` [PATCH v4 1/7] fsmonitor: use fsmonitor data in `git diff` Alex Vandiver via GitGitGadget
2020-10-20 13:40       ` [PATCH v4 2/7] t/perf/README: elaborate on output format Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 3/7] t/perf/p7519-fsmonitor.sh: warm cache on first git status Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 4/7] t/perf: add fsmonitor perf test for git diff Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 5/7] perf lint: add make test-lint to perf tests Nipunn Koorapati via GitGitGadget
2020-10-20 22:06         ` Taylor Blau
2020-10-20 22:17           ` Nipunn Koorapati
2020-10-20 22:19             ` Taylor Blau
2020-10-20 13:41       ` [PATCH v4 6/7] p7519-fsmonitor: refactor to avoid code duplication Nipunn Koorapati via GitGitGadget
2020-10-20 13:41       ` [PATCH v4 7/7] p7519-fsmonitor: add a git add benchmark Nipunn Koorapati via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.756.v2.git.1603143316.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=nipunn1313@gmail.com \
    --cc=nipunn@dropbox.com \
    --cc=stolee@gmail.com \
    --cc=utsav@dropbox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git