* Question about fsmonitor and --untracked-files=all @ 2020-09-22 11:35 Tao Klerks 2020-09-23 10:40 ` Johannes Schindelin 0 siblings, 1 reply; 3+ messages in thread From: Tao Klerks @ 2020-09-22 11:35 UTC (permalink / raw) To: git Hi folks, I've got a couple questions about the "fsmonitor" functionality, untracked files, and multithreading. Background: In a repo with: * A couple hundred thousand tracked files, and a couple hundred thousand .gitignored files, across a few thousand directories * The --untracked-cache setting, tested and working * core.fsmonitor set up with watchman (with the sample integration script from january) * Git version 2.27.0.windows.1 "git status" takes about 2s "git status --untracked-files=all" takes about 20s When I turn off "core.fsmonitor", the numbers change to something like: "git status": 8s "git status --untracked-files=all": 9s Using windows' "procmon" to observe git.exe's behavior from outside, I think I've understood a couple things that surprise me: 1. when you specify "--untracked-files=all", git scans the entire folder tree regardless of the "fsmonitor" hook 2. when you specify the "fsmonitor" hook, git does any filesystem-scanning in a single-threaded fashion (as opposed to multi-threaded without "fsmonitor" / normally) These two things combine so that with "fsmonitor" set, normal command-line git status performance is great, but the performance in tools that eagerly look for untracked files (like "Git Extensions" on windows) actually suffers - it takes twice as long to run the 'git -c diff.ignoreSubModules=none status --porcelain=2 -z --untracked-files=all' command that this UI wants (and blocks on, when you go to a commit dialog). Questions: 1. Is there a reason "--untracked-files=all" causes a full directory tree scan even with the "fsmonitor" hook active, or is this accidental? 2. Assuming that the full directory tree scan is indeed necessary even with "fsmonitor" (when requesting all untracked files), could it be made multithreaded? (my apologies for the simplistic "outside-in" observations; I don't feel qualified to attempt to understand the git source code) Thanks for any help understanding the optimization opportunities here! Tao Klerks ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question about fsmonitor and --untracked-files=all 2020-09-22 11:35 Question about fsmonitor and --untracked-files=all Tao Klerks @ 2020-09-23 10:40 ` Johannes Schindelin 2020-09-24 12:14 ` Tao Klerks 0 siblings, 1 reply; 3+ messages in thread From: Johannes Schindelin @ 2020-09-23 10:40 UTC (permalink / raw) To: Tao Klerks; +Cc: git Hi Tao, On Tue, 22 Sep 2020, Tao Klerks wrote: > I've got a couple questions about the "fsmonitor" functionality, > untracked files, and multithreading. > > Background: > > In a repo with: > * A couple hundred thousand tracked files, and a couple hundred > thousand .gitignored files, across a few thousand directories > * The --untracked-cache setting, tested and working > * core.fsmonitor set up with watchman (with the sample integration > script from january) > * Git version 2.27.0.windows.1 > > "git status" takes about 2s > "git status --untracked-files=all" takes about 20s > > When I turn off "core.fsmonitor", the numbers change to something like: > "git status": 8s > "git status --untracked-files=all": 9s > > Using windows' "procmon" to observe git.exe's behavior from outside, I > think I've understood a couple things that surprise me: > 1. when you specify "--untracked-files=all", git scans the entire > folder tree regardless of the "fsmonitor" hook > 2. when you specify the "fsmonitor" hook, git does any > filesystem-scanning in a single-threaded fashion (as opposed to > multi-threaded without "fsmonitor" / normally) > > These two things combine so that with "fsmonitor" set, normal > command-line git status performance is great, but the performance in > tools that eagerly look for untracked files (like "Git Extensions" on > windows) actually suffers - it takes twice as long to run the 'git -c > diff.ignoreSubModules=none status --porcelain=2 -z > --untracked-files=all' command that this UI wants (and blocks on, when > you go to a commit dialog). > > Questions: > > 1. Is there a reason "--untracked-files=all" causes a full directory > tree scan even with the "fsmonitor" hook active, or is this > accidental? I have a hunch that this might be related to a performance hack we have in Git for Windows: did you enable FSCache perchance? If so, I _suspect_ that turning it off would accelerate `git status --untracked-files=all`. Ciao, Johannes > 2. Assuming that the full directory tree scan is indeed necessary even > with "fsmonitor" (when requesting all untracked files), could it be > made multithreaded? > > (my apologies for the simplistic "outside-in" observations; I don't > feel qualified to attempt to understand the git source code) > > Thanks for any help understanding the optimization opportunities here! > > Tao Klerks > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Question about fsmonitor and --untracked-files=all 2020-09-23 10:40 ` Johannes Schindelin @ 2020-09-24 12:14 ` Tao Klerks 0 siblings, 0 replies; 3+ messages in thread From: Tao Klerks @ 2020-09-24 12:14 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Hi Johannes, Thanks for the tip - unfortunately, that doesn't seem to have worked / had any positive effect. With "git config core.fscache false", everything/anything takes longer except a simple "git status" with the fsmonitor enabled and the untrackedCache enabled (in which case I guess nothing ends up needing the filesystem). This combination (fsmonitor enabled, untrackedCache enabled, and running simple "git status") is the *only* combination that I've found so far that doesn't force a directory scan - and *when* there is a directory scan (because of "--untracked-files=all", or because the fsmonitor is disabled, or because the untrackedCache is disabled), then having fscache disabled makes things significantly worse/slower (20% slower to double the time, depending on the exact combination). I tried to stumble my way around some of the source code, and I suspect I've found at least one explanation: The untracked cache appears to be ignored when "--untracked-files=all" is specified, and this appears to be intentional: * In wt-status.c#wt_status_collect_untracked(), the "dir.flags" are updated to include "DIR_SHOW_OTHER_DIRECTORIES" when the "SHOW_ALL_UNTRACKED_FILES" arg is detected * In later logic nested in dir.c#validate_untracked_cache(), the presence of the "DIR_SHOW_OTHER_DIRECTORIES" flag causes the validation to fail and, up one level in read_directory(), this causes the untracked structure to be discarded The relevant comment in "validate_untracked_cache()" says "See treat_directory(), case index_nonexistent. Without this [DIR_SHOW_OTHER_DIRECTORIES] flag, we may need to also cache .git file content for the resolve_gitlink_ref() call, which we don't.". I can't claim to understand the comment, the relationship to gitlinks, etc :( Does this look like something solvable? It looks like supporting the untrackedCache even with "--untracked-files=all" would make a (potentially) large difference to git status performance in some workflows with fsmonitor enabled. (all that said, I still haven't understood why the presence of the fsmonitor hook makes the difference, in terms of behavior, between *multi-threaded* directory tree scanning for all directory contents (without the fsmonitor), and *single-threaded* directory scanning for untracked files specifically (with the fsmonitor)) Thanks for looking, any further thoughts will of course be most appreciated! Tao Klerks On Wed, Sep 23, 2020 at 4:42 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Tao, > > On Tue, 22 Sep 2020, Tao Klerks wrote: > > > I've got a couple questions about the "fsmonitor" functionality, > > untracked files, and multithreading. > > > > Background: > > > > In a repo with: > > * A couple hundred thousand tracked files, and a couple hundred > > thousand .gitignored files, across a few thousand directories > > * The --untracked-cache setting, tested and working > > * core.fsmonitor set up with watchman (with the sample integration > > script from january) > > * Git version 2.27.0.windows.1 > > > > "git status" takes about 2s > > "git status --untracked-files=all" takes about 20s > > > > When I turn off "core.fsmonitor", the numbers change to something like: > > "git status": 8s > > "git status --untracked-files=all": 9s > > > > Using windows' "procmon" to observe git.exe's behavior from outside, I > > think I've understood a couple things that surprise me: > > 1. when you specify "--untracked-files=all", git scans the entire > > folder tree regardless of the "fsmonitor" hook > > 2. when you specify the "fsmonitor" hook, git does any > > filesystem-scanning in a single-threaded fashion (as opposed to > > multi-threaded without "fsmonitor" / normally) > > > > These two things combine so that with "fsmonitor" set, normal > > command-line git status performance is great, but the performance in > > tools that eagerly look for untracked files (like "Git Extensions" on > > windows) actually suffers - it takes twice as long to run the 'git -c > > diff.ignoreSubModules=none status --porcelain=2 -z > > --untracked-files=all' command that this UI wants (and blocks on, when > > you go to a commit dialog). > > > > Questions: > > > > 1. Is there a reason "--untracked-files=all" causes a full directory > > tree scan even with the "fsmonitor" hook active, or is this > > accidental? > > I have a hunch that this might be related to a performance hack we have in > Git for Windows: did you enable FSCache perchance? > > If so, I _suspect_ that turning it off would accelerate `git status > --untracked-files=all`. > > Ciao, > Johannes > > > 2. Assuming that the full directory tree scan is indeed necessary even > > with "fsmonitor" (when requesting all untracked files), could it be > > made multithreaded? > > > > (my apologies for the simplistic "outside-in" observations; I don't > > feel qualified to attempt to understand the git source code) > > > > Thanks for any help understanding the optimization opportunities here! > > > > Tao Klerks > > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-09-24 12:15 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-22 11:35 Question about fsmonitor and --untracked-files=all Tao Klerks 2020-09-23 10:40 ` Johannes Schindelin 2020-09-24 12:14 ` Tao Klerks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).