All of lore.kernel.org
 help / color / mirror / Atom feed
* Too many 'stat' calls by git-status on Windows
@ 2009-07-07  0:05 Dmitry Potapov
  2009-07-08 19:49 ` Ramsay Jones
  2009-07-09  2:04 ` Linus Torvalds
  0 siblings, 2 replies; 39+ messages in thread
From: Dmitry Potapov @ 2009-07-07  0:05 UTC (permalink / raw)
  To: git

I have used the Cygwin version of Git on one Windows computer and
noticed that git-status is sluggish. So, I have run the Process Monitor
to see what is going on.

The below, you can see the result of testing on Windows and Linux on the
same repository using the same version of Git. It is rather easy to
compare if you notice that the following match between syscalls:

Windows         Linux

QueryOpen       lstat or fstat
CreateFile      open
CloseFile       close
QueryDirectory  getdents

I have also tested git-diff to verify that the number of system calls
matches pretty well. (In fact, I got practical identical list for stat
syscalls for files inside of the working directory on Windows and Linux
when ran git-diff.) But something strange is going on with git-status.
The beginning of the log is identical on Windows and Linux, but then I
see more 'stat's in the Windows log that did not happen on Linux.  In
total, I see about 3 times increase of 'stat' calls, with all files
being stat twice and directories (which are numerous) being stat 3 and
more times (some of them as many 39 times...) It seems that every
directory is stat as many times as the number of subdirectories it has
plus 3.

It appears that the second 'stat' for files on Windows caused by lack
of d_type in dirent. When I recompiled the Linux version with
NO_D_TYPE_IN_DIRENT = YesPlease, I got the same result for files.
(Still I am not sure what caused those extra stat calls for
directory, maybe, it is Cygwin specific...)

The question is whether it is possible to avoid this redundant 'stat'
for files on system that do not have d_type in dirent or that would
require too much modification? Is it possible to use the cache where
d_stat is not available provided that the entry is marked as uptodate?


==== Git on Windows (CYGWIN) =====

$ wc -l git-diff.csv  git-status.csv
   5186 git-diff.csv
  21694 git-status.csv

$ csvtool col 5 git-diff.csv | sort | uniq -c | sort -nr | head -10
   4656 QueryOpen
    100 CreateFile
     94 CloseFile
     80 QuerySecurityFile
     61 ReadFile
     30 QueryInformationVolume
     28 QueryAllInformationFile
     26 RegOpenKey
     24 RegCloseKey
     20 QueryStandardInformationFile

$ csvtool col 5 git-status.csv | sort | uniq -c | sort -nr | head -10
  12984 QueryOpen
   3086 CreateFile
   2103 CloseFile
   1984 QueryDirectory
    988 QueryFileInternalInformationFile
    132 QuerySecurityFile
    100 ReadFile
     77 WriteFile
     55 QueryInformationVolume
     53 QueryAllInformationFile

Successful open:
$ csvtool col 5,7,8 git-diff.csv | grep CreateFile,SUCCESS, | wc -l
94
$ csvtool col 5,7,8 git-status.csv | grep CreateFile,SUCCESS, | wc -l
2103

Successful open for directories:
$ csvtool col 5,7,8 git-diff.csv | grep CreateFile,SUCCESS,.*Options:.*Directory | wc -l
37
$ csvtool col 5,7,8 git-status.csv | grep CreateFile,SUCCESS,.*Options:.*Directory | wc -l
1024

Not successful attempts to open
$ csvtool col 5,7,8 git-diff.csv | grep CreateFile | grep -v ,SUCCESS, | wc -l
6
$ csvtool col 5,7,8 git-status.csv | grep CreateFile | grep -v ,SUCCESS, | wc -l
983

Attempts to open .gitignore
$ csvtool col 5,6 git-diff.csv | grep 'CreateFile,.*\\\.gitignore' | wc -l
0
$ csvtool col 5,6 git-status.csv | grep 'CreateFile,.*\\\.gitignore' | wc -l
986

=== GIT on Linux ===

$ wc -l linux-git-*
   4674 linux-git-diff.log
   9807 linux-git-status.log

$ sed -e 's/(.*//' < linux-git-diff.log  | sort | uniq -c | sort -rn | head -10
   4237 lstat
     88 mmap
     56 open
     50 close
     50 access
     48 fstat
     45 mprotect
     43 read
     15 stat
     13 munmap

The number of lstat+fstat is equal 4285 for git-diff

$ sed -e 's/(.*//' < linux-git-status.log  | sort | uniq -c | sort -rn | head -10
   3279 lstat
   2048 open
   1976 getdents
   1062 close
   1058 fstat
     97 mmap
     67 read
     48 access
     45 mprotect
     40 write

The number of lstat+fstat is equal 4337 for git-status.

Successful open:
$ grep -c '^open(.*= [^-]' linux-*
linux-git-diff.log:50
linux-git-status.log:1064

Successful open for directories:
$ grep -c '^open(.*O_DIRECTORY.*= [^-]' linux-*
linux-git-diff.log:1
linux-git-status.log:989

Not successful attempts to open:
$ grep -c '^open(.*= -1' linux-*
linux-git-diff.log:6
linux-git-status.log:984

Attempts to open .gitignore:
$ grep -c '^open(.*.\.gitignore"' linux-*
linux-git-diff.log:0
linux-git-status.log:987

=== Linux with NO_D_TYPE_IN_DIRENT = YesPlease ===

$ wc -l linux-git-*no-dtype.log
   4674 linux-git-diff-no-dtype.log
  14040 linux-git-status-no-dtype.log

$ sed -e 's/(.*//' < linux-git-diff-no-dtype.log  | sort | uniq -c | sort -rn | head -10

   4237 lstat
     88 mmap
     56 open
     50 close
     50 access
     48 fstat
     45 mprotect
     43 read
     15 stat
     13 munmap

The number of lstat+fstat is equal 4285 for git-diff

$ sed -e 's/(.*//' < linux-git-status-no-dtype.log  | sort | uniq -c | sort -rn | head -10

   7512 lstat
   2048 open
   1976 getdents
   1062 close
   1058 fstat
     97 mmap
     67 read
     48 access
     45 mprotect
     40 write

The number of lstat+fstat is equal 8570 for git-status.

Successful open:
$ grep -c '^open(.*= [^-]' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:50
linux-git-status-no-dtype.log:1064

Successful open for directories:
$ grep -c '^open(.*O_DIRECTORY.*= [^-]' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:1
linux-git-status-no-dtype.log:989

Not successful attempts to open:
$ grep -c '^open(.*= -1' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:6
linux-git-status-no-dtype.log:984

Attempts to open .gitignore:
$ grep -c '^open(.*.\.gitignore"' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:0
linux-git-status-no-dtype.log:987

=======

Dmitry

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-07-12 21:33 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-07  0:05 Too many 'stat' calls by git-status on Windows Dmitry Potapov
2009-07-08 19:49 ` Ramsay Jones
2009-07-09  2:04 ` Linus Torvalds
2009-07-09  2:35   ` Linus Torvalds
2009-07-09  2:40     ` [PATCH 1/3] Add 'fill_directory()' helper function for directory traversal Linus Torvalds
2009-07-09  2:42       ` [PATCH 2/3] Simplify read_directory[_recursive]() arguments Linus Torvalds
2009-07-09  2:43         ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have an up-to-date cache entry Linus Torvalds
2009-07-09  8:18           ` Junio C Hamano
2009-07-09 15:52             ` Linus Torvalds
2009-07-09 16:32               ` Junio C Hamano
2009-07-09 16:59                 ` Linus Torvalds
2009-07-09 18:34                   ` Junio C Hamano
2009-07-09 17:13                 ` Linus Torvalds
2009-07-09 17:18                   ` Linus Torvalds
2009-07-09 18:37                     ` Junio C Hamano
2009-07-09 18:53                       ` Linus Torvalds
2009-07-09 20:44                         ` [PATCH 4/3] Avoid using 'lstat()' to figure out directories Linus Torvalds
2009-07-09 20:47                           ` [PATCH 5/3] Prepare symlink caching for thread-safety Linus Torvalds
2009-07-09 20:48                             ` [PATCH 6/3] Export thread-safe version of 'has_symlink_leading_path()' Linus Torvalds
2009-07-09 20:50                               ` [PATCH 7/3] Make index preloading check the whole path to the file Linus Torvalds
2009-07-09 20:56                                 ` Linus Torvalds
2009-07-10  3:12                                 ` Junio C Hamano
2009-07-10  3:29                                   ` Linus Torvalds
2009-07-10  3:40                                     ` Linus Torvalds
2009-07-11  2:53                                     ` Junio C Hamano
2009-07-11  3:04                                       ` Linus Torvalds
2009-07-12  0:09                               ` [PATCH 6/3] Export thread-safe version of 'has_symlink_leading_path()' Kjetil Barvik
2009-07-12 21:33                                 ` Junio C Hamano
2009-07-09 22:36                           ` [PATCH 4/3] Avoid using 'lstat()' to figure out directories Paolo Bonzini
2009-07-09 23:26                             ` Linus Torvalds
2009-07-09 23:52                               ` Linus Torvalds
2009-07-10  0:13                                 ` Linus Torvalds
2009-07-09 23:37                             ` Junio C Hamano
2009-07-09 21:05                 ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have an up-to-date cache entry Dmitry Potapov
2009-07-09 21:52                   ` Eric Blake
2009-07-09 23:30                     ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have?an " Dmitry Potapov
2009-07-10 13:04                       ` Dmitry Potapov
2009-07-09 23:29                   ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have an " Dmitry Potapov
2009-07-09 13:50           ` Dmitry Potapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.