git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] help: correct behavior for is_executable on Windows
@ 2017-01-27 13:50 Johannes Schindelin
  2017-01-27 18:43 ` Junio C Hamano
  2017-01-30 12:40 ` [PATCH v2] help: improve is_executable() " Johannes Schindelin
  0 siblings, 2 replies; 5+ messages in thread
From: Johannes Schindelin @ 2017-01-27 13:50 UTC (permalink / raw)
  To: git; +Cc: Heiko Voigt, Junio C Hamano

From: Heiko Voigt <hvoigt@hvoigt.net>

The previous implementation said that the filesystem information on
Windows is not reliable to determine whether a file is executable. To
gather this information it was peeking into the first two bytes of a
file to see whether it looks executable.

Apart from the fact that on Windows executables are defined as such by
their extension (and Git has special code to support "executing" scripts
when they have a she-bang line) it leads to serious performance
problems: not only do we have to open many files now, it gets even
slower when a virus scanner is running.

See the following measurements (in seconds) as an example, where we
execute a simple program that simply lists the directory contents and
calls open() on every listed file:

With virus scanner running (coldcache):

$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.412873
before open (git-annotate.exe): 0.000175
after open (git-annotate.exe): 0.397925
before open (git-apply.exe): 0.000243
after open (git-apply.exe): 0.399996
before open (git-archive.exe): 0.000147
after open (git-archive.exe): 0.397783
before open (git-bisect--helper.exe): 0.000160
after open (git-bisect--helper.exe): 0.397700
before open (git-blame.exe): 0.000160
after open (git-blame.exe): 0.399136
...

With virus scanner running (hotcache):

$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.000325
before open (git-annotate.exe): 0.000229
after open (git-annotate.exe): 0.000177
before open (git-apply.exe): 0.000167
after open (git-apply.exe): 0.000150
before open (git-archive.exe): 0.000154
after open (git-archive.exe): 0.000156
before open (git-bisect--helper.exe): 0.000132
after open (git-bisect--helper.exe): 0.000180
before open (git-blame.exe): 0.000718
after open (git-blame.exe): 0.000724
...

With this patch I get:

$ time git help git
Launching default browser to display HTML ...

real    0m8.723s
user    0m0.000s
sys     0m0.000s

and without

$ time git help git
Launching default browser to display HTML ...

real    1m37.734s
user    0m0.000s
sys     0m0.031s

both tests with cold cache and giving the machine some time to settle
down after restart.

Signed-off-by: Heiko Voigt <heiko.voigt@mahr.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Published-As: https://github.com/dscho/git/releases/tag/help-is-exe-v1
Fetch-It-Via: git fetch https://github.com/dscho/git help-is-exe-v1

 help.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/help.c b/help.c
index 53e2a67e00..bc6cd19cf3 100644
--- a/help.c
+++ b/help.c
@@ -105,7 +105,22 @@ static int is_executable(const char *name)
 		return 0;
 
 #if defined(GIT_WINDOWS_NATIVE)
-{	/* cannot trust the executable bit, peek into the file instead */
+	/*
+	 * On Windows there is no executable bit. The file extension
+	 * indicates whether it can be run as an executable, and Git
+	 * has special-handling to detect scripts and launch them
+	 * through the indicated script interpreter. We test for the
+	 * file extension first because virus scanners may make
+	 * it quite expensive to open many files.
+	 */
+	if (ends_with(name, ".exe"))
+		return S_IXUSR;
+
+{
+	/*
+	 * Now that we know it does not have an executable extension,
+	 * peek into the file instead.
+	 */
 	char buf[3] = { 0 };
 	int n;
 	int fd = open(name, O_RDONLY);
@@ -113,8 +128,8 @@ static int is_executable(const char *name)
 	if (fd >= 0) {
 		n = read(fd, buf, 2);
 		if (n == 2)
-			/* DOS executables start with "MZ" */
-			if (!strcmp(buf, "#!") || !strcmp(buf, "MZ"))
+			/* look for a she-bang */
+			if (!strcmp(buf, "#!"))
 				st.st_mode |= S_IXUSR;
 		close(fd);
 	}

base-commit: 4e59582ff70d299f5a88449891e78d15b4b3fabe
-- 
2.11.1.windows.prerelease.2.9.g3014b57

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] help: correct behavior for is_executable on Windows
  2017-01-27 13:50 [PATCH] help: correct behavior for is_executable on Windows Johannes Schindelin
@ 2017-01-27 18:43 ` Junio C Hamano
  2017-01-30 12:44   ` Johannes Schindelin
  2017-01-30 12:40 ` [PATCH v2] help: improve is_executable() " Johannes Schindelin
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2017-01-27 18:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Heiko Voigt

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> From: Heiko Voigt <hvoigt@hvoigt.net>
>
> The previous implementation said that the filesystem information on
> Windows is not reliable to determine whether a file is executable. To
> gather this information it was peeking into the first two bytes of a
> file to see whether it looks executable.
>
> Apart from the fact that on Windows executables are defined as such by
> their extension (and Git has special code to support "executing" scripts
> when they have a she-bang line) it leads to serious performance
> problems: not only do we have to open many files now, it gets even
> slower when a virus scanner is running.

Heiko, around here (before going into the details of how severe the
problem is and how wonderful the result applying of this change is)
is the best place to summarize the solution.  E.g.

	Because the definition of "executable" on Windows is based
	on the file extension, update the function to declare that a
	file with ".exe" extension without opening and reading the
	early bytes from it.  This avoids serious performance issues.

I paraphrased the rest only so that the description of the solution
(i.e. "instead of opening and peeking, we trust .exe suffix") fits
well in the surrounding text; the important part is to say what the
change does clearly.

I agree with the reasoning and the execution of the patch, except
that 

 - "correct behaviour" in the title makes it appear that this is a
   correctness thing, but this is primarily a performance fix.

 - It is a bit strange that "MZ" is dropped in the same patch
   without any mention.

The latter may be "correctness" fix, in that earlier we treated any
file that happens to begin with MZ as an executable, regardless of
the filename, which may have misidentified a non-executable file as
an executable.  If that is what is going on, it deserves to be
described in the log message.

> diff --git a/help.c b/help.c
> index 53e2a67e00..bc6cd19cf3 100644
> --- a/help.c
> +++ b/help.c
> @@ -105,7 +105,22 @@ static int is_executable(const char *name)
>  		return 0;
>  
>  #if defined(GIT_WINDOWS_NATIVE)
> -{	/* cannot trust the executable bit, peek into the file instead */
> +	/*
> +	 * On Windows there is no executable bit. The file extension
> +	 * indicates whether it can be run as an executable, and Git
> +	 * has special-handling to detect scripts and launch them
> +	 * through the indicated script interpreter. We test for the
> +	 * file extension first because virus scanners may make
> +	 * it quite expensive to open many files.
> +	 */
> +	if (ends_with(name, ".exe"))
> +		return S_IXUSR;
> +
> +{
> +	/*
> +	 * Now that we know it does not have an executable extension,
> +	 * peek into the file instead.
> +	 */
>  	char buf[3] = { 0 };
>  	int n;
>  	int fd = open(name, O_RDONLY);
> @@ -113,8 +128,8 @@ static int is_executable(const char *name)
>  	if (fd >= 0) {
>  		n = read(fd, buf, 2);
>  		if (n == 2)
> -			/* DOS executables start with "MZ" */
> -			if (!strcmp(buf, "#!") || !strcmp(buf, "MZ"))
> +			/* look for a she-bang */
> +			if (!strcmp(buf, "#!"))
>  				st.st_mode |= S_IXUSR;
>  		close(fd);
>  	}
>
> base-commit: 4e59582ff70d299f5a88449891e78d15b4b3fabe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] help: improve is_executable() on Windows
  2017-01-27 13:50 [PATCH] help: correct behavior for is_executable on Windows Johannes Schindelin
  2017-01-27 18:43 ` Junio C Hamano
@ 2017-01-30 12:40 ` Johannes Schindelin
  2017-01-30 17:03   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Johannes Schindelin @ 2017-01-30 12:40 UTC (permalink / raw)
  To: git; +Cc: Heiko Voigt, Junio C Hamano

From: Heiko Voigt <hvoigt@hvoigt.net>

On Windows, executables need to have the file extension `.exe`, or they
are not executables. Hence, to support scripts, Git for Windows also
looks for a she-bang line by opening the file in question, and executing
it via the specified script interpreter.

To figure out whether files in the `PATH` are executable, `git help` has
code that imitates this behavior. With one exception: it *always* opens
the files and looks for a she-bang line *or* an `MZ` tell-tale
(nevermind that files with the magic `MZ` but without file extension
`.exe` would still not be executable).

Opening this many files leads to performance problems that are even more
serious when a virus scanner is running. Therefore, let's change the
code to look for the file extension `.exe` early, and avoid opening the
file altogether if we already know that it is executable.

See the following measurements (in seconds) as an example, where we
execute a simple program that simply lists the directory contents and
calls open() on every listed file:

With virus scanner running (coldcache):

$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.412873
before open (git-annotate.exe): 0.000175
after open (git-annotate.exe): 0.397925
before open (git-apply.exe): 0.000243
after open (git-apply.exe): 0.399996
before open (git-archive.exe): 0.000147
after open (git-archive.exe): 0.397783
before open (git-bisect--helper.exe): 0.000160
after open (git-bisect--helper.exe): 0.397700
before open (git-blame.exe): 0.000160
after open (git-blame.exe): 0.399136
...

With virus scanner running (hotcache):

$ ./a.exe /libexec/git-core/
before open (git-add.exe): 0.000000
after open (git-add.exe): 0.000325
before open (git-annotate.exe): 0.000229
after open (git-annotate.exe): 0.000177
before open (git-apply.exe): 0.000167
after open (git-apply.exe): 0.000150
before open (git-archive.exe): 0.000154
after open (git-archive.exe): 0.000156
before open (git-bisect--helper.exe): 0.000132
after open (git-bisect--helper.exe): 0.000180
before open (git-blame.exe): 0.000718
after open (git-blame.exe): 0.000724
...

With this patch I get:

$ time git help git
Launching default browser to display HTML ...

real    0m8.723s
user    0m0.000s
sys     0m0.000s

and without

$ time git help git
Launching default browser to display HTML ...

real    1m37.734s
user    0m0.000s
sys     0m0.031s

both tests with cold cache and giving the machine some time to settle
down after restart.

[jes: adjusted the commit message]

Signed-off-by: Heiko Voigt <heiko.voigt@mahr.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Published-As: https://github.com/dscho/git/releases/tag/help-is-exe-v2
Fetch-It-Via: git fetch https://github.com/dscho/git help-is-exe-v2

 help.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/help.c b/help.c
index 53e2a67e00..bc6cd19cf3 100644
--- a/help.c
+++ b/help.c
@@ -105,7 +105,22 @@ static int is_executable(const char *name)
 		return 0;
 
 #if defined(GIT_WINDOWS_NATIVE)
-{	/* cannot trust the executable bit, peek into the file instead */
+	/*
+	 * On Windows there is no executable bit. The file extension
+	 * indicates whether it can be run as an executable, and Git
+	 * has special-handling to detect scripts and launch them
+	 * through the indicated script interpreter. We test for the
+	 * file extension first because virus scanners may make
+	 * it quite expensive to open many files.
+	 */
+	if (ends_with(name, ".exe"))
+		return S_IXUSR;
+
+{
+	/*
+	 * Now that we know it does not have an executable extension,
+	 * peek into the file instead.
+	 */
 	char buf[3] = { 0 };
 	int n;
 	int fd = open(name, O_RDONLY);
@@ -113,8 +128,8 @@ static int is_executable(const char *name)
 	if (fd >= 0) {
 		n = read(fd, buf, 2);
 		if (n == 2)
-			/* DOS executables start with "MZ" */
-			if (!strcmp(buf, "#!") || !strcmp(buf, "MZ"))
+			/* look for a she-bang */
+			if (!strcmp(buf, "#!"))
 				st.st_mode |= S_IXUSR;
 		close(fd);
 	}

base-commit: 4e59582ff70d299f5a88449891e78d15b4b3fabe
-- 
2.11.1.windows.prerelease.2.9.g3014b57

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] help: correct behavior for is_executable on Windows
  2017-01-27 18:43 ` Junio C Hamano
@ 2017-01-30 12:44   ` Johannes Schindelin
  0 siblings, 0 replies; 5+ messages in thread
From: Johannes Schindelin @ 2017-01-30 12:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Heiko Voigt

Hi Junio,

On Fri, 27 Jan 2017, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > From: Heiko Voigt <hvoigt@hvoigt.net>
> >
> > The previous implementation said that the filesystem information on
> > Windows is not reliable to determine whether a file is executable. To
> > gather this information it was peeking into the first two bytes of a
> > file to see whether it looks executable.
> >
> > Apart from the fact that on Windows executables are defined as such by
> > their extension (and Git has special code to support "executing"
> > scripts when they have a she-bang line) it leads to serious
> > performance problems: not only do we have to open many files now, it
> > gets even slower when a virus scanner is running.
> 
> Heiko, around here (before going into the details of how severe the
> problem is and how wonderful the result applying of this change is) is
> the best place to summarize the solution.  E.g.
> 
> 	Because the definition of "executable" on Windows is based
> 	on the file extension, update the function to declare that a
> 	file with ".exe" extension without opening and reading the
> 	early bytes from it.  This avoids serious performance issues.
> 
> I paraphrased the rest only so that the description of the solution
> (i.e. "instead of opening and peeking, we trust .exe suffix") fits well
> in the surrounding text; the important part is to say what the change
> does clearly.

I adjusted the commit message. It was tweaked a little differently from
what you suggested, as I preferred to condense the information a bit more.

> I agree with the reasoning and the execution of the patch, except
> that 
> 
>  - "correct behaviour" in the title makes it appear that this is a
>    correctness thing, but this is primarily a performance fix.

Primarily. But not only. The magic `MZ` without the file extension `.exe`
is pretty useless, as the file could not be executed, still.

To avoid further turnaround, though, I also edited the contentious
"correct" to read "improve" now.

>  - It is a bit strange that "MZ" is dropped in the same patch
>    without any mention.

I fixed that in the commit message.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] help: improve is_executable() on Windows
  2017-01-30 12:40 ` [PATCH v2] help: improve is_executable() " Johannes Schindelin
@ 2017-01-30 17:03   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2017-01-30 17:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Heiko Voigt

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> From: Heiko Voigt <hvoigt@hvoigt.net>
>
> On Windows, executables need to have the file extension `.exe`, or they
> are not executables. Hence, to support scripts, Git for Windows also
> looks for a she-bang line by opening the file in question, and executing
> it via the specified script interpreter.
>
> To figure out whether files in the `PATH` are executable, `git help` has
> code that imitates this behavior. With one exception: it *always* opens
> the files and looks for a she-bang line *or* an `MZ` tell-tale
> (nevermind that files with the magic `MZ` but without file extension
> `.exe` would still not be executable).
>
> Opening this many files leads to performance problems that are even more
> serious when a virus scanner is running. Therefore, let's change the
> code to look for the file extension `.exe` early, and avoid opening the
> file altogether if we already know that it is executable.

Much more readable than the initial round.  Will queue; thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-30 17:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-27 13:50 [PATCH] help: correct behavior for is_executable on Windows Johannes Schindelin
2017-01-27 18:43 ` Junio C Hamano
2017-01-30 12:44   ` Johannes Schindelin
2017-01-30 12:40 ` [PATCH v2] help: improve is_executable() " Johannes Schindelin
2017-01-30 17:03   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).