* fread reading directories @ 2020-06-06 22:36 Kyle Evans 2020-06-07 17:05 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Kyle Evans @ 2020-06-06 22:36 UTC (permalink / raw) To: git Hi, I was looking at FREAD_READS_DIRECTORIES to measure some performance differences, then stumbled upon [0] that dropped fread() from the autoconf test that causes git to use its git_fopen shim [1] even on Linux. I've read the commit message a couple of times, but I'm really not seeing the rationale for *why* git wants this knob to be set on Linux. Unless I'm missing something, this would seem to regress the almost certainly much-more-common case of fopen() a file and fread() it with an unconditional fstat() from grep_fopen(), rather than just using two syscalls at all times (directories and non-directories) and letting it get rejected in fread(). Thoughts? Thanks, Kyle Evans [0] https://github.com/git/git/commit/3adf9fdecfb0cd31a83ef3af1d8d631a1acd392b [1] https://github.com/git/git/blob/master/compat/fopen.c ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fread reading directories 2020-06-06 22:36 fread reading directories Kyle Evans @ 2020-06-07 17:05 ` Junio C Hamano 2020-06-07 17:16 ` Kyle Evans 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2020-06-07 17:05 UTC (permalink / raw) To: Kyle Evans; +Cc: git Kyle Evans <kevans@freebsd.org> writes: > I was looking at FREAD_READS_DIRECTORIES to measure some performance > differences, then stumbled upon [0] that dropped fread() from the > autoconf test that causes git to use its git_fopen shim [1] even on > Linux. I thought we saw this mentioned recently? I do not recall if any concrete improvement came out of it. The Makefile defines the macro as such: # Define FREAD_READS_DIRECTORIES if you are on a system which succeeds # when attempting to read from an fopen'ed directory (or even to fopen # it at all). So, the macro is expected to be set if a platform gives back FILE * on a directory, whether it allows fread() on it or not. If it is a good idea is entirely different story, though. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fread reading directories 2020-06-07 17:05 ` Junio C Hamano @ 2020-06-07 17:16 ` Kyle Evans 2020-06-08 17:18 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Kyle Evans @ 2020-06-07 17:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Sun, Jun 7, 2020 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote: > > Kyle Evans <kevans@freebsd.org> writes: > > > I was looking at FREAD_READS_DIRECTORIES to measure some performance > > differences, then stumbled upon [0] that dropped fread() from the > > autoconf test that causes git to use its git_fopen shim [1] even on > > Linux. > > I thought we saw this mentioned recently? I do not recall if > any concrete improvement came out of it. > Ah, this is my bad. =-( I had searched the archives (I'm not typically subscribed to this list) and noticed the related patch for GNU/Hurd, but completely missed that a more active discussion had taken place within that thread. I have now read that, and have no further questions. Thanks! Kyle Evans ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fread reading directories 2020-06-07 17:16 ` Kyle Evans @ 2020-06-08 17:18 ` Junio C Hamano 2020-06-08 19:08 ` Brandon Casey 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2020-06-08 17:18 UTC (permalink / raw) To: git; +Cc: Kyle Evans, Jeff King, Brandon Casey Kyle Evans <kevans@freebsd.org> writes: > On Sun, Jun 7, 2020 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Kyle Evans <kevans@freebsd.org> writes: >> >> > I was looking at FREAD_READS_DIRECTORIES to measure some performance >> > differences, then stumbled upon [0] that dropped fread() from the >> > autoconf test that causes git to use its git_fopen shim [1] even on >> > Linux. >> >> I thought we saw this mentioned recently? I do not recall if >> any concrete improvement came out of it. >> > > Ah, this is my bad. =-( I had searched the archives (I'm not typically > subscribed to this list) and noticed the related patch for GNU/Hurd, > but completely missed that a more active discussion had taken place > within that thread. I have now read that, and have no further > questions. For the benefit of those who are watching from sidelines, the relevant thread ends at https://lore.kernel.org/git/20200424055106.GG1648190@coredump.intra.peff.net/ In short, many callers of fopen() in our code rely on our variant of fopen() that notices that the caller fed us a directory for error reporting. Unless the caller somehow knows the argument it calls fopen() with is a file and not a directory, somebody needs to fopen() and fstat() (or stat() and then fopen()) to catch it as an error to give that caller a directory. In the current arrangement, we let our fopen() wrapper do that task, instead of the callers. It may make sense to do one of the two things: - The lighter weight one is to rename the macro to the reflect the trait we are trying to capture more faithfully: "fopen opens directories" and leave the code and performance characteristics as-is. - Heavier weight one is to audit callers of fopen() and only let those that know they do not have a directory directly call fopen(). The other callers would call our wrapper under a different name. This way, the former won't have to pay the overhead of checking for "you gave me a directory but I only take a file" error twice. This is what Brandon proposed in the thread. Doing neither would leave this seed of confusion for later readers, which is not ideal. I am tempted to say that we for now should do an even lighter variant of the former, which is to give a comment. Thoughts? Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 09f98b777c..a0bef206a9 100644 --- a/Makefile +++ b/Makefile @@ -19,8 +19,7 @@ all:: # have been written to the final string if enough space had been available. # # Define FREAD_READS_DIRECTORIES if you are on a system which succeeds -# when attempting to read from an fopen'ed directory (or even to fopen -# it at all). +# when an fopen() on a directory does not result in an error. # # Define NO_OPENSSL environment variable if you do not have OpenSSL. # ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: fread reading directories 2020-06-08 17:18 ` Junio C Hamano @ 2020-06-08 19:08 ` Brandon Casey 2020-06-08 19:41 ` Randall S. Becker 0 siblings, 1 reply; 6+ messages in thread From: Brandon Casey @ 2020-06-08 19:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Kyle Evans, Jeff King On Mon, Jun 8, 2020 at 10:18 AM Junio C Hamano <gitster@pobox.com> wrote: > > It may make sense to do one of the two things: > > - The lighter weight one is to rename the macro to the reflect the > trait we are trying to capture more faithfully: "fopen opens > directories" and leave the code and performance characteristics > as-is. > > - Heavier weight one is to audit callers of fopen() and only let > those that know they do not have a directory directly call > fopen(). The other callers would call our wrapper under a > different name. This way, the former won't have to pay the > overhead of checking for "you gave me a directory but I only take > a file" error twice. This is what Brandon proposed in the > thread. > > Doing neither would leave this seed of confusion for later readers, > which is not ideal. I am tempted to say that we for now should do > an even lighter variant of the former, which is to give a comment. > > Thoughts? I'd suggest a medium weight approach which would be to introduce a new function with an appropriate name (fopen_file_only()?) that behaves the way we want it to, and replace every existing fopen() call with this new function. We could introduce a new macro, which I think would only be used on Windows, to say "fopen already fails to open directories" (FOPEN_FAILS_ON_DIRECTORIES?) so that fopen_file_only could be simplified to just a bare fopen there. That way it's clear to the reader, at the callsite, that the call does not have the standard behavior of fopen. Then, FREAD_READS_DIRECTORIES could be removed from all but the 1 or 2 platforms that it was originally set for. I'd imagine that we'd basically just promote the git_fopen() function from compat to become the implementation of the first tier fopen_file_only() function. On the FREAD_READS_DIRECTORIES platforms, a bare fopen would also become fopen_file_only(). The call to fopen() within fopen_file_only() would obviously need to take this into account to ensure that it calls the real fopen(). I think this would put the pieces in place for someone to audit all of the existing uses of fopen_file_only() and potentially replace them with a straight fopen() if appropriate. And it would allow future code to explicitly make the choice between fopen_file_only() or just fopen(). None of this would produce any functional change on any of our platforms, but I think it would make things more clear. -Brandon ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: fread reading directories 2020-06-08 19:08 ` Brandon Casey @ 2020-06-08 19:41 ` Randall S. Becker 0 siblings, 0 replies; 6+ messages in thread From: Randall S. Becker @ 2020-06-08 19:41 UTC (permalink / raw) To: 'Brandon Casey', 'Junio C Hamano' Cc: 'git', 'Kyle Evans', 'Jeff King' On June 8, 2020 3:08 PM, Brandon Casey wrote: > To: Junio C Hamano <gitster@pobox.com> > Cc: git <git@vger.kernel.org>; Kyle Evans <kevans@freebsd.org>; Jeff King > <peff@peff.net> > Subject: Re: fread reading directories > > On Mon, Jun 8, 2020 at 10:18 AM Junio C Hamano <gitster@pobox.com> > wrote: > > > > It may make sense to do one of the two things: > > > > - The lighter weight one is to rename the macro to the reflect the > > trait we are trying to capture more faithfully: "fopen opens > > directories" and leave the code and performance characteristics > > as-is. > > > > - Heavier weight one is to audit callers of fopen() and only let > > those that know they do not have a directory directly call > > fopen(). The other callers would call our wrapper under a > > different name. This way, the former won't have to pay the > > overhead of checking for "you gave me a directory but I only take > > a file" error twice. This is what Brandon proposed in the > > thread. > > > > Doing neither would leave this seed of confusion for later readers, > > which is not ideal. I am tempted to say that we for now should do an > > even lighter variant of the former, which is to give a comment. > > > > Thoughts? > > I'd suggest a medium weight approach which would be to introduce a new > function with an appropriate name (fopen_file_only()?) that behaves the way > we want it to, and replace every existing fopen() call with this new function. > We could introduce a new macro, which I think would only be used on > Windows, to say "fopen already fails to open directories" > (FOPEN_FAILS_ON_DIRECTORIES?) so that fopen_file_only could be > simplified to just a bare fopen there. That way it's clear to the reader, at the > callsite, that the call does not have the standard behavior of fopen. > > Then, FREAD_READS_DIRECTORIES could be removed from all but the 1 or 2 > platforms that it was originally set for. I'd imagine that we'd basically just > promote the git_fopen() function from compat to become the > implementation of the first tier fopen_file_only() function. On the > FREAD_READS_DIRECTORIES platforms, a bare fopen would also become > fopen_file_only(). The call to fopen() within fopen_file_only() would > obviously need to take this into account to ensure that it calls the real > fopen(). > > I think this would put the pieces in place for someone to audit all of the > existing uses of fopen_file_only() and potentially replace them with a straight > fopen() if appropriate. And it would allow future code to explicitly make the > choice between fopen_file_only() or just fopen(). > > None of this would produce any functional change on any of our platforms, > but I think it would make things more clear. Please keep me on the loop on this one. The NonStop platforms have FREAD_READS_DIRECTORIES UnfortunatelyYes. We will want to move to the new structure as soon as we can, so compat makes me comfortable. Thanks, Randall ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-08 19:41 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-06 22:36 fread reading directories Kyle Evans 2020-06-07 17:05 ` Junio C Hamano 2020-06-07 17:16 ` Kyle Evans 2020-06-08 17:18 ` Junio C Hamano 2020-06-08 19:08 ` Brandon Casey 2020-06-08 19:41 ` Randall S. Becker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).