git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fread reading directories
@ 2020-06-06 22:36 Kyle Evans
  2020-06-07 17:05 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Kyle Evans @ 2020-06-06 22:36 UTC (permalink / raw)
  To: git

Hi,

I was looking at FREAD_READS_DIRECTORIES to measure some performance
differences, then stumbled upon [0] that dropped fread() from the
autoconf test that causes git to use its git_fopen shim [1] even on
Linux.

I've read the commit message a couple of times, but I'm really not
seeing the rationale for *why* git wants this knob to be set on Linux.

Unless I'm missing something, this would seem to regress the almost
certainly much-more-common case of fopen() a file and fread() it with
an unconditional fstat() from grep_fopen(), rather than just using two
syscalls at all times (directories and non-directories) and letting it
get rejected in fread().

Thoughts?

Thanks,

Kyle Evans

[0] https://github.com/git/git/commit/3adf9fdecfb0cd31a83ef3af1d8d631a1acd392b
[1] https://github.com/git/git/blob/master/compat/fopen.c

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fread reading directories
  2020-06-06 22:36 fread reading directories Kyle Evans
@ 2020-06-07 17:05 ` Junio C Hamano
  2020-06-07 17:16   ` Kyle Evans
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2020-06-07 17:05 UTC (permalink / raw)
  To: Kyle Evans; +Cc: git

Kyle Evans <kevans@freebsd.org> writes:

> I was looking at FREAD_READS_DIRECTORIES to measure some performance
> differences, then stumbled upon [0] that dropped fread() from the
> autoconf test that causes git to use its git_fopen shim [1] even on
> Linux.

I thought we saw this mentioned recently?  I do not recall if
any concrete improvement came out of it.

The Makefile defines the macro as such:

# Define FREAD_READS_DIRECTORIES if you are on a system which succeeds
# when attempting to read from an fopen'ed directory (or even to fopen
# it at all).

So, the macro is expected to be set if a platform gives back FILE *
on a directory, whether it allows fread() on it or not.

If it is a good idea is entirely different story, though.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fread reading directories
  2020-06-07 17:05 ` Junio C Hamano
@ 2020-06-07 17:16   ` Kyle Evans
  2020-06-08 17:18     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Kyle Evans @ 2020-06-07 17:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Jun 7, 2020 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Kyle Evans <kevans@freebsd.org> writes:
>
> > I was looking at FREAD_READS_DIRECTORIES to measure some performance
> > differences, then stumbled upon [0] that dropped fread() from the
> > autoconf test that causes git to use its git_fopen shim [1] even on
> > Linux.
>
> I thought we saw this mentioned recently?  I do not recall if
> any concrete improvement came out of it.
>

Ah, this is my bad. =-( I had searched the archives (I'm not typically
subscribed to this list) and noticed the related patch for GNU/Hurd,
but completely missed that a more active discussion had taken place
within that thread. I have now read that, and have no further
questions.

Thanks!

Kyle Evans

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fread reading directories
  2020-06-07 17:16   ` Kyle Evans
@ 2020-06-08 17:18     ` Junio C Hamano
  2020-06-08 19:08       ` Brandon Casey
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2020-06-08 17:18 UTC (permalink / raw)
  To: git; +Cc: Kyle Evans, Jeff King, Brandon Casey

Kyle Evans <kevans@freebsd.org> writes:

> On Sun, Jun 7, 2020 at 12:05 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Kyle Evans <kevans@freebsd.org> writes:
>>
>> > I was looking at FREAD_READS_DIRECTORIES to measure some performance
>> > differences, then stumbled upon [0] that dropped fread() from the
>> > autoconf test that causes git to use its git_fopen shim [1] even on
>> > Linux.
>>
>> I thought we saw this mentioned recently?  I do not recall if
>> any concrete improvement came out of it.
>>
>
> Ah, this is my bad. =-( I had searched the archives (I'm not typically
> subscribed to this list) and noticed the related patch for GNU/Hurd,
> but completely missed that a more active discussion had taken place
> within that thread. I have now read that, and have no further
> questions.

For the benefit of those who are watching from sidelines, the
relevant thread ends at

https://lore.kernel.org/git/20200424055106.GG1648190@coredump.intra.peff.net/

In short, many callers of fopen() in our code rely on our variant of
fopen() that notices that the caller fed us a directory for error
reporting.  Unless the caller somehow knows the argument it calls
fopen() with is a file and not a directory, somebody needs to
fopen() and fstat() (or stat() and then fopen()) to catch it as an
error to give that caller a directory.  In the current arrangement, 
we let our fopen() wrapper do that task, instead of the callers.

It may make sense to do one of the two things:

 - The lighter weight one is to rename the macro to the reflect the
   trait we are trying to capture more faithfully: "fopen opens
   directories" and leave the code and performance characteristics
   as-is.

 - Heavier weight one is to audit callers of fopen() and only let
   those that know they do not have a directory directly call
   fopen().  The other callers would call our wrapper under a
   different name.  This way, the former won't have to pay the
   overhead of checking for "you gave me a directory but I only take
   a file" error twice.  This is what Brandon proposed in the
   thread.

Doing neither would leave this seed of confusion for later readers,
which is not ideal.  I am tempted to say that we for now should do
an even lighter variant of the former, which is to give a comment.

Thoughts?

 Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 09f98b777c..a0bef206a9 100644
--- a/Makefile
+++ b/Makefile
@@ -19,8 +19,7 @@ all::
 # have been written to the final string if enough space had been available.
 #
 # Define FREAD_READS_DIRECTORIES if you are on a system which succeeds
-# when attempting to read from an fopen'ed directory (or even to fopen
-# it at all).
+# when an fopen() on a directory does not result in an error.
 #
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 #

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: fread reading directories
  2020-06-08 17:18     ` Junio C Hamano
@ 2020-06-08 19:08       ` Brandon Casey
  2020-06-08 19:41         ` Randall S. Becker
  0 siblings, 1 reply; 6+ messages in thread
From: Brandon Casey @ 2020-06-08 19:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Kyle Evans, Jeff King

On Mon, Jun 8, 2020 at 10:18 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> It may make sense to do one of the two things:
>
>  - The lighter weight one is to rename the macro to the reflect the
>    trait we are trying to capture more faithfully: "fopen opens
>    directories" and leave the code and performance characteristics
>    as-is.
>
>  - Heavier weight one is to audit callers of fopen() and only let
>    those that know they do not have a directory directly call
>    fopen().  The other callers would call our wrapper under a
>    different name.  This way, the former won't have to pay the
>    overhead of checking for "you gave me a directory but I only take
>    a file" error twice.  This is what Brandon proposed in the
>    thread.
>
> Doing neither would leave this seed of confusion for later readers,
> which is not ideal.  I am tempted to say that we for now should do
> an even lighter variant of the former, which is to give a comment.
>
> Thoughts?

I'd suggest a medium weight approach which would be to introduce a new
function with an appropriate name (fopen_file_only()?) that behaves
the way we want it to, and replace every existing fopen() call with
this new function.  We could introduce a new macro, which I think
would only be used on Windows, to say "fopen already fails to open
directories" (FOPEN_FAILS_ON_DIRECTORIES?) so that fopen_file_only
could be simplified to just a bare fopen there. That way it's clear to
the reader, at the callsite, that the call does not have the standard
behavior of fopen.

Then, FREAD_READS_DIRECTORIES could be removed from all but the 1 or 2
platforms that it was originally set for. I'd imagine that we'd
basically just promote the git_fopen() function from compat to become
the implementation of the first tier fopen_file_only() function.  On
the FREAD_READS_DIRECTORIES platforms, a bare fopen would also become
fopen_file_only(). The call to fopen() within fopen_file_only() would
obviously need to take this into account to ensure that it calls the
real fopen().

I think this would put the pieces in place for someone to audit all of
the existing uses of fopen_file_only() and potentially replace them
with a straight fopen() if appropriate. And it would allow future code
to explicitly make the choice between fopen_file_only() or just
fopen().

None of this would produce any functional change on any of our
platforms, but I think it would make things more clear.

-Brandon

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: fread reading directories
  2020-06-08 19:08       ` Brandon Casey
@ 2020-06-08 19:41         ` Randall S. Becker
  0 siblings, 0 replies; 6+ messages in thread
From: Randall S. Becker @ 2020-06-08 19:41 UTC (permalink / raw)
  To: 'Brandon Casey', 'Junio C Hamano'
  Cc: 'git', 'Kyle Evans', 'Jeff King'

On June 8, 2020 3:08 PM, Brandon Casey wrote:
> To: Junio C Hamano <gitster@pobox.com>
> Cc: git <git@vger.kernel.org>; Kyle Evans <kevans@freebsd.org>; Jeff King
> <peff@peff.net>
> Subject: Re: fread reading directories
> 
> On Mon, Jun 8, 2020 at 10:18 AM Junio C Hamano <gitster@pobox.com>
> wrote:
> >
> > It may make sense to do one of the two things:
> >
> >  - The lighter weight one is to rename the macro to the reflect the
> >    trait we are trying to capture more faithfully: "fopen opens
> >    directories" and leave the code and performance characteristics
> >    as-is.
> >
> >  - Heavier weight one is to audit callers of fopen() and only let
> >    those that know they do not have a directory directly call
> >    fopen().  The other callers would call our wrapper under a
> >    different name.  This way, the former won't have to pay the
> >    overhead of checking for "you gave me a directory but I only take
> >    a file" error twice.  This is what Brandon proposed in the
> >    thread.
> >
> > Doing neither would leave this seed of confusion for later readers,
> > which is not ideal.  I am tempted to say that we for now should do an
> > even lighter variant of the former, which is to give a comment.
> >
> > Thoughts?
> 
> I'd suggest a medium weight approach which would be to introduce a new
> function with an appropriate name (fopen_file_only()?) that behaves the way
> we want it to, and replace every existing fopen() call with this new function.
> We could introduce a new macro, which I think would only be used on
> Windows, to say "fopen already fails to open directories"
> (FOPEN_FAILS_ON_DIRECTORIES?) so that fopen_file_only could be
> simplified to just a bare fopen there. That way it's clear to the reader, at the
> callsite, that the call does not have the standard behavior of fopen.
> 
> Then, FREAD_READS_DIRECTORIES could be removed from all but the 1 or 2
> platforms that it was originally set for. I'd imagine that we'd basically just
> promote the git_fopen() function from compat to become the
> implementation of the first tier fopen_file_only() function.  On the
> FREAD_READS_DIRECTORIES platforms, a bare fopen would also become
> fopen_file_only(). The call to fopen() within fopen_file_only() would
> obviously need to take this into account to ensure that it calls the real
> fopen().
> 
> I think this would put the pieces in place for someone to audit all of the
> existing uses of fopen_file_only() and potentially replace them with a straight
> fopen() if appropriate. And it would allow future code to explicitly make the
> choice between fopen_file_only() or just fopen().
> 
> None of this would produce any functional change on any of our platforms,
> but I think it would make things more clear.

Please keep me on the loop on this one. The NonStop platforms have FREAD_READS_DIRECTORIES UnfortunatelyYes. We will want to move to the new structure as soon as we can, so compat makes me comfortable.

Thanks,
Randall


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-08 19:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-06 22:36 fread reading directories Kyle Evans
2020-06-07 17:05 ` Junio C Hamano
2020-06-07 17:16   ` Kyle Evans
2020-06-08 17:18     ` Junio C Hamano
2020-06-08 19:08       ` Brandon Casey
2020-06-08 19:41         ` Randall S. Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).