git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	blees@dcon.de, Junio C Hamano <gitster@pobox.com>,
	Kyle Meyer <kyle@kyleam.com>, Samuel Lijin <sxlijin@gmail.com>
Subject: Re: [PATCH v2 6/8] dir: fix checks on common prefix directory
Date: Tue, 17 Dec 2019 08:58:49 -0800	[thread overview]
Message-ID: <CABPp-BEkX9cH1=r3dJ4WLzcJKVcF-KpGUkshL34MMp3Xhhhpuw@mail.gmail.com> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.1912171209170.46@tvgsbejvaqbjf.bet>

Hi Dscho,

On Tue, Dec 17, 2019 at 3:16 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Mon, 16 Dec 2019, Elijah Newren wrote:
>
> > On Mon, Dec 16, 2019 at 4:04 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > > On Mon, 16 Dec 2019, Elijah Newren wrote:
> > > > On Mon, Dec 16, 2019 at 5:51 AM Elijah Newren <newren@gmail.com> wrote:
> > > > >
> > > > > On Sun, Dec 15, 2019 at 2:29 AM Johannes Schindelin
> > > > > <Johannes.Schindelin@gmx.de> wrote:
> > > > > >
> > > > > > Hi Elijah,
> > > > > >
> > > > > > I have not had time to dive deeply into this, but I know that it _does_
> > > > > > cause a ton of segmentation faults in the `shears/pu` branch (where all of
> > > > > > Git for Windows' patches are rebased on top of `pu`):
> > > > >
> > > > > Weird.  If it's going to cause segmentation faults at all, it would
> > > > > certainly do it all over the place, but I tested the patches on the
> > > > > major platforms using your Azure Pipelines setup on git.git so it
> > > > > should be good on all the platforms.  Did your shears/pu branch make
> > > > > some other changes to the setup?
> > >
> > > Not really.
> > >
> > > >
> > > > Actually, it looks like I looked up the definition of dirent
> > > > previously and forgot by the time you emailed.  On linux, from
> > > > /usr/include/bits/dirent.h:
> > ...
> > > > and from compat/win32/dirent.h defines it as:
> > > >
> > > > struct dirent {
> > > >         unsigned char d_type;      /* file type to prevent lstat after
> > > > readdir */
> > > >         char d_name[MAX_PATH * 3]; /* file name (* 3 for UTF-8 conversion) */
> > > > };
> > ...
> > >
> > > If you care to look at our very own `compat/win32/dirent.h`, you will see
> > > this:
> >
> > Interesting, we both brought up compat/win32/dirent.h and quoted from
> > it in our emails...
> >
> > > struct dirent {
> > >         unsigned char d_type; /* file type to prevent lstat after readdir */
> > >         char *d_name;         /* file name */
> > > };
> >
> > ...but the contents were different?  Looks like git-for-windows forked
> > compat/win32/dirent.h, possibly in a way that violates POSIX as
> > pointed out by Junio.
>
> Yep, I messed that up, sorry.
>
> > Any reason those changes weren't sent back upstream, by chance?  Feels
> > odd having a compat/win32/ directory that our downstream windows users
> > aren't actually using.  It also means the testing I'm getting from
> > gitgitgadget and your Azure setup (which all is really, really nice by
> > the way), is far less reassuring and helpful than I hoped.
>
> Yes. I was ready to submit the FSCache feature to the Git mailing list for
> review some 2.5 years ago when along came Ben Peart, finding ways to speed
> up FSCache even further. That is the reason why I held off, and I still
> have to condense the patches (which currently form a topology of 17 patch
> series!!!) into a nice small patch series that does not reflect the
> meandering history of the FSCache history, but instead presents one neat
> story.
>
> > > And looking at
> > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/dirent.h.html, I
> > > do not see any guarantee of that `[256]` at all:
> > >
> > > The <dirent.h> header shall [...] define the structure dirent which shall
> > > include the following members:
> > >
> > > [XSI][Option Start]
> > > ino_t  d_ino       File serial number.
> > > [Option End]
> > > char   d_name[]    Filename string of entry.
> > >
> > > You will notice that not even `d_type` is guaranteed.
> >
> > Doh, yeah, I messed that up too.
> >
> > Anyway, as I mentioned to Junio, I'll resubmit after gutting the
> > series.  I'll still include a fix for the issue that a real world user
> > reported, but all the other ancillary bugs I found that have been
> > around for over a decade aren't important enough to merit a major
> > refactor, IMO.
>
> Hmm. I am really sorry that I nudged you to go down this route. Quite
> honestly, I'd rather add an ugly work-around that is Windows-only just so
> that you can fix those ancillary bugs.

You brought up issues; that's what you're supposed to do.  You
shouldn't feel bad about that.  Besides, the d_type one is real, and
means the patches at least need a
    #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
surrounding my explicit setting of d_type.  The problem wasn't what
you brought up or how you brought it up, it's massive fatigue on my
end from dir.c, from before even submitting this series[*].  I'm not
giving up on these changes or trying to discourage anyone else from
picking them up and extending them, I just don't want to touch them
right now and would rather put them on the shelf for a while.

Elijah


[*]  If you're really curious...I got involved in dir.c because of a
simple bug report nearly two years ago[1], and found myself working on
a foundation that was error-prone by design[2], with ambiguous or even
wrong documentation[3] about not just what the code does but the
intent.  Further, it was a place where not only is the correct fix
unclear, and not only is the "right" behavior unclear, but the cases
in question affect so few people that pinging the list periodically
over more than a year can't generate enough interest for anyone else
to hazard a guess as to what "correct" behavior is[4].  Stack on that
the fact that every time I touch this area, I think I'm really close
to having a fix, only to find I never, ever am.  There's always
one-more-thing before I can finally get back to something I really
wanted to work on instead.  Speaking of which, I've only managed to
work on my new merge strategy like once every 3-6 months for a small
amount of time each time.  Yes, part of that's my fault with
git-filter-repo (another case of perpetually thinking I'm close to
done), rebase changes, and whatnot.  But this series arose right when
I had my calendar nearly cleared so that I could work on the merge
strategy again (and of course the rebase bug report came in about the
same time too).  But at least git-filter-repo and rebase are generally
useful; dir.c at most generates "meh, this seems annoying" reports.
And I've already fixed all of those, the remaining fixes are stuff
that it appears I'm the only one to have reported, and I only reported
it because I was digging into the other "meh, seems annoying" reports.
I'm usually happy when I have a patch series ready to submit to git;
it means I think I'll make things better for others.  I didn't feel
that way with this series; I kind of wanted to just drop it entirely
and not even turn it in.  But I figured I should to at least document
my findings, so I pushed myself to submit and hoped no one would
respond.  Then this issue arose and when I mentioned in my
possibilities of fixing it that ripping the usage of dirent out would
be a lot of work and would probably cause me to give up and asked for
ideas, Junio responded that we should rip out dirent.  I think he's
right, and it's important the he defend code quality and point out the
right way to do things, it's just that I want out of this rabbit hole
right now.

[1] https://lore.kernel.org/git/20180405173446.32372-1-newren@gmail.com/
[2] https://lore.kernel.org/git/xmqqefjp6sko.fsf@gitster-ct.c.googlers.com/
[3] e.g. https://lore.kernel.org/git/20190905154735.29784-10-newren@gmail.com/
[4] https://lore.kernel.org/git/20190905154735.29784-1-newren@gmail.com/
and links referenced therein

  reply	other threads:[~2019-12-17 16:59 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 20:47 [PATCH 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-09 21:06   ` Denton Liu
2019-12-09 20:47 ` [PATCH 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-09 21:32   ` Denton Liu
2019-12-09 21:51     ` Elijah Newren
2019-12-09 22:09     ` Eric Sunshine
2019-12-09 20:47 ` [PATCH 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-09 20:47 ` [PATCH 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
2019-12-10 20:00 ` [PATCH v2 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-15 10:29     ` Johannes Schindelin
2019-12-16 13:51       ` Elijah Newren
2019-12-16 16:00         ` Elijah Newren
2019-12-16 18:13           ` Junio C Hamano
2019-12-16 21:08             ` Elijah Newren
2019-12-16 21:25               ` Junio C Hamano
2019-12-16 22:39                 ` Elijah Newren
2019-12-17  0:04           ` Johannes Schindelin
2019-12-17  0:14             ` Junio C Hamano
2019-12-17 11:08               ` Johannes Schindelin
2019-12-17 17:33                 ` Junio C Hamano
2019-12-17 19:32                   ` Johannes Schindelin
2019-12-17  5:26             ` Elijah Newren
2019-12-17 11:15               ` Johannes Schindelin
2019-12-17 16:58                 ` Elijah Newren [this message]
2019-12-10 20:00   ` [PATCH v2 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-10 20:00   ` [PATCH v2 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
2019-12-17  8:33   ` [PATCH v3 0/3] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-17  8:33     ` [PATCH v3 1/3] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-17  8:33     ` [PATCH v3 2/3] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-17  8:33     ` [PATCH v3 3/3] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-17 11:18     ` [PATCH v3 0/3] Directory traversal bugs Johannes Schindelin
2019-12-17 18:24       ` Junio C Hamano
2019-12-21 22:05         ` Johannes Schindelin
2019-12-18 19:29     ` [PATCH v4 0/8] " Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-18 21:29         ` Junio C Hamano
2019-12-19 20:23           ` Elijah Newren
2019-12-19 22:24             ` Jeff King
2019-12-20 17:00               ` Elijah Newren
2019-12-20 21:14                 ` Jeff King
2019-12-20 18:01               ` Junio C Hamano
2019-12-20 21:15                 ` Jeff King
2019-12-18 19:29       ` [PATCH v4 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-18 19:29       ` [PATCH v4 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget
2019-12-19 21:28       ` [PATCH v5 0/8] Directory traversal bugs Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 1/8] t3011: demonstrate directory traversal failures Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 2/8] Revert "dir.c: make 'git-status --ignored' work within leading directories" Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 3/8] dir: remove stray quote character in comment Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 4/8] dir: exit before wildcard fall-through if there is no wildcard Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 5/8] dir: break part of read_directory_recursive() out for reuse Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 6/8] dir: fix checks on common prefix directory Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 7/8] dir: synchronize treat_leading_path() and read_directory_recursive() Elijah Newren via GitGitGadget
2019-12-19 21:28         ` [PATCH v5 8/8] dir: consolidate similar code in treat_directory() Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BEkX9cH1=r3dJ4WLzcJKVcF-KpGUkshL34MMp3Xhhhpuw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=blees@dcon.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=kyle@kyleam.com \
    --cc=sxlijin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).