git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Martin Melka <martin.melka@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Samuel Lijin <sxlijin@gmail.com>
Subject: Re: git status --ignored hangs when a deep directory structure present in working tree
Date: Mon, 27 Jan 2020 21:06:01 -0800	[thread overview]
Message-ID: <CABPp-BGvU_DHQu66bqPZ+WXg5mL8bCP5Uxp4g5393WnWyO1Dhg@mail.gmail.com> (raw)
In-Reply-To: <20200127120837.GA10482@szeder.dev>

On Mon, Jan 27, 2020 at 4:08 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Mon, Jan 27, 2020 at 11:55:01AM +0100, Martin Melka wrote:
> > Hi all, I have ran across what might be a bug in git. When there is a
> > deep directory structure (tried on 100+ nested dirs), then git status
> > --ignored hangs indefinitely.
> > Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it
> > reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and
> > also on baremetal Ubuntu server (16.04, git 2.17.1).
> >
> > Steps to reproduce:
> >
> > 1. Generate the deep dir structure:
> >
> >     mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do
> > mkdir dir; cd dir; done; touch leaf; popd
> >
> > 2. Try to get git status --ignored
> >
> >     cd gittest && git status --ignored
> >
> >
> > If there is a dir depth limit, git should probably exit with an error
> > rather than getting stuck endlessly.
>
> This is interesting, thanks for the report.

Agreed.

> There is no such directory depth limit, but the runtime of 'git status
> --ignored' grows quickly with the depth of the untracked directory.
> Running this shell loop produces the numbers below:
>
> for depth in $(seq 10 30)
> do
>         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
>         rm -rf dir
>         mkdir -p $dirs
>         >$dirs/untracked-file
>         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> done
>
> 10: 0.01
> 11: 0.03
> 12: 0.05
> 13: 0.11
> 14: 0.23
> 15: 0.47
> 16: 0.97
> 17: 1.97
> 18: 3.88
> 19: 7.85
> 20: 16.29
> 21: 32.92
> 22: 76.24

Wow.

Really nice testcase, though, thanks.

> Beautifully quadratic, isn't it? :)

I think you mean exponential (in particular 2^n rather than n^2).

> Unless I messed up my numbers, with a depth of 120 directories it
> would take over 6*10^23 years to complete... so yeah, it does qualify
> as indefinitely.

No comment about how people today are spoiled and addicted to instant
gratification?  I mean, can't folks just be a little patient?  ;-)

> This slowdown was caused by commit df5bcdf83a (dir: recurse into
> untracked dirs for ignored files, 2017-05-18), which was part of a
> patch series to fix 'git clean -d' deleting untracked directories even
> if they contained ignored files.
>
> Cc'ing Samuel, author of that commit, and Elijah, who had quite some
> fun with 'dir.c' recently.

Heh, yes, what "fun" it was.

Anyway, after digging around for quite a bit today... that commit
added calling read_directory_recursive() directly from itself for
certain untracked paths.  This means that read_directory_recursive()
(which I'll abbreviate to r_d_r()), when we're dealing with certain
untracked paths:

  * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r()
  * Calls r_d_r() directly as well

So, from the toplevel directory, r_d_r() will call itself twice on the
next directory down.  For each of those, it'll call r_d_r() twice on
the second directory down.  From each of those, it'll call r_d_r()
twice on the third directory in the hierarchy and so on until we have
2^n calls to r_d_r() for the nth level deep directory.

Trying to back out the underlying problem, I _think_ the cause behind
all this is that r_d_r() and friends all use the path_treatment enum,
which says that the treatment of any path has to be one of those four
types. The dichotomy between path_untracked and path_recurse in
particular means the system has no way to mark that something should
be both marked as untracked and recursed into, yet we definitely need
to have some directories marked as untracked and we also need to
recurse into them.  This, I think led to Samuel's attempt to
workaround that dichotomy by having the code in r_d_r() check for
certain path_untracked cases which should also be recursed into.  I
think adding another type to the enum and shifting the logic elsewhere
might enable us to both simplify the logic and avoid this expensive
exponential behavior, but I haven't gotten it working yet.  We'll see
if my patch goes anywhere, or if it's just another dead-end among
many.


Elijah

  reply	other threads:[~2020-01-28  5:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-27 10:55 git status --ignored hangs when a deep directory structure present in working tree Martin Melka
2020-01-27 12:08 ` SZEDER Gábor
2020-01-28  5:06   ` Elijah Newren [this message]
2020-01-28 13:57     ` SZEDER Gábor
2020-01-28 15:00       ` Elijah Newren
2020-01-29 22:10     ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGvU_DHQu66bqPZ+WXg5mL8bCP5Uxp4g5393WnWyO1Dhg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.melka@gmail.com \
    --cc=sxlijin@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).