All of lore.kernel.org
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Martin Melka <martin.melka@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Samuel Lijin <sxlijin@gmail.com>
Subject: Re: git status --ignored hangs when a deep directory structure present in working tree
Date: Tue, 28 Jan 2020 14:57:07 +0100	[thread overview]
Message-ID: <20200128135707.GD10482@szeder.dev> (raw)
In-Reply-To: <CABPp-BGvU_DHQu66bqPZ+WXg5mL8bCP5Uxp4g5393WnWyO1Dhg@mail.gmail.com>

On Mon, Jan 27, 2020 at 09:06:01PM -0800, Elijah Newren wrote:
> > the runtime of 'git status
> > --ignored' grows quickly with the depth of the untracked directory.
> > Running this shell loop produces the numbers below:
> >
> > for depth in $(seq 10 30)
> > do
> >         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >         rm -rf dir
> >         mkdir -p $dirs
> >         >$dirs/untracked-file
> >         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> > done
> >
> > 10: 0.01
> > 11: 0.03
> > 12: 0.05
> > 13: 0.11
> > 14: 0.23
> > 15: 0.47
> > 16: 0.97
> > 17: 1.97
> > 18: 3.88
> > 19: 7.85
> > 20: 16.29
> > 21: 32.92
> > 22: 76.24
> 
> Wow.
> 
> Really nice testcase, though, thanks.
> 
> > Beautifully quadratic, isn't it? :)
> 
> I think you mean exponential (in particular 2^n rather than n^2).

Ouch, yeah, indeed.

> > Unless I messed up my numbers, with a depth of 120 directories it
> > would take over 6*10^23 years to complete... so yeah, it does qualify
> > as indefinitely.
> 
> No comment about how people today are spoiled and addicted to instant
> gratification?  I mean, can't folks just be a little patient?  ;-)

Nope.  Notice how my shell loop above goes to 30, but the results only
to 22 :)

> > This slowdown was caused by commit df5bcdf83a (dir: recurse into
> > untracked dirs for ignored files, 2017-05-18), which was part of a
> > patch series to fix 'git clean -d' deleting untracked directories even
> > if they contained ignored files.
> >
> > Cc'ing Samuel, author of that commit, and Elijah, who had quite some
> > fun with 'dir.c' recently.
> 
> Heh, yes, what "fun" it was.
> 
> Anyway, after digging around for quite a bit today... that commit
> added calling read_directory_recursive() directly from itself for
> certain untracked paths.  This means that read_directory_recursive()
> (which I'll abbreviate to r_d_r()), when we're dealing with certain
> untracked paths:
> 
>   * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r()
>   * Calls r_d_r() directly as well
> 
> So, from the toplevel directory, r_d_r() will call itself twice on the
> next directory down.  For each of those, it'll call r_d_r() twice on
> the second directory down.  From each of those, it'll call r_d_r()
> twice on the third directory in the hierarchy and so on until we have
> 2^n calls to r_d_r() for the nth level deep directory.

Got it, thanks.

> Trying to back out the underlying problem, I _think_ the cause behind
> all this is that r_d_r() and friends all use the path_treatment enum,
> which says that the treatment of any path has to be one of those four
> types. The dichotomy between path_untracked and path_recurse in
> particular means the system has no way to mark that something should
> be both marked as untracked and recursed into, yet we definitely need
> to have some directories marked as untracked and we also need to
> recurse into them.  This, I think led to Samuel's attempt to
> workaround that dichotomy by having the code in r_d_r() check for
> certain path_untracked cases which should also be recursed into.  I
> think adding another type to the enum and shifting the logic elsewhere
> might enable us to both simplify the logic and avoid this expensive
> exponential behavior, but I haven't gotten it working yet.  We'll see
> if my patch goes anywhere, or if it's just another dead-end among
> many.

I was wondering whether it would make sense to give the enum contants
power-of-two values, so we could say 'path_recurse | path_untracked'.
But while this particular combination makes sense, others, at least to
my superficial understanding, not at all (e.g. 'path_recurse |
path_exclude').


  reply	other threads:[~2020-01-28 13:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-27 10:55 git status --ignored hangs when a deep directory structure present in working tree Martin Melka
2020-01-27 12:08 ` SZEDER Gábor
2020-01-28  5:06   ` Elijah Newren
2020-01-28 13:57     ` SZEDER Gábor [this message]
2020-01-28 15:00       ` Elijah Newren
2020-01-29 22:10     ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200128135707.GD10482@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.melka@gmail.com \
    --cc=newren@gmail.com \
    --cc=sxlijin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.