Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Martin Melka" <martin.melka@gmail.com>,
	"Samuel Lijin" <sxlijin@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one
Date: Fri, 31 Jan 2020 09:47:45 -0800
Message-ID: <CABPp-BG8wAmOL03mU2Tf8woVKdWw7o6muXenMH4G-Z6_kBEajQ@mail.gmail.com> (raw)
In-Reply-To: <20200131171341.GH10482@szeder.dev>

On Fri, Jan 31, 2020 at 9:13 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Wed, Jan 29, 2020 at 10:03:42PM +0000, Elijah Newren via GitGitGadget wrote:
> > Part of the tension noted above is that the treatment of a directory can
> > changed based on the files within it, and based on the various settings
>
> s/changed/change/, or perhaps s/changed/be changed/ ?
>
> > Since dir.c is somewhat complex, extra cruft built up around this over
> > time.  While trying to unravel it, I noticed several instances where the
> > first call to read_directory_recursive() would return e.g.
> > path_untracked for a some directory and a later one would return e.g.
>
> s/for a some/for some/
>
> > However, on the positive side, it does make the code much faster.  For
> > the following simple shell loop in an empty repository:
> >
> >   for depth in $(seq 10 25)
> >   do
> >     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >     rm -rf dir
> >     mkdir -p $dirs
> >     >$dirs/untracked-file
> >     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> >   done
> >
> > I saw the following timings, in seconds (note that the numbers are a
> > little noisy from run-to-run, but the trend is very clear with every
> > run):
> >
> >     10: 0.03
> >     11: 0.05
> >     12: 0.08
> >     13: 0.19
> >     14: 0.29
> >     15: 0.50
> >     16: 1.05
> >     17: 2.11
> >     18: 4.11
> >     19: 8.60
> >     20: 17.55
> >     21: 33.87
> >     22: 68.71
> >     23: 140.05
> >     24: 274.45
> >     25: 551.15
> >
> > After this fix, those drop to:
> >
> >     10: 0.00
> >     11: 0.00
> >     12: 0.00
> >     13: 0.00
> >     14: 0.00
> >     15: 0.00
> >     16: 0.00
> >     17: 0.00
> >     18: 0.00
> >     19: 0.00
> >     20: 0.00
> >     21: 0.00
> >     22: 0.00
> >     23: 0.00
> >     24: 0.00
> >     25: 0.00
>
> I agree with Derrick here: if you just said that all these report
> 0.00, I would have taken your word for it.

Thanks, I'll include all these fixes.  Good timing too, as I was about
to send a re-roll.

> Having said that...  I don't know how to get more decimal places out
> of /use/bin/time, but our trace performance facility uses nanosecond
> resolution timestamps.  So using this command in the loop above:
>
>   GIT_TRACE_PERFORMANCE=2 git status --ignored 2>&1 >/dev/null |
>     sed -n -e "s/.* performance: \(.*\): git command.*/$depth: \1/p"
>
> gave me this:
>
>   1: 0.000574302 s
>   2: 0.000584995 s
>   3: 0.000608684 s
>   4: 0.000951336 s
>   5: 0.000762019 s
>   6: 0.000816685 s
>   7: 0.000672516 s
>   8: 0.000912628 s
>   9: 0.000661538 s
>   10: 0.000687465 s
>   11: 0.000708880 s
>   12: 0.000693754 s
>   13: 0.000726120 s
>   14: 0.000737334 s
>   15: 0.000787362 s
>   16: 0.000856687 s
>   17: 0.000780892 s
>   18: 0.000790798 s
>   19: 0.000834411 s
>   20: 0.000859094 s
>   21: 0.001230912 s
>   22: 0.001048852 s
>   23: 0.000891057 s
>   24: 0.000934097 s
>   25: 0.001051704 s
>
> Not sure it's worth including, though.

Yeah, I'm afraid people will spend time trying to analyze it and the
numbers are extremely noisy.  I instead included some words about
counting the number of untracked files opened according to strace,
which shows before we had 2^(1+$depth)-2 untracked directories get
opened and after we had exactly $depth get opened.

  reply index

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-29 22:03 [PATCH 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-29 22:03 ` [PATCH 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-30 15:20   ` Derrick Stolee
2020-01-31 18:04   ` SZEDER Gábor
2020-01-31 18:17     ` Elijah Newren
2020-01-29 22:03 ` [PATCH 4/6] dir: move setting of nested_repo next to its actual usage Elijah Newren via GitGitGadget
2020-01-30 15:33   ` Derrick Stolee
2020-01-30 15:45     ` Elijah Newren
2020-01-30 16:00       ` Derrick Stolee
2020-01-30 16:10         ` Derrick Stolee
2020-01-30 16:20           ` Elijah Newren
2020-01-30 18:17             ` Derrick Stolee
2020-01-29 22:03 ` [PATCH 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-30 15:55   ` Derrick Stolee
2020-01-30 17:13     ` Elijah Newren
2020-01-30 17:45       ` Elijah Newren
2020-01-31 17:13   ` SZEDER Gábor
2020-01-31 17:47     ` Elijah Newren [this message]
2020-01-29 22:03 ` [PATCH 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-01-31 18:31 ` [PATCH v2 0/6] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 1/6] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 2/6] dir: fix broken comment Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 3/6] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 4/6] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 5/6] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-01-31 18:31   ` [PATCH v2 6/6] t7063: blindly accept diffs Elijah Newren via GitGitGadget
2020-03-25 19:31   ` [PATCH v3 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 1/7] t7063: correct broken test expectation Elijah Newren via GitGitGadget
2020-03-26 13:02       ` Derrick Stolee
2020-03-26 21:18         ` Elijah Newren
2020-03-25 19:31     ` [PATCH v3 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-25 19:31     ` [PATCH v3 7/7] dir: replace exponential algorithm with a linear one, fix untracked cache Elijah Newren via GitGitGadget
2020-03-26 13:13       ` Derrick Stolee
2020-03-26 21:27     ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 1/7] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-03-27 13:09         ` Derrick Stolee
2020-03-29 18:18           ` Junio C Hamano
2020-03-31 20:15             ` Elijah Newren
2020-03-26 21:27       ` [PATCH v4 2/7] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 3/7] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 4/7] dir: fix broken comment Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 5/7] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 6/7] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-03-26 21:27       ` [PATCH v4 7/7] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-03-27 13:13       ` [PATCH v4 0/7] Avoid multiple recursive calls for same path in read_directory_recursive() Derrick Stolee
2020-03-28 17:33         ` Elijah Newren
2020-03-29 18:20           ` Junio C Hamano
2020-04-01  4:17       ` [PATCH v5 00/12] " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 01/12] t7063: more thorough status checking Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 02/12] t3000: add more testcases testing a variety of ls-files issues Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 03/12] dir: fix simple typo in comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 04/12] dir: consolidate treat_path() and treat_one_path() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 05/12] dir: fix broken comment Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 06/12] dir: fix confusion based on variable tense Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 07/12] dir: refactor treat_directory to clarify control flow Derrick Stolee via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 08/12] dir: replace exponential algorithm with a linear one Elijah Newren via GitGitGadget
2020-04-01 13:57           ` Derrick Stolee
2020-04-01 15:59             ` Elijah Newren
2020-04-01  4:17         ` [PATCH v5 09/12] dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory() Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 10/12] dir: replace double pathspec matching with single " Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 11/12] Fix error-prone fill_directory() API; make it only return matches Elijah Newren via GitGitGadget
2020-04-01  4:17         ` [PATCH v5 12/12] completion: fix 'git add' on paths under an untracked directory Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BG8wAmOL03mU2Tf8woVKdWw7o6muXenMH4G-Z6_kBEajQ@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=martin.melka@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=sxlijin@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git