git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Lars Hjemli <hjemli@gmail.com>
Cc: Steffen Prohaska <prohaska@zib.de>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: limiting rename detection during merge is a really bad idea
Date: Mon, 11 Feb 2008 06:35:16 -0500	[thread overview]
Message-ID: <20080211113516.GB6344@coredump.intra.peff.net> (raw)
In-Reply-To: <20080211110816.GA6344@coredump.intra.peff.net>

On Mon, Feb 11, 2008 at 06:08:16AM -0500, Jeff King wrote:

> The mega-commit I was playing with that caused Linus to suggest
> diff.renamelimit in the first place is 374 by 641 (src by dest) and
> completes in ~15 minutes. The case recently reported in "git-revert is a
> memory hog" is 3541 by 8043, and doesn't complete ever.  We limit to 100
> by 100 by default.

I tried putting together a simple test script to see how much processing
time was taken. It basically does rename detection on an NxN matrix.
Each file is about 21K (the average size of a file in the linux-2.6
repository).

The results are what you would expect from an O(n^2) algorithm:

   N   CPU seconds
  10          0.43
 100          0.44
 200          1.40
 400          4.87
 800         18.08
1000         27.82

So for average repositories, we could probably bump the default rename
limit by a factor of 10 for merges (though I think I would keep it under
400 or so for "git log"). Note that this conflicts with the
"mega-commit" I mentioned above; that repository has a much larger
average file size (around 1M). But I think it makes sense to set the
default based on more common repositories.

An alternative would be to set an alarm and just give up on rename
detection after N seconds (which is really what the user wants anyway).

My test script is below (it references the file 'sample', which is
actually a copy of arch/m68/Kconfig, which just happens to have a
size close to the repository mean).

-Peff

-- >8 --
#!/bin/sh

n=$1; shift

rm -rf repo
mkdir repo && cd repo
git init

mkdata() {
  mkdir $1
  for i in `seq 1 $2`; do
    (sed "s/^/$i /" <../sample
     echo tag: $1
    ) >$1/$i
  done
}

mkdata initial $n
git add .
git commit -m initial

mkdata new $n
git add .
rm -rf initial
git commit -a -m new

time git-diff-tree -M -l0 --summary HEAD^ HEAD

      parent reply	other threads:[~2008-02-11 11:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-11  6:19 limiting rename detection during merge is a really bad idea Steffen Prohaska
2008-02-11  7:42 ` Marco Costalba
2008-02-11  7:48 ` Jeff King
2008-02-11  7:55   ` Marco Costalba
2008-02-11  8:03     ` Jeff King
2008-02-11 10:41   ` Lars Hjemli
2008-02-11 11:08     ` Jeff King
2008-02-11 11:20       ` Santi Béjar
2008-02-11 11:40         ` Jeff King
2008-02-11 13:29         ` Steffen Prohaska
2008-02-11 11:35       ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080211113516.GB6344@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=hjemli@gmail.com \
    --cc=prohaska@zib.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).