All of lore.kernel.org
 help / color / mirror / Atom feed
* how to speed up "git log"?
@ 2007-02-11 11:52 Bruno Haible
  2007-02-11 16:49 ` Johannes Schindelin
       [not found] ` <20070211152840.GA2781@steel.home>
  0 siblings, 2 replies; 23+ messages in thread
From: Bruno Haible @ 2007-02-11 11:52 UTC (permalink / raw)
  To: git

Hi,

Are there some known tricks to speed up the operation of "git log"?

On a file in a local copy of the coreutils git repository,
"git log tr.c > output" takes
  - 33 seconds of CPU time (33 user, 0 system) on a Linux/x86 500MHz system,
  - 24 seconds of CPU time (12 user, 12 system) on a MacOS X PowerPC 1.1 GHz
    system.
The result shows only 147 commits and a total of 40 KB textual output.

1) Why so much user CPU time?
2) Why so much system CPU time, but only on MacOS X?

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 11:52 how to speed up "git log"? Bruno Haible
@ 2007-02-11 16:49 ` Johannes Schindelin
  2007-02-11 23:00   ` Shawn O. Pearce
  2007-02-11 23:41   ` Bruno Haible
       [not found] ` <20070211152840.GA2781@steel.home>
  1 sibling, 2 replies; 23+ messages in thread
From: Johannes Schindelin @ 2007-02-11 16:49 UTC (permalink / raw)
  To: Bruno Haible; +Cc: git

Hi,

On Sun, 11 Feb 2007, Bruno Haible wrote:

> Are there some known tricks to speed up the operation of "git log"?
> 
> On a file in a local copy of the coreutils git repository,
> "git log tr.c > output" takes
>   - 33 seconds of CPU time (33 user, 0 system) on a Linux/x86 500MHz system,
>   - 24 seconds of CPU time (12 user, 12 system) on a MacOS X PowerPC 1.1 GHz
>     system.
> The result shows only 147 commits and a total of 40 KB textual output.

Yes, because there were only 147 commits which changed the file. But git 
looked at all commits to find that.

Basically, we don't do file versions. File versions do not make sense, 
since they strip away the context. See also

http://news.gmane.org/group/gmane.comp.version-control.git/thread=37838

for a real flamewar revolving around that very subject.

> 1) Why so much user CPU time?

See above.

Plus, you are probably not really interested in _all_ revisions changing 
that file, are you? Usually the output of git-log -- even with pathname 
filtering -- starts almost instantaneous, and is piped to your pager. So, 
your numbers are misleading.

> 2) Why so much system CPU time, but only on MacOS X?

Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?

Hth,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 16:49 ` Johannes Schindelin
@ 2007-02-11 23:00   ` Shawn O. Pearce
  2007-02-11 23:08     ` Johannes Schindelin
  2007-02-11 23:41   ` Bruno Haible
  1 sibling, 1 reply; 23+ messages in thread
From: Shawn O. Pearce @ 2007-02-11 23:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Bruno Haible, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 1) Why so much user CPU time?
> 
> See above.

Some of the ideas Nico and I have kicked around for a pack v4 (post
1.5.0, obviously) would speed up revision traversal by bypassing
some of the costly decompression overheads.
 
> > 2) Why so much system CPU time, but only on MacOS X?
> 
> Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?

What does 1.5.0-rc4 do here that didn't happen before?  Are you
referring to the mmap sliding window?  Because NO_MMAP might be
faster on MacOS X then using mmap (thanks to its slower mmap)... but
I can't say I have performance tested it either way.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:00   ` Shawn O. Pearce
@ 2007-02-11 23:08     ` Johannes Schindelin
  0 siblings, 0 replies; 23+ messages in thread
From: Johannes Schindelin @ 2007-02-11 23:08 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Bruno Haible, git

Hi,

On Sun, 11 Feb 2007, Shawn O. Pearce wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > 1) Why so much user CPU time?
> > 
> > See above.
> 
> Some of the ideas Nico and I have kicked around for a pack v4 (post
> 1.5.0, obviously) would speed up revision traversal by bypassing
> some of the costly decompression overheads.

Maybe. But my point (which you did not quote) was this: git log _starts_ 
very fast, and the information you are most likely after is shown right 
away. So I don't think it makes sense investing much time to enhance 
performance for a full log.

> > > 2) Why so much system CPU time, but only on MacOS X?
> > 
> > Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?
> 
> What does 1.5.0-rc4 do here that didn't happen before?  Are you
> referring to the mmap sliding window?

No. I was referring to v1.5.0-rc0~62, but was too lazy to look that up.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 16:49 ` Johannes Schindelin
  2007-02-11 23:00   ` Shawn O. Pearce
@ 2007-02-11 23:41   ` Bruno Haible
  2007-02-11 23:46     ` Shawn O. Pearce
                       ` (3 more replies)
  1 sibling, 4 replies; 23+ messages in thread
From: Bruno Haible @ 2007-02-11 23:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hello Johannes,

Thanks for the helpful answer.

> Yes, because there were only 147 commits which changed the file. But git 
> looked at all commits to find that.

Ouch.

> Basically, we don't do file versions. File versions do not make sense, 
> since they strip away the context.

Is there some other concept or command that git offers? I'm in the situation
where I know that 'tr' in coreutils version 5.2.1 had a certain bug and
version 6.4 does not have the bug, and I want to review all commits that
are relevant to this. I know that the only changes in tr.c are relevant
for this, and I'm interested in a display of the minimum amount of relevant
commit messages. If "git log" is not the right command for this question,
which command is it?

> > 2) Why so much system CPU time, but only on MacOS X?
> 
> Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?

No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
this command:
  git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
  git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:41   ` Bruno Haible
@ 2007-02-11 23:46     ` Shawn O. Pearce
  2007-02-11 23:56     ` Johannes Schindelin
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Shawn O. Pearce @ 2007-02-11 23:46 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Johannes Schindelin, git

Bruno Haible <bruno@clisp.org> wrote:
> Is there some other concept or command that git offers? I'm in the situation
> where I know that 'tr' in coreutils version 5.2.1 had a certain bug and
> version 6.4 does not have the bug, and I want to review all commits that
> are relevant to this. I know that the only changes in tr.c are relevant
> for this, and I'm interested in a display of the minimum amount of relevant
> commit messages. If "git log" is not the right command for this question,
> which command is it?

Two options come to mind:

  `git log v5.2.1..v6.4 -- tr.c`
  `git bisect`

The former has a few different flavors, e.g. you can run the
same arguments to `gitk` to view the changes in a graphical form.
The latter will help you do a binary search through the commits
which affected tr.c between the known good and known bad revisions,
allowing you to test the possible candidates for the defect.
 
> > > 2) Why so much system CPU time, but only on MacOS X?
> > 
> > Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?
> 
> No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
> this command:
>   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
>   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)

That's not so good... This is `git log -- tr.c >/dev/null` ?

-- 
Shawn.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
       [not found] ` <20070211152840.GA2781@steel.home>
@ 2007-02-11 23:52   ` Bruno Haible
  2007-02-17 19:19   ` Bruno Haible
  1 sibling, 0 replies; 23+ messages in thread
From: Bruno Haible @ 2007-02-11 23:52 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

Alex Riesen wrote:
> - do not use "tr.c", unless you really need it: git has to read more
>   of a commit in this case. Just "git log" takes only 0.9 sec on the
>   machine above.

"git log" is indeed faster, but is useless for the given task, since it doesn't
show which of the 4 megabytes of commit messages apply to tr.c.

> > On a file in a local copy of the coreutils git repository,
> > "git log tr.c > output" takes
> 
> Why do you need _all_ commits, btw?

I want to quickly find the cause of a behaviour change between tr.c of
coreutils 5.2.1 and the one of coreutils 6.4. It's a period of 1.5 years,
but limited to a single file. Can't git produce this quickly?

> > 2) Why so much system CPU time, but only on MacOS X?
> 
> MacOS X is famous for its bad perfomance when doing serious work.
> The mmap(2) of it, in particular.

But at least, a MacOS X machine is still interactively usable when it uses
6 times more swap than the machine's RAM size. Whereas a Linux 2.4 machine
is interactively unusable already with 1.5 to 2 times more swap than the
machine has RAM.

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:41   ` Bruno Haible
  2007-02-11 23:46     ` Shawn O. Pearce
@ 2007-02-11 23:56     ` Johannes Schindelin
  2007-02-11 23:59     ` Robin Rosenberg
  2007-02-12  4:20     ` Linus Torvalds
  3 siblings, 0 replies; 23+ messages in thread
From: Johannes Schindelin @ 2007-02-11 23:56 UTC (permalink / raw)
  To: Bruno Haible; +Cc: git

Hi,

On Mon, 12 Feb 2007, Bruno Haible wrote:

> > Yes, because there were only 147 commits which changed the file. But git 
> > looked at all commits to find that.
> 
> Ouch.

Not so ouch:

> > Basically, we don't do file versions. File versions do not make sense, 
> > since they strip away the context.

You could have it faster, but you'd break a very useful concept by doing 
so.

> Is there some other concept or command that git offers? I'm in the 
> situation where I know that 'tr' in coreutils version 5.2.1 had a 
> certain bug and version 6.4 does not have the bug, and I want to review 
> all commits that are relevant to this.

So, only look at those:

	git log v5.2.1..v6.4 tr.c

(provided you have the tags for the releases). You can start reviewing 
right away, since the output will start very fast (much faster than it 
takes to complete the log!).

If you want to get the patches to tr.c with the logs, just add "-p":

	git log -p v5.2.1..v6.4 tr.c

> > > 2) Why so much system CPU time, but only on MacOS X?
> > 
> > Probably the mmap() problem. Does it go away when you use git 
> > 1.5.0-rc4?
> 
> No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
> this command:
>   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
>   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)

Hmmm. I don't have MacOSX any more, so I cannot investigate. You might 
find this the perfect opening into working on git ;-)

Hth,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:41   ` Bruno Haible
  2007-02-11 23:46     ` Shawn O. Pearce
  2007-02-11 23:56     ` Johannes Schindelin
@ 2007-02-11 23:59     ` Robin Rosenberg
  2007-02-12  2:02       ` Bruno Haible
  2007-02-12  4:08       ` Junio C Hamano
  2007-02-12  4:20     ` Linus Torvalds
  3 siblings, 2 replies; 23+ messages in thread
From: Robin Rosenberg @ 2007-02-11 23:59 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Johannes Schindelin, git

måndag 12 februari 2007 00:41 skrev Bruno Haible:
> Hello Johannes,
> 
> Thanks for the helpful answer.
> 
> > Yes, because there were only 147 commits which changed the file. But git 
> > looked at all commits to find that.
> 
> Ouch.
> 
> > Basically, we don't do file versions. File versions do not make sense, 
> > since they strip away the context.
> 
> Is there some other concept or command that git offers? I'm in the situation
> where I know that 'tr' in coreutils version 5.2.1 had a certain bug and
> version 6.4 does not have the bug, and I want to review all commits that
> are relevant to this. I know that the only changes in tr.c are relevant
> for this, and I'm interested in a display of the minimum amount of relevant
> commit messages. If "git log" is not the right command for this question,
> which command is it?

Since you know that you are not interested in the whole history, you can limit your scan.

git log COREUTILS-5_2_1..COREUTILS-6_4 src/tr.c

> > > 2) Why so much system CPU time, but only on MacOS X?
> > 
> > Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?
> 
> No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
> this command:
>   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
>   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)

Could the UTF-8 stuff have anything to do with this?

-- robin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:59     ` Robin Rosenberg
@ 2007-02-12  2:02       ` Bruno Haible
  2007-02-12 11:19         ` Johannes Schindelin
  2007-02-12  4:08       ` Junio C Hamano
  1 sibling, 1 reply; 23+ messages in thread
From: Bruno Haible @ 2007-02-12  2:02 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Johannes Schindelin, git

Thanks for the responses.

Robin Rosenberg wrote:
> Since you know that you are not interested in the whole history, you can limit your scan.
> 
> git log COREUTILS-5_2_1..COREUTILS-6_4 src/tr.c

Thanks, that indeed does the trick: it reduces the time from 33 sec to 11 sec.

To reduce the time even more, and to allow more flexibility among the
search criteria (e.g. "I need the commits from date X to date Y, on this
file set, from anyone except me"), I would need to connect git to a database.
git cannot store all kinds of indices and reverse mappings to allow all
kinds of queries; that's really a classical database application area.

> > No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
> > this command:
> >   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
> >   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)
> 
> Could the UTF-8 stuff have anything to do with this?

Actually, no. Brown paper bag on me for doing benches in different
conditions. The timing difference is an effect of the buffer cache / page
cache:

  - After the second repetition of the command (i.e. when all files are cached
    in RAM), the timings are
        25 seconds real time, 24 seconds of CPU time (13 user, 11 system)
    both in git-1.4.4 and -1.5.0-rc4.

  - After unmounting and remounting the disk containing the repository (i.e.
    when none of the files are cached in RAM), the timings are
        49 seconds real time, 38 seconds of CPU time (20 user, 18 system)

Sorry for the false alarm.

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:59     ` Robin Rosenberg
  2007-02-12  2:02       ` Bruno Haible
@ 2007-02-12  4:08       ` Junio C Hamano
  2007-02-12  6:06         ` Shawn O. Pearce
  1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2007-02-12  4:08 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Bruno Haible, Johannes Schindelin, git

Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:

>> No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
>> this command:
>>   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
>>   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)
>
> Could the UTF-8 stuff have anything to do with this?

I doubt it -- sliding mmap() in the current git, while is a good
change overall for handling really huge repos, would most likely
perform poorer than the fixed mmap() in 1.4.4 series on
platforms with slow mmap(), most notably on MacOS X.

It _might_ be possible that turning some sliding mmap() calls
into pread() makes it perform better on MacOS X.

I wonder what happens it git is compiled with NO_MMAP there...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-11 23:41   ` Bruno Haible
                       ` (2 preceding siblings ...)
  2007-02-11 23:59     ` Robin Rosenberg
@ 2007-02-12  4:20     ` Linus Torvalds
  2007-02-12 11:27       ` Bruno Haible
  3 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2007-02-12  4:20 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Johannes Schindelin, git



On Mon, 12 Feb 2007, Bruno Haible wrote:

> Hello Johannes,
> 
> Thanks for the helpful answer.
> 
> > Yes, because there were only 147 commits which changed the file. But git 
> > looked at all commits to find that.
> 
> Ouch.

This should become a FAQ.

Git simply DOES NOT HAVE per-file history. And having it is actually a 
BUG in other systems.

Not having per-file history is what allows git to do

	git log directory-or-file-set

ratehr than being able to track just one file. You can't do it sanely 
with per-file history (because to tie the per-file histories back 
together in a logical sequence, you need the global history to sort it 
again!)

So:

 - git is "slow" on single-file things, because such things DON'T EVEN 
   EXIST in git!

   When you do "git log <path-limiter>", itreally always ends up being a 
   full git log. 

 - but this is fundamentally what allows you to track multiple directories 
   well. It's what makes things like "gitk drivers/scsi/" actually work, 
   where you really can see the history for a random *collection* of 
   files. Nobody else can do it, afaik, and git just considers a single 
   filename to be a case of the "random collection of files".

The example I gave to corecode was to do

	gitk builtin-rev-list.c
	gitk builtin-rev-parse.c
	gitk builtin-rev-parse.c builtin-rev-list.c

adn realize that doing the history for two files together is NOT AT ALL 
EQUIVALENT to doing the history for those files individually and stitching 
it together.

(The reason the above is a great example is that both of the files alone 
have a very simple linear history, but when you look at the *combined* 
history you actually see concurrent development, and merges: you see 
merge commits that simply don't "exist" when only looking at the history 
of one of them separately).

> Is there some other concept or command that git offers? I'm in the situation
> where I know that 'tr' in coreutils version 5.2.1 had a certain bug and
> version 6.4 does not have the bug, and I want to review all commits that
> are relevant to this. I know that the only changes in tr.c are relevant
> for this, and I'm interested in a display of the minimum amount of relevant
> commit messages. If "git log" is not the right command for this question,
> which command is it?

Do

	git log v5.2.1..v6.4 -- tr.c

(or whatever your tag-names for releases are) where you can limit the log 
generation cost by giving the beginning commit. But yeah, it *will* look 
at the whole history in between, so if there is a long long history 
between v5.2.1 and v6.4, you'll still end up using reasonable amounts of 
CPU.

> > Probably the mmap() problem. Does it go away when you use git 1.5.0-rc4?
> 
> No, it became even worse: git-1.5.0-rc4 is twice as slow as git-1.4.4 for
> this command:
>   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
>   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)

That's an interesting fact in itself. Do you have the repo available 
somewhere?

Yes, some of the operations can be improved upon by not wasting quite so 
much time uncompressing stuff, so we could at least help this a bit. But 
that's a long-term thing. The slowdown is bad, and that probably has some 
simple explanation.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-12  4:08       ` Junio C Hamano
@ 2007-02-12  6:06         ` Shawn O. Pearce
  2007-02-12  6:11           ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Shawn O. Pearce @ 2007-02-12  6:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, Bruno Haible, Johannes Schindelin, git

Junio C Hamano <junkio@cox.net> wrote:
> I doubt it -- sliding mmap() in the current git, while is a good
> change overall for handling really huge repos, would most likely
> perform poorer than the fixed mmap() in 1.4.4 series on
> platforms with slow mmap(), most notably on MacOS X.
> 
> It _might_ be possible that turning some sliding mmap() calls
> into pread() makes it perform better on MacOS X.
> 
> I wonder what happens it git is compiled with NO_MMAP there...

So I ran three trials, v1.5.0-rc4-26-gcc46a74 with and without
NO_MMAP against v1.4.4.4 on a freshly repacked git.git.

v150-mmap:
        3.33 real         3.12 user         0.05 sys
        3.32 real         3.12 user         0.05 sys
        3.34 real         3.12 user         0.05 sys

v150-nommap:
        3.46 real         3.13 user         0.16 sys
        3.43 real         3.13 user         0.16 sys
        3.46 real         3.13 user         0.16 sys

v1444-mmap:
        3.30 real         3.09 user         0.05 sys
        3.30 real         3.09 user         0.05 sys
        3.25 real         3.09 user         0.04 sys

CFLAGS="-O2"; the above timings are three representative samples
out of 10 runs each, all hot cache.

Clearly the sliding mmap window isn't hurting us in this case by
very much, and NO_MMAP really isn't helping matters at all.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-12  6:06         ` Shawn O. Pearce
@ 2007-02-12  6:11           ` Junio C Hamano
  2007-02-12  6:22             ` Shawn O. Pearce
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2007-02-12  6:11 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Robin Rosenberg, Bruno Haible, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> I doubt it -- sliding mmap() in the current git, while is a good
>> change overall for handling really huge repos, would most likely
>> perform poorer than the fixed mmap() in 1.4.4 series on
>> platforms with slow mmap(), most notably on MacOS X.
>> 
>> It _might_ be possible that turning some sliding mmap() calls
>> into pread() makes it perform better on MacOS X.
>> 
>> I wonder what happens it git is compiled with NO_MMAP there...
>
> So I ran three trials, v1.5.0-rc4-26-gcc46a74 with and without
> NO_MMAP against v1.4.4.4 on a freshly repacked git.git.

I do not think freshly repacked git.git is a good test case for
a real-world workload where this really matters.  Doesn't your
default pack window large enough to cover it with a single
window, or perhaps two at most?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-12  6:11           ` Junio C Hamano
@ 2007-02-12  6:22             ` Shawn O. Pearce
  2007-02-12  6:28               ` Shawn O. Pearce
  0 siblings, 1 reply; 23+ messages in thread
From: Shawn O. Pearce @ 2007-02-12  6:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, Bruno Haible, Johannes Schindelin, git

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > So I ran three trials, v1.5.0-rc4-26-gcc46a74 with and without
> > NO_MMAP against v1.4.4.4 on a freshly repacked git.git.
> 
> I do not think freshly repacked git.git is a good test case for
> a real-world workload where this really matters.  Doesn't your
> default pack window large enough to cover it with a single
> window, or perhaps two at most?

Its one window, maybe two, as git.git is ~12 MiB and the window
size is 1 MiB (NO_MMAP) or 32 MiB (with mmap).

On linux.git:

v150-mmap:
        2.23 real         1.99 user         0.10 sys
        2.19 real         1.98 user         0.10 sys
        2.19 real         1.98 user         0.10 sys

v150-nommap:
        2.63 real         1.99 user         0.50 sys
        2.67 real         1.98 user         0.51 sys
        2.63 real         1.99 user         0.51 sys

v1444:
        2.15 real         1.94 user         0.09 sys
        2.19 real         1.95 user         0.10 sys
        2.16 real         1.94 user         0.10 sys

Again, we aren't too far away from v1.4.4.4, but the NO_MMAP clearly
is hurting us, even on Mac OS X.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-12  6:22             ` Shawn O. Pearce
@ 2007-02-12  6:28               ` Shawn O. Pearce
  0 siblings, 0 replies; 23+ messages in thread
From: Shawn O. Pearce @ 2007-02-12  6:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, Bruno Haible, Johannes Schindelin, git

"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Junio C Hamano <junkio@cox.net> wrote:
> > "Shawn O. Pearce" <spearce@spearce.org> writes:
> > > So I ran three trials, v1.5.0-rc4-26-gcc46a74 with and without
> > > NO_MMAP against v1.4.4.4 on a freshly repacked git.git.

I probably should have mentioned, my run (in all cases) was:

	git rev-list HEAD -- Makefile 2>/dev/null

cheap, a file that exists pretty much everywhere, and that triggers
the path limiter in the revision walking code.

BTW, I discovered by accident tonight that this works:

	cp git-rev-list ../git-1444
	../git-1444 rev-list

which is so not something I would have expected.  :-) I honestly
expected the wrapper to puke and say it doesn't know what command
1444 is.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-12  2:02       ` Bruno Haible
@ 2007-02-12 11:19         ` Johannes Schindelin
  0 siblings, 0 replies; 23+ messages in thread
From: Johannes Schindelin @ 2007-02-12 11:19 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Robin Rosenberg, git

Hi,

On Mon, 12 Feb 2007, Bruno Haible wrote:

> Robin Rosenberg wrote:
> > Since you know that you are not interested in the whole history, you can limit your scan.
> > 
> > git log COREUTILS-5_2_1..COREUTILS-6_4 src/tr.c
> 
> Thanks, that indeed does the trick: it reduces the time from 33 sec to 11 sec.
> 
> To reduce the time even more, and to allow more flexibility among the 
> search criteria (e.g. "I need the commits from date X to date Y, on this 
> file set, from anyone except me"), I would need to connect git to a 
> database. git cannot store all kinds of indices and reverse mappings to 
> allow all kinds of queries; that's really a classical database 
> application area.

[in the following paragraph, "index" means the index on a classical 
database table]

And -- as everywhere else with classical databases -- you have to ask if 
it is worth it. Given the fact that a one-time use of such an index is 
_worse_ than doing it without index at all (building and writing the 
index is _at least_ as expensive as searching once without an index), I'd 
rather doubt it.

However, if you do similar kinds of searches quite often, it makes tons of 
sense to connect to a database. We already use sqlite in cvsserver, so I'd 
try that.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-12  4:20     ` Linus Torvalds
@ 2007-02-12 11:27       ` Bruno Haible
  0 siblings, 0 replies; 23+ messages in thread
From: Bruno Haible @ 2007-02-12 11:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, git

Linus,

> >   git-1.4.4: 25 seconds real time, 24 seconds of CPU time (12 user, 12 system)
> >   git-1.5.0: 50 seconds real time, 39 seconds of CPU time (20 user, 19 system)
> 
> That's an interesting fact in itself.

Sorry, these measurements happened to be done in different conditions:
repo fully cached in RAM vs. repo not yet in buffer cache / page cache.

When measured under the same conditions, no speed difference is visible
between git-1.4.4 and git-1.5.0-rc4.

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
       [not found] ` <20070211152840.GA2781@steel.home>
  2007-02-11 23:52   ` Bruno Haible
@ 2007-02-17 19:19   ` Bruno Haible
  2007-02-17 23:20     ` Johannes Schindelin
  2007-02-18  6:33     ` how to speed up "git log"? Shawn O. Pearce
  1 sibling, 2 replies; 23+ messages in thread
From: Bruno Haible @ 2007-02-17 19:19 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

Alex Riesen wrote:
> MacOS X is famous for its bad perfomance when doing serious work.
> The mmap(2) of it, in particular.

You can't blame MacOS X mmap(2) for git's slow execution of "git log".
Here are is execution times of "git log tr.c > output"

  - with git-1.5.0-rc4 built with -DNO_MMAP

      real    0m26.032s
      user    0m13.580s
      sys     0m11.730s

  - with git-1.5.0-rc4 built with the default settings:

      real    0m25.469s
      user    0m13.530s
      sys     0m11.490s

You can see that using mmap() provides a speedup of about 2% on MacOS X,
which is similar to the 4% than Shawn measured on Linux.

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-17 19:19   ` Bruno Haible
@ 2007-02-17 23:20     ` Johannes Schindelin
  2007-02-18  0:09       ` piped to a pager (was: how to speed up "git log"?) Bruno Haible
  2007-02-18  6:33     ` how to speed up "git log"? Shawn O. Pearce
  1 sibling, 1 reply; 23+ messages in thread
From: Johannes Schindelin @ 2007-02-17 23:20 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Alex Riesen, git

Hi,

On Sat, 17 Feb 2007, Bruno Haible wrote:

> Alex Riesen wrote:
> > MacOS X is famous for its bad perfomance when doing serious work.
> > The mmap(2) of it, in particular.
> 
> You can't blame MacOS X mmap(2) for git's slow execution of "git log".

No, but you can blame the person calling git log and waiting until it 
finishes. See the list archives for reasons why.

If this comes up one more time, I'm very tempted to write a scathing 
remark in the FAQ.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: piped to a pager (was: how to speed up "git log"?)
  2007-02-17 23:20     ` Johannes Schindelin
@ 2007-02-18  0:09       ` Bruno Haible
  2007-02-18  0:10         ` Johannes Schindelin
  0 siblings, 1 reply; 23+ messages in thread
From: Bruno Haible @ 2007-02-18  0:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin wrote:
> you can blame the person calling git log and waiting until it 
> finishes. See the list archives for reasons why.
... and earlier:
> Usually the output of git-log -- even with pathname 
> filtering -- starts almost instantaneous, and is piped to your pager.

The pager ('less') in a console is not a good solution for everone:
  - People used to GUI editors (kate, nedit, ...) miss a scroll bar for
    navigation. You can't use kate or nedit as a pager.
  - PAGER="vi -" also reads all input before it displays anything.
  - PAGER="xless" likewise.
  - In Emacs shell-mode, with PAGER="", you see the output as it is produced,
    but it's disturbing to work in a buffer which is growing, where the scrollbar
    continues to change its position.

It's OK for many people, but not for everyone.

Bruno

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: piped to a pager (was: how to speed up "git log"?)
  2007-02-18  0:09       ` piped to a pager (was: how to speed up "git log"?) Bruno Haible
@ 2007-02-18  0:10         ` Johannes Schindelin
  0 siblings, 0 replies; 23+ messages in thread
From: Johannes Schindelin @ 2007-02-18  0:10 UTC (permalink / raw)
  To: Bruno Haible; +Cc: git

Hi,

On Sun, 18 Feb 2007, Bruno Haible wrote:

> Johannes Schindelin wrote:
> > you can blame the person calling git log and waiting until it 
> > finishes. See the list archives for reasons why.
> ... and earlier:
> > Usually the output of git-log -- even with pathname 
> > filtering -- starts almost instantaneous, and is piped to your pager.
> 
> The pager ('less') in a console is not a good solution for everone:
>   - People used to GUI editors (kate, nedit, ...) miss a scroll bar for
>     navigation. You can't use kate or nedit as a pager.
>   - PAGER="vi -" also reads all input before it displays anything.
>   - PAGER="xless" likewise.
>   - In Emacs shell-mode, with PAGER="", you see the output as it is produced,
>     but it's disturbing to work in a buffer which is growing, where the scrollbar
>     continues to change its position.
> 
> It's OK for many people, but not for everyone.

So why don't you go scratch that itch, and write a decent GUI pager?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: how to speed up "git log"?
  2007-02-17 19:19   ` Bruno Haible
  2007-02-17 23:20     ` Johannes Schindelin
@ 2007-02-18  6:33     ` Shawn O. Pearce
  1 sibling, 0 replies; 23+ messages in thread
From: Shawn O. Pearce @ 2007-02-18  6:33 UTC (permalink / raw)
  To: Bruno Haible; +Cc: Alex Riesen, git

Bruno Haible <bruno@clisp.org> wrote:
> Alex Riesen wrote:
> > MacOS X is famous for its bad perfomance when doing serious work.
> > The mmap(2) of it, in particular.
> 
> You can see that using mmap() provides a speedup of about 2% on MacOS X,
> which is similar to the 4% than Shawn measured on Linux.

Uh, I was testing on Mac OS X (G4 PowerBook).

-- 
Shawn.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-02-18  6:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-11 11:52 how to speed up "git log"? Bruno Haible
2007-02-11 16:49 ` Johannes Schindelin
2007-02-11 23:00   ` Shawn O. Pearce
2007-02-11 23:08     ` Johannes Schindelin
2007-02-11 23:41   ` Bruno Haible
2007-02-11 23:46     ` Shawn O. Pearce
2007-02-11 23:56     ` Johannes Schindelin
2007-02-11 23:59     ` Robin Rosenberg
2007-02-12  2:02       ` Bruno Haible
2007-02-12 11:19         ` Johannes Schindelin
2007-02-12  4:08       ` Junio C Hamano
2007-02-12  6:06         ` Shawn O. Pearce
2007-02-12  6:11           ` Junio C Hamano
2007-02-12  6:22             ` Shawn O. Pearce
2007-02-12  6:28               ` Shawn O. Pearce
2007-02-12  4:20     ` Linus Torvalds
2007-02-12 11:27       ` Bruno Haible
     [not found] ` <20070211152840.GA2781@steel.home>
2007-02-11 23:52   ` Bruno Haible
2007-02-17 19:19   ` Bruno Haible
2007-02-17 23:20     ` Johannes Schindelin
2007-02-18  0:09       ` piped to a pager (was: how to speed up "git log"?) Bruno Haible
2007-02-18  0:10         ` Johannes Schindelin
2007-02-18  6:33     ` how to speed up "git log"? Shawn O. Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.