All of lore.kernel.org
 help / color / mirror / Atom feed
* More precise tag following
@ 2007-01-26 11:07 Junio C Hamano
  2007-01-26 11:53 ` Junio C Hamano
  2007-01-27  8:01 ` Shawn O. Pearce
  0 siblings, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-26 11:07 UTC (permalink / raw)
  To: git

What if (I know, this discussion does not belong here until
1.5.0 final) we had a "reverse" database that keeps track of
what tag references which object, and "git rev-list" knows how
to exploit it?  That is, just like generating a list of objects
that are reachable with --objects option, if we can add a new
option --with-tag very cheaply to list tag objects that would
reach what are in the generated list of objects?

The way current git-fetch "follows" tags is very imprecise,
although it is good enough in practice.  If you happen to
locally have an object that is tagged (and currently we get the
list of non-tag objects that tags eventually refer to in an
out-of-band-ish way), then we fetch the tag and everything it
reaches.  This means if you copied a single commit that is tagged
from somewhere without objects that it refers to, we would end
up fetching beyond that commit to complete it.  Which would not
result in a corrupted repository, but ideally we should not be
fetching the tag in such a case.  And with something like
enhanced rev-list that knows --with-tag it might be possible (I
need to think a bit more about have/want exchange and what
should happen later in fetch-pack and push-pack protocol,
though).

The application of this actually may not be limited to tag
following.  We could define a tag-like objects that attaches to
other objects and enhance its meanings (annotates them) and
treat them the same way as tags for objects traversal and
transfer purposes, so if we were to do this, the option to
exploit reverse database would be called --with-annotation and
not --with-tag.

 - If a single-path following turns out to be too expensive
   (there was a longstanding talk about "git log --single-follow
   $path"; "git blame" also follows a single path although the
   target it follows can fork into two or more when following
   cut&pastes) because we need to explode multi-level trees for
   each commit while traversing commit ancestry, we could define
   an annotation to a commit that lists the set of paths the
   commit touches relative to each of its parents (so the object
   contains N lists of paths), so that pathspec limiting needs
   to open and read only one object to figure out that the trees
   do not have to be opened to skip the commit and/or a merge
   can be simplified.

 - We could define an annotation to a commit that describes what
   fake parents it should have instead of the real ones
   (i.e. grafts implemented in the object database).

Just an idle thought.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-26 11:07 More precise tag following Junio C Hamano
@ 2007-01-26 11:53 ` Junio C Hamano
  2007-01-27  8:01 ` Shawn O. Pearce
  1 sibling, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-26 11:53 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> writes:

> What if (I know, this discussion does not belong here until
> 1.5.0 final) we had a "reverse" database that keeps track of
> what tag references which object, and "git rev-list" knows how
> to exploit it?  That is, just like generating a list of objects
> that are reachable with --objects option, if we can add a new
> option --with-tag very cheaply to list tag objects that would
> reach what are in the generated list of objects?
> ...
> Just an idle thought.

Nah, that would not work.  Please disregard.

The thing is, you cannot sanely traverse and transfer objects
that have such reverse connectivity.  The other end can annotate
a commit long after you acquired it, and trying to make
git-fetch to retrieve newly created annotations would mean
breaking "things reachable from your local refs are complete and
there is nothing missing from them" invariant.  Pushing into
somebody else after annotating an old commit that already exists
at the other end has the same issue.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-26 11:07 More precise tag following Junio C Hamano
  2007-01-26 11:53 ` Junio C Hamano
@ 2007-01-27  8:01 ` Shawn O. Pearce
  2007-01-27  8:41   ` Junio C Hamano
                     ` (3 more replies)
  1 sibling, 4 replies; 92+ messages in thread
From: Shawn O. Pearce @ 2007-01-27  8:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> What if (I know, this discussion does not belong here until
> 1.5.0 final) we had a "reverse" database that keeps track of
> what tag references which object, and "git rev-list" knows how
> to exploit it?

It'd be useful.  In as many ways as you suggest.  But it also
would be downright difficult to transfer between repositories,
as you have already pointed out in a public forum to yourself. :-)

>  - If a single-path following turns out to be too expensive
>    (there was a longstanding talk about "git log --single-follow
>    $path"; "git blame" also follows a single path although the
>    target it follows can fork into two or more when following
>    cut&pastes) because we need to explode multi-level trees for
>    each commit while traversing commit ancestry, we could define
>    an annotation to a commit that lists the set of paths the
>    commit touches relative to each of its parents (so the object
>    contains N lists of paths), so that pathspec limiting needs
>    to open and read only one object to figure out that the trees
>    do not have to be opened to skip the commit and/or a merge
>    can be simplified.

_THIS_ is worth doing.  I've been having a lot of discussion on
#git with Simon 'corecode' Schubert and Chris Lee about how poorly
git-blame performs compared to its counterpart in Subversion.

Basically Subversion is storing file-level revision ancestry data
within its commit data, allowing it to quickly skip back through
the commits which modified that file.

Git doesn't have this information and must instead traverse the
entire DAG, at least until the file was initially added to the tree.
With 440,000+ revisions and a file which exists in nearly all
of them, getting anything from "git-blame" or "git-log -- foo.c"
takes ages.

Based on some (limited) profiling with Shark it seems we spend about
50% of our CPU time doing zlib decompression of objects and almost
another 14% parsing the tree objects to apply the path limiter.
The only way to speed up blame/log operations is to reduce the number
of decompressions we need to do to the bare minium, and maybe also
reduce the tree parsing overheads.  Do that and we can maybe drop
the running time to 1/4th the current time.


One idea Simon and I were talking about was to store a reverse
file/tree-level DAG in the header of each tree/blob object in the
pack file.  This additional header would be something like a list
of triplets:

  (descendant-commit, ancestor-commit, ancestor-object)

where:

  descendant-commit: the "current" commit being looked at.
  ancestor-commit:   the commit which descendant-commit
                     derives from (directly or indirectly)
  ancestor-object:   prior version (descendant-commit^:path)

This triplet would probably be encoded with descendant-commit using
OBJ_REF, ancestor-commit being an OBJ_OFS style back reference within
the pack (or OBJ_REF if not in this pack) and ancestor-object would
also be an OBJ_REF.  So a triplet probably would wind up costing
~60 bytes.

Triplets would only be stored if ancestor-object != this-object, so
basically only for the commits which changed the path the tree/blob
is occupying in descendant-commit.

Finding the prior revision of a tree or file would be a matter of
finding the triplet which matches the current commit, then jumping
through to the ancestor-* values.  If no triplet matches the current
commit then we peel back the parents of the current commit and try
again with those.  Worst case we do what we do now, which is walk
the DAG.  ;-)

This of course penalizes objects which don't ever change, as we'd
have to walk back a good chunk of the DAG before we find a matching
triplet.  But I would suspect that files which never change are
also not given to log/blame very often either.  And once we do find
a triplet, we can skip through the DAG in time proportional to the
rate of change for the path, rather than to the entire repository.


Thoughts?

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27  8:01 ` Shawn O. Pearce
@ 2007-01-27  8:41   ` Junio C Hamano
  2007-01-27 13:33     ` Jeff King
  2007-01-27 17:47     ` Nicolas Pitre
  2007-01-27  9:04   ` Simon 'corecode' Schubert
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-27  8:41 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Based on some (limited) profiling with Shark it seems we spend about
> 50% of our CPU time doing zlib decompression of objects and almost
> another 14% parsing the tree objects to apply the path limiter.

I once tried to use zlib compression level 0 for tree objects
and did not see much difference -- maybe I should dig it up and
find out why.

> One idea Simon and I were talking about was to store a reverse
> file/tree-level DAG in the header of each tree/blob object in the
> pack file.
> ...
> Thoughts?

Anything you would do, storing that in tree is wrong.  Tree
object only represents just the contents of a single state and
in itself there should not be any information that describes its
relation with other trees [*1*].

And of course making it pack-only is doubly wrong.


*1* That's why my thinking-aloud talked about "N list of changed
paths recorded in a commit object with N parents".  A commit is
to talk about one particular state (i.e. tree) and its relation
to other commits (and by indirection, other trees), so logically
the information could belong there --- that is merely a "could",
since that is strictly caching for performance.  After finding
where the bottleneck is, obviously finding a way to optimize the
tree pathlimiting with the currently available data without
having such redundant data is more preferable.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27  8:01 ` Shawn O. Pearce
  2007-01-27  8:41   ` Junio C Hamano
@ 2007-01-27  9:04   ` Simon 'corecode' Schubert
  2007-01-27 12:58   ` Johannes Schindelin
  2007-01-27 17:22   ` Linus Torvalds
  3 siblings, 0 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-27  9:04 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2828 bytes --]

Shawn O. Pearce wrote:
> This triplet would probably be encoded with descendant-commit using
> OBJ_REF, ancestor-commit being an OBJ_OFS style back reference within
> the pack (or OBJ_REF if not in this pack) and ancestor-object would
> also be an OBJ_REF.  So a triplet probably would wind up costing
> ~60 bytes.

I'd propose the following:

Have a "shadow" tree which is storing DAG information per entry of the original tree.  keep this shadow tree sorted in the same order the original tree object is.  for simplicity i will now add the new information to the tree examples, but of course this new data cannot be stored in the tree itself and can't be hashed.

tree
<filename> <mode> <object> (<new: ancestor commit> <new: ancestor object>)*

the ancestor object isn't necessary, it would just speed up annotations.  considering that we want to look only at the path, it is superfluous.  if we want to traverse copies/moves as well, this could be helpful.

now, this information allows us to build a object-level (read: path identifies object) DAG, by starting out on the current tree and retrieving the associated information.  using this it is possible to jump to the next commit/tree which changed the object and start over.

expense:  8 or 40 bytes per parent, per object, per commit, for each tree modified.

per parent:  we of course store information about all parents
per object:  as this is a "shadow" tree, we need to annotate all entries and not just changed ones
for each tree modified:  (sub)trees not being modified of course do not need annotation *again*

but:  this shadow tree can be deltified quite tightly, i'd say.  possibly with a different, specialized tree delta method.  then this boils down to 8 or 40 bytes per *changed* object, plus delta overhead.

> This of course penalizes objects which don't ever change, as we'd
> have to walk back a good chunk of the DAG before we find a matching
> triplet.  But I would suspect that files which never change are
> also not given to log/blame very often either.  And once we do find
> a triplet, we can skip through the DAG in time proportional to the
> rate of change for the path, rather than to the entire repository.

using my proposal this penalty does not exist.  i think it would be really awkward to have the annotation of a never-changed Makefile to take way longer than the operation on a recently/often changed file.

we'd have to compare the space requirements of both approaches.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27  8:01 ` Shawn O. Pearce
  2007-01-27  8:41   ` Junio C Hamano
  2007-01-27  9:04   ` Simon 'corecode' Schubert
@ 2007-01-27 12:58   ` Johannes Schindelin
  2007-01-27 13:50     ` Simon 'corecode' Schubert
  2007-01-27 17:22   ` Linus Torvalds
  3 siblings, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2007-01-27 12:58 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

Hi,

On Sat, 27 Jan 2007, Shawn O. Pearce wrote:

> I've been having a lot of discussion on #git with Simon 'corecode' 
> Schubert and Chris Lee about how poorly git-blame performs compared to 
> its counterpart in Subversion.

Well, I don't run git-blame too often, and then it is mostly confined to a 
certain file / lines combo. So git-blame is fast enough for me.

It is slower than Subversion's counterpart, just because SVN's blame 
sucks. You cannot find out the _relevant_ information easily, i.e. once 
you merged something, the _merge_ gets attributed for the change (at least 
the last time I tried it).

So, don't blame blame for being useful in git.

Of course, you could introduce a cache, but then, I don't run blame _that_ 
often.

Besides, we already introduced an orthogonal historisation by reflogs, and 
your method would not cope gracefully with that, would it?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27  8:41   ` Junio C Hamano
@ 2007-01-27 13:33     ` Jeff King
  2007-01-27 17:47     ` Nicolas Pitre
  1 sibling, 0 replies; 92+ messages in thread
From: Jeff King @ 2007-01-27 13:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

On Sat, Jan 27, 2007 at 12:41:54AM -0800, Junio C Hamano wrote:

> > Based on some (limited) profiling with Shark it seems we spend about
> > 50% of our CPU time doing zlib decompression of objects and almost
> > another 14% parsing the tree objects to apply the path limiter.
> 
> I once tried to use zlib compression level 0 for tree objects
> and did not see much difference -- maybe I should dig it up and
> find out why.

I don't know exactly what Shawn meant, but a considerable amount of time
in a blame is spent decompressing the blobs. Just for fun, some numbers:

Fully packed, warm cache, core.compression = -1:
$ time git blame Makefile >/dev/null
real    0m5.537s
user    0m5.500s
sys     0m0.032s

Fully packed, warm cache, core.compression = 0:
$ time git blame Makefile >/dev/null
real    0m3.001s
user    0m2.984s
sys     0m0.012s

That's 45% savings. The resulting pack sizes are 11932K compressed and
22308 uncompressed.

-Peff

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 12:58   ` Johannes Schindelin
@ 2007-01-27 13:50     ` Simon 'corecode' Schubert
  2007-01-27 16:30       ` Jakub Narebski
  2007-01-27 16:46       ` Johannes Schindelin
  0 siblings, 2 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-27 13:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

Johannes Schindelin wrote:
> It is slower than Subversion's counterpart, just because SVN's blame 
> sucks. You cannot find out the _relevant_ information easily, i.e. once 
> you merged something, the _merge_ gets attributed for the change (at least 
> the last time I tried it).
> 
> So, don't blame blame for being useful in git.

Your reasoning is backwards.  Git's blame (or fwiw, rev-list path.name) is not slower because it is doing a better job (I can't tell, I don't use svn), but because it uses an algorithm which doesn't scale.  rev-list and blame are O(number of commits between HEAD and root) and not O(number of commits affecting path).  It might be sufficient for git.git, but certainly not for projects with a long history.  we are talking KDE, FreeBSD, OOo, something like this.  They each got about 400k commits.  It takes literally *minutes* to get a rev-list or a blame for a certain path.  The algorithm simply does not scale.  And this has nothing to do with superior output, because hg does it in O(num_of_file_revs), so it *can* be done.

> Of course, you could introduce a cache, but then, I don't run blame _that_ 
> often.

I don't think a cache is the right way.  I'd call the right idea "auxillary information".

> Besides, we already introduced an orthogonal historisation by reflogs, and 
> your method would not cope gracefully with that, would it?

I don't see how reflogs can play into this.  After all we're talking about the series of commits the blob experienced to get into its current state, not the series of actions it took this repo to contain this blob.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 13:50     ` Simon 'corecode' Schubert
@ 2007-01-27 16:30       ` Jakub Narebski
  2007-01-27 17:36         ` Linus Torvalds
  2007-01-27 16:46       ` Johannes Schindelin
  1 sibling, 1 reply; 92+ messages in thread
From: Jakub Narebski @ 2007-01-27 16:30 UTC (permalink / raw)
  To: git

Simon 'corecode' Schubert wrote:
> Johannes Schindelin wrote:

>> It is slower than Subversion's counterpart, just because SVN's blame 
>> sucks. You cannot find out the _relevant_ information easily, i.e. once 
>> you merged something, the _merge_ gets attributed for the change (at
>> least the last time I tried it).
>> 
>> So, don't blame blame for being useful in git.
> 
> Your reasoning is backwards.  Git's blame (or fwiw, rev-list path.name)
> is not slower because it is doing a better job (I can't tell, I don't
> use svn), but because it uses an algorithm which doesn't scale.
> rev-list and blame are O(number of commits between HEAD and root) and not
> O(number of commits affecting path).  It might be sufficient for git.git,
> but certainly not for projects with a long history.  we are talking KDE,
> FreeBSD, OOo, something like this.  They each got about 400k commits.
> It takes literally *minutes* to get a rev-list or a blame for a certain
> path.  The algorithm simply does not scale.  And this has nothing to do
> with superior output, because hg does it in O(num_of_file_revs), so it
> *can* be done.          

Mercurial (hg) has different repository structure, with changesets in
per filename "buckets", tied together with mainfest file and changelog
file. So it is easy to get per file history in hg, while it is harder
to get per commit (general) history; contrary to git where it is easy
to get per commit (general) history, and it is harder to get per file
history.

On the other hand IIRC Mercurial, due to its repository structure, has some
problems with file copying and renames, not to mention more complicated 
contents movement (of which git-blame is aware of). Perhaps this structure
is/was also the cause that Mercurial is geared towards one branch per
repository workflow.
 
>> Of course, you could introduce a cache, but then, I don't run blame
>> _that_ often.
> 
> I don't think a cache is the right way.  I'd call the right idea
> "auxillary information". 

If the information can be regenerated, this is cache. (Well, this is
one point of view).

P.S. In git we can use so called pickaxe (options to git-diff/git-log)
besides using annotate/blame.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 13:50     ` Simon 'corecode' Schubert
  2007-01-27 16:30       ` Jakub Narebski
@ 2007-01-27 16:46       ` Johannes Schindelin
  2007-01-27 17:12         ` Simon 'corecode' Schubert
  1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2007-01-27 16:46 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: Shawn O. Pearce, Junio C Hamano, git

Hi,

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:

> Johannes Schindelin wrote:
> > It is slower than Subversion's counterpart, just because SVN's blame sucks.
> > You cannot find out the _relevant_ information easily, i.e. once you merged
> > something, the _merge_ gets attributed for the change (at least the last
> > time I tried it).
> > 
> > So, don't blame blame for being useful in git.
> 
> Your reasoning is backwards.  Git's blame (or fwiw, rev-list path.name) 
> is not slower because it is doing a better job (I can't tell, I don't 
> use svn), but because it uses an algorithm which doesn't scale.  
> rev-list and blame are O(number of commits between HEAD and root) and 
> not O(number of commits affecting path).

Ah, I think you fall in the "files matter" trap.

My point is: for what git does it does not need information which might or 
might not be present, but it derives that information which was there from 
the beginning: the ancestry path.

Many people don't use or even need blame. And what you want to introduce 
would affect them, too.

That is why I proposed a cache (of precomputed data): you don't have to 
change _anything_ in the file format, but you can speed the processes up 
-- locally! -- if they matter to you.

Which means it works on old repositories, too.

> It might be sufficient for git.git, but certainly not for projects with 
> a long history.  we are talking KDE, FreeBSD, OOo, something like this.  
> They each got about 400k commits.  It takes literally *minutes* to get a 
> rev-list or a blame for a certain path.  The algorithm simply does not 
> scale.  And this has nothing to do with superior output, because hg does 
> it in O(num_of_file_revs), so it *can* be done.

But can hg do it that fast, if you track code _movement_ between files? I 
doubt so.

I don't know if git can, at the moment, but even if it cannot, in future 
versions this may well be possible, exactly because we do _not_ rely on 
metadata to be stored in the objects, which can be derived from the 
history as-is anyway.

> > Of course, you could introduce a cache, but then, I don't run blame 
> > _that_ often.
> 
> I don't think a cache is the right way.  I'd call the right idea 
> "auxillary information".

You can name it "Dirty Harry" if you want.

The important part is that you should not change the file format when you 
do not have to.

Rather, calculate the information you need from the existing data, and if 
you can reuse it, store it locally. _That_ is flexibility.

It also gives me a warm fuzzy feeling that no bogus "auxillary 
information" can be introduced by fetching from somewhere else. (It does 
not matter if intended or unintended.)

And if something is wrong with that "auxillary information", it can be 
regenerated correctly, without touching the real data -- the commit 
ancestry.

Just think of .git/info/refs: this data is derived from the repository, 
but because you need it so often (or it would be prohibitively expensive 
to do otherwise), it is derived only when needed, then stored, and 
retrieved quite often.

> > Besides, we already introduced an orthogonal historisation by reflogs, 
> > and your method would not cope gracefully with that, would it?
> 
> I don't see how reflogs can play into this.  After all we're talking 
> about the series of commits the blob experienced to get into its current 
> state, not the series of actions it took this repo to contain this blob.

My point was that you want to introduce a reverse mapping onto the history 
DAG. But this claims that there is only one history you can possibly look 
at. This assumption is wrong.

It can make a lot of sense to git-blame a change on a pull, maybe because 
you don't want to fix it yourself, but throw it all back to the lieutnant 
whom you pulled that part from.

You could find that pull (in theory; I don't think it works right now) 
with git-blame walking the _reflogs_ instead of the _commit history_.

In this case, your reverse mapping would be wrong.

See?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 16:46       ` Johannes Schindelin
@ 2007-01-27 17:12         ` Simon 'corecode' Schubert
  2007-01-27 19:13           ` Johannes Schindelin
  2007-01-27 19:41           ` Nicolas Pitre
  0 siblings, 2 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-27 17:12 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 4885 bytes --]

Johannes Schindelin wrote:
> Ah, I think you fall in the "files matter" trap.
> 
> My point is: for what git does it does not need information which might or 
> might not be present, but it derives that information which was there from 
> the beginning: the ancestry path.
> 
> Many people don't use or even need blame. And what you want to introduce 
> would affect them, too.

Many people do not use colored diffs.  Introducing colored diff support affects them, too.  In which way?  Additional command line switches, for example.  I don't think that's a big deal, and neither is a reverse map to create object-level DAGs.

> That is why I proposed a cache (of precomputed data): you don't have to 
> change _anything_ in the file format, but you can speed the processes up 
> -- locally! -- if they matter to you.
> 
> Which means it works on old repositories, too.

Maybe I was not clear enough.  I do not propose to change the file format, but to extend the information stored.  In which way whatsoever.  However I think that keeping this information along with trees in pack files seems very sensible.  Or along pack files, whatever.

>> It might be sufficient for git.git, but certainly not for projects with 
>> a long history.  we are talking KDE, FreeBSD, OOo, something like this.  
>> They each got about 400k commits.  It takes literally *minutes* to get a 
>> rev-list or a blame for a certain path.  The algorithm simply does not 
>> scale.  And this has nothing to do with superior output, because hg does 
>> it in O(num_of_file_revs), so it *can* be done.
> 
> But can hg do it that fast, if you track code _movement_ between files? I 
> doubt so.
> 
> I don't know if git can, at the moment, but even if it cannot, in future 
> versions this may well be possible, exactly because we do _not_ rely on 
> metadata to be stored in the objects, which can be derived from the 
> history as-is anyway.

Please don't take the mentioning of hg as an attack on git.  You don't have to shoot back.  It was just to illustrate that this information can be used to speed up certain operations considerably.  Besides, I don't think that hg's repo format prevents it to do things which git can do.  Just some things might be less elegant or easy.

> The important part is that you should not change the file format when you 
> do not have to.

Do doubt.  Especially not in a way which breaks backwards compatibility.

> Rather, calculate the information you need from the existing data, and if 
> you can reuse it, store it locally. _That_ is flexibility.

Of course this is flexibility.  But this also means that every consumer has to do this for every repo.  Wouldn't it be nice to have it done one time and then stored in a pack?

> It also gives me a warm fuzzy feeling that no bogus "auxillary 
> information" can be introduced by fetching from somewhere else. (It does 
> not matter if intended or unintended.)

I agree on that.

> And if something is wrong with that "auxillary information", it can be 
> regenerated correctly, without touching the real data -- the commit 
> ancestry.

Yes, it always can be regenerated.  I never said it should be made part of the core structure.

>>> Besides, we already introduced an orthogonal historisation by reflogs, 
>>> and your method would not cope gracefully with that, would it?
>> I don't see how reflogs can play into this.  After all we're talking 
>> about the series of commits the blob experienced to get into its current 
>> state, not the series of actions it took this repo to contain this blob.
> My point was that you want to introduce a reverse mapping onto the history 
> DAG. But this claims that there is only one history you can possibly look 
> at. This assumption is wrong.

Then you are reading it wrong.  It is just a way to speed up the common way of operation.  That doesn't mean that other ways stop working.  git-rev-list does one thing and you wouldn't call it not being gracefull, just because it doesn't operate on reflogs?

> It can make a lot of sense to git-blame a change on a pull, maybe because 
> you don't want to fix it yourself, but throw it all back to the lieutnant 
> whom you pulled that part from.
> 
> You could find that pull (in theory; I don't think it works right now) 
> with git-blame walking the _reflogs_ instead of the _commit history_.

Fair enough.  Nobody said that this wouldn't work anymore.  I just said that working on commit history could be sped up considerably.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27  8:01 ` Shawn O. Pearce
                     ` (2 preceding siblings ...)
  2007-01-27 12:58   ` Johannes Schindelin
@ 2007-01-27 17:22   ` Linus Torvalds
  2007-01-27 17:56     ` Linus Torvalds
                       ` (4 more replies)
  3 siblings, 5 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 17:22 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git



On Sat, 27 Jan 2007, Shawn O. Pearce wrote:
> 
> _THIS_ is worth doing.  I've been having a lot of discussion on
> #git with Simon 'corecode' Schubert and Chris Lee about how poorly
> git-blame performs compared to its counterpart in Subversion.

I think we're *much* better off trying to get people off the "git-blame" 
mentality entirely.

Don't screw up git in trying to make "git blame" performance better.

Because you *will* screw it up. No ifs, buts, maybes or anything else 
about it. The only way to make "git blame" perform better is to do a 
really really crappy job.

You really have to fundamnetally realize that the reason SVN and CVS can 
do "annotate" really cheaply is BECAUSE THEY ARE BROKEN. Trying to emulate 
them is only going to break git too.

Really. Let that thought sink in, and let it fester in your brain until 
you really really get it.

There simply is no way to get "git blame" to be faster, without screwing 
things up royally in any number of ways:

 - extra (redundant) on-the-fly-generated metadata

   Yes, you can generate caches etc. Now you need to have a way to check 
   the caches, and to make sure that they are in sync. You also need to 
   update them constantly - you can make them look great in *benchmarks*, 
   but I bet that once you actually start developing, and really want to 
   _use_ them, you'll just curse the whole thing, because the caches won't 
   be there for the things you want. People normally don't do a million 
   "blame" operations on the same tree - they do *one*. Caches don't work.

 - extra (redundant) metadata generated at commit time

   Instead of doing caches, you can do it statically at commit time, and 
   now you will screw up all the *other* things, like finding content 
   movement between files, and a dense and efficient repository encoding. 
   You'll need to add tons of crap to the commits (that most ops won't 
   find *any* use for), and you'll also make the operation a lot less 
   useful because it's now static rather than dynamic.

> Thoughts?

Here's a really fundamental suggestion:

Instead of trying to do "git blame" faster, which is a totally broken 
notion, just face the fact that emulating a broken environment will be 
slower. CPU's will get faster and help you in the long run, but more 
importantly, if you just "accept git".

So instead, aim for:

 - "yes, we can do blame, but yes, it will take five seconds for a biggish 
    archive on a reasonable CPU".

   Which implies that with a slow CPU, and a really humongous archive, it 
   will take much more. Is five seconds slow enough that people think it 
   is slow? Yes. Is 30 seconds approaching painful? Yes. But you should 
   try to aim for really just telling people that it's not a common 
   operation.

   For example, I think it is a mistake to expose blame in "gitweb". It's 
   simply not a natural operation for git to do. Don't do it.

   (Side note: for the kernel - which is certainly not a "small" project, 
   even if it's not a humongous one either, and the kinds of machines I 
   work with, git blame usually takes about 1-2 seconds for most files. 
   That is *not* excessive for a developer. It's excessive if you try to 
   do it on a web-server where you've made everybody and his dog press 
   "history" by putting a big red blinking button there..

   In other words, aim for "git blame" being something that you run once a 
   week (which is about as often as I do it) or maybe a couple of times a 
   day if you're really obsessed. At which point a few seconds isn't that 
   horrid.

 - teach people about alternatives. For example, "git log -p filename" is 
   actually a hell of a lot more useful for most things. Yes, it's 
   *different*, but git makes it really easy, and it has the added 
   advantage that you see things in time order, and can very naturally 
   search back through time.

   In a very similar vein, the real problem with "git blame" is not that 
   git cannot do it, but the fact that it's a "whole history in one go" 
   operation. Again, you can actually do a "git blame" that people would 
   probably find much less annoying, if you just did things 
   *incrementally*.

   The reason "git log -p filename" doesn't perform badly is exactly that 
   it is incremental. Try it some time. The cost of

	time git log -p mm/memory.c > /dev/null
	time git blame mm/memory.c > /dev/null

   is almost 100% identical when you run them that way. So why is it that 
   just about everybody would always consider "git log -p" to be 
   instantaneous with git, but "git blame" is slow?

I'd really like people to think about that difference between "git log -p 
filename" and "git blame filename". Because it tells you a lot about the 
*psychology* of the thing. They both take the same amount of time, but one 
is slow as hell, and the other one is so fast that anybody coming from the 
CVS world will just go "Whoah! Magic!".

Really. Think about it.

Now, think about what would happen if you had a graphical "git blame" that 
was a tcl/tk thing (and slowed things down even more), but basically 
filled in the "git blame" information incrementally - the exact same way 
that "git blame" actually calculates it internally?

You know what? I bet that people would LIKE it. They could open up the 
file in that nice graphical interface, and scroll down/search to the part 
they care about, and see how git populates the blame. They'd think it's 
*cool*. And it would feel fast, because there wouldn't be any need to wait 
for *all* the information before it's done.

Here's a patch. Use "git blame --incremental filename" to get the blame 
output in a nicely parseable format that you can now write a simple 
graphical viewer for. 

Please.

		Linus
---
diff --git a/builtin-blame.c b/builtin-blame.c
index 4a1accf..7d97ae9 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -27,6 +27,7 @@ static char blame_usage[] =
 "  -p, --porcelain     Show in a format designed for machine consumption\n"
 "  -L n,m              Process only line range n,m, counting from 1\n"
 "  -M, -C              Find line movements within and across files\n"
+"  --incremental       Show blame entries as we find them, incrementally\n"
 "  -S revs-file        Use revisions from revs-file instead of calling git-rev-list\n";
 
 static int longest_file;
@@ -36,6 +37,7 @@ static int max_digits;
 static int max_score_digits;
 static int show_root;
 static int blank_boundary;
+static int incremental;
 
 #ifndef DEBUG
 #define DEBUG 0
@@ -1069,6 +1071,21 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt)
 		origin_decref(parent_origin[i]);
 }
 
+static void found_guilty_entry(struct blame_entry *ent)
+{
+	if (ent->guilty)
+		return;
+	ent->guilty = 1;
+	if (incremental) {
+		struct origin *origin = ent->suspect;
+		printf("%d %d %s:%s:%d\n",
+			ent->lno, ent->num_lines,
+			sha1_to_hex(origin->commit->object.sha1),
+			origin->path,
+			ent->s_lno);
+	}
+}
+
 static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
 {
 	while (1) {
@@ -1102,7 +1119,7 @@ static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
 		/* Take responsibility for the remaining entries */
 		for (ent = sb->ent; ent; ent = ent->next)
 			if (!cmp_suspect(ent->suspect, suspect))
-				ent->guilty = 1;
+				found_guilty_entry(ent);
 		origin_decref(suspect);
 
 		if (DEBUG) /* sanity */
@@ -1717,6 +1734,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 				die("More than one '-L n,m' option given");
 			bottomtop = arg;
 		}
+		else if (!strcmp("--incremental", arg))
+			incremental = 1;
 		else if (!strcmp("--score-debug", arg))
 			output_option |= OUTPUT_SHOW_SCORE;
 		else if (!strcmp("-f", arg) ||
@@ -1907,6 +1926,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 
 	assign_blame(&sb, &revs, opt);
 
+	if (incremental)
+		return 0;
+
 	coalesce(&sb);
 
 	if (!(output_option & OUTPUT_PORCELAIN))

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 16:30       ` Jakub Narebski
@ 2007-01-27 17:36         ` Linus Torvalds
  0 siblings, 0 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 17:36 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git



On Sat, 27 Jan 2007, Jakub Narebski wrote:
> 
> On the other hand IIRC Mercurial, due to its repository structure, has some
> problems with file copying and renames

This is not a hg-only problem.

This is the SAME and FUNDAMENTAL problem that you have any time you think 
"file identity" matters.

Yes, it's what makes "blame/annotate" fast. But I have tried, over and 
over again, to explain why it's fundamentally broken (regardless of any 
blame thing).

So I will  just say once again: don't try to make "blame" faster. You can 
only do so by introducing MUCH MORE serious problems in other parts.

This was a very early design decision in git. It was discussed within 
hours of me releasing the first version of git. Yes, "git blame" is 
relatively slow, and it is very very fundamental. It's fundamental exactly 
because git avoids the mistake that *everybody* else has done.

The upside? I just sent a patch that should make it possible to do a "cool 
blame" that doesn't feel slow, and in fact allows some nice eyecandy 
effects (well, it would allow it, if I just knew tcl/tk or something like 
that: it needs a canvas to draw the existing file into, and then filling 
in the incremental data as git-blame finds it).

It's going to be tons more fun to watch than any CVS/SVN annotate has ever 
been. Trust me. I bet that you'll feel that "git blame" is *too* fast, and 
you'll want the graphical viewer to have a "slow down" flag, just so that 
you can appreciate the blame building up!

[ Ok, that may not be true for everybody, but having played with "git 
   blame --incremental" a bit I really think it would be a bit cool to 
   have that "slow down" mode, and start things with a really tiny font 
   so you could see the blame build up over a file!

   I'm not kidding you. Eyecandy! And it literally would need git to slow 
   down to make it more human-friendly! ]

The other upside? EVERYTHING ELSE IS FASTER. And I really mean 
*everything*. Yes, "git blame" is slower than SVN. It will remain so, 
unless somebody either overrides my objections, or some alien intelligence 
comes up with something _really_ clever. But look at it this way: blame 
may take a few seconds, but that's a big part of why you can do merges of 
things that have tens of thousands of files in half a second.

Things that you would need to go brew a cup of coffee for in some other 
environments are basically _instantaneous_. 

			Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27  8:41   ` Junio C Hamano
  2007-01-27 13:33     ` Jeff King
@ 2007-01-27 17:47     ` Nicolas Pitre
  1 sibling, 0 replies; 92+ messages in thread
From: Nicolas Pitre @ 2007-01-27 17:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

On Sat, 27 Jan 2007, Junio C Hamano wrote:

> Anything you would do, storing that in tree is wrong.  Tree
> object only represents just the contents of a single state and
> in itself there should not be any information that describes its
> relation with other trees [*1*].
> 
> And of course making it pack-only is doubly wrong.
> 
> 
> *1* That's why my thinking-aloud talked about "N list of changed
> paths recorded in a commit object with N parents".  A commit is
> to talk about one particular state (i.e. tree) and its relation
> to other commits (and by indirection, other trees), so logically
> the information could belong there --- that is merely a "could",
> since that is strictly caching for performance.  After finding
> where the bottleneck is, obviously finding a way to optimize the
> tree pathlimiting with the currently available data without
> having such redundant data is more preferable.

I do think, too, that such data is not desirable in the object database.

However there is nothing wrong with a separate "cache", just like the 
pack index, that can be discarded and recreated at any time.  
Especially since this "cache data" might change with time as new tricks 
to speed up things are found.  OTOH it is preferable to keep the object 
database as slick and stable as possible.


Nicolas

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:22   ` Linus Torvalds
@ 2007-01-27 17:56     ` Linus Torvalds
  2007-01-27 22:00       ` Junio C Hamano
  2007-01-27 18:40     ` Simon 'corecode' Schubert
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 17:56 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git



On Sat, 27 Jan 2007, Linus Torvalds wrote:
> 
> Here's a patch. Use "git blame --incremental filename" to get the blame 
> output in a nicely parseable format that you can now write a simple 
> graphical viewer for. 

Btw, the more I play with this, the more I curse the fact that I don't 
know tcltk enough to actually enjoy the full effect.

But in the kernel, with the patch I just sent out, try

	PAGER= git blame -C --incremental block/ll_rw_blk.c

(because using a pager for the output just detracts from the whole 
experience.

I think it's really cool how it notices the chunks coming in from the 
rename, and starts giving different names - and for some reason I didn't 
delve into it actually seems to find some of the changes from the old 
location before it have pinpointed all of the changes from the new one 
etc..

(That's a fairly expensive file to check, so it takes almost ten seconds 
for me to generate blame for. But with the incremental output, you really 
don't mind, because it finds the "new changes" immediately, so it's really 
the old and relatively uninteresting stuff that will only be found at the 
end).

This should be an example of how important interfaces can be to what feels 
"slow".

Btw, Junio - even if nobody writes a graphical front-end for that 
"--incremental" flag, this should be in 1.5.0. If only so that people 
could write those front-ends and not have to patch git to get the blame 
information out of it.

The exact format for the output of this thing is obviously very debatable: 
it might well be worthwhile to use a more verbose thing that also gives 
the commit name and date etc information (since git-blame knows it), so 
that any graphical front-end doesn't need to look up every commit as it 
comes out of the incremntal blame engine. I just wrote it as a quick 
"proof of concept" thing, and the output is obviously pretty minimalistic 
right now.

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:22   ` Linus Torvalds
  2007-01-27 17:56     ` Linus Torvalds
@ 2007-01-27 18:40     ` Simon 'corecode' Schubert
  2007-01-27 19:02       ` Johannes Schindelin
  2007-01-27 19:15       ` Linus Torvalds
  2007-01-27 18:52     ` Jakub Narebski
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-27 18:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2071 bytes --]

Linus Torvalds wrote:
> On Sat, 27 Jan 2007, Shawn O. Pearce wrote:
>> _THIS_ is worth doing.  I've been having a lot of discussion on
>> #git with Simon 'corecode' Schubert and Chris Lee about how poorly
>> git-blame performs compared to its counterpart in Subversion.
> I think we're *much* better off trying to get people off the "git-blame" 
> mentality entirely.
> 
> Don't screw up git in trying to make "git blame" performance better.

Okay, let's try to assume for now that nobody said "git blame".  Instead let's say:

git rev-list and git log (with or without -p) perform poorly when invoked with a pathspec.

>    Which implies that with a slow CPU, and a really humongous archive, it 
>    will take much more. Is five seconds slow enough that people think it 
>    is slow? Yes. Is 30 seconds approaching painful? Yes. But you should 
>    try to aim for really just telling people that it's not a common 
>    operation.

I agreee with those numbers.  However, on a converted KDE repo, they are *completely* different:

git log kdelibs/README takes 1:18.  One minute, eighteen seconds.
git rev-list and git blame take roughly the same time.

This particular file has 64 revisions.  However there are ~ 375000 revisions in the converted repo.

My and also Shawn's point was not about the speed of git blame itself.  It is about pathspec/rev operations.  The operation time does not scale with the number of changes to the file/object/call-it-whatever, but with the number of total commits in the branch.

That's what we were getting at.  Not the superiority of git blame (no irony) and thus reduced speed, but the algorithmic deficiency of any operation on a pathspec/object, which can be easily fixed.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:22   ` Linus Torvalds
  2007-01-27 17:56     ` Linus Torvalds
  2007-01-27 18:40     ` Simon 'corecode' Schubert
@ 2007-01-27 18:52     ` Jakub Narebski
  2007-01-27 20:16     ` Jeff King
  2007-01-28  7:40     ` Shawn O. Pearce
  4 siblings, 0 replies; 92+ messages in thread
From: Jakub Narebski @ 2007-01-27 18:52 UTC (permalink / raw)
  To: git

Linus Torvalds wrote:

> - teach people about alternatives.

Actually, "git log -p -S'<string>'" is usually much better alternative
to blame / annotate to find who introduced given change and what for;
moreover you can find the <string> which vanished (find also deleted
contents).
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 18:40     ` Simon 'corecode' Schubert
@ 2007-01-27 19:02       ` Johannes Schindelin
  2007-01-27 19:12         ` Simon 'corecode' Schubert
  2007-01-27 19:15       ` Linus Torvalds
  1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2007-01-27 19:02 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

Hi,

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:

> Okay, let's try to assume for now that nobody said "git blame".  
> Instead let's say:
> 
> git rev-list and git log (with or without -p) perform poorly when 
> invoked with a pathspec.

So what? You _will_ be interested in the _newest_ changes _99%_ of the 
time. And for these you don't need to wait 1:18, but 0:00.01 or so.

> This particular file has 64 revisions.  However there are ~ 375000 
> revisions in the converted repo.

"file version" trap! "file version" trap! "file version" trap!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:02       ` Johannes Schindelin
@ 2007-01-27 19:12         ` Simon 'corecode' Schubert
  2007-01-27 19:19           ` Johannes Schindelin
  2007-01-27 19:59           ` Jakub Narebski
  0 siblings, 2 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-27 19:12 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 969 bytes --]

Johannes Schindelin wrote:
>> Okay, let's try to assume for now that nobody said "git blame".  
>> Instead let's say:
>>
>> git rev-list and git log (with or without -p) perform poorly when 
>> invoked with a pathspec.
> 
> So what? You _will_ be interested in the _newest_ changes _99%_ of the 
> time. And for these you don't need to wait 1:18, but 0:00.01 or so.

not if you are interested which commit introduced/changed a particular line.

>> This particular file has 64 revisions.  However there are ~ 375000 
>> revisions in the converted repo.
> 
> "file version" trap! "file version" trap! "file version" trap!

call it path and retry.

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:12         ` Simon 'corecode' Schubert
@ 2007-01-27 19:13           ` Johannes Schindelin
  2007-01-27 19:55             ` Simon 'corecode' Schubert
  2007-01-27 19:41           ` Nicolas Pitre
  1 sibling, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2007-01-27 19:13 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

Hi,

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:

> Johannes Schindelin wrote:
> 
> > Many people don't use or even need blame. And what you want to 
> > introduce would affect them, too.
> 
> Many people do not use colored diffs.  Introducing colored diff support 
> affects them, too.  In which way?  Additional command line switches, for 
> example.  I don't think that's a big deal, and neither is a reverse map 
> to create object-level DAGs.

Do colored diffs need additional data?

No.

There's the principal difference between rev-pathspec-speed-up and 
colored-diffs.

> Please don't take the mentioning of hg as an attack on git.  You don't 
> have to shoot back.

I did not take it as an attack on git! If it sounded like that, sorry.

(Side note: If there is _any_ feature in other versioning systems I would 
like to have in git, I'll try to implement it. If there is a VCS that is 
better than git, and which is free, I'll use it. Sorry, but that's the 
way it is.)

I only tried to make it clear why we do not do things as Mercurial does it 
(in this particular case at least), and why I think that git's way is 
better.

> > Rather, calculate the information you need from the existing data, and 
> > if you can reuse it, store it locally. _That_ is flexibility.
> 
> Of course this is flexibility.  But this also means that every consumer 
> has to do this for every repo.  Wouldn't it be nice to have it done one 
> time and then stored in a pack?

So you want to store it in a pack, fetchable?

> > It also gives me a warm fuzzy feeling that no bogus "auxillary 
> > information" can be introduced by fetching from somewhere else. (It 
> > does not matter if intended or unintended.)
> 
> I agree on that.

So you agree we should _not_ store it in a pack, fetchable?

> > And if something is wrong with that "auxillary information", it can be 
> > regenerated correctly, without touching the real data -- the commit 
> > ancestry.
> 
> Yes, it always can be regenerated.  I never said it should be made part 
> of the core structure.

So, if you _do_ have it in a pack, fetchable, what happens if you 
regenerated it locally, fixing a flaw, but then fetch it from somewhere 
else, where the flaw possibly still exists, what do you do?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 18:40     ` Simon 'corecode' Schubert
  2007-01-27 19:02       ` Johannes Schindelin
@ 2007-01-27 19:15       ` Linus Torvalds
  2007-01-27 19:25         ` Linus Torvalds
  2007-01-27 19:36         ` Chris Lee
  1 sibling, 2 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 19:15 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: Shawn O. Pearce, Junio C Hamano, git



On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:
> 
> git rev-list and git log (with or without -p) perform poorly when invoked with
> a pathspec.

Really? I would say exactly the opposite. They _smoke_ when invoked with a 
pathspec.

Show me *one* other SCM that even comes close..

And please, realize that git does arbitrary combinations of directories, 
and not just single files. AND THAT IS IMPORTANT!

Any SCM that can't do

	git log drivers/scsi/ include/scsi/

and have it be a sane log of the changes to the _union_ of those two 
directories is strictly inferior to what git can do.

Usually this is something that others CANNOT DO AT ALL.

Even your 1:18 number is a hell of a lot faster than "can't do it", which 
is what you have for everything else I can imagine.

Maybe you just do single files, but my pathspecs tend to be directories or 
multiple files more often than single ones.

How the heck did you intend to cache that?

> I agreee with those numbers.  However, on a converted KDE repo, they are
> *completely* different:
> 
> git log kdelibs/README takes 1:18.  One minute, eighteen seconds.
> git rev-list and git blame take roughly the same time.

Do you have the converted repo somewhere to be cloned for? It's going to 
be a lot more interesting for scalability testing than anything else.

It is possible, for example, that the real issue is that we shouldn't 
compress delta objects in a pack.

> That's what we were getting at.  Not the superiority of git blame (no irony)
> and thus reduced speed, but the algorithmic deficiency of any operation on a
> pathspec/object, which can be easily fixed.

The thing is, one of the reasons the git object database is small is that 
it compresses really well, and I suspect that for the KDE repo, what 
you're seeing is really a combination of:

 - the KDE people were idiots in the first place to make it into one big 
   repo

 - we've consciously made repo size be a major goal, and yes, we spend a 
   lot of CPU as a result, following delta chains etc. The zlib overhead 
   is more visible, because once you've uncompressed the delta the delta 
   itself is really quick to apply, but the whole "trees compress really 
   well" all boils down to the same thing: we create lots of small 
   objects, and we have tons of deltas, and the hierarchical nature of the 
   data structures (ie saving the trees not as one big manifest but as 
   a more complex hierarchial datastructure) is what allows us to do tons 
   of the path-based optimizations.

But they all do end up boiling down to "we use lots of CPU".

And I suspect tweaking the existing stuff is quite reasonable. But we need 
to have a public repo that people who want to tweak can play with (for 
example, the old "linux-history" archive was what made us tweak things 
like gitk, which was horribly horribly bad).

So please point to a kde conversion archive to play with (maybe you have, 
I missed it).

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:12         ` Simon 'corecode' Schubert
@ 2007-01-27 19:19           ` Johannes Schindelin
  2007-01-27 19:59           ` Jakub Narebski
  1 sibling, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2007-01-27 19:19 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git

Hi,

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:

> Johannes Schindelin wrote:
> > > Okay, let's try to assume for now that nobody said "git blame".  
> > > Instead let's say:
> > > 
> > > git rev-list and git log (with or without -p) perform poorly when 
> > > invoked with a pathspec.
> > 
> > So what? You _will_ be interested in the _newest_ changes _99%_ of the 
> > time. And for these you don't need to wait 1:18, but 0:00.01 or so.
> 
> not if you are interested which commit introduced/changed a particular 
> line.

Wonderful. Say, you want to know who last changed the beginning of the 
function main() in git.c:

	$  git blame -L '/main(/,+20' git.c

What was your point again?

> > > This particular file has 64 revisions.  However there are ~ 375000 
> > > revisions in the converted repo.
> > 
> > "file version" trap! "file version" trap! "file version" trap!
> 
> call it path and retry.

Does not matter. Not one wit. Your reasoning is still harping on "file 
versions".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:15       ` Linus Torvalds
@ 2007-01-27 19:25         ` Linus Torvalds
  2007-01-27 19:54           ` Jakub Narebski
  2007-01-27 19:36         ` Chris Lee
  1 sibling, 1 reply; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 19:25 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: Shawn O. Pearce, Junio C Hamano, git



On Sat, 27 Jan 2007, Linus Torvalds wrote:
>
> Quoth Simon 'corecode' Schubert:
> > git log kdelibs/README takes 1:18.  One minute, eighteen seconds.
> > git rev-list and git blame take roughly the same time.

Btw, why do people even think this is "slow"?

Yeah, we should speed it up, just because I think having that large a repo 
will make it more obvious what we can do even better. No question about 
that.

But I actually think that it's perfectly ok for "whole-history" operations 
to be slow. If you want the whole history of a big repository, you 
shouldn't expect it to be totally fast. And you should NOT expect the 
history for "one file" to be any faster than the history for "the whole 
repo".

Because if you do, you have totally missed the whole point of git. It's 
not a "file tracker". It's a *content* tracker. A file doesn't have 
history.

So what you're basically saying is that getting the whole history of KDE 
takes just over a minute. That's pretty damned *fast*. And yes, we can 
make it faster still.

The operations that git has been optimized for is that the size of the 
history shouldn't affect *new* stuff. 

Basically, asking for "git log --since=1.week.ago" should be 
constant-time, regardless of how big the history is (well, it obviously 
depends on how many changes there have been in the last week, but the 
point is that it shouldn't get slower over time). And the git log output 
should "stream", so that you can do 

	git log ..randomfile..

and you'll always get speedy access to the stuff that happened recently, 
and we should *never* have to do the whole history just to get the recent 
changes.

That's why "git blame" is so horrible. It's fundamentally an operation 
that depends on "whole history" and thus cannot scale.

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:15       ` Linus Torvalds
  2007-01-27 19:25         ` Linus Torvalds
@ 2007-01-27 19:36         ` Chris Lee
  2007-01-28 18:10           ` Theodore Tso
                             ` (2 more replies)
  1 sibling, 3 replies; 92+ messages in thread
From: Chris Lee @ 2007-01-27 19:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Simon 'corecode' Schubert, Shawn O. Pearce, Junio C Hamano, git

On 1/27/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Do you have the converted repo somewhere to be cloned for? It's going to
> be a lot more interesting for scalability testing than anything else.

I don't have access to any servers that I could drop a 3GB packfile
onto and expect them to serve it. And I don't have a connection at
home that I could use to upload the 3GB pack from quickly - it would
take days, at least. If anybody wants to hook me up with a hosting
provider or a machine that just the git devs can access, I'd be
willing to tie up my upstream bandwidth for a few days so you all can
have access to it.

> The thing is, one of the reasons the git object database is small is that
> it compresses really well, and I suspect that for the KDE repo, what
> you're seeing is really a combination of:
>
>  - the KDE people were idiots in the first place to make it into one big
>    repo

No argument from me about this one. The only defense I can really
think of is that, in KDE, we *have* moved entire applications and
libraries around between modules, and it is really nice to be able to
have the full history for them.

> So please point to a kde conversion archive to play with (maybe you have,
> I missed it).

I can provide you with instructions on how to reconstruct one, but
you'll have to rsync over about 38GB of KDE's Subversion archive to do
it. (Not fun.) If someone else wants to give me a dumping ground where
I can upload my 3GB converted repo, I'd be happy to start pushing it.

Also, please note, the 3GB packed repo is only about 2/3 of the full
KDE repo - I cut off the import at revision 409202, because that was
when the KDE svn admins decided to move a bunch of modules from
/trunk/ to /trunk/KDE/ and it screws up everything. So a *full* KDE
history import would definitely be more than 4GB, packed.

-clee

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:12         ` Simon 'corecode' Schubert
  2007-01-27 19:13           ` Johannes Schindelin
@ 2007-01-27 19:41           ` Nicolas Pitre
  1 sibling, 0 replies; 92+ messages in thread
From: Nicolas Pitre @ 2007-01-27 19:41 UTC (permalink / raw)
  To: Simon 'corecode' Schubert
  Cc: Johannes Schindelin, Shawn O. Pearce, Junio C Hamano, git

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:

> Maybe I was not clear enough.  I do not propose to change the file format, but
> to extend the information stored.  In which way whatsoever.  However I think
> that keeping this information along with trees in pack files seems very
> sensible.  Or along pack files, whatever.

Along pack files please.

> > Rather, calculate the information you need from the existing data, and if
> > you can reuse it, store it locally. _That_ is flexibility.
> 
> Of course this is flexibility.  But this also means that every consumer has to
> do this for every repo.  Wouldn't it be nice to have it done one time and then
> stored in a pack?

NO!  That would mean that this extra information is now tied to the pack 
format and this is not a good thing to depend on.

Every consumer is already recomputing the pack index locally for every 
repo.  This has the advantage that we can change the pack index format 
as we so choose without having to bother with backward compatibility in 
the pack transfer protocol.

> > And if something is wrong with that "auxillary information", it can be
> > regenerated correctly, without touching the real data -- the commit
> > ancestry.
> 
> Yes, it always can be regenerated.  I never said it should be made part of the
> core structure.

But the pack format is pretty much part of the core structure.  If 
things can be deduced from the pack without adding to it then they 
should.  This way you have the freedom to experiment with any ancillary 
format you wish.


Nicolas

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:25         ` Linus Torvalds
@ 2007-01-27 19:54           ` Jakub Narebski
  2007-01-27 20:13             ` Linus Torvalds
  0 siblings, 1 reply; 92+ messages in thread
From: Jakub Narebski @ 2007-01-27 19:54 UTC (permalink / raw)
  To: git

Linus Torvalds wrote:

> On Sat, 27 Jan 2007, Linus Torvalds wrote:

>> Quoth Simon 'corecode' Schubert:

>>> git log kdelibs/README takes 1:18.  One minute, eighteen seconds.
>>> git rev-list and git blame take roughly the same time.
> 
> Btw, why do people even think this is "slow"?
> 
> Yeah, we should speed it up, just because I think having that large a repo 
> will make it more obvious what we can do even better. No question about 
> that.
[...]
> Basically, asking for "git log --since=1.week.ago" should be 
> constant-time, regardless of how big the history is (well, it obviously 
> depends on how many changes there have been in the last week, but the 
> point is that it shouldn't get slower over time).
[...]
> That's why "git blame" is so horrible. It's fundamentally an operation 
> that depends on "whole history" and thus cannot scale.

By the way, in git-blame you can also give the cutoff like in git-log;
the lines which come from outside given revision range either get blamed
on boundary, or are shown "unblamed".

I wonder if any other SCM's blame/annotate has that...
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:13           ` Johannes Schindelin
@ 2007-01-27 19:55             ` Simon 'corecode' Schubert
  0 siblings, 0 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-27 19:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

Johannes Schindelin wrote:
> So you want to store it in a pack, fetchable?

Or wherever.  Main point was "reusable", but actually that depends on how long it takes to build the cache (okay, i'll call it cache).

>>> It also gives me a warm fuzzy feeling that no bogus "auxillary 
>>> information" can be introduced by fetching from somewhere else. (It 
>>> does not matter if intended or unintended.)
>> I agree on that.
> 
> So you agree we should _not_ store it in a pack, fetchable?

I agree that it had advantages if you can opt out.

> So, if you _do_ have it in a pack, fetchable, what happens if you 
> regenerated it locally, fixing a flaw, but then fetch it from somewhere 
> else, where the flaw possibly still exists, what do you do?

the same what happens if you repack a pack locally.  the pack won't be re-fetched, thus your data won't be overwritten.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:12         ` Simon 'corecode' Schubert
  2007-01-27 19:19           ` Johannes Schindelin
@ 2007-01-27 19:59           ` Jakub Narebski
  1 sibling, 0 replies; 92+ messages in thread
From: Jakub Narebski @ 2007-01-27 19:59 UTC (permalink / raw)
  To: git

Simon 'corecode' Schubert wrote:
> Johannes Schindelin wrote:

>>> This particular file has 64 revisions.  However there are ~ 375000 
>>> revisions in the converted repo.
>> 
>> "file version" trap! "file version" trap! "file version" trap!
> 
> call it path and retry.

By the way, if you don't mind be wrong in rare situation (file
resurrecting), "git log -p --remove-empty -- <filename>" should
speed up things for new files at least.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:54           ` Jakub Narebski
@ 2007-01-27 20:13             ` Linus Torvalds
  0 siblings, 0 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 20:13 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git



On Sat, 27 Jan 2007, Jakub Narebski wrote:
> 
> By the way, in git-blame you can also give the cutoff like in git-log;
> the lines which come from outside given revision range either get blamed
> on boundary, or are shown "unblamed".

Well, that's not actually all that useful. It's ok when nothing else 
works, but it would be nicer if it just acted well "by default".

Because You generally don't know a priori where your point of interest 
lies.

This is why "git log -p" (or "git whatchanged" before it) is so nice. A 
streaming format means that you get the stuff you likely care about soon, 
but if you aren't quite sure about where it was, it will come _eventually_ 
as you page down. And you can decide at any point in the middle that "ok, 
the thing I was looking for is obviously ancient", which may end up 
changing your whole outlook on a problem.

Which is why I think "incremental" things are so important.

In Sydney at Linux.conf.au I talked a bit to Paul Mackerras about gitk, 
and gitk is _fairly_ good at doing things incrementally (and apparently it 
is internally better at it than I have realized), but by default it still 
passes "--topo-order" to git-rev-list.

Which turns git-rev-list totally non-incremental, and makes gitk horrible 
to start up with default arguments (ie none) on a huge repository. If it 
takes 1 minute to walk the whole history, then gitk will take a minute 
before it shows the first commit.

Paulus was saying that it should be easily fixable, and that gitk 
*already* internally has a reorder buffer for commits out of topological 
order (for the "--date-order" thing, aka "gitk -d"), so gitk too should be 
able to stream perfectly well.

And once you can stream, who cares how big the history is? The part that 
is old will take a long time, but people won't even see it, because 
they'll be busy looking at the new parts that they saw immediately.

So this is why I tend to think that doing

	time <fundamental git operation>

is actually not all that interesting. It's a *lot* more interesting in 
many cases to do


	time <fundamental git operation> | head

because that gives a much more accurate view of what the user experience 
is like.

To get back to the patch I sent out to "git blame", just to illustrate 
this issue:

	[torvalds@woody linux]$ time git blame --incremental -C block/ll_rw_blk.c > /dev/null
	real    0m8.540s
	user    0m8.109s
	sys     0m0.432s

vs

	[torvalds@woody linux]$ time git blame --incremental -C block/ll_rw_blk.c | head > /dev/null
	real    0m0.238s
	user    0m0.240s
	sys     0m0.004s

and 8.5 seconds is a _loong_ time even for a human, but 0.24 seconds is 
"instant". THAT is the difference between "streaming" and "non-streaming".

For a similar example, and seeing why "topo-order" is problematic, just 
try this:

	[torvalds@woody linux]$ time git rev-list --all | head > /dev/null
	real    0m0.007s
	user    0m0.000s
	sys     0m0.012s

vs

	[torvalds@woody linux]$ time git rev-list --topo-order --all | head > /dev/null
	real    0m1.058s
	user    0m1.028s
	sys     0m0.036s

and note how they both just time the first few lines: one takes basically 
no time at all (it's fast *and* streaming) and the other one takes over a 
second (it gets the whole kernel history and then sorts it - so it can't 
stream. A second is still fast for "whole history", but the lack of 
streaming means that it's two orders of magnitude slower IN PRACTICE).

So it's really the *second* case we want to avoid. We want to avoid 
teaching people bad manners, and here "bad manners" is not "having large 
repositories with lots of history", but simply means "do operations that 
fundamentally depend on all of history".

This is why I would much prefer the "--incremental" blame. Suddenly, that 
turns "git blame" from a non-streaming (and thus fundamentally broken) 
operation into something that streams and can thus have a nice user 
experience.

			Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:22   ` Linus Torvalds
                       ` (2 preceding siblings ...)
  2007-01-27 18:52     ` Jakub Narebski
@ 2007-01-27 20:16     ` Jeff King
  2007-01-27 22:39       ` Linus Torvalds
  2007-01-28  7:40     ` Shawn O. Pearce
  4 siblings, 1 reply; 92+ messages in thread
From: Jeff King @ 2007-01-27 20:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Sat, Jan 27, 2007 at 09:22:50AM -0800, Linus Torvalds wrote:

> Here's a patch. Use "git blame --incremental filename" to get the blame 
> output in a nicely parseable format that you can now write a simple 
> graphical viewer for. 

And here's a (very hackish) incremental viewer using perl/gtk (you will
need the Gtk2 perl module installed). It doesn't take any options, and it
just shows the blame output (no lookup of committer/date).

It needs much work (and cleanup) to be useful, but I think it proves
your point: in the time it takes me to actually start looking through
the output, the blame has finished without me noticing!

-Peff

-- >8 --
#!/usr/bin/perl

use Gtk2 -init;
use Gtk2::SimpleList;

my $fn = shift or die "require filename to blame";

my $window = Gtk2::Window->new('toplevel');
$window->signal_connect(destroy => sub { Gtk2->main_quit });
my $scrolled_window = Gtk2::ScrolledWindow->new;
$window->add($scrolled_window);
my $fileview = Gtk2::SimpleList->new(
    'Commit' => 'text',
    'FileLine' => 'text',
    'Data' => 'text'
);
$scrolled_window->add($fileview);
$fileview->get_column(0)->set_spacing(0);
$fileview->set_size_request(1024, 768);

open(my $fh, '<', $fn)
  or die "unable to open $fn: $!";
while(<$fh>) {
  chomp;
  $fileview->{data}->[$.] = ['HEAD', "$fn:$.", $_];
}

open(my $blame, '-|', qw(git blame --incremental), $fn)
  or die "unable to open git blame: $!";
Glib::IO->add_watch(fileno($blame), 'in', \&read_blame_line);

$window->show_all;
Gtk2->main;
exit 0;

my $buf;
sub read_blame_line {
  my $r = sysread($blame, $buf, 1024, length($buf));
  return 0 unless $r;
  while($buf =~ s/([^\n]*)\n//) {
    my $line = $1;
    $line =~ /^(\d+) (\d+) ([0-9a-f]+):(.*):(\d+)$/
      or die "bad blame output: $line";
    for(my $i = 0; $i < $2; $i++) {
      @{$fileview->{data}->[$1+$i]}[0,1] =
        (substr($3, 0, 8), $4 . ':' . ($5+$i+1));
    }
  }
  return 1;
}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:56     ` Linus Torvalds
@ 2007-01-27 22:00       ` Junio C Hamano
  2007-01-27 22:54         ` Linus Torvalds
  0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2007-01-27 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This should be an example of how important interfaces can be to what feels 
> "slow".

I learn new thing (or relearn things I knew but did not apply in
my argument myself) every day from you, and this is a prime
example ;-).  I agree that latency in giving feedback helps the
feel of speed, and the feel is the most important thing in UI.

> Btw, Junio - even if nobody writes a graphical front-end for that 
> "--incremental" flag, this should be in 1.5.0. If only so that people 
> could write those front-ends and not have to patch git to get the blame 
> information out of it.
>
> The exact format for the output of this thing is obviously very debatable: 
> it might well be worthwhile to use a more verbose thing that also gives 
> the commit name and date etc information (since git-blame knows it), so 
> that any graphical front-end doesn't need to look up every commit as it 
> comes out of the incremntal blame engine. I just wrote it as a quick 
> "proof of concept" thing, and the output is obviously pretty minimalistic 
> right now.

I would think we probably should reuse the --porcelain output,
perhaps enhancing it even more.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 20:16     ` Jeff King
@ 2007-01-27 22:39       ` Linus Torvalds
  2007-01-27 23:52         ` Jeff King
  0 siblings, 1 reply; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 22:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1360 bytes --]



On Sat, 27 Jan 2007, Jeff King wrote:
> 
> And here's a (very hackish) incremental viewer using perl/gtk (you will
> need the Gtk2 perl module installed). It doesn't take any options, and it
> just shows the blame output (no lookup of committer/date).

Heh. Ok, this is absolutely the ugliest thing I have ever seen, but after 
doing a simple "yum install perl-Gtk2" it clearly does work ;)

(For some reason I have never fathomed, gtk apps seem to always like 
adding about a mile of empty space around lines. I want my text nice and 
tight - gtk menus and text always look like it's 1½ line spacing to me. 
Which may be ok when you're writing a paper and want space to add your 
scribbles and underlining etc, but not when you're looking at the screen, 
and it just means that there's *less* space for commentary).

> It needs much work (and cleanup) to be useful, but I think it proves
> your point: in the time it takes me to actually start looking through
> the output, the blame has finished without me noticing!

Yeah. It needs the logic to coalesce consecutive file-name/line-nr 
entries, but even after you add a "-C" to the arguments (which really 
makes 'git-blame' quite a bit more expensive), it doesn't feel "slow". 

Just ugly ;)

		Linus "shouldn't throw stones when anything
			I would have done would have been
			uglier still" Torvalds
			

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 22:00       ` Junio C Hamano
@ 2007-01-27 22:54         ` Linus Torvalds
  2007-01-28  9:27           ` Junio C Hamano
  0 siblings, 1 reply; 92+ messages in thread
From: Linus Torvalds @ 2007-01-27 22:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git



On Sat, 27 Jan 2007, Junio C Hamano wrote:
> 
> I would think we probably should reuse the --porcelain output,
> perhaps enhancing it even more.

I looked at using "emit_porcelain()" directly, but that format doesn't 
seem to actually be usable for incremental blame.

For example: the porcelain depends on things like the MORE_THAN_ONE_PATH 
flag having been computed, which simply isn't known incrementally.

Also, for the incremental blame, it makes no sense to actually print out 
the actual blame buffer: anybody who uses the incremental blame thing 
really needs to get the original buffer separately set up anyway.

So one one hand, I agree: the output really should probably share a lot of 
the ideas with --porcelain. At the same time, the porcelain output as it 
is now is actually very non-sensible for the incremental case.

(The "METAINFO_SHOWN" kind of logic works fine for --incremental, though. 
It's only the MORE_THAN_ONE_PATH things that don't really make sense until 
the end, since they are part of the discovery logic rather than part of 
the actual print-out logic. I guess it _works_, but still).

I think the people who will care are the people who actually write some 
nice gui around it..

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 22:39       ` Linus Torvalds
@ 2007-01-27 23:52         ` Jeff King
  2007-01-28  2:39           ` Theodore Tso
  0 siblings, 1 reply; 92+ messages in thread
From: Jeff King @ 2007-01-27 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Sat, Jan 27, 2007 at 02:39:30PM -0800, Linus Torvalds wrote:

> (For some reason I have never fathomed, gtk apps seem to always like 
> adding about a mile of empty space around lines. I want my text nice and 
> tight - gtk menus and text always look like it's 1½ line spacing to me. 
> Which may be ok when you're writing a paper and want space to add your 
> scribbles and underlining etc, but not when you're looking at the screen, 
> and it just means that there's *less* space for commentary).

Yes, I spent quite a bit of time trying to correct this, but it seems to
be an artifact of the GtkTreeView widget (which maybe I am abusing, but
it seems like the right thing conceptually). As the vertical spacing is
a "style" attribute, I'm not as a developer allowed to change it. You
can try this, which helps a bit (the default is '2'); put it before any
widgets are created:

Gtk2::Rc->parse_string(<<'EOS');
style "treeview_style"
{
  GtkTreeView::vertical-separator = 0
}
class "GtkTreeView" style "treeview_style"
EOS

But it's still pretty ugly. I'm not inclined to hack on it much more --
IMHO, a nice curses interface like tig would be much more sensible.

> Yeah. It needs the logic to coalesce consecutive file-name/line-nr 
> entries, but even after you add a "-C" to the arguments (which really 
> makes 'git-blame' quite a bit more expensive), it doesn't feel "slow". 

Yes, it would ideally color the blobs (try adding
$fileview->set_rules_hint(1) to get alternating line colors!). But I
don't think I can do anything that magical with their widget, which
means I get to write my own widget. Bleh.

> Just ugly ;)

:)

Here's my "final" version that looks up the committer name. It also
takes blame output on the command line:
  git-blame --incremental -C file | perl foo.pl file
I won't be working on it anymore, but if somebody wants to see an
example of some really ugly perl/gtk code, here it is.

-Peff

-- >8 --
#!/usr/bin/perl

use Gtk2 -init;
use Gtk2::SimpleList;

my $fn = shift or die "require filename to blame";

Gtk2::Rc->parse_string(<<'EOS');
style "treeview_style"
{
  GtkTreeView::vertical-separator = 0
}
class "GtkTreeView" style "treeview_style"
EOS

my $window = Gtk2::Window->new('toplevel');
$window->signal_connect(destroy => sub { Gtk2->main_quit });
my $scrolled_window = Gtk2::ScrolledWindow->new;
$window->add($scrolled_window);
my $fileview = Gtk2::SimpleList->new(
    'Commit' => 'text',
    'CommitInfo' => 'text',
    'FileLine' => 'text',
    'Data' => 'text'
);
$scrolled_window->add($fileview);
$fileview->get_column(0)->set_spacing(0);
$fileview->set_size_request(1024, 768);
$fileview->set_rules_hint(1);

open(my $fh, '<', $fn)
  or die "unable to open $fn: $!";
while(<$fh>) {
  chomp;
  $fileview->{data}->[$.] = ['HEAD', '?', "$fn:$.", $_];
}

Glib::IO->add_watch(fileno(STDIN), 'in', \&read_blame_line);

$window->show_all;
Gtk2->main;
exit 0;

my $buf;
sub read_blame_line {
  my $r = sysread(STDIN, $buf, 1024, length($buf));
  return 0 unless $r;
  while($buf =~ s/([^\n]*)\n//) {
    my $line = $1;
    $line =~ /^(\d+) (\d+) ([0-9a-f]+):(.*):(\d+)$/
      or die "bad blame output: $line";
    my $info = commitinfo($3);
    for(my $i = 0; $i < $2; $i++) {
      @{$fileview->{data}->[$1+$i]}[0,1] =
        (substr($3, 0, 8), $info, $4 . ':' . ($5+$i+1));
    }
  }
  return 1;
}

sub commitinfo {
  my $hash = shift;
  open(my $fh, '-|', qw(git rev-list -1 --pretty=raw), $hash)
    or die "unable to open git-rev-list: $!";
  while(<$fh>) {
    chomp;
    next unless /^author (.*) <.*> (\d+) ([+-]\d+)/;
    return $1 . ' ' . format_time($2, $3);
  }
}

sub format_time {
  my $time = shift;
  my $tz = shift;

  my $minutes = $tz < 0 ? 0-$tz : $tz;
  $minutes = ($minutes / 100)*60 + ($minutes % 100);
  $minutes = $tz < 0 ? 0-$minutes : $minutes;
  $time += $minutes * 60;
  my @t = gmtime($time);
  return sprintf('%04d-%02d-%02d %02d:%02d:%02d %s', @t[5,4,3,2,1,0], $tz);
}

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 23:52         ` Jeff King
@ 2007-01-28  2:39           ` Theodore Tso
  2007-01-28  3:17             ` Randal L. Schwartz
  2007-01-28 13:15             ` Jeff King
  0 siblings, 2 replies; 92+ messages in thread
From: Theodore Tso @ 2007-01-28  2:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, git

On Sat, Jan 27, 2007 at 06:52:38PM -0500, Jeff King wrote:
> But it's still pretty ugly. I'm not inclined to hack on it much more --
> IMHO, a nice curses interface like tig would be much more sensible.

For emacs users, it would even be better to tie it into emacs.  That
way you're already at the line number looking at the source code when
you start wondering, "who the f*ck created this mess?".  The way I'd
design it would to have emacs split the window vertically, and place
the blame information in a buffer whose scrolling was synchronized
with the main window.  

One of the things that I noticed myself doing (and I'm guessing many
other people would do it as well), is that you have a tendency to wait
until the attribution information has started filling in before you
start scrolling to the part of the code you were interested in --- and
since the beginning of the fle often hasn't changed much until the
earliest beginnings of the project, that can be quite a while.  And of
course, scrolling to the right part of the file is a pain.  So
building it into the editor is not only convenient, but it avoids the
psychological effects that could make it seem slow because how long it
takes to fill the attribution for these first bits:

/*
 * Copyright (C) 1991, 1992 Linus Torvalds
 * Copyright (C) 1994,      Karl Keyte: Added support for disk statistics
 * Elevator latency, (C) 2000  Andrea Arcangeli <andrea@suse.de> SuSE
 * Queue request tables / lock, selectable elevator, Jens Axboe <axboe@suse.de>
 * kernel-doc documentation started by NeilBrown <neilb@cse.unsw.edu.au> -  July2000
 * bio rewrite, highmem i/o, etc, Jens Axboe <axboe@suse.de> - may 2001
 */


Anyway, this should be relatively easily for emacs, and for eclipse
(although I don't think anyone is using eclipse to do Kernel hacking,
is there?).  I have no idea how you would hack this into Vim, other
than passing the line number into the GUI so it can open right into
the function that the developer was looking at.

						- Ted

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28  2:39           ` Theodore Tso
@ 2007-01-28  3:17             ` Randal L. Schwartz
  2007-01-28 13:15             ` Jeff King
  1 sibling, 0 replies; 92+ messages in thread
From: Randal L. Schwartz @ 2007-01-28  3:17 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jeff King, Linus Torvalds, git

>>>>> "Theodore" == Theodore Tso <tytso@mit.edu> writes:

Theodore> For emacs users, it would even be better to tie it into emacs.  That
Theodore> way you're already at the line number looking at the source code
Theodore> when you start wondering, "who the f*ck created this mess?".  The
Theodore> way I'd design it would to have emacs split the window vertically,
Theodore> and place the blame information in a buffer whose scrolling was
Theodore> synchronized with the main window.

vc-annotate can do the window already, although not the rest.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 17:22   ` Linus Torvalds
                       ` (3 preceding siblings ...)
  2007-01-27 20:16     ` Jeff King
@ 2007-01-28  7:40     ` Shawn O. Pearce
  4 siblings, 0 replies; 92+ messages in thread
From: Shawn O. Pearce @ 2007-01-28  7:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Here's a patch. Use "git blame --incremental filename" to get the blame 
> output in a nicely parseable format that you can now write a simple 
> graphical viewer for. 

Heh.  Nice timing.  I was starting to think about adding blame output
to git-gui.  Having an incremental backend would certainly be nice.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 22:54         ` Linus Torvalds
@ 2007-01-28  9:27           ` Junio C Hamano
  2007-01-28  9:44             ` [PATCH] git-blame --porcelain: quote filename in c-style when needed Junio C Hamano
                               ` (3 more replies)
  0 siblings, 4 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28  9:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, 27 Jan 2007, Junio C Hamano wrote:
>> 
>> I would think we probably should reuse the --porcelain output,
>> perhaps enhancing it even more.
>
> I looked at using "emit_porcelain()" directly, but that format doesn't 
> seem to actually be usable for incremental blame.

I agree the code itself wouldn't be for the reasons you stated.

> Also, for the incremental blame, it makes no sense to actually print out 
> the actual blame buffer: anybody who uses the incremental blame thing 
> really needs to get the original buffer separately set up anyway.

Yes and no -- it might be interesting to start from a blank
canvas, and insert the lines as they are received at appropriate
places (recorded as ent->lno), although in general I agree the
GUI would have the way and the need to grab the blob contents
without us giving it in the --incremental output.

I think it is sensible to do the attached on top of your patch.

-- >8 --
[PATCH] Update blame --incremental output format.

It makes the output show the origin information in the same
format as the porcelain format.  The first line has commit
object name, the line number of the first line in the group in
the original file, the line number of that file in the final
image, and number of lines in the group.  Then subsequent lines
show the metainformation for the commit when the commit is shown
for the first time, except the filename information is always
shown (we cannot even make it conditional to -C option as blame
always follows the renaming of the file wholesale).

Two things I updated are (1) line numbers start at 1, not 0, to
make it consistent with other formats, (2) filename is C-quoted
if needed.

The latter should be done to fix the original porcelain output;
it was an oversight.

 builtin-blame.c |   67 +++++++++++++++++++++++++++++++++++++------------------
 1 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 7d97ae9..967e30d 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -13,6 +13,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "revision.h"
+#include "quote.h"
 #include "xdiff-interface.h"
 
 static char blame_usage[] =
@@ -1071,18 +1072,56 @@ static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt)
 		origin_decref(parent_origin[i]);
 }
 
+struct commit_info
+{
+	char *author;
+	char *author_mail;
+	unsigned long author_time;
+	char *author_tz;
+
+	/* filled only when asked for details */
+	char *committer;
+	char *committer_mail;
+	unsigned long committer_time;
+	char *committer_tz;
+
+	char *summary;
+};
+
+static void get_commit_info(struct commit *commit,
+			    struct commit_info *ret,
+			    int detailed);
+
 static void found_guilty_entry(struct blame_entry *ent)
 {
 	if (ent->guilty)
 		return;
 	ent->guilty = 1;
 	if (incremental) {
-		struct origin *origin = ent->suspect;
-		printf("%d %d %s:%s:%d\n",
-			ent->lno, ent->num_lines,
-			sha1_to_hex(origin->commit->object.sha1),
-			origin->path,
-			ent->s_lno);
+		struct origin *suspect = ent->suspect;
+
+		printf("%s %d %d %d\n",
+		       sha1_to_hex(suspect->commit->object.sha1),
+		       ent->s_lno + 1, ent->lno + 1, ent->num_lines);
+		if (!(suspect->commit->object.flags & METAINFO_SHOWN)) {
+			struct commit_info ci;
+			suspect->commit->object.flags |= METAINFO_SHOWN;
+			get_commit_info(suspect->commit, &ci, 1);
+			printf("author %s\n", ci.author);
+			printf("author-mail %s\n", ci.author_mail);
+			printf("author-time %lu\n", ci.author_time);
+			printf("author-tz %s\n", ci.author_tz);
+			printf("committer %s\n", ci.committer);
+			printf("committer-mail %s\n", ci.committer_mail);
+			printf("committer-time %lu\n", ci.committer_time);
+			printf("committer-tz %s\n", ci.committer_tz);
+			printf("summary %s\n", ci.summary);
+			if (suspect->commit->object.flags & UNINTERESTING)
+				printf("boundary\n");
+		}
+		printf("filename ");
+		write_name_quoted(NULL, 0, suspect->path, 1, stdout);
+		putchar('\n');
 	}
 }
 
@@ -1152,22 +1191,6 @@ static const char *format_time(unsigned long time, const char *tz_str,
 	return time_buf;
 }
 
-struct commit_info
-{
-	char *author;
-	char *author_mail;
-	unsigned long author_time;
-	char *author_tz;
-
-	/* filled only when asked for details */
-	char *committer;
-	char *committer_mail;
-	unsigned long committer_time;
-	char *committer_tz;
-
-	char *summary;
-};
-
 static void get_ac_line(const char *inbuf, const char *what,
 			int bufsz, char *person, char **mail,
 			unsigned long *time, char **tz)

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH] git-blame --porcelain: quote filename in c-style when needed.
  2007-01-28  9:27           ` Junio C Hamano
@ 2007-01-28  9:44             ` Junio C Hamano
  2007-01-28 14:25             ` [PATCH] git-blame --incremental: don't use pager René Scharfe
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28  9:44 UTC (permalink / raw)
  To: git

Otherwise a pathname that has funny characters such as LF would
screw up the parsing programs of the output.

Strictly speaking, this is not backward compatible, but the
current output for pathnames that have embedded LF and such
cannot be sanely parsed anyway, and pathnames that only use
characters from the portable pathname character set won't be
affected.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 builtin-blame.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 54ab675..7a58ee3 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1182,6 +1182,13 @@ static void get_commit_info(struct commit *commit,
 	summary_buf[len] = 0;
 }
 
+static void write_filename_info(const char *path)
+{
+	printf("filename ");
+	write_name_quoted(NULL, 0, path, 1, stdout);
+	putchar('\n');
+}
+
 static void found_guilty_entry(struct blame_entry *ent)
 {
 	if (ent->guilty)
@@ -1209,9 +1216,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 			if (suspect->commit->object.flags & UNINTERESTING)
 				printf("boundary\n");
 		}
-		printf("filename ");
-		write_name_quoted(NULL, 0, suspect->path, 1, stdout);
-		putchar('\n');
+		write_filename_info(suspect->path);
 	}
 }
 
@@ -1315,13 +1320,13 @@ static void emit_porcelain(struct scoreboard *sb, struct blame_entry *ent)
 		printf("committer-mail %s\n", ci.committer_mail);
 		printf("committer-time %lu\n", ci.committer_time);
 		printf("committer-tz %s\n", ci.committer_tz);
-		printf("filename %s\n", suspect->path);
+		write_filename_info(suspect->path);
 		printf("summary %s\n", ci.summary);
 		if (suspect->commit->object.flags & UNINTERESTING)
 			printf("boundary\n");
 	}
 	else if (suspect->commit->object.flags & MORE_THAN_ONE_PATH)
-		printf("filename %s\n", suspect->path);
+		write_filename_info(suspect->path);
 
 	cp = nth_line(sb, ent->lno);
 	for (cnt = 0; cnt < ent->num_lines; cnt++) {
-- 
1.5.0.rc2.g1650-dirty

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28  2:39           ` Theodore Tso
  2007-01-28  3:17             ` Randal L. Schwartz
@ 2007-01-28 13:15             ` Jeff King
  1 sibling, 0 replies; 92+ messages in thread
From: Jeff King @ 2007-01-28 13:15 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git

On Sat, Jan 27, 2007 at 09:39:58PM -0500, Theodore Tso wrote:

> For emacs users, it would even be better to tie it into emacs.  That
> way you're already at the line number looking at the source code when

I agree that connecting it with the editor might be sensible; I'm not
likely to work on an emacs version, though. :)

One of my other long-standing annoyances with using "raw" git-blame is
that it shows me a bunch of commits, but then I have to open a new
window and cut and paste the hash to actually _see_ the commit. That's
why I think something like tig makes sense, where you can jump between
different views very easily. An editor extension should be able to do
the same thing.

> course, scrolling to the right part of the file is a pain.  So
> building it into the editor is not only convenient, but it avoids the
> psychological effects that could make it seem slow because how long it
> takes to fill the attribution for these first bits:

Agreed, I noticed that as well (especially because the enormous font and
spacing choices of GTK made sure you could only see the first couple of
lines :) ).

> is there?).  I have no idea how you would hack this into Vim, other
> than passing the line number into the GUI so it can open right into
> the function that the developer was looking at.

I'll look into what's out there for vim; there's quite a bit of
extensibility if you buy into things like the perl support.

-Peff

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH] git-blame --incremental: don't use pager
  2007-01-28  9:27           ` Junio C Hamano
  2007-01-28  9:44             ` [PATCH] git-blame --porcelain: quote filename in c-style when needed Junio C Hamano
@ 2007-01-28 14:25             ` René Scharfe
  2007-01-28 19:09               ` Junio C Hamano
  2007-01-28 19:08             ` More precise tag following Linus Torvalds
  2007-01-29  6:18             ` Shawn O. Pearce
  3 siblings, 1 reply; 92+ messages in thread
From: René Scharfe @ 2007-01-28 14:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Shawn O. Pearce, git

Starting a pager defeats the purpose of the incremental output
mode.  This changes git-blame to only paginate if --incremental
was not given.

git -p blame --incremental still starts the pager, though.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---

 builtin-blame.c |    3 +++
 git.c           |    2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 7a58ee3..02bda5e 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1780,6 +1780,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 			argv[unk++] = arg;
 	}
 
+	if (!incremental)
+		setup_pager();
+
 	if (!blame_move_score)
 		blame_move_score = BLAME_DEFAULT_MOVE_SCORE;
 	if (!blame_copy_score)
diff --git a/git.c b/git.c
index 530e99f..e9febc3 100644
--- a/git.c
+++ b/git.c
@@ -217,7 +217,7 @@ static void handle_internal_command(int argc, const char **argv, char **envp)
 		{ "annotate", cmd_annotate, USE_PAGER },
 		{ "apply", cmd_apply },
 		{ "archive", cmd_archive },
-		{ "blame", cmd_blame, RUN_SETUP | USE_PAGER },
+		{ "blame", cmd_blame, RUN_SETUP },
 		{ "branch", cmd_branch, RUN_SETUP },
 		{ "cat-file", cmd_cat_file, RUN_SETUP },
 		{ "checkout-index", cmd_checkout_index, RUN_SETUP },

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:36         ` Chris Lee
@ 2007-01-28 18:10           ` Theodore Tso
  2007-01-28 18:27             ` Linus Torvalds
  2007-01-28 22:26           ` David Lang
  2007-01-29 23:00           ` Eric Wong
  2 siblings, 1 reply; 92+ messages in thread
From: Theodore Tso @ 2007-01-28 18:10 UTC (permalink / raw)
  To: Chris Lee
  Cc: Linus Torvalds, Simon 'corecode' Schubert,
	Shawn O. Pearce, Junio C Hamano, git

On Sat, Jan 27, 2007 at 11:36:50AM -0800, Chris Lee wrote:
> I don't have access to any servers that I could drop a 3GB packfile
> onto and expect them to serve it. And I don't have a connection at
> home that I could use to upload the 3GB pack from quickly - it would
> take days, at least. If anybody wants to hook me up with a hosting
> provider or a machine that just the git devs can access, I'd be
> willing to tie up my upstream bandwidth for a few days so you all can
> have access to it.

Hmm, maybe the right answer is to send a DVD out to someone who is
willing make copies through a distribution tree to those that want it;
my guess it will probably be a relatively small set of folks.  

						- Ted

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 18:10           ` Theodore Tso
@ 2007-01-28 18:27             ` Linus Torvalds
  0 siblings, 0 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-28 18:27 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Chris Lee, Simon 'corecode' Schubert, Shawn O. Pearce,
	Junio C Hamano, git



On Sun, 28 Jan 2007, Theodore Tso wrote:
> 
> Hmm, maybe the right answer is to send a DVD out to someone who is
> willing make copies through a distribution tree to those that want it;
> my guess it will probably be a relatively small set of folks.  

Heh. Sneakernet. "The throughput of a truck-full of tapes..."

But it probably would work in this case. Chris - where in the world are 
you? Maybe somebody with bandwidth can indeed just pick up a DVD, and then 
make it available to the rest of us through the net (possibly even though 
a non-public URL - I suspect that Ted is right, and there's only a few 
people who would actually want to download 3GB to play with).

I was *not* planning on downloading a 48GB KDE archive to then run it 
through some strange contortions, but I'd love to just download 3GB 
overnight..

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28  9:27           ` Junio C Hamano
  2007-01-28  9:44             ` [PATCH] git-blame --porcelain: quote filename in c-style when needed Junio C Hamano
  2007-01-28 14:25             ` [PATCH] git-blame --incremental: don't use pager René Scharfe
@ 2007-01-28 19:08             ` Linus Torvalds
  2007-01-28 19:18               ` Junio C Hamano
  2007-01-29  6:18             ` Shawn O. Pearce
  3 siblings, 1 reply; 92+ messages in thread
From: Linus Torvalds @ 2007-01-28 19:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git



On Sun, 28 Jan 2007, Junio C Hamano wrote:
> 
> I think it is sensible to do the attached on top of your patch.

Ack.

I see you committed this, which is nice, but now Shawn's butt-ugly thing 
doesn't work any more, and my mad perl skillz are sadly lacking.

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git-blame --incremental: don't use pager
  2007-01-28 14:25             ` [PATCH] git-blame --incremental: don't use pager René Scharfe
@ 2007-01-28 19:09               ` Junio C Hamano
  2007-01-28 19:14                 ` Junio C Hamano
  0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 19:09 UTC (permalink / raw)
  To: René Scharfe; +Cc: Linus Torvalds, Shawn O. Pearce, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Starting a pager defeats the purpose of the incremental output
> mode.  This changes git-blame to only paginate if --incremental
> was not given.

I should have done this myself when I applied Linus's patch.
Thanks for catching.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git-blame --incremental: don't use pager
  2007-01-28 19:09               ` Junio C Hamano
@ 2007-01-28 19:14                 ` Junio C Hamano
  2007-01-29  0:32                   ` René Scharfe
  0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 19:14 UTC (permalink / raw)
  To: René Scharfe; +Cc: Linus Torvalds, Shawn O. Pearce, git

Junio C Hamano <junkio@cox.net> writes:

> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>
>> Starting a pager defeats the purpose of the incremental output
>> mode.  This changes git-blame to only paginate if --incremental
>> was not given.
>
> I should have done this myself when I applied Linus's patch.
> Thanks for catching.

Although I'd apply it anyway, strictly speaking, I think this
patch should not matter because any real Porcelain would be
using this as an upstream of a pipe to its drawing engine.

Well, unless that Porcelain drives --incremental through a pair
of ptys, but I do not think it is likely ;-).

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 19:08             ` More precise tag following Linus Torvalds
@ 2007-01-28 19:18               ` Junio C Hamano
  2007-01-28 19:57                 ` Linus Torvalds
  2007-01-28 19:58                 ` Junio C Hamano
  0 siblings, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 19:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 28 Jan 2007, Junio C Hamano wrote:
>> 
>> I think it is sensible to do the attached on top of your patch.
>
> Ack.
>
> I see you committed this, which is nice, but now Shawn's butt-ugly thing 
> doesn't work any more, and my mad perl skillz are sadly lacking.

Do you mean the perl-Gtk one by Jeff King?

I was hoping to take a look at Shawn's git-gui and also perhaps
looking into adding blame --incremental support to gitk myself
when I have time, but unfortunately my day-job deadline is
spilling into this weekend.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 19:18               ` Junio C Hamano
@ 2007-01-28 19:57                 ` Linus Torvalds
  2007-01-28 20:01                   ` Junio C Hamano
                                     ` (3 more replies)
  2007-01-28 19:58                 ` Junio C Hamano
  1 sibling, 4 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-28 19:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 28 Jan 2007, Junio C Hamano wrote:
> 
> Do you mean the perl-Gtk one by Jeff King?

Sorry, yeah, I'm just confused.

Where are my meds again?

> I was hoping to take a look at Shawn's git-gui and also perhaps
> looking into adding blame --incremental support to gitk myself
> when I have time, but unfortunately my day-job deadline is
> spilling into this weekend.

I think the nice thing about the new "git-blame --incremental" is that it 
allows people who really don't know (or care) anything at all about git 
internals to do the viewer. So you shouldn't need to care.

So I don't think you should do it, we should encourage others (who may not 
be comfy with writing hard-core C that touches subtle internal git issues) 
to just do it.

One thing I looked at, which *should* be easy to do inside "git-blame", is 
to make the case where you do *not* give a head to start with, default to 
"current working tree" instead of HEAD.

For example, say that I have changes in my working tree, and I do

	git blame-viewer <filename-that-is-dirty>

I think it would be nice if the *dirty* lines would actually get blamed to 
a fake commit (SHA-1 "00000000..") that is the "current working tree. 
Right now, it always starts from HEAD:filename, which may be how CVS/SVN 
annotate and friends work, but I actually think we could do better.

If you really want the annotation for the _committed_ state, you can 
always just say so explicitly:

	git blame-viewer HEAD <filename-that-may-be-dirty-but-who-cares>

No?

But for the actual viewer parts, which don't need internal git knowledge, 
let's just document the blame format, so that others can do it:

The new format is fairly easy to parse: each blame entry is always

 - starts with a line of

	<40-byte hex sha1> <sourceline> <resultline> <num_lines>

 - the first time that commit shows up in the stream, it has various
   other information about it printed out with a one-word tag at the 
   beginning of each line about that "extended commit info" (author, 
   email, committer, dates, summary etc)

 - each entry is _always_ finished by a

	"filename" <whitespace-quoted-filename-goes-here>

and thus it's really quite easy to parse for some line- and word-oriented 
parser (which should be quite natural for most scripting languages).

NOTE! For people who do parsing: to make it more robust, just ignore any 
lines in between the first and last one ("<sha1>" and "filename" lines) 
where you don't recognize the tag-words (or care about that particular 
one) at the beginning of the "extended information" lines. That way, if 
there is ever added information (like the commit encoding or extended 
commit commentary), a blame viewer won't ever care.

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 19:18               ` Junio C Hamano
  2007-01-28 19:57                 ` Linus Torvalds
@ 2007-01-28 19:58                 ` Junio C Hamano
  1 sibling, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 19:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> On Sun, 28 Jan 2007, Junio C Hamano wrote:
>>> 
>>> I think it is sensible to do the attached on top of your patch.
>>
>> Ack.
>>
>> I see you committed this, which is nice, but now Shawn's butt-ugly thing 
>> doesn't work any more, and my mad perl skillz are sadly lacking.
>
> Do you mean the perl-Gtk one by Jeff King?

This on top of the "final" by Jeff should minimally restore it,
in addition to fixing a few problems.

 * filename/linenumber is updated from the blame.
 * gmtime output is relative to year 1900.

I've annotated revision.c with -C and it was fun to watch;-).

---
diff --git a/jk.perl b/jk.perl
index 6a4ac9f..c4d9d57 100644
--- a/jk.perl
+++ b/jk.perl
@@ -41,32 +41,63 @@ $window->show_all;
 Gtk2->main;
 exit 0;
 
+my %commitinfo = ();
+
+sub flush_blame_line {
+	my ($attr) = @_;
+
+	return unless defined $attr;
+
+	my ($commit, $s_lno, $lno, $cnt) =
+	    @{$attr}{qw(COMMIT S_LNO LNO CNT)};
+
+	my ($filename, $author, $author_time, $author_tz) =
+	    @{$commitinfo{$commit}}{qw(FILENAME AUTHOR AUTHOR-TIME AUTHOR-TZ)};
+	my $info = $author . ' ' . format_time($author_time, $author_tz);
+
+	for(my $i = 0; $i < $cnt; $i++) {
+		@{$fileview->{data}->[$lno+$i-1]}[0,1,2] =
+		    (substr($commit, 0, 8), $info,
+		     $filename . ':' . ($s_lno+$i));
+	}
+}
+
 my $buf;
+my $current;
 sub read_blame_line {
-  my $r = sysread(STDIN, $buf, 1024, length($buf));
-  return 0 unless $r;
-  while($buf =~ s/([^\n]*)\n//) {
-    my $line = $1;
-    $line =~ /^(\d+) (\d+) ([0-9a-f]+):(.*):(\d+)$/
-      or die "bad blame output: $line";
-    my $info = commitinfo($3);
-    for(my $i = 0; $i < $2; $i++) {
-      @{$fileview->{data}->[$1+$i]}[0,1] =
-        (substr($3, 0, 8), $info, $4 . ':' . ($5+$i+1));
-    }
-  }
-  return 1;
-}
 
-sub commitinfo {
-  my $hash = shift;
-  open(my $fh, '-|', qw(git rev-list -1 --pretty=raw), $hash)
-    or die "unable to open git-rev-list: $!";
-  while(<$fh>) {
-    chomp;
-    next unless /^author (.*) <.*> (\d+) ([+-]\d+)/;
-    return $1 . ' ' . format_time($2, $3);
-  }
+	my $r = sysread(STDIN, $buf, 1024, length($buf));
+	die "I/O error" unless defined $r;
+
+	if ($r == 0) {
+		flush_blame_line($current);
+		$current = undef;
+		return 0;
+	}
+
+	while ($buf =~ s/([^\n]*)\n//) {
+		my $line = $1;
+
+		if (($commit, $s_lno, $lno, $cnt) =
+		    ($line =~ /^([0-9a-f]{40}) (\d+) (\d+) (\d+)$/)) {
+			flush_blame_line($current);
+			$current = +{
+				COMMIT => $1,
+				S_LNO => $2,
+				LNO => $3,
+				CNT => $4,
+			};
+			next;
+		}
+
+		# extended attribute values
+		if ($line =~ /^(author|author-mail|author-time|author-tz|committer|committer-mail|committer-time|committer-tz|summary|filename) (.*)$/) {
+			my $commit = $current->{COMMIT};
+			$commitinfo{$commit}{uc($1)} = $2;
+			next;
+		}
+	}
+	return 1;
 }
 
 sub format_time {
@@ -78,5 +109,6 @@ sub format_time {
   $minutes = $tz < 0 ? 0-$minutes : $minutes;
   $time += $minutes * 60;
   my @t = gmtime($time);
-  return sprintf('%04d-%02d-%02d %02d:%02d:%02d %s', @t[5,4,3,2,1,0], $tz);
+  return sprintf('%04d-%02d-%02d %02d:%02d:%02d %s',
+		 $t[5] + 1900, @t[4,3,2,1,0], $tz);
 }

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 19:57                 ` Linus Torvalds
@ 2007-01-28 20:01                   ` Junio C Hamano
  2007-01-28 20:20                   ` [PATCH] document 'blame --incremental' Junio C Hamano
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 20:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But for the actual viewer parts, which don't need internal git knowledge, 
> let's just document the blame format, so that others can do it:
>
> The new format is fairly easy to parse: each blame entry is always
>
>  - starts with a line of
>
> 	<40-byte hex sha1> <sourceline> <resultline> <num_lines>
>
>  - the first time that commit shows up in the stream, it has various
>    other information about it printed out with a one-word tag at the 
>    beginning of each line about that "extended commit info" (author, 
>    email, committer, dates, summary etc)
>
>  - each entry is _always_ finished by a
>
> 	"filename" <whitespace-quoted-filename-goes-here>
>
> and thus it's really quite easy to parse for some line- and word-oriented 
> parser (which should be quite natural for most scripting languages).
>
> NOTE! For people who do parsing: to make it more robust, just ignore any 
> lines in between the first and last one ("<sha1>" and "filename" lines) 
> where you don't recognize the tag-words (or care about that particular 
> one) at the beginning of the "extended information" lines. That way, if 
> there is ever added information (like the commit encoding or extended 
> commit commentary), a blame viewer won't ever care.

Thanks for these notes, which I should have written.  I would
also caution them to ignore if there is anything they do not
understand between "filename" and <sha1>.

A sample code to parse it in Perl was just posted by me ;-).

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH] document 'blame --incremental'
  2007-01-28 19:57                 ` Linus Torvalds
  2007-01-28 20:01                   ` Junio C Hamano
@ 2007-01-28 20:20                   ` Junio C Hamano
  2007-01-28 21:06                   ` More precise tag following Junio C Hamano
  2007-01-30  9:22                   ` Junio C Hamano
  3 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 20:20 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index 5dd8e36..a4e4bee 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -8,7 +8,7 @@ git-blame - Show what revision and author last modified each line of a file
 SYNOPSIS
 --------
 [verse]
-'git-blame' [-c] [-l] [-t] [-f] [-n] [-p] [-L n,m] [-S <revs-file>]
+'git-blame' [-c] [-l] [-t] [-f] [-n] [-p] [--incremental] [-L n,m] [-S <revs-file>]
             [-M] [-C] [-C] [--since=<date>] [<rev>] [--] <file>
 
 DESCRIPTION
@@ -63,6 +63,10 @@ OPTIONS
 -p, --porcelain::
 	Show in a format designed for machine consumption.
 
+--incremental::
+	Show the result incrementally in a format designed for
+	machine consumption.
+
 -M::
 	Detect moving lines in the file as well.  When a commit
 	moves a block of lines in a file (e.g. the original file
@@ -158,6 +162,46 @@ parents, using `commit{caret}!` notation:
 	git blame -C -C -f $commit^! -- foo
 
 
+INCREMENTAL OUTPUT
+------------------
+
+When called with `--incremental` option, the command outputs the
+result as it is built.  The output generally will talk about
+lines touched by more recent commits first and is meant to be
+used by interactive viewers.
+
+The output format is similar to the Porcelain format, but it
+does not contain the actual lines from the file that is being
+annotated.  
+
+. Each blame entry always starts with a line of:
+
+	<40-byte hex sha1> <sourceline> <resultline> <num_lines>
++
+Line numbers count from 1.
+
+. The first time that commit shows up in the stream, it has various
+  other information about it printed out with a one-word tag at the 
+  beginning of each line about that "extended commit info" (author, 
+  email, committer, dates, summary etc).
+
+. Unlike Porcelain format, the filename information is always
+  given and terminates the entry:
+
+	"filename" <whitespace-quoted-filename-goes-here>
++
+and thus it's really quite easy to parse for some line- and word-oriented
+parser (which should be quite natural for most scripting languages).
++
+[NOTE]
+For people who do parsing: to make it more robust, just ignore any 
+lines in between the first and last one ("<sha1>" and "filename" lines) 
+where you don't recognize the tag-words (or care about that particular 
+one) at the beginning of the "extended information" lines. That way, if 
+there is ever added information (like the commit encoding or extended 
+commit commentary), a blame viewer won't ever care.
+
+
 SEE ALSO
 --------
 gitlink:git-annotate[1]

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 19:57                 ` Linus Torvalds
  2007-01-28 20:01                   ` Junio C Hamano
  2007-01-28 20:20                   ` [PATCH] document 'blame --incremental' Junio C Hamano
@ 2007-01-28 21:06                   ` Junio C Hamano
  2007-01-28 23:01                     ` Jeff King
  2007-01-30  9:22                   ` Junio C Hamano
  3 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2007-01-28 21:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Jeff King

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 28 Jan 2007, Junio C Hamano wrote:
> ...
>> I was hoping to take a look at Shawn's git-gui and also perhaps
>> looking into adding blame --incremental support to gitk myself
>> when I have time, but unfortunately my day-job deadline is
>> spilling into this weekend.
>
> I think the nice thing about the new "git-blame --incremental" is that it 
> allows people who really don't know (or care) anything at all about git 
> internals to do the viewer. So you shouldn't need to care.
>
> So I don't think you should do it, we should encourage others (who may not 
> be comfy with writing hard-core C that touches subtle internal git issues) 
> to just do it.

Good points.

I won't, although I've added fixed-up version of Jeff's as an
example under contrib/ -- I hope Jeff does not mind.

> ...
> I think it would be nice if the *dirty* lines would actually get blamed to 
> a fake commit (SHA-1 "00000000..") that is the "current working tree. 
> ...
> No?

Yeah.  That sounds sensible.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:36         ` Chris Lee
  2007-01-28 18:10           ` Theodore Tso
@ 2007-01-28 22:26           ` David Lang
  2007-01-29 17:34             ` Nicolas Pitre
  2007-01-29 23:00           ` Eric Wong
  2 siblings, 1 reply; 92+ messages in thread
From: David Lang @ 2007-01-28 22:26 UTC (permalink / raw)
  To: Chris Lee
  Cc: Linus Torvalds, Simon 'corecode' Schubert,
	Shawn O. Pearce, Junio C Hamano, git

if nobody else steps forward I can arrange something like this on my home server 
(only 768K updtream bandwidth, but it's better then nothing)

Daivd Lang

On Sat, 27 Jan 2007, Chris Lee wrote:

> Date: Sat, 27 Jan 2007 11:36:50 -0800
> From: Chris Lee <clee@kde.org>
> To: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Simon 'corecode' Schubert <corecode@fs.ei.tum.de>,
>     Shawn O. Pearce <spearce@spearce.org>, Junio C Hamano <junkio@cox.net>,
>     git@vger.kernel.org
> Subject: Re: More precise tag following
> 
> On 1/27/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> Do you have the converted repo somewhere to be cloned for? It's going to
>> be a lot more interesting for scalability testing than anything else.
>
> I don't have access to any servers that I could drop a 3GB packfile
> onto and expect them to serve it. And I don't have a connection at
> home that I could use to upload the 3GB pack from quickly - it would
> take days, at least. If anybody wants to hook me up with a hosting
> provider or a machine that just the git devs can access, I'd be
> willing to tie up my upstream bandwidth for a few days so you all can
> have access to it.
>
>> The thing is, one of the reasons the git object database is small is that
>> it compresses really well, and I suspect that for the KDE repo, what
>> you're seeing is really a combination of:
>>
>>  - the KDE people were idiots in the first place to make it into one big
>>    repo
>
> No argument from me about this one. The only defense I can really
> think of is that, in KDE, we *have* moved entire applications and
> libraries around between modules, and it is really nice to be able to
> have the full history for them.
>
>> So please point to a kde conversion archive to play with (maybe you have,
>> I missed it).
>
> I can provide you with instructions on how to reconstruct one, but
> you'll have to rsync over about 38GB of KDE's Subversion archive to do
> it. (Not fun.) If someone else wants to give me a dumping ground where
> I can upload my 3GB converted repo, I'd be happy to start pushing it.
>
> Also, please note, the 3GB packed repo is only about 2/3 of the full
> KDE repo - I cut off the import at revision 409202, because that was
> when the KDE svn admins decided to move a bunch of modules from
> /trunk/ to /trunk/KDE/ and it screws up everything. So a *full* KDE
> history import would definitely be more than 4GB, packed.
>
> -clee
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 21:06                   ` More precise tag following Junio C Hamano
@ 2007-01-28 23:01                     ` Jeff King
  0 siblings, 0 replies; 92+ messages in thread
From: Jeff King @ 2007-01-28 23:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Jan 28, 2007 at 01:06:00PM -0800, Junio C Hamano wrote:

> I won't, although I've added fixed-up version of Jeff's as an
> example under contrib/ -- I hope Jeff does not mind.

Not at all; thanks for updating it.

-Peff

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git-blame --incremental: don't use pager
  2007-01-28 19:14                 ` Junio C Hamano
@ 2007-01-29  0:32                   ` René Scharfe
  2007-01-29  2:35                     ` [PATCH] git blame --progress Junio C Hamano
  0 siblings, 1 reply; 92+ messages in thread
From: René Scharfe @ 2007-01-29  0:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Shawn O. Pearce, git

Junio C Hamano schrieb:
> Although I'd apply it anyway, strictly speaking, I think this
> patch should not matter because any real Porcelain would be
> using this as an upstream of a pipe to its drawing engine.
> 
> Well, unless that Porcelain drives --incremental through a pair
> of ptys, but I do not think it is likely ;-).

Ha!, didn't think of that.  I still like it more without a pager
even if run on a terminal, because then you can *see* that it's
really incremental (without needing to unset PAGER).  I'm a
non-believer. ;-)

René

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH] git blame --progress
  2007-01-29  0:32                   ` René Scharfe
@ 2007-01-29  2:35                     ` Junio C Hamano
  2007-01-29  7:00                       ` Simon 'corecode' Schubert
                                         ` (3 more replies)
  0 siblings, 4 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-29  2:35 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

[PATCH] git blame --progress

With  --progress option, the command shows a fairly useless but
amusing eye-candy while making the user wait.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

 > Junio C Hamano schrieb:
 >> Although I'd apply it anyway, strictly speaking, I think this
 >> patch should not matter because any real Porcelain would be
 >> using this as an upstream of a pipe to its drawing engine.
 >> 
 >> Well, unless that Porcelain drives --incremental through a pair
 >> of ptys, but I do not think it is likely ;-).
 >
 > Ha!, didn't think of that.  I still like it more without a pager
 > even if run on a terminal, because then you can *see* that it's
 > really incremental (without needing to unset PAGER).  I'm a
 > non-believer. ;-)

 builtin-blame.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 02bda5e..cd54acf 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -17,7 +17,7 @@
 #include "xdiff-interface.h"
 
 static char blame_usage[] =
-"git-blame [-c] [-l] [-t] [-f] [-n] [-p] [-L n,m] [-S <revs-file>] [-M] [-C] [-C] [commit] [--] file\n"
+"git-blame [-c] [-l] [-t] [-f] [-n] [-p] [--progress] [-L n,m] [-S <revs-file>] [-M] [-C] [-C] [commit] [--] file\n"
 "  -c, --compatibility Use the same output mode as git-annotate (Default: off)\n"
 "  -b                  Show blank SHA-1 for boundary commits (Default: off)\n"
 "  -l, --long          Show long commit SHA1 (Default: off)\n"
@@ -29,6 +29,7 @@ static char blame_usage[] =
 "  -L n,m              Process only line range n,m, counting from 1\n"
 "  -M, -C              Find line movements within and across files\n"
 "  --incremental       Show blame entries as we find them, incrementally\n"
+"  --progress          Show fairly useless progress display\n"
 "  -S revs-file        Use revisions from revs-file instead of calling git-rev-list\n";
 
 static int longest_file;
@@ -39,6 +40,7 @@ static int max_score_digits;
 static int show_root;
 static int blank_boundary;
 static int incremental;
+static int eye_candy;
 
 #ifndef DEBUG
 #define DEBUG 0
@@ -1189,7 +1191,80 @@ static void write_filename_info(const char *path)
 	putchar('\n');
 }
 
-static void found_guilty_entry(struct blame_entry *ent)
+#define NUM_EC_SPOT 500
+#define NUM_EC_SPOT_PER_GROUP 10
+#define NUM_EC_SPOT_PER_ROW 50
+
+static int eye_candy_spots(struct scoreboard *sb)
+{
+	int num_lines = sb->num_lines;
+	if (NUM_EC_SPOT < num_lines)
+		return NUM_EC_SPOT;
+	return num_lines;
+}
+
+static void initialize_eye_candy(struct scoreboard *sb)
+{
+	int cnt = eye_candy_spots(sb);
+	int i, j;
+
+	fprintf(stderr, "\033[2JAssigning blame for %s\n", sb->path);
+	for (i = j = 0; i < cnt; i++) {
+		fputc('.', stderr);
+		j++;
+		if (NUM_EC_SPOT_PER_ROW <= j) {
+			j = 0;
+			fputc('\n', stderr);
+		}
+		else if ((j % NUM_EC_SPOT_PER_GROUP) == 0)
+			fputc(' ', stderr);
+	}
+	if (j)
+		fputc('\n', stderr);
+}
+
+static int eye_candy_spot(struct scoreboard *sb, int lno)
+{
+	int cnt = eye_candy_spots(sb);
+	return lno * cnt / sb->num_lines;
+}
+
+static void update_eye_candy(struct scoreboard *sb, struct blame_entry *ent)
+{
+	int cnt = eye_candy_spots(sb);
+	int spot_lo, spot_hi, spot;
+	struct blame_entry *lo, *hi;
+
+	for (lo = ent; lo->prev && lo->prev->guilty; lo = lo->prev)
+		;
+	spot_lo = eye_candy_spot(sb, lo->lno);
+	for (hi = ent; hi->next && hi->next->guilty; hi = hi->next)
+		;
+	spot_hi = eye_candy_spot(sb, hi->lno + hi->num_lines - 1);
+
+	for (spot = spot_lo; spot <= spot_hi; spot++) {
+		int spot_x, spot_y;
+
+		spot_x = spot % NUM_EC_SPOT_PER_ROW;
+		spot_x = spot_x + spot_x / NUM_EC_SPOT_PER_GROUP;
+
+		spot_y = spot / NUM_EC_SPOT_PER_ROW;
+		spot_y = (cnt / NUM_EC_SPOT_PER_ROW) - spot_y;
+		if (cnt < NUM_EC_SPOT && (cnt % NUM_EC_SPOT_PER_ROW))
+			spot_y++;
+
+		if (spot_y)
+			fprintf(stderr, "\033[%dA", spot_y);
+		if (spot_x)
+			fprintf(stderr, "\033[%dC", spot_x);
+		fputc('*', stderr);
+		fprintf(stderr, "\033[%dD", spot_x + 1);
+		if (spot_y)
+			fprintf(stderr, "\033[%dB", spot_y);
+	}
+}
+
+static void found_guilty_entry(struct scoreboard *sb, struct blame_entry *ent)
 {
 	if (ent->guilty)
 		return;
@@ -1218,6 +1293,8 @@ static void found_guilty_entry(struct blame_entry *ent)
 		}
 		write_filename_info(suspect->path);
 	}
+	else if (eye_candy)
+		update_eye_candy(sb, ent);
 }
 
 static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
@@ -1253,7 +1330,7 @@ static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
 		/* Take responsibility for the remaining entries */
 		for (ent = sb->ent; ent; ent = ent->next)
 			if (!cmp_suspect(ent->suspect, suspect))
-				found_guilty_entry(ent);
+				found_guilty_entry(sb, ent);
 		origin_decref(suspect);
 
 		if (DEBUG) /* sanity */
@@ -1768,6 +1845,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 		else if (!strcmp("-n", arg) ||
 			 !strcmp("--show-number", arg))
 			output_option |= OUTPUT_SHOW_NUMBER;
+		else if (!strcmp("--progress", arg))
+			eye_candy = 1;
 		else if (!strcmp("-p", arg) ||
 			 !strcmp("--porcelain", arg))
 			output_option |= OUTPUT_PORCELAIN;
@@ -1951,6 +2030,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 		die("reading graft file %s failed: %s",
 		    revs_file, strerror(errno));
 
+	if (eye_candy)
+		initialize_eye_candy(&sb);
 	assign_blame(&sb, &revs, opt);
 
 	if (incremental)

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28  9:27           ` Junio C Hamano
                               ` (2 preceding siblings ...)
  2007-01-28 19:08             ` More precise tag following Linus Torvalds
@ 2007-01-29  6:18             ` Shawn O. Pearce
  2007-01-29 10:17               ` Junio C Hamano
  2007-01-29 16:24               ` Linus Torvalds
  3 siblings, 2 replies; 92+ messages in thread
From: Shawn O. Pearce @ 2007-01-29  6:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> wrote:
> Yes and no -- it might be interesting to start from a blank
> canvas, and insert the lines as they are received at appropriate
> places (recorded as ent->lno), although in general I agree the
> GUI would have the way and the need to grab the blob contents
> without us giving it in the --incremental output.

I just implemented the blame --incremental thing in git-gui.
I'm grabbing the file data ahead of time with git-cat-file, throwing
up the UI, then streaming in the incremental blame data as it comes.
Its *VERY* fast.  Nice job to both of you (Linus and Junio).

If you are curious its been pushed to repo.or.cz:

  git://repo.or.cz/git-gui.git

  Repository->Browse Current Branch
  Double click on the file you want to see.

I'm going to work up screenshots a bit later.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29  2:35                     ` [PATCH] git blame --progress Junio C Hamano
@ 2007-01-29  7:00                       ` Simon 'corecode' Schubert
  2007-01-29 16:54                       ` Alex Riesen
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-29  7:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

Junio C Hamano wrote:
> +	fprintf(stderr, "\033[2JAssigning blame for %s\n", sb->path);

are you sure that you want to hard code the escape sequence?  I guess the correct way would be to query terminfo.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29  6:18             ` Shawn O. Pearce
@ 2007-01-29 10:17               ` Junio C Hamano
  2007-01-29 10:31                 ` Shawn O. Pearce
  2007-01-29 16:24               ` Linus Torvalds
  1 sibling, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2007-01-29 10:17 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Linus Torvalds, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> If you are curious its been pushed to repo.or.cz:
>
>   git://repo.or.cz/git-gui.git
>
>   Repository->Browse Current Branch
>   Double click on the file you want to see.

Cool.

I think it is not a big deal for git-gui which is for active
developers and not primarily for archaeologists, but it does not
seem to like to be invoked inside a bare repository.

Also it becomes very tempting to somehow have this "file
browser" selection UI as "tree browser" that can wander around
to view an arbitrary tree in the commit history.  The boundary
between git-gui and gitk would start to blurrrrrr.....

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 10:17               ` Junio C Hamano
@ 2007-01-29 10:31                 ` Shawn O. Pearce
  0 siblings, 0 replies; 92+ messages in thread
From: Shawn O. Pearce @ 2007-01-29 10:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > If you are curious its been pushed to repo.or.cz:
> >
> >   git://repo.or.cz/git-gui.git
> 
> I think it is not a big deal for git-gui which is for active
> developers and not primarily for archaeologists, but it does not
> seem to like to be invoked inside a bare repository.

Yes.  git-gui does some bad things.  On Mac OS X and Windows
it lets you setup a "shortcut".  This bastard thing is a batch
file/script which basically sets the GIT_DIR environment variable,
then starts git-gui.  So it assumes that the GIT_DIR its getting
is somehow connected to a working directory of sorts.

I actually have plans to cleanup some of git-gui's internals so that
its easier to specify what it can do, and cannot do, during startup,
then configure the UI from that.  For example, one should not be
allowed to commit in a bare repository, or merge, but fetch and
push are still OK.  So is browsing, creating and deleting branches.
Or editing options (.git/config).

I think the cleanup is easier than it sounds; a lot of the UI is
already parameterized based on [appname], which is 'git-gui' or
'git-citool', depending on the name it was invoked as.  This just
needs to carry through a little bit more.
 
> Also it becomes very tempting to somehow have this "file
> browser" selection UI as "tree browser" that can wander around
> to view an arbitrary tree in the commit history.  The boundary
> between git-gui and gitk would start to blurrrrrr.....

Indeed.  The main entrypoint is "new_browser $committish".  I don't
care what $committish is, just so long as git-blame would understand
it.  It could actually be a treeish, but blame would obviously
choke when you open a file and we won't get annotation data.

I just need to hook up some smarter UI to let you select the
committish in question.  Then comes things like wanting to extract
any given file to the local filesystem (e.g. "git show b:file >c"),
etc.

As for the line blurring between git-gui and gitk, yea, its heading
there.  Originally I set out to say "git-gui is for making changes
and transport, gitk is for history browsing".  With this addition of
a tree browser and incremental blame viewer, I'm finding it hard to
not add some sort of commit viewer when you double click a commit
in the blame output.

I *really* do not want to redo what gitk does.  Paul, et.al. have
done an excellent job with gitk[*1*].  Its currently 6,324 lines.
git-gui is another 5,654 lines.  I don't think we want them redoing
each other's work.  It would be better if Paul and I could find a
way to meld the two into a single process.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29  6:18             ` Shawn O. Pearce
  2007-01-29 10:17               ` Junio C Hamano
@ 2007-01-29 16:24               ` Linus Torvalds
  2007-01-29 18:07                 ` Simon 'corecode' Schubert
                                   ` (2 more replies)
  1 sibling, 3 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-29 16:24 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git



On Mon, 29 Jan 2007, Shawn O. Pearce wrote:
> 
> I just implemented the blame --incremental thing in git-gui.

That's a real technicolor interface ;)

It's prettier, but it highlights an issue I had with the perl-gtk blame 
viewer too (but there it was overshadowed by all the other aesthetic 
issues)..

One thing I never really enjoyed about the normal "git blame" either, and 
that the git-gui interface makes even worse, is that it uses a *lot* of 
real-estate for the blaming. I've got a big screen, and usually run with 
100+ character wide terminals, but for git blame, I think the canvas is 
120+ characters, and despite that over half of it is just the blame 
output.

That's actually distracting for several reasons:

 - it may be interesting when the primary interest is the shiny new output 
   from "git blame --incremental", but at least the way I have ever used 
   annotations, I'm not actually *interested* in the annotations until I 
   find the code I'm looking for.

   In other words, the actual file data is really the *primary* thing. 
   It's the stuff you need to look at first, and it's the thing that ends 
   up making all the rest relevant. The current "git blame" and "git-gui" 
   interfaces just seem to give too much importance to the annotation data 
   itself.

   Now, in a plain-text pager thign (aka the traditional "git blame"), you 
   don't have much choice. The blame data needs to be there, and you can't 
   hide it, because if you do, there's no way to get at it. But things are 
   different with an interactive graphical environment (or a textual one, 
   for that matter: using some curses interface wouldn't change this 
   argument).

   You _could_ just make the primary thing be the actual file data, and 
   the blame be "incidental". Which it really is.

 - As Ted already pointed out, you actually want to search for the point 
   you're interested in, but when you start out and see the top of the 
   file that generally gets annotated last, a natural reaction with the 
   current interface is to wait for the annotations to happen rather than 
   actually start looking at the code.

   Which is silly. You end up waiting for somethign that you don't even 
   really care about..

   Again, I think the basic issue is the same: by making the annotation 
   data *so* prominent, the lack of it just forces you mentally to think 
   that something important is missing.

 - Finally, the purely practical issue of "on a small screen, this would 
   be almost impossible to use".

   Optimally, you should be able to see the whole (or at least the bulk) 
   of the actual file content even if you only had 80-character lines in 
   the blame viewer. And I just tested: if I make the blame viewer 80 
   characters wide when I look at a random kernel file annotation, I don't 
   even see the "current line number", much less the actual file data. And 
   remember: the file data was supposed to be the *primary* thing.

   If I make it 110 characters wide, I can see ~20 characters of the file 
   data, which means that I can't actually make sense out of anything that 
   is indented by more than two indents, and I usually can't even see the 
   full function names - much less arguments - in declarations..

Anyway, all of these issues makes me suspect that the proper blame 
interface is to basically *hide* the blame almost entirely, in order to 
make the important parts much more visible, and in order to encourage 
people to start looking for the piece of code that they are actually 
interested in.

Then, some *small* part of the annotation window (preferably on the 
right-hand side) should have some very basic blame info - possibly even 
just a "grouping hint" to see where the blame boundaries are. And only 
when you mouse over it or something, do you get the full data.

I dunno. I'm horrible at actually doing GUI's, so you should take anything 
I say with a grain of salt. At the same time, I do know what *I* consider 
to be important (which tends to be unusual in a user), and I'd like to 
think that I have a clue about how people work. And I've always hated 
"annotate" in CVS, but git made it even worse by making the annotation 
data much bigger.

(Yes, from a technical standpoint making the annotation data bigger is a 
good thign: git simply has more useful information than CVS does. But the 
lack of information in CVS actually makes the "stupid interface" better, 
if only because you don't waste as much space on it).

But I'm not going to be able to actually do what I describe above. I can 
only hope to inspire somebody else..

			Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29  2:35                     ` [PATCH] git blame --progress Junio C Hamano
  2007-01-29  7:00                       ` Simon 'corecode' Schubert
@ 2007-01-29 16:54                       ` Alex Riesen
  2007-01-29 18:12                       ` Matthias Lederhofer
  2007-01-29 19:59                       ` René Scharfe
  3 siblings, 0 replies; 92+ messages in thread
From: Alex Riesen @ 2007-01-29 16:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git

On 1/29/07, Junio C Hamano <junkio@cox.net> wrote:
> [PATCH] git blame --progress
>
> With  --progress option, the command shows a fairly useless but
> amusing eye-candy while making the user wait.
>

It is not only amusing - it also gives the user a visual
information (not precise, but interesting) about something happening,
how fast is it happening and how long to wait.
I like it, even though I seldom use git-blame myself, it's the kind of
nice thing you have a fond memories afterwards.
Like the ascii graphics :)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 22:26           ` David Lang
@ 2007-01-29 17:34             ` Nicolas Pitre
  2007-01-29 17:42               ` Linus Torvalds
  0 siblings, 1 reply; 92+ messages in thread
From: Nicolas Pitre @ 2007-01-29 17:34 UTC (permalink / raw)
  To: David Lang
  Cc: Chris Lee, Linus Torvalds, Simon 'corecode' Schubert,
	Shawn O. Pearce, Junio C Hamano, git

On Sun, 28 Jan 2007, David Lang wrote:

> On Sat, 27 Jan 2007, Chris Lee wrote:
> 
> > From: Chris Lee <clee@kde.org>
> > On 1/27/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > Do you have the converted repo somewhere to be cloned for? It's going to
> > > be a lot more interesting for scalability testing than anything else.
> >
> > I don't have access to any servers that I could drop a 3GB packfile
> > onto and expect them to serve it. And I don't have a connection at
> > home that I could use to upload the 3GB pack from quickly - it would
> > take days, at least. If anybody wants to hook me up with a hosting
> > provider or a machine that just the git devs can access, I'd be
> > willing to tie up my upstream bandwidth for a few days so you all can
> > have access to it.

> if nobody else steps forward I can arrange something like this on my home
> server (only 768K updtream bandwidth, but it's better then nothing)

Hey guys,

We might be a couple people interested in this pack (I do as pack 
performance is one of my main interest in git).  Linus is interested, 
and I'd guess Junio too, maybe a few others.

Chris: why don't you just set up a Bittorrent feed for it?  When we'll 
all start fetching it then the bandwidth will increasingly be shared 
amongst all interested people.


Nicolas

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 17:34             ` Nicolas Pitre
@ 2007-01-29 17:42               ` Linus Torvalds
  2007-01-29 17:58                 ` Nicolas Pitre
  0 siblings, 1 reply; 92+ messages in thread
From: Linus Torvalds @ 2007-01-29 17:42 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: David Lang, Chris Lee, Simon 'corecode' Schubert,
	Shawn O. Pearce, Junio C Hamano, git



On Mon, 29 Jan 2007, Nicolas Pitre wrote:
> 
> Chris: why don't you just set up a Bittorrent feed for it?  When we'll 
> all start fetching it then the bandwidth will increasingly be shared 
> amongst all interested people.

Well, it doesn't really help Chris. All the data will end up starting from 
him anyway. 

The problem isn't the bandwidth for lots of people to download it, but the 
bandwidth for a *single* upload ;)

Once it's uploaded anywhere, we've got people willing to mirror it 
infinitely ..

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 17:42               ` Linus Torvalds
@ 2007-01-29 17:58                 ` Nicolas Pitre
  2007-01-29 19:16                   ` Chris Lee
  0 siblings, 1 reply; 92+ messages in thread
From: Nicolas Pitre @ 2007-01-29 17:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Lang, Chris Lee, Simon 'corecode' Schubert,
	Shawn O. Pearce, Junio C Hamano, git

On Mon, 29 Jan 2007, Linus Torvalds wrote:

> 
> 
> On Mon, 29 Jan 2007, Nicolas Pitre wrote:
> > 
> > Chris: why don't you just set up a Bittorrent feed for it?  When we'll 
> > all start fetching it then the bandwidth will increasingly be shared 
> > amongst all interested people.
> 
> Well, it doesn't really help Chris. All the data will end up starting from 
> him anyway. 

Sure, but I was under the impression this wasn't the problem.  Given 
what Chris said:

|I don't have access to any servers that I could drop a 3GB packfile 
|onto and expect them to serve it. [...] If anybody wants to hook me up
|with a hosting provider or a machine that just the git devs can access, 
|I'd be willing to tie up my upstream bandwidth for a few days so you 
|all can have access to it.

And then David said:

|if nobody else steps forward I can arrange something like this on my 
|home server (only 768K updtream bandwidth, but it's better then 
|nothing)

So it looks like the server was the main issue here.

> Once it's uploaded anywhere, we've got people willing to mirror it 
> infinitely ..

Has this been set up with Chris already?


Nicolas

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 16:24               ` Linus Torvalds
@ 2007-01-29 18:07                 ` Simon 'corecode' Schubert
  2007-01-29 19:29                 ` Theodore Tso
  2007-01-31  8:39                 ` David Kågedal
  2 siblings, 0 replies; 92+ messages in thread
From: Simon 'corecode' Schubert @ 2007-01-29 18:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

Linus Torvalds wrote:
> (Yes, from a technical standpoint making the annotation data bigger is a 
> good thign: git simply has more useful information than CVS does. But the 
> lack of information in CVS actually makes the "stupid interface" better, 
> if only because you don't waste as much space on it).

I absolutely agree.  My primary workflow around cvs annotate is about this:  I read code and would like to know why this one snippet was introduced in the first place (or changed).  So I go cvs annotate in my browser, and in parallel I display the cvs log, to actually see the commit message.  Then I retrieve the diff output to see what happened, maybe starting the cycle again with an older version.

What I want to illustrate is:  No matter how much information you show in one line, you won't be able to fit all possible information.  So my dream interface is a display which shows which runs of lines were changed together, and in which order the runs were changed.  A temporary numbering of commits might help here (in CVS it is clear).  Now when I identify a run, i'd like to have an easy way to retrieve the git log -p output.  An additional blame should work on the parent of the commit associated with the current line (so that I can see how the line looked before this commit, and when this was changed).

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29  2:35                     ` [PATCH] git blame --progress Junio C Hamano
  2007-01-29  7:00                       ` Simon 'corecode' Schubert
  2007-01-29 16:54                       ` Alex Riesen
@ 2007-01-29 18:12                       ` Matthias Lederhofer
  2007-01-29 19:06                         ` Junio C Hamano
  2007-01-29 19:59                       ` René Scharfe
  3 siblings, 1 reply; 92+ messages in thread
From: Matthias Lederhofer @ 2007-01-29 18:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> With  --progress option, the command shows a fairly useless but
> amusing eye-candy while making the user wait.
Looks nice :)  It makes it much more comfortable to wait for the real
output.  Perhaps there should be a config option to enable the
eye-candy when git-blame is run on a terminal (e.g. (configoption &&
stdout is a tty && stderr is a tty) || --progress)?  Typing --progress
is annoying.

I also noticed that git-blame with --progress is slowed down a bit by
the terminal here; the output on stderr was ~400kb.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29 18:12                       ` Matthias Lederhofer
@ 2007-01-29 19:06                         ` Junio C Hamano
  0 siblings, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-29 19:06 UTC (permalink / raw)
  To: Matthias Lederhofer; +Cc: git, Alex Riesen

Matthias Lederhofer <matled@gmx.net> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> With  --progress option, the command shows a fairly useless but
>> amusing eye-candy while making the user wait.
>
> Looks nice :)  It makes it much more comfortable to wait for the real
> output.  Perhaps there should be a config option to enable the
> eye-candy when git-blame is run on a terminal (e.g. (configoption &&
> stdout is a tty && stderr is a tty) || --progress)?  Typing --progress
> is annoying.
>
> I also noticed that git-blame with --progress is slowed down a bit by
> the terminal here; the output on stderr was ~400kb.

Easy, that was not a serious patch meant for inclusion.

If somebody wants to polish it up and re-submit that is fine by
me.  It might be interesting to add colors, make it less
dependent on ANSI terminal control, and handle half-dots more
intelligently.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 17:58                 ` Nicolas Pitre
@ 2007-01-29 19:16                   ` Chris Lee
  0 siblings, 0 replies; 92+ messages in thread
From: Chris Lee @ 2007-01-29 19:16 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Linus Torvalds, David Lang, Simon 'corecode' Schubert,
	Shawn O. Pearce, Junio C Hamano, git

On 1/29/07, Nicolas Pitre <nico@cam.org> wrote:
> On Mon, 29 Jan 2007, Linus Torvalds wrote:
> > Well, it doesn't really help Chris. All the data will end up starting from
> > him anyway.
>
> Sure, but I was under the impression this wasn't the problem.  Given
> what Chris said:
>
> |I don't have access to any servers that I could drop a 3GB packfile
> |onto and expect them to serve it. [...] If anybody wants to hook me up
> |with a hosting provider or a machine that just the git devs can access,
> |I'd be willing to tie up my upstream bandwidth for a few days so you
> |all can have access to it.
>
> And then David said:
>
> |if nobody else steps forward I can arrange something like this on my
> |home server (only 768K updtream bandwidth, but it's better then
> |nothing)
>
> So it looks like the server was the main issue here.
>
> > Once it's uploaded anywhere, we've got people willing to mirror it
> > infinitely ..
>
> Has this been set up with Chris already?

I'm going to burn a DVD with the repo on it later today and mail it to
hpa since he's only like two towns over.

So, hopefully, it'll be up somewhere useful by the end of the week :)

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 16:24               ` Linus Torvalds
  2007-01-29 18:07                 ` Simon 'corecode' Schubert
@ 2007-01-29 19:29                 ` Theodore Tso
  2007-01-29 19:45                   ` Linus Torvalds
                                     ` (2 more replies)
  2007-01-31  8:39                 ` David Kågedal
  2 siblings, 3 replies; 92+ messages in thread
From: Theodore Tso @ 2007-01-29 19:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn O. Pearce, Junio C Hamano, git

On Mon, Jan 29, 2007 at 08:24:52AM -0800, Linus Torvalds wrote:
> Anyway, all of these issues makes me suspect that the proper blame 
> interface is to basically *hide* the blame almost entirely, in order to 
> make the important parts much more visible, and in order to encourage 
> people to start looking for the piece of code that they are actually 
> interested in.

One approach which might work is where you hover your mouse over a
line, and it pops up a tiny window with the blame information if the
mouse remains stationary for more than a second or two.

Another thing which would be really useful is where the lines that
have been changed in the last n commits (where n is probably between
3-5) are highlighted using different colors.  That way you can see
what was changed recently, which is often what you are most interested
in.  (As in, what changed recently that might have caused this file to
get all screwed up?)

						- Ted

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 19:29                 ` Theodore Tso
@ 2007-01-29 19:45                   ` Linus Torvalds
  2007-01-29 20:25                   ` Jakub Narebski
  2007-02-09  7:41                   ` Shawn O. Pearce
  2 siblings, 0 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-29 19:45 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Shawn O. Pearce, Junio C Hamano, git



On Mon, 29 Jan 2007, Theodore Tso wrote:
> 
> One approach which might work is where you hover your mouse over a
> line, and it pops up a tiny window with the blame information if the
> mouse remains stationary for more than a second or two.

Yes. I think that's the kind of interface that most people really want.

Of course, almost always, you'd actually want it in your editor of choice, 
and that's not going to happen. But if it looks basically just like a 
normal editor (perhaps with line numbers - that is fairly traditional in 
annotations), that's the basic most spartan interface I can think of. With 
"hover" just causing the extended information for the few lines around the 
mouse to show up (and a "select <n> lines with mouse" for more than just a 
few lines).

> Another thing which would be really useful is where the lines that
> have been changed in the last n commits (where n is probably between
> 3-5) are highlighted using different colors.

That would be very natural for the way "git blame --incremental" works, so 
yes, I agree. Not only does it highlight the likely interesting places, 
but it's very much the natural flow for the whole tool.

Ok. Now we just need some sucker^H^H^H^H^H^Henterprising person to 
actually do this ;)

			Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29  2:35                     ` [PATCH] git blame --progress Junio C Hamano
                                         ` (2 preceding siblings ...)
  2007-01-29 18:12                       ` Matthias Lederhofer
@ 2007-01-29 19:59                       ` René Scharfe
  2007-01-29 20:24                         ` Linus Torvalds
  2007-01-30  1:53                         ` Junio C Hamano
  3 siblings, 2 replies; 92+ messages in thread
From: René Scharfe @ 2007-01-29 19:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano schrieb:
> [PATCH] git blame --progress
> 
> With  --progress option, the command shows a fairly useless but 
> amusing eye-candy while making the user wait.

Nicely done, I like it.  Well, then again, I used to watch the progress
of filesystem defragmentors as a kid.  Ahem. :-P

The problem here is, of course, that we don't know how beforehand much
work needs to be done.  The indicator could be full of stars long before
the start of history is reached.

This could be helped somewhat by having three states instead of two:
unblamed (.), blamed (o) and just-now-blamed (*).  Each time new stars
are written you'd demote the other stars in the field to o's.  This way
you'll at least see something moving until the end, no matter how often
blame is pushed further down for already blamed lines.

This increases terminal bandwidth usage and on-screen activity, but not
necessarily the usefulness of this thing. :)

René


diff --git a/builtin-blame.c b/builtin-blame.c
index cd54acf..9bed52f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1229,18 +1229,9 @@ static int eye_candy_spot(struct scoreboard *sb, int lno)
 	return lno * cnt / sb->num_lines;
 }
 
-static void update_eye_candy(struct scoreboard *sb, struct blame_entry *ent)
+static void update_eye_candy_spots(int cnt, int spot_lo, int spot_hi, char c)
 {
-	int cnt = eye_candy_spots(sb);
-	int spot_lo, spot_hi, spot;
-	struct blame_entry *lo, *hi;
-
-	for (lo = ent; lo->prev && lo->prev->guilty; lo = lo->prev)
-		;
-	spot_lo = eye_candy_spot(sb, lo->lno);
-	for (hi = ent; hi->next && hi->next->guilty; hi = hi->next)
-		;
-	spot_hi = eye_candy_spot(sb, hi->lno + hi->num_lines - 1);
+	int spot;
 
 	for (spot = spot_lo; spot <= spot_hi; spot++) {
 		int spot_x, spot_y;
@@ -1257,13 +1248,35 @@ static void update_eye_candy(struct scoreboard *sb, struct blame_entry *ent)
 			fprintf(stderr, "\033[%dA", spot_y);
 		if (spot_x)
 			fprintf(stderr, "\033[%dC", spot_x);
-		fputc('*', stderr);
+		fputc(c, stderr);
 		fprintf(stderr, "\033[%dD", spot_x + 1);
 		if (spot_y)
 			fprintf(stderr, "\033[%dB", spot_y);
 	}
 }
 
+static void update_eye_candy(struct scoreboard *sb, struct blame_entry *ent)
+{
+	int cnt = eye_candy_spots(sb);
+	int spot_lo, spot_hi;
+	struct blame_entry *lo, *hi;
+	static int prev_cnt, prev_spot_lo, prev_spot_hi;
+
+	for (lo = ent; lo->prev && lo->prev->guilty; lo = lo->prev)
+		;
+	spot_lo = eye_candy_spot(sb, lo->lno);
+	for (hi = ent; hi->next && hi->next->guilty; hi = hi->next)
+		;
+	spot_hi = eye_candy_spot(sb, hi->lno + hi->num_lines - 1);
+
+	update_eye_candy_spots(prev_cnt, prev_spot_lo, prev_spot_hi, 'o');
+	update_eye_candy_spots(cnt, spot_lo, spot_hi, '*');
+
+	prev_cnt = cnt;
+	prev_spot_lo = spot_lo;
+	prev_spot_hi = spot_hi;
+}
+
 static void found_guilty_entry(struct scoreboard *sb, struct blame_entry *ent)
 {
 	if (ent->guilty)

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29 19:59                       ` René Scharfe
@ 2007-01-29 20:24                         ` Linus Torvalds
  2007-01-30  1:53                         ` Junio C Hamano
  1 sibling, 0 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-29 20:24 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 590 bytes --]



On Mon, 29 Jan 2007, René Scharfe wrote:
> 
> The problem here is, of course, that we don't know how beforehand much
> work needs to be done.  The indicator could be full of stars long before
> the start of history is reached.

Well, we do have an approximation for it: we know how many lines the file 
has, and we do know (although we don't actually track) how many lines 
we've blamed so far.

So it would be fairly easy to give at least a *rough* indication of 
"percent blamed" - although it doesn't necessarily say anything about how 
expensive that last 1% is going to be..

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 19:29                 ` Theodore Tso
  2007-01-29 19:45                   ` Linus Torvalds
@ 2007-01-29 20:25                   ` Jakub Narebski
  2007-01-29 20:47                     ` Shawn O. Pearce
  2007-02-09  7:41                   ` Shawn O. Pearce
  2 siblings, 1 reply; 92+ messages in thread
From: Jakub Narebski @ 2007-01-29 20:25 UTC (permalink / raw)
  To: git

Theodore Tso wrote:

> On Mon, Jan 29, 2007 at 08:24:52AM -0800, Linus Torvalds wrote:
>> Anyway, all of these issues makes me suspect that the proper blame 
>> interface is to basically *hide* the blame almost entirely, in order to 
>> make the important parts much more visible, and in order to encourage 
>> people to start looking for the piece of code that they are actually 
>> interested in.
> 
> One approach which might work is where you hover your mouse over a
> line, and it pops up a tiny window with the blame information if the
> mouse remains stationary for more than a second or two.
> 
> Another thing which would be really useful is where the lines that
> have been changed in the last n commits (where n is probably between
> 3-5) are highlighted using different colors.  That way you can see
> what was changed recently, which is often what you are most interested
> in.  (As in, what changed recently that might have caused this file to
> get all screwed up?)

It would be also nice to have window split into two, and for example
have at bottom details of the commit which changed current line, like
author, description, date, how many commits ago, branch name (e.g. taken
from commit message if it was merged), perhaps also patch...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 20:25                   ` Jakub Narebski
@ 2007-01-29 20:47                     ` Shawn O. Pearce
  2007-01-29 21:02                       ` Jakub Narebski
  0 siblings, 1 reply; 92+ messages in thread
From: Shawn O. Pearce @ 2007-01-29 20:47 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> wrote:
> Theodore Tso wrote:
> > On Mon, Jan 29, 2007 at 08:24:52AM -0800, Linus Torvalds wrote:
> >> Anyway, all of these issues makes me suspect that the proper blame 
> >> interface is to basically *hide* the blame almost entirely, in order to 
> >> make the important parts much more visible, and in order to encourage 
> >> people to start looking for the piece of code that they are actually 
> >> interested in.
> > 
> > One approach which might work is where you hover your mouse over a
> > line, and it pops up a tiny window with the blame information if the
> > mouse remains stationary for more than a second or two.
> > 
> > Another thing which would be really useful is where the lines that
> > have been changed in the last n commits (where n is probably between
> > 3-5) are highlighted using different colors.  That way you can see
> > what was changed recently, which is often what you are most interested
> > in.  (As in, what changed recently that might have caused this file to
> > get all screwed up?)
> 
> It would be also nice to have window split into two, and for example
> have at bottom details of the commit which changed current line, like
> author, description, date, how many commits ago, branch name (e.g. taken
> from commit message if it was merged), perhaps also patch...

These are some really good ideas.  I knew that if I made pretty
technicolor crap available, people would tell me what they really
needed.  :-)

I'll like steal them (er, uhm, implement them) in git-gui in the next
day or so.  The current blame UI was sort of a prototype.  Once I
tossed the original filename and original line number into that
thing it started to become pretty obvious its just too cluttered.
But at that point I wanted pretty colors, and uh, it was late... :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 20:47                     ` Shawn O. Pearce
@ 2007-01-29 21:02                       ` Jakub Narebski
  0 siblings, 0 replies; 92+ messages in thread
From: Jakub Narebski @ 2007-01-29 21:02 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

Shawn O. Pearce wrote:

> I'll like steal them (er, uhm, implement them) in git-gui in the next
> day or so.  The current blame UI was sort of a prototype.  Once I
> tossed the original filename and original line number into that
> thing it started to become pretty obvious its just too cluttered.
> But at that point I wanted pretty colors, and uh, it was late... :-)

You can borrow some of the ideas from gitweb new blame output 
(git_blame2) by Luben and Junio, too...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-27 19:36         ` Chris Lee
  2007-01-28 18:10           ` Theodore Tso
  2007-01-28 22:26           ` David Lang
@ 2007-01-29 23:00           ` Eric Wong
  2007-01-30  0:42             ` Eric Wong
  2 siblings, 1 reply; 92+ messages in thread
From: Eric Wong @ 2007-01-29 23:00 UTC (permalink / raw)
  To: Chris Lee; +Cc: git

Chris Lee <clee@kde.org> wrote:
> Also, please note, the 3GB packed repo is only about 2/3 of the full
> KDE repo - I cut off the import at revision 409202, because that was
> when the KDE svn admins decided to move a bunch of modules from
> /trunk/ to /trunk/KDE/ and it screws up everything. So a *full* KDE
> history import would definitely be more than 4GB, packed.

Hmm.. This movement from /trunk to /trunk/KDE could be a good case
for the (still-improving) --follow-parent feature in git-svn.

I've resigned to the fact that git-svn (and git) is not ideal for
tracking the entire KDE repository.   I don't think most (sane)
developers check out the entire repository root when working on KDE,
and cloning a 3GB pack just isn't realistic these days.

But if splitting the monster repository into separate repositories is an
option for KDE, and I can make --follow-parent do what it's supposed to
do very well: then it could be a nice way to make things more manageable
for developers that will eventually switch to.

For tracking the trunk of kde-common, I'm running with my
work-in-progress version of git-svn:

git-svn init -i kde-common svn://anonsvn.kde.org/home/kde/trunk/KDE/kde-common
git-svn fetch -i kde-common --follow-parent

It seems to be following kde-common into pre-409202 revisions (down to
r11472) pretty well.  I'll upload the result to git.bogomips.org when
I'm done.

You can get my latest git-svn here: git://git.bogomips.org/git-svn
Note: I've not used this to dcommit for serious work, so I can't tell
if it's really useful.  But fetching seems fine.

The current --follow-parent implementation does not handle some cases
very well.  I would like to get it to correctly track parents analogous
to the way gitk was merged into git, so that subprojects merged into
a bigger project are currently ignored.

Junio: there are some huge changes; so please don't merge it into master
yet, I don't feel it's ready yet, but it should be sometime in the next
week.  I would like to see this in 1.5.0, but only if it's not horribly
broken (especially w.r.t committing).

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 23:00           ` Eric Wong
@ 2007-01-30  0:42             ` Eric Wong
  2007-01-30  0:48               ` Eric Wong
  2007-01-30  8:51               ` Eric Wong
  0 siblings, 2 replies; 92+ messages in thread
From: Eric Wong @ 2007-01-30  0:42 UTC (permalink / raw)
  To: Chris Lee; +Cc: git

Eric Wong <normalperson@yhbt.net> wrote:
> It seems to be following kde-common into pre-409202 revisions (down to
> r11472) pretty well.  I'll upload the result to git.bogomips.org when
> I'm done.

This will have to wait: I'm getting "Malformed network data" errors with
both do_update and do_switch (not yet working on any release version of
SVN incl. 1.4.3).  I'll have to examine this a bit more when I get the
time.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-30  0:42             ` Eric Wong
@ 2007-01-30  0:48               ` Eric Wong
  2007-01-30  8:51               ` Eric Wong
  1 sibling, 0 replies; 92+ messages in thread
From: Eric Wong @ 2007-01-30  0:48 UTC (permalink / raw)
  To: Chris Lee; +Cc: git

Eric Wong <normalperson@yhbt.net> wrote:
> Eric Wong <normalperson@yhbt.net> wrote:
> > It seems to be following kde-common into pre-409202 revisions (down to
> > r11472) pretty well.  I'll upload the result to git.bogomips.org when
> > I'm done.
> 
> This will have to wait: I'm getting "Malformed network data" errors with
> both do_update and do_switch (not yet working on any release version of
> SVN incl. 1.4.3).  I'll have to examine this a bit more when I get the
> time.

Also, you should *not* be able to reproduce this error if you try
tracking kde-common on a local file:// repository.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH] git blame --progress
  2007-01-29 19:59                       ` René Scharfe
  2007-01-29 20:24                         ` Linus Torvalds
@ 2007-01-30  1:53                         ` Junio C Hamano
  1 sibling, 0 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-30  1:53 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Junio C Hamano schrieb:
>> [PATCH] git blame --progress
>> 
>> With  --progress option, the command shows a fairly useless but 
>> amusing eye-candy while making the user wait.
>
> Nicely done, I like it.  Well, then again, I used to watch the progress
> of filesystem defragmentors as a kid.  Ahem. :-P
>
> The problem here is, of course, that we don't know how beforehand much
> work needs to be done.  The indicator could be full of stars long before
> the start of history is reached.
>
> This could be helped somewhat by having three states instead of two:
> unblamed (.), blamed (o) and just-now-blamed (*).  Each time new stars
> are written you'd demote the other stars in the field to o's.  This way
> you'll at least see something moving until the end, no matter how often
> blame is pushed further down for already blamed lines.
>
> This increases terminal bandwidth usage and on-screen activity, but not
> necessarily the usefulness of this thing. :)

I do not know if this is working as you intended.

If somebody wants to really do this, the first clean-up to be
done is to remove the two loops that goes back and forward to
find the continguous guilty range.  That was done only because I
was lazy and did not want to count the boundary to deal with a
half-dot problem (a displayed column on the screen can represent
N lines -- what happens when the blame entry whose origin is now
known covers only partially?  You need to either draw it as a
half-done, or you make sure to paint it only when all of the N
lines are blamed).  The output from my original will repaint the
whole thing at the end of the blame because at that point
spot_lo and spot_hi would cover the entire range -- which shows
how lazy I am ;-).

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-30  0:42             ` Eric Wong
  2007-01-30  0:48               ` Eric Wong
@ 2007-01-30  8:51               ` Eric Wong
  1 sibling, 0 replies; 92+ messages in thread
From: Eric Wong @ 2007-01-30  8:51 UTC (permalink / raw)
  To: Chris Lee; +Cc: git

Eric Wong <normalperson@yhbt.net> wrote:
> Eric Wong <normalperson@yhbt.net> wrote:
> > It seems to be following kde-common into pre-409202 revisions (down to
> > r11472) pretty well.  I'll upload the result to git.bogomips.org when
> > I'm done.

Ok, kde-common is available here: http://git.bogomips.org/kde-common.git

> This will have to wait: I'm getting "Malformed network data" errors with
> both do_update and do_switch (not yet working on any release version of
> SVN incl. 1.4.3).  I'll have to examine this a bit more when I get the
> time.

Actually, I think I'm going crazy, do_update works fine, do_switch
(which no released version of SVN supports, yet) did not because of
reparenting.  Nevertheless, everything appears to work with my latest
git-svn (git://git.bogomips.org/git-svn.git)

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-28 19:57                 ` Linus Torvalds
                                     ` (2 preceding siblings ...)
  2007-01-28 21:06                   ` More precise tag following Junio C Hamano
@ 2007-01-30  9:22                   ` Junio C Hamano
  2007-01-30 15:31                     ` Shawn O. Pearce
  2007-01-30 17:02                     ` Linus Torvalds
  3 siblings, 2 replies; 92+ messages in thread
From: Junio C Hamano @ 2007-01-30  9:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> One thing I looked at, which *should* be easy to do inside "git-blame", is 
> to make the case where you do *not* give a head to start with, default to 
> "current working tree" instead of HEAD.

This is still very rough; the existing diff frontends are mess
and making diff-cache and diff-tree behave more or less
interchangeably is quite a pain.  I am not proud of the new
do_diff_cache() interface I had to add, which is probably
totally useless for anybody other than the three calling sites
this patch has.

I tested only the most trivial case that exercises the
do_diff_cache() cal in find_origin() before I got too tired, and
I am retiring to bed now.

-- >8 --
[PATCH] git-blame: no rev means start from the working tree file.

Warning: this changes the semantics.

This is a WIP to make "git blame" without any positive rev to
start digging from the working tree copy, which is made into a
fake commit whose sole parent is the HEAD.

It might make sense to give "git-blame --cached" to start
digging from the index as well, which should be trivial.

The calls to do_diff_cache() in find_copy_in_parent() and
find_rename() need to be vetted, as I haven't checked them yet.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 builtin-blame.c |  119 ++++++++++++++++++++++++++++++++++++++++++++----------
 cache.h         |    1 +
 diff-lib.c      |   22 ++++++++++-
 diff.h          |    1 +
 ident.c         |    8 ++--
 5 files changed, 124 insertions(+), 27 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 3033e9b..a8668c0 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -333,9 +333,13 @@ static struct origin *find_origin(struct scoreboard *sb,
 	diff_tree_setup_paths(paths, &diff_opts);
 	if (diff_setup_done(&diff_opts) < 0)
 		die("diff-setup");
-	diff_tree_sha1(parent->tree->object.sha1,
-		       origin->commit->tree->object.sha1,
-		       "", &diff_opts);
+
+	if (is_null_sha1(origin->commit->object.sha1))
+		do_diff_cache(parent->tree->object.sha1, &diff_opts, 0);
+	else
+		diff_tree_sha1(parent->tree->object.sha1,
+			       origin->commit->tree->object.sha1,
+			       "", &diff_opts);
 	diffcore_std(&diff_opts);
 
 	/* It is either one entry that says "modified", or "created",
@@ -402,9 +406,13 @@ static struct origin *find_rename(struct scoreboard *sb,
 	diff_tree_setup_paths(paths, &diff_opts);
 	if (diff_setup_done(&diff_opts) < 0)
 		die("diff-setup");
-	diff_tree_sha1(parent->tree->object.sha1,
-		       origin->commit->tree->object.sha1,
-		       "", &diff_opts);
+
+	if (is_null_sha1(origin->commit->object.sha1))
+		do_diff_cache(parent->tree->object.sha1, &diff_opts, 0);
+	else
+		diff_tree_sha1(parent->tree->object.sha1,
+			       origin->commit->tree->object.sha1,
+			       "", &diff_opts);
 	diffcore_std(&diff_opts);
 
 	for (i = 0; i < diff_queued_diff.nr; i++) {
@@ -1047,9 +1055,12 @@ static int find_copy_in_parent(struct scoreboard *sb,
 	    (!porigin || strcmp(target->path, porigin->path)))
 		diff_opts.find_copies_harder = 1;
 
-	diff_tree_sha1(parent->tree->object.sha1,
-		       target->commit->tree->object.sha1,
-		       "", &diff_opts);
+	if (is_null_sha1(target->commit->object.sha1))
+		do_diff_cache(parent->tree->object.sha1, &diff_opts, 0);
+	else
+		diff_tree_sha1(parent->tree->object.sha1,
+			       target->commit->tree->object.sha1,
+			       "", &diff_opts);
 
 	if (!diff_opts.find_copies_harder)
 		diffcore_std(&diff_opts);
@@ -1910,6 +1921,64 @@ static int git_blame_config(const char *var, const char *value)
 	return git_default_config(var, value);
 }
 
+static struct commit *fake_working_tree_commit(const char *path)
+{
+	struct stat st;
+	struct commit *commit;
+	struct origin *origin;
+	unsigned char head_sha1[20];
+	char *buf;
+	const char *ident;
+	int fd;
+
+	if (lstat(path, &st) < 0)
+		die("Cannot lstat %s", path);
+	if (get_sha1("HEAD", head_sha1))
+		die("No such ref: HEAD");
+
+	commit = xcalloc(1, sizeof(*commit));
+	commit->parents = xcalloc(1, sizeof(*commit->parents));
+	commit->parents->item = lookup_commit_reference(head_sha1);
+	commit->object.parsed = 1;
+	commit->date = st.st_mtime;
+	commit->object.type = OBJ_COMMIT;
+
+	origin = make_origin(commit, path);
+	origin->file.ptr = buf = xmalloc(st.st_size+1);
+	origin->file.size = st.st_size;
+	buf[st.st_size] = 0;
+
+	switch (st.st_mode & S_IFMT) {
+	case S_IFREG:
+		fd = open(path, O_RDONLY);
+		if (fd < 0)
+			die("cannot open %s", path);
+		if (read_in_full(fd, buf, st.st_size) != st.st_size)
+			die("cannot read %s", path);
+		break;
+	case S_IFLNK:
+		if (readlink(path, buf, st.st_size+1) != st.st_size)
+			die("cannot readlink %s", path);
+		break;
+	default:
+		die("unsupported file type %s", path);
+	}
+	hash_sha1_file(buf, st.st_size, blob_type, origin->blob_sha1);
+	commit->util = origin;
+
+	commit->buffer = xmalloc(400);
+	ident = fmt_ident("Not Committed Yet", "not.committed.yet", NULL, 0);
+	sprintf(commit->buffer,
+		"tree 0000000000000000000000000000000000000000\n"
+		"parent %s\n"
+		"author %s\n"
+		"committer %s\n\n"
+		"Version of %s from the working tree",
+		sha1_to_hex(head_sha1),
+		ident, ident, path);
+	return commit;
+}
+
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
@@ -2087,7 +2156,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	argv[unk] = NULL;
 
 	init_revisions(&revs, NULL);
-	setup_revisions(unk, argv, &revs, "HEAD");
+	setup_revisions(unk, argv, &revs, NULL);
 	memset(&sb, 0, sizeof(sb));
 
 	/*
@@ -2114,15 +2183,10 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	if (!sb.final) {
 		/*
 		 * "--not A B -- path" without anything positive;
-		 * default to HEAD.
+		 * do not default to HEAD, but use the cache.
 		 */
-		unsigned char head_sha1[20];
-
-		final_commit_name = "HEAD";
-		if (get_sha1(final_commit_name, head_sha1))
-			die("No such ref: HEAD");
-		sb.final = lookup_commit_reference(head_sha1);
-		add_pending_object(&revs, &(sb.final->object), "HEAD");
+		sb.final = fake_working_tree_commit(path);
+		add_pending_object(&revs, &(sb.final->object), ":");
 	}
 
 	/*
@@ -2132,11 +2196,22 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	 */
 	prepare_revision_walk(&revs);
 
-	o = get_origin(&sb, sb.final, path);
-	if (fill_blob_sha1(o))
-		die("no such path %s in %s", path, final_commit_name);
+	if (is_null_sha1(sb.final->object.sha1)) {
+		char *buf;
+		o = sb.final->util;
+		buf = xmalloc(o->file.size + 1);
+		memcpy(buf, o->file.ptr, o->file.size + 1);
+		sb.final_buf = buf;
+		sb.final_buf_size = o->file.size;
+	}
+	else {
+		o = get_origin(&sb, sb.final, path);
+		if (fill_blob_sha1(o))
+			die("no such path %s in %s", path, final_commit_name);
 
-	sb.final_buf = read_sha1_file(o->blob_sha1, type, &sb.final_buf_size);
+		sb.final_buf = read_sha1_file(o->blob_sha1, type,
+					      &sb.final_buf_size);
+	}
 	num_read_blob++;
 	lno = prepare_lines(&sb);
 
diff --git a/cache.h b/cache.h
index 9873ee9..dcceea4 100644
--- a/cache.h
+++ b/cache.h
@@ -321,6 +321,7 @@ unsigned long approxidate(const char *);
 
 extern const char *git_author_info(int);
 extern const char *git_committer_info(int);
+extern const char *fmt_ident(const char *name, const char *email, const char *date_str, int);
 
 struct checkout {
 	const char *base_dir;
diff --git a/diff-lib.c b/diff-lib.c
index 2c9be60..b93f7a3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -271,7 +271,7 @@ static int diff_cache(struct rev_info *revs,
 				break;
 			}
 			/* Show difference between old and new */
-			show_modified(revs,ac[1], ce, 1,
+			show_modified(revs, ac[1], ce, 1,
 				      cached, match_missing);
 			break;
 		case 1:
@@ -372,3 +372,23 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diff_flush(&revs->diffopt);
 	return ret;
 }
+
+int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt, int cached)
+{
+	struct tree *tree;
+	struct rev_info revs;
+
+	init_revisions(&revs, NULL);
+	revs.prune_data = opt->paths;
+	discard_cache();
+	if (read_cache() < 0)
+		die("cannot read index");
+	mark_merge_entries();
+	tree = parse_tree_indirect(tree_sha1);
+	if (!tree)
+		die("bad tree object %s", sha1_to_hex(tree_sha1));
+	if (read_tree(tree, 1, opt->paths))
+		return error("unable to read tree %s", sha1_to_hex(tree_sha1));
+	return diff_cache(&revs, active_cache, active_nr, revs.prune_data,
+			  cached, 0);
+}
diff --git a/diff.h b/diff.h
index 7a347cf..dd180b8 100644
--- a/diff.h
+++ b/diff.h
@@ -222,6 +222,7 @@ extern int run_diff_files(struct rev_info *revs, int silent_on_removed);
 
 extern int run_diff_index(struct rev_info *revs, int cached);
 
+extern int do_diff_cache(const unsigned char *, struct diff_options *, int);
 extern int diff_flush_patch_id(struct diff_options *, unsigned char *);
 
 #endif /* DIFF_H */
diff --git a/ident.c b/ident.c
index a6fc7b5..bb03bdd 100644
--- a/ident.c
+++ b/ident.c
@@ -185,8 +185,8 @@ static const char *env_hint =
 "Add --global to set your account\'s default\n"
 "\n";
 
-static const char *get_ident(const char *name, const char *email,
-			     const char *date_str, int error_on_no_name)
+const char *fmt_ident(const char *name, const char *email,
+		      const char *date_str, int error_on_no_name)
 {
 	static char buffer[1000];
 	char date[50];
@@ -233,7 +233,7 @@ static const char *get_ident(const char *name, const char *email,
 
 const char *git_author_info(int error_on_no_name)
 {
-	return get_ident(getenv("GIT_AUTHOR_NAME"),
+	return fmt_ident(getenv("GIT_AUTHOR_NAME"),
 			 getenv("GIT_AUTHOR_EMAIL"),
 			 getenv("GIT_AUTHOR_DATE"),
 			 error_on_no_name);
@@ -241,7 +241,7 @@ const char *git_author_info(int error_on_no_name)
 
 const char *git_committer_info(int error_on_no_name)
 {
-	return get_ident(getenv("GIT_COMMITTER_NAME"),
+	return fmt_ident(getenv("GIT_COMMITTER_NAME"),
 			 getenv("GIT_COMMITTER_EMAIL"),
 			 getenv("GIT_COMMITTER_DATE"),
 			 error_on_no_name);
-- 
1.5.0.rc2.77.g1732a

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-30  9:22                   ` Junio C Hamano
@ 2007-01-30 15:31                     ` Shawn O. Pearce
  2007-01-30 17:02                     ` Linus Torvalds
  1 sibling, 0 replies; 92+ messages in thread
From: Shawn O. Pearce @ 2007-01-30 15:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <junkio@cox.net> wrote:
> This is a WIP to make "git blame" without any positive rev to
> start digging from the working tree copy, which is made into a
> fake commit whose sole parent is the HEAD.

I hate to be a stick in the mud, but including MERGE_HEAD as parents
of the virtual commit would also be nice.  Then you can get a blame
on conflicted files while in the middle of a merge and are working
on sorting the mess out.  :-)

Yea, MERGE_HEAD is really strictly for git-merge and git-commit,
but here its got some use too.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-30  9:22                   ` Junio C Hamano
  2007-01-30 15:31                     ` Shawn O. Pearce
@ 2007-01-30 17:02                     ` Linus Torvalds
  1 sibling, 0 replies; 92+ messages in thread
From: Linus Torvalds @ 2007-01-30 17:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Tue, 30 Jan 2007, Junio C Hamano wrote:
> 
> This is still very rough; the existing diff frontends are mess
> and making diff-cache and diff-tree behave more or less
> interchangeably is quite a pain.

Ahh. If it's that painful, then perhaps it's not worth it. It was just a 
"wouldn't it be nice" - I don't think anybody will really *depend* on this 
kind of politeness..

		Linus

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 16:24               ` Linus Torvalds
  2007-01-29 18:07                 ` Simon 'corecode' Schubert
  2007-01-29 19:29                 ` Theodore Tso
@ 2007-01-31  8:39                 ` David Kågedal
  2007-01-31 10:59                   ` David Kågedal
  2007-01-31 16:12                   ` Peter Eriksen
  2 siblings, 2 replies; 92+ messages in thread
From: David Kågedal @ 2007-01-31  8:39 UTC (permalink / raw)
  To: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

>    Now, in a plain-text pager thign (aka the traditional "git blame"), you 
>    don't have much choice. The blame data needs to be there, and you can't 
>    hide it, because if you do, there's no way to get at it. But things are 
>    different with an interactive graphical environment (or a textual one, 
>    for that matter: using some curses interface wouldn't change this 
>    argument).
>
>    You _could_ just make the primary thing be the actual file data, and 
>    the blame be "incidental". Which it really is.

Here is an emacs implementation of incremental git-blame.  When you
turn it on while viewing a file, the editor buffer will be updated by
setting the background of individual lines to a color that reflects
which commit it comes from.  And when you move around the buffer, a
one-line summary will be shown in the echo area.

To fix:  The colors are horrible, and work best if you normally
use a dark background.  You can see the list of colors near the top if
you want to change them.

If the file is changed from what is in HEAD, things will obviously be
screwed up.

Usage instructions:  Open a file and type M-x git-blame-mode

;;; git-blame.el

(defvar git-blame-colors
  '("black" "midnight blue" "medium blue" "steel blue"
    "gray2" "gray4" "gray6" "gray8" "gray10" "gray12" "gray14"
    "gray16" "gray18" "gray20" "gray22" "gray24" "gray26" "gray28" "gray30"
    "gray32" "gray34" "gray36" "gray38" "gray40" "gray42" "gray44" "gray46"
    "gray48" "gray56" "gray64" "gray72" "gray80" "gray88" "gray96"))
(defvar git-blame-ancient-color "dark green")

(defvar git-blame-overlays nil)
(defvar git-blame-cache nil)

(defvar git-blame-mode nil)
(make-variable-buffer-local 'git-blame-mode)
(push (list 'git-blame-mode " blame") minor-mode-alist)

(defun git-blame-mode (&optional arg)
  (interactive "P")
  (if arg
      (setq git-blame-mode (eq arg 1))
    (setq git-blame-mode (not git-blame-mode)))
  (make-local-variable 'git-blame-overlays)
  (make-local-variable 'git-blame-colors)
  (make-local-variable 'git-blame-cache)
  (setq git-blame-colors (default-value 'git-blame-colors))
  (if git-blame-mode
      (git-blame-run)
    (git-blame-cleanup)))

(defun git-blame-run ()
  (interactive)
  (let* ((display-buf (current-buffer))
         (blame-buf (get-buffer-create
                     (concat " git blame for " (buffer-name))))
         (proc (start-process "git-blame" blame-buf
                             "git" "blame" "--incremental"
                             (file-name-nondirectory buffer-file-name))))
    (mapcar 'delete-overlay git-blame-overlays)
    (setq git-blame-overlays nil)
    (setq git-blame-cache (make-hash-table))
    (with-current-buffer blame-buf
      (erase-buffer)
      (make-local-variable 'git-blame-file)
      (make-local-variable 'git-blame-current)
      (setq git-blame-file display-buf)
      (setq git-blame-current nil))
    (set-process-filter proc 'git-blame-filter)
    (set-process-sentinel proc 'git-blame-sentinel)))

(defun git-blame-cleanup ()
  "Remove all blame properties"
    (mapcar 'delete-overlay git-blame-overlays)
    (setq git-blame-overlays nil)
    (let ((modified (buffer-modified-p)))
      (remove-text-properties (point-min) (point-max) '(point-entered nil))
      (set-buffer-modified-p modified)))
    

(defun git-blame-sentinel (proc status)
  ;;(kill-buffer (process-buffer proc))
  )

(defvar in-blame-filter nil)

(defun git-blame-filter (proc str)
  (save-excursion
    (set-buffer (process-buffer proc))
    (goto-char (process-mark proc))
    (insert-before-markers str)
    (goto-char 0)
    (unless in-blame-filter
      (let ((more t)
            (in-blame-filter t))
        (while more
          (setq more (git-blame-parse)))))))

(defun git-blame-parse ()
  (cond ((looking-at "\\([0-9a-f]\\{40\\}\\) \\([0-9]+\\) \\([0-9]+\\) \\([0-9]+\\)\n")
         (let ((hash (match-string 1))
               (src-line (string-to-number (match-string 2)))
               (res-line (string-to-number (match-string 3)))
               (num-lines (string-to-number (match-string 4))))
           (setq git-blame-current
                 (git-blame-new-commit
                  hash src-line res-line num-lines)))
         (delete-region (point) (match-end 0))
         t)
        ((looking-at "filename \\(.+\\)\n")
         (let ((filename (match-string 1)))
           (git-blame-add-info "filename" filename))
         (delete-region (point) (match-end 0))
         t)
        ((looking-at "\\([a-z-]+\\) \\(.+\\)\n")
         (let ((key (match-string 1))
               (value (match-string 2)))
           (git-blame-add-info key value))
         (delete-region (point) (match-end 0))
         t)
        (t
         nil)))


(defun git-blame-new-commit (hash src-line res-line num-lines)
  (save-excursion
    (set-buffer git-blame-file)
    (let ((info (gethash hash git-blame-cache)))
      (when (not info)
        (let ((color (pop git-blame-colors)))
          (unless color
            (setq color git-blame-ancient-color))
          (setq info (list hash src-line res-line num-lines
                           (cons 'color color))))
        (puthash hash info git-blame-cache))
      (goto-line res-line)
      (while (> num-lines 0)
        (if (get-text-property (point) 'git-blame)
            (forward-line)
          (let* ((start (point))
                 (end (progn (forward-line 1) (point)))
                 (ovl (make-overlay start end)))
            (push ovl git-blame-overlays)
            (overlay-put ovl 'git-blame info)
            (overlay-put ovl 'help-echo hash)
            (overlay-put ovl 'face (list :background
                                         (cdr (assq 'color (cddddr info)))))
            ;;(overlay-put ovl 'point-entered
            ;;             `(lambda (x y) (git-blame-identify ,hash)))
            (let ((modified (buffer-modified-p)))
              (put-text-property (if (= start 1) start (1- start)) (1- end)
                                 'point-entered
                                 `(lambda (x y) (git-blame-identify ,hash)))
              (set-buffer-modified-p modified))))
        (setq num-lines (1- num-lines))))))

(defun git-blame-add-info (key value)
  (nconc git-blame-current (list (cons (intern key) value))))

(defun git-blame-current-commit ()
  (let ((info (get-char-property (point) 'git-blame)))
    (if info
        (car info)
      (error "No commit info"))))

(defun git-blame-identify (&optional hash)
  (interactive)
  (shell-command
   (format "git log -1 --pretty=oneline %s" (or hash
                                                (git-blame-current-commit)))))


-- 
David Kågedal

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-31  8:39                 ` David Kågedal
@ 2007-01-31 10:59                   ` David Kågedal
  2007-01-31 16:12                   ` Peter Eriksen
  1 sibling, 0 replies; 92+ messages in thread
From: David Kågedal @ 2007-01-31 10:59 UTC (permalink / raw)
  To: git

David Kågedal <davidk@lysator.liu.se> writes:

> Here is an emacs implementation of incremental git-blame.  When you
> turn it on while viewing a file, the editor buffer will be updated by
> setting the background of individual lines to a color that reflects
> which commit it comes from.  And when you move around the buffer, a
> one-line summary will be shown in the echo area.

I noticed that the output of "git blame --incremental" sometimes
has a line only containing the word "boundary".  This is not described
in the documentation.  The usage string for "git blame" mentions a -b
option, but that doesn't seem to change the output in this case.

Anyway, the version I posted was buggy.  This one seems to work
better:

;;; git-blame.el

(defvar git-blame-colors
  '("midnight blue" "medium blue" "steel blue"
    "gray2" "gray4" "gray6" "gray8" "gray10" "gray12" "gray14"
    "gray16" "gray18" "gray20" "gray22" "gray24" "gray26" "gray28" "gray30"
    "gray32" "gray34" "gray36" "gray38" "gray40" "gray42" "gray44" "gray46"
    "gray48" "gray56" "gray64" "gray72" "gray80" "gray88" "gray96"))
(defvar git-blame-ancient-color "dark green")

(defvar git-blame-overlays nil)
(defvar git-blame-cache nil)

(defvar git-blame-mode nil)
(make-variable-buffer-local 'git-blame-mode)
(push (list 'git-blame-mode " blame") minor-mode-alist)

(defun git-blame-mode (&optional arg)
  (interactive "P")
  (if arg
      (setq git-blame-mode (eq arg 1))
    (setq git-blame-mode (not git-blame-mode)))
  (make-local-variable 'git-blame-overlays)
  (make-local-variable 'git-blame-colors)
  (make-local-variable 'git-blame-cache)
  (setq git-blame-colors (default-value 'git-blame-colors))
  (if git-blame-mode
      (git-blame-run)
    (git-blame-cleanup)))

(defun git-blame-run ()
  (interactive)
  (let* ((display-buf (current-buffer))
         (blame-buf (get-buffer-create
                     (concat " git blame for " (buffer-name))))
         (proc (start-process "git-blame" blame-buf
                             "git" "blame" "-b" "--incremental"
                             (file-name-nondirectory buffer-file-name))))
    (mapcar 'delete-overlay git-blame-overlays)
    (setq git-blame-overlays nil)
    (setq git-blame-cache (make-hash-table :test 'equal))
    (with-current-buffer blame-buf
      (erase-buffer)
      (make-local-variable 'git-blame-file)
      (make-local-variable 'git-blame-current)
      (setq git-blame-file display-buf)
      (setq git-blame-current nil))
    (set-process-filter proc 'git-blame-filter)
    (set-process-sentinel proc 'git-blame-sentinel)))

(defun git-blame-cleanup ()
  "Remove all blame properties"
    (mapcar 'delete-overlay git-blame-overlays)
    (setq git-blame-overlays nil)
    (let ((modified (buffer-modified-p)))
      (remove-text-properties (point-min) (point-max) '(point-entered nil))
      (set-buffer-modified-p modified)))
    

(defun git-blame-sentinel (proc status)
  ;;(kill-buffer (process-buffer proc))
  (message "git blame finished"))

(defvar in-blame-filter nil)

(defun git-blame-filter (proc str)
  (save-excursion
    (set-buffer (process-buffer proc))
    (goto-char (process-mark proc))
    (insert-before-markers str)
    (goto-char 0)
    (unless in-blame-filter
      (let ((more t)
            (in-blame-filter t))
        (while more
          (setq more (git-blame-parse)))))))

(defun git-blame-parse ()
  (cond ((looking-at "\\([0-9a-f]\\{40\\}\\) \\([0-9]+\\) \\([0-9]+\\) \\([0-9]+\\)\n")
         (let ((hash (match-string 1))
               (src-line (string-to-number (match-string 2)))
               (res-line (string-to-number (match-string 3)))
               (num-lines (string-to-number (match-string 4))))
           (setq git-blame-current
                 (git-blame-new-commit
                  hash src-line res-line num-lines)))
         (delete-region (point) (match-end 0))
         t)
        ((looking-at "filename \\(.+\\)\n")
         (let ((filename (match-string 1)))
           (git-blame-add-info "filename" filename))
         (delete-region (point) (match-end 0))
         t)
        ((looking-at "\\([a-z-]+\\) \\(.+\\)\n")
         (let ((key (match-string 1))
               (value (match-string 2)))
           (git-blame-add-info key value))
         (delete-region (point) (match-end 0))
         t)
        ((looking-at "boundary\n")
         (setq git-blame-current nil)
         (delete-region (point) (match-end 0))
         t)
        (t
         nil)))


(defun git-blame-new-commit (hash src-line res-line num-lines)
  (save-excursion
    (set-buffer git-blame-file)
    (let ((info (gethash hash git-blame-cache)))
      (when (not info)
        (let ((color (pop git-blame-colors)))
          (unless color
            (setq color git-blame-ancient-color))
          (setq info (list hash src-line res-line num-lines
                           (cons 'color color))))
        (puthash hash info git-blame-cache))
      (goto-line res-line)
      (while (> num-lines 0)
        (if (get-text-property (point) 'git-blame)
            (forward-line)
          (let* ((start (point))
                 (end (progn (forward-line 1) (point)))
                 (ovl (make-overlay start end)))
            (push ovl git-blame-overlays)
            (overlay-put ovl 'git-blame info)
            (overlay-put ovl 'help-echo hash)
            (overlay-put ovl 'face (list :background
                                         (cdr (assq 'color (cddddr info)))))
            ;;(overlay-put ovl 'point-entered
            ;;             `(lambda (x y) (git-blame-identify ,hash)))
            (let ((modified (buffer-modified-p)))
              (put-text-property (if (= start 1) start (1- start)) (1- end)
                                 'point-entered
                                 `(lambda (x y) (git-blame-identify ,hash)))
              (set-buffer-modified-p modified))))
        (setq num-lines (1- num-lines))))))

(defun git-blame-add-info (key value)
  (if git-blame-current
      (nconc git-blame-current (list (cons (intern key) value)))))

(defun git-blame-current-commit ()
  (let ((info (get-char-property (point) 'git-blame)))
    (if info
        (car info)
      (error "No commit info"))))

(defun git-blame-identify (&optional hash)
  (interactive)
  (shell-command
   (format "git log -1 --pretty=oneline %s" (or hash
                                                (git-blame-current-commit)))))

-- 
David Kågedal

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-31  8:39                 ` David Kågedal
  2007-01-31 10:59                   ` David Kågedal
@ 2007-01-31 16:12                   ` Peter Eriksen
  2007-01-31 17:04                     ` David Kågedal
  1 sibling, 1 reply; 92+ messages in thread
From: Peter Eriksen @ 2007-01-31 16:12 UTC (permalink / raw)
  To: git

David Kågedal <davidk@lysator.liu.se> writes:

> Usage instructions:  Open a file and type M-x git-blame-mode
> 
> ;;; git-blame.el

I saved the elisp code in a file .emacs.d/git-blame.el, and loaded it
with M-x load-file.  Then I visited git/cache.h, and typed M-x
git-blame-mode, but the background colours did not change.  What did I
forget to do?

Regards,

Peter

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-31 16:12                   ` Peter Eriksen
@ 2007-01-31 17:04                     ` David Kågedal
  2007-01-31 17:12                       ` Peter Eriksen
  0 siblings, 1 reply; 92+ messages in thread
From: David Kågedal @ 2007-01-31 17:04 UTC (permalink / raw)
  To: git

Peter Eriksen <s022018@student.dtu.dk> writes:

> David Kågedal <davidk@lysator.liu.se> writes:
>
>> Usage instructions:  Open a file and type M-x git-blame-mode
>> 
>> ;;; git-blame.el
>
> I saved the elisp code in a file .emacs.d/git-blame.el, and loaded it
> with M-x load-file.  Then I visited git/cache.h, and typed M-x
> git-blame-mode, but the background colours did not change.  What did I
> forget to do?

Probably you forgot to use the latest version :-)

See my mail with the subject line "git-blame.el".

-- 
David Kågedal

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-31 17:04                     ` David Kågedal
@ 2007-01-31 17:12                       ` Peter Eriksen
  2007-01-31 17:35                         ` Jakub Narebski
  2007-01-31 20:59                         ` David Kågedal
  0 siblings, 2 replies; 92+ messages in thread
From: Peter Eriksen @ 2007-01-31 17:12 UTC (permalink / raw)
  To: git

David Kågedal <davidk@lysator.liu.se> writes:

> Peter Eriksen <s022018@student.dtu.dk> writes:
> 
> > David Kågedal <davidk@lysator.liu.se> writes:
> >
> >> Usage instructions:  Open a file and type M-x git-blame-mode
> >> 
> >> ;;; git-blame.el
> >
> > I saved the elisp code in a file .emacs.d/git-blame.el, and loaded it
> > with M-x load-file.  Then I visited git/cache.h, and typed M-x
> > git-blame-mode, but the background colours did not change.  What did I
> > forget to do?
> 
> Probably you forgot to use the latest version :-)
> 
> See my mail with the subject line "git-blame.el".

I saw that mail just after I responded.  The newest version does not
work either, that is, it does not work in the same way, as the old
version.  Closing Emacs I can see, that Emacs did fork of "git blame"
processes.  So it is just the colours, I cannot see.

Peter

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-31 17:12                       ` Peter Eriksen
@ 2007-01-31 17:35                         ` Jakub Narebski
  2007-01-31 20:59                         ` David Kågedal
  1 sibling, 0 replies; 92+ messages in thread
From: Jakub Narebski @ 2007-01-31 17:35 UTC (permalink / raw)
  To: git

Peter Eriksen wrote:
> David K?gedal <davidk@lysator.liu.se> writes:
>> Peter Eriksen <s022018@student.dtu.dk> writes:
>> 
>>> David K?gedal <davidk@lysator.liu.se> writes:
>>>
>>>> Usage instructions:  Open a file and type M-x git-blame-mode
>>>> 
>>>> ;;; git-blame.el
>>>
>>> I saved the elisp code in a file .emacs.d/git-blame.el, and loaded it
>>> with M-x load-file.  Then I visited git/cache.h, and typed M-x
>>> git-blame-mode, but the background colours did not change.  What did I
>>> forget to do?

I always used M-x load-library, not M-x load-file...
 
>> Probably you forgot to use the latest version :-)
>> 
>> See my mail with the subject line "git-blame.el".
> 
> I saw that mail just after I responded.  The newest version does not
> work either, that is, it does not work in the same way, as the old
> version.  Closing Emacs I can see, that Emacs did fork of "git blame"
> processes.  So it is just the colours, I cannot see.

Do you use new enough version of git, one which has git-blame --incremental?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-31 17:12                       ` Peter Eriksen
  2007-01-31 17:35                         ` Jakub Narebski
@ 2007-01-31 20:59                         ` David Kågedal
  1 sibling, 0 replies; 92+ messages in thread
From: David Kågedal @ 2007-01-31 20:59 UTC (permalink / raw)
  To: git

Peter Eriksen <s022018@student.dtu.dk> writes:

> David Kågedal <davidk@lysator.liu.se> writes:
>
>> Peter Eriksen <s022018@student.dtu.dk> writes:
>> 
>> > David Kågedal <davidk@lysator.liu.se> writes:
>> >
>> >> Usage instructions:  Open a file and type M-x git-blame-mode
>> >> 
>> >> ;;; git-blame.el
>> >
>> > I saved the elisp code in a file .emacs.d/git-blame.el, and loaded it
>> > with M-x load-file.  Then I visited git/cache.h, and typed M-x
>> > git-blame-mode, but the background colours did not change.  What did I
>> > forget to do?
>> 
>> Probably you forgot to use the latest version :-)
>> 
>> See my mail with the subject line "git-blame.el".
>
> I saw that mail just after I responded.  The newest version does not
> work either, that is, it does not work in the same way, as the old
> version.  Closing Emacs I can see, that Emacs did fork of "git blame"
> processes.  So it is just the colours, I cannot see.

I think it requires GNU Emacs 21.  If you'are using Emacs 20, try
changing this:

            (overlay-put ovl 'face (list :background
                                         (cdr (assq 'color (cddddr info)))))

to

            (overlay-put ovl 'face (cons 'background-color
                                         (cdr (assq 'color (cddddr info)))))


-- 
David Kågedal

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: More precise tag following
  2007-01-29 19:29                 ` Theodore Tso
  2007-01-29 19:45                   ` Linus Torvalds
  2007-01-29 20:25                   ` Jakub Narebski
@ 2007-02-09  7:41                   ` Shawn O. Pearce
  2 siblings, 0 replies; 92+ messages in thread
From: Shawn O. Pearce @ 2007-02-09  7:41 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Linus Torvalds, Junio C Hamano, git

Theodore Tso <tytso@mit.edu> wrote:
> On Mon, Jan 29, 2007 at 08:24:52AM -0800, Linus Torvalds wrote:
> > Anyway, all of these issues makes me suspect that the proper blame 
> > interface is to basically *hide* the blame almost entirely, in order to 
> > make the important parts much more visible, and in order to encourage 
> > people to start looking for the piece of code that they are actually 
> > interested in.
> 
> One approach which might work is where you hover your mouse over a
> line, and it pops up a tiny window with the blame information if the
> mouse remains stationary for more than a second or two.
> 
> Another thing which would be really useful is where the lines that
> have been changed in the last n commits (where n is probably between
> 3-5) are highlighted using different colors.  That way you can see
> what was changed recently, which is often what you are most interested
> in.  (As in, what changed recently that might have caused this file to
> get all screwed up?)

In case you are interested, I've tweaked the display of annotation
data in git-gui.  The latest version lets you run blame right from
the command line:

	git-gui blame master revision.c

Clicking on a line colors that line and all lines which are blamed
on the same commit as yellow; the commit before it (ancestor)
is colored blue and the commit after it (descendant) is colored red.

The bottom part of the window is used to show the commit SHA-1,
author, committer and complete log message.  I have not yet added
navigating to the prior commit, or support for the new --contents
flag.

Because the data is now in the bottom pane (and not in columns),
the display is about 90 characters wide, instead of 180 or whatever
insane value it was.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2007-02-09  7:41 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-26 11:07 More precise tag following Junio C Hamano
2007-01-26 11:53 ` Junio C Hamano
2007-01-27  8:01 ` Shawn O. Pearce
2007-01-27  8:41   ` Junio C Hamano
2007-01-27 13:33     ` Jeff King
2007-01-27 17:47     ` Nicolas Pitre
2007-01-27  9:04   ` Simon 'corecode' Schubert
2007-01-27 12:58   ` Johannes Schindelin
2007-01-27 13:50     ` Simon 'corecode' Schubert
2007-01-27 16:30       ` Jakub Narebski
2007-01-27 17:36         ` Linus Torvalds
2007-01-27 16:46       ` Johannes Schindelin
2007-01-27 17:12         ` Simon 'corecode' Schubert
2007-01-27 19:13           ` Johannes Schindelin
2007-01-27 19:55             ` Simon 'corecode' Schubert
2007-01-27 19:41           ` Nicolas Pitre
2007-01-27 17:22   ` Linus Torvalds
2007-01-27 17:56     ` Linus Torvalds
2007-01-27 22:00       ` Junio C Hamano
2007-01-27 22:54         ` Linus Torvalds
2007-01-28  9:27           ` Junio C Hamano
2007-01-28  9:44             ` [PATCH] git-blame --porcelain: quote filename in c-style when needed Junio C Hamano
2007-01-28 14:25             ` [PATCH] git-blame --incremental: don't use pager René Scharfe
2007-01-28 19:09               ` Junio C Hamano
2007-01-28 19:14                 ` Junio C Hamano
2007-01-29  0:32                   ` René Scharfe
2007-01-29  2:35                     ` [PATCH] git blame --progress Junio C Hamano
2007-01-29  7:00                       ` Simon 'corecode' Schubert
2007-01-29 16:54                       ` Alex Riesen
2007-01-29 18:12                       ` Matthias Lederhofer
2007-01-29 19:06                         ` Junio C Hamano
2007-01-29 19:59                       ` René Scharfe
2007-01-29 20:24                         ` Linus Torvalds
2007-01-30  1:53                         ` Junio C Hamano
2007-01-28 19:08             ` More precise tag following Linus Torvalds
2007-01-28 19:18               ` Junio C Hamano
2007-01-28 19:57                 ` Linus Torvalds
2007-01-28 20:01                   ` Junio C Hamano
2007-01-28 20:20                   ` [PATCH] document 'blame --incremental' Junio C Hamano
2007-01-28 21:06                   ` More precise tag following Junio C Hamano
2007-01-28 23:01                     ` Jeff King
2007-01-30  9:22                   ` Junio C Hamano
2007-01-30 15:31                     ` Shawn O. Pearce
2007-01-30 17:02                     ` Linus Torvalds
2007-01-28 19:58                 ` Junio C Hamano
2007-01-29  6:18             ` Shawn O. Pearce
2007-01-29 10:17               ` Junio C Hamano
2007-01-29 10:31                 ` Shawn O. Pearce
2007-01-29 16:24               ` Linus Torvalds
2007-01-29 18:07                 ` Simon 'corecode' Schubert
2007-01-29 19:29                 ` Theodore Tso
2007-01-29 19:45                   ` Linus Torvalds
2007-01-29 20:25                   ` Jakub Narebski
2007-01-29 20:47                     ` Shawn O. Pearce
2007-01-29 21:02                       ` Jakub Narebski
2007-02-09  7:41                   ` Shawn O. Pearce
2007-01-31  8:39                 ` David Kågedal
2007-01-31 10:59                   ` David Kågedal
2007-01-31 16:12                   ` Peter Eriksen
2007-01-31 17:04                     ` David Kågedal
2007-01-31 17:12                       ` Peter Eriksen
2007-01-31 17:35                         ` Jakub Narebski
2007-01-31 20:59                         ` David Kågedal
2007-01-27 18:40     ` Simon 'corecode' Schubert
2007-01-27 19:02       ` Johannes Schindelin
2007-01-27 19:12         ` Simon 'corecode' Schubert
2007-01-27 19:19           ` Johannes Schindelin
2007-01-27 19:59           ` Jakub Narebski
2007-01-27 19:15       ` Linus Torvalds
2007-01-27 19:25         ` Linus Torvalds
2007-01-27 19:54           ` Jakub Narebski
2007-01-27 20:13             ` Linus Torvalds
2007-01-27 19:36         ` Chris Lee
2007-01-28 18:10           ` Theodore Tso
2007-01-28 18:27             ` Linus Torvalds
2007-01-28 22:26           ` David Lang
2007-01-29 17:34             ` Nicolas Pitre
2007-01-29 17:42               ` Linus Torvalds
2007-01-29 17:58                 ` Nicolas Pitre
2007-01-29 19:16                   ` Chris Lee
2007-01-29 23:00           ` Eric Wong
2007-01-30  0:42             ` Eric Wong
2007-01-30  0:48               ` Eric Wong
2007-01-30  8:51               ` Eric Wong
2007-01-27 18:52     ` Jakub Narebski
2007-01-27 20:16     ` Jeff King
2007-01-27 22:39       ` Linus Torvalds
2007-01-27 23:52         ` Jeff King
2007-01-28  2:39           ` Theodore Tso
2007-01-28  3:17             ` Randal L. Schwartz
2007-01-28 13:15             ` Jeff King
2007-01-28  7:40     ` Shawn O. Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.