All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6] Add new git-related helper to contrib
@ 2013-05-19 15:53 Felipe Contreras
  2013-05-20  1:47 ` Eric Sunshine
  2013-05-22 19:23 ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Felipe Contreras @ 2013-05-19 15:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ramkumar Ramachandra, Duy Nguyen, Felipe Contreras

This script find people that might be interested in a patch, by going
back through the history for each single hunk modified, and finding
people that reviewed, acknowledge, signed, or authored the code the
patch is modifying.

It does this by running 'git blame' incrementally on each hunk, and then
parsing the commit message. After gathering all the relevant people, it
groups them to show what exactly was their role when the participated in
the development of the relevant commit, and on how many relevant commits
they participated. They are only displayed if they pass a minimum
threshold of participation.

For example:

  % git related 0001-remote-hg-trivial-cleanups.patch
  Felipe Contreras <felipe.contreras@gmail.com>
  Jeff King <peff@peff.net>
  Max Horn <max@quendi.de>
  Junio C Hamano <gitster@pobox.com>

Thus it can be used for 'git send-email' as a cc-cmd.

There might be some other related functions to this script, not just to
be used as a cc-cmd.

Comments-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---

Sames as v5, with a few tiny modifications. I'm tired of sending the whole
series multiple times over the years, only to get stuck at the first patch, so
I'll send only the first one.

 contrib/related/git-related | 124 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)
 create mode 100755 contrib/related/git-related

diff --git a/contrib/related/git-related b/contrib/related/git-related
new file mode 100755
index 0000000..b96dcdd
--- /dev/null
+++ b/contrib/related/git-related
@@ -0,0 +1,124 @@
+#!/usr/bin/env ruby
+
+# This script finds people that might be interested in a patch
+# usage: git related <file>
+
+$since = '5-years-ago'
+$min_percent = 10
+
+def fmt_person(name, email)
+  '%s <%s>' % [name, email]
+end
+
+class Commit
+
+  attr_reader :persons
+
+  def initialize(id)
+    @id = id
+    @persons = []
+  end
+
+  def parse(data)
+    msg = nil
+    data.each_line do |line|
+      if not msg
+        case line
+        when /^author ([^<>]+) <(\S+)> (.+)$/
+          @persons << fmt_person($1, $2)
+        when /^$/
+          msg = true
+        end
+      else
+        if line =~ /^(Signed-off|Reviewed|Acked)-by: ([^<>]+) <(\S+?)>$/
+          @persons << fmt_person($2, $3)
+        end
+      end
+    end
+    @persons.uniq!
+  end
+
+end
+
+class Commits
+
+  def initialize
+    @items = {}
+  end
+
+  def size
+    @items.size
+  end
+
+  def each(&block)
+    @items.each(&block)
+  end
+
+  def import
+    return if @items.empty?
+    File.popen(%w[git cat-file --batch], 'r+') do |p|
+      p.write(@items.keys.join("\n"))
+      p.close_write
+      p.each do |line|
+        if line =~ /^(\h{40}) commit (\d+)/
+          id, len = $1, $2
+          data = p.read($2.to_i)
+          @items[id].parse(data)
+        end
+      end
+    end
+  end
+
+  def get_blame(source, start, len, from)
+    return if len == 0
+    len ||= 1
+    File.popen(['git', 'blame', '--incremental', '-CCC',
+               '-L', '%u,+%u' % [start, len],
+               '--since', $since, from + '^',
+               '--', source]) do |p|
+      p.each do |line|
+        if line =~ /^(\h{40})/
+          id = $&
+          @items[id] = Commit.new(id)
+        end
+      end
+    end
+  end
+
+  def from_patch(file)
+    from = source = nil
+    File.open(file) do |f|
+      f.each do |line|
+        case line
+        when /^From (\h+) (.+)$/
+          from = $1
+        when /^---\s+(\S+)/
+          source = $1 != '/dev/null' ? $1[2..-1] : nil
+        when /^@@ -(\d+)(?:,(\d+))?/
+          get_blame(source, $1, $2, from)
+        end
+      end
+    end
+  end
+
+end
+
+exit 1 if ARGV.size != 1
+
+commits = Commits.new
+commits.from_patch(ARGV[0])
+commits.import
+
+count_per_person = Hash.new(0)
+
+commits.each do |id, commit|
+  commit.persons.each do |person|
+    count_per_person[person] += 1
+  end
+end
+
+count_per_person.each do |person, count|
+  percent = count.to_f * 100 / commits.size
+  next if percent < $min_percent
+  puts person
+end
-- 
1.8.3.rc3.286.g3d43083

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-19 15:53 [PATCH v6] Add new git-related helper to contrib Felipe Contreras
@ 2013-05-20  1:47 ` Eric Sunshine
  2013-05-22 19:23 ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sunshine @ 2013-05-20  1:47 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Git List, Junio C Hamano, Ramkumar Ramachandra, Duy Nguyen

On Sun, May 19, 2013 at 11:53 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> This script find people that might be interested in a patch, by going

s/find/finds/

> back through the history for each single hunk modified, and finding
> people that reviewed, acknowledge, signed, or authored the code the
> patch is modifying.
>
> It does this by running 'git blame' incrementally on each hunk, and then
> parsing the commit message. After gathering all the relevant people, it
> groups them to show what exactly was their role when the participated in
> the development of the relevant commit, and on how many relevant commits
> they participated. They are only displayed if they pass a minimum
> threshold of participation.
>
> For example:
>
>   % git related 0001-remote-hg-trivial-cleanups.patch
>   Felipe Contreras <felipe.contreras@gmail.com>
>   Jeff King <peff@peff.net>
>   Max Horn <max@quendi.de>
>   Junio C Hamano <gitster@pobox.com>
>
> Thus it can be used for 'git send-email' as a cc-cmd.
>
> There might be some other related functions to this script, not just to
> be used as a cc-cmd.
>
> Comments-by: Ramkumar Ramachandra <artagnon@gmail.com>
> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-19 15:53 [PATCH v6] Add new git-related helper to contrib Felipe Contreras
  2013-05-20  1:47 ` Eric Sunshine
@ 2013-05-22 19:23 ` Junio C Hamano
  2013-05-22 19:54   ` Junio C Hamano
  2013-05-22 22:23   ` Felipe Contreras
  1 sibling, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-22 19:23 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

> diff --git a/contrib/related/git-related b/contrib/related/git-related
> new file mode 100755
> index 0000000..b96dcdd
> --- /dev/null
> +++ b/contrib/related/git-related
> @@ -0,0 +1,124 @@
> +#!/usr/bin/env ruby
> +
> +# This script finds people that might be interested in a patch
> +# usage: git related <file>
> +
> +$since = '5-years-ago'
> +$min_percent = 10
> +
> +def fmt_person(name, email)
> +  '%s <%s>' % [name, email]
> +end

Micronit.  I suspect you do not need this helper, unless later
patches start using it.

> +  def import
> +    return if @items.empty?
> +    File.popen(%w[git cat-file --batch], 'r+') do |p|
> +      p.write(@items.keys.join("\n"))
> +      p.close_write
> +      p.each do |line|
> +        if line =~ /^(\h{40}) commit (\d+)/
> +          id, len = $1, $2
> +          data = p.read($2.to_i)
> +          @items[id].parse(data)
> +        end
> +      end
> +    end
> +  end
> +
> +  def get_blame(source, start, len, from)
> +    return if len == 0
> +    len ||= 1
> +    File.popen(['git', 'blame', '--incremental', '-CCC',

I am torn on the hardcoded use of "-CCC" here.

Depending on the nature of the change in question, it may match well
or worse to what you are trying to find out.  When you are trying to
say "What were you smoking when you implemented this broken logic?",
using -C may be good, but when your question is "Even though all the
callers of this function live in that other file, somebody moved
this function that used to be file static in that file to here and
made it public. Why?", you do not want to use -C.

I am reasonably sure that in the finished code later in the series
it will become configurable, but a fallback default is better to be
not so expensive one.

> +               '-L', '%u,+%u' % [start, len],
> +               '--since', $since, from + '^',

Is "from" unconditionally set?

Perhaps that nil + '^' magically disappear and this code is relying
on that, but it smells like a too much magic to me.

> +               '--', source]) do |p|
> +      p.each do |line|
> +        if line =~ /^(\h{40})/
> +          id = $&
> +          @items[id] = Commit.new(id)
> +        end
> +      end
> +    end
> +  end
> +
> +  def from_patch(file)
> +    from = source = nil
> +    File.open(file) do |f|
> +      f.each do |line|
> +        case line
> +        when /^From (\h+) (.+)$/
> +          from = $1
> +        when /^---\s+(\S+)/
> +          source = $1 != '/dev/null' ? $1[2..-1] : nil
> +        when /^@@ -(\d+)(?:,(\d+))?/
> +          get_blame(source, $1, $2, from)
> +        end

Makes sense to start from the preimage so that you can find out who
wrote the original block of lines your patch is removing.

But then if source is /dev/null, wouldn't you be able to stop
without running blame at all?  You know the patch is creating a new
file at that point and there is nobody to point a finger at.

> +      end
> +    end
> +  end
> +
> +end
> +
> +exit 1 if ARGV.size != 1
> +
> +commits = Commits.new
> +commits.from_patch(ARGV[0])
> +commits.import
> +
> +count_per_person = Hash.new(0)
> +
> +commits.each do |id, commit|
> +  commit.persons.each do |person|
> +    count_per_person[person] += 1
> +  end
> +end
> +
> +count_per_person.each do |person, count|
> +  percent = count.to_f * 100 / commits.size
> +  next if percent < $min_percent
> +  puts person
> +end

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 19:23 ` Junio C Hamano
@ 2013-05-22 19:54   ` Junio C Hamano
  2013-05-22 22:23   ` Felipe Contreras
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-22 19:54 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Junio C Hamano <gitster@pobox.com> writes:

> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>> diff --git a/contrib/related/git-related b/contrib/related/git-related
>> new file mode 100755
>> index 0000000..b96dcdd
>> --- /dev/null
>> +++ b/contrib/related/git-related
>> @@ -0,0 +1,124 @@
>> +#!/usr/bin/env ruby
>> +
>> +# This script finds people that might be interested in a patch
>> +# usage: git related <file>
>> +
>> +$since = '5-years-ago'
>> +$min_percent = 10
>> +
>> +def fmt_person(name, email)
>> +  '%s <%s>' % [name, email]
>> +end
>
> Micronit.  I suspect you do not need this helper, unless later
> patches start using it.

Not that matters terribly, but "later patches start using it in a
different way" may be needed as a clarification.

The current two callers capture both "%s <%s>" as two separate
groups but they do not need to; they can do "(%s <%s>)" and pass $1
to this function, I think.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 19:23 ` Junio C Hamano
  2013-05-22 19:54   ` Junio C Hamano
@ 2013-05-22 22:23   ` Felipe Contreras
  2013-05-22 22:38     ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-22 22:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 2:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>> diff --git a/contrib/related/git-related b/contrib/related/git-related
>> new file mode 100755
>> index 0000000..b96dcdd
>> --- /dev/null
>> +++ b/contrib/related/git-related
>> @@ -0,0 +1,124 @@
>> +#!/usr/bin/env ruby
>> +
>> +# This script finds people that might be interested in a patch
>> +# usage: git related <file>
>> +
>> +$since = '5-years-ago'
>> +$min_percent = 10
>> +
>> +def fmt_person(name, email)
>> +  '%s <%s>' % [name, email]
>> +end
>
> Micronit.  I suspect you do not need this helper, unless later
> patches start using it.

It's not *needed*, but it makes if fulfills the role of a function: to
avoid typing that code in multiple places.

>> +  def import
>> +    return if @items.empty?
>> +    File.popen(%w[git cat-file --batch], 'r+') do |p|
>> +      p.write(@items.keys.join("\n"))
>> +      p.close_write
>> +      p.each do |line|
>> +        if line =~ /^(\h{40}) commit (\d+)/
>> +          id, len = $1, $2
>> +          data = p.read($2.to_i)
>> +          @items[id].parse(data)
>> +        end
>> +      end
>> +    end
>> +  end
>> +
>> +  def get_blame(source, start, len, from)
>> +    return if len == 0
>> +    len ||= 1
>> +    File.popen(['git', 'blame', '--incremental', '-CCC',
>
> I am torn on the hardcoded use of "-CCC" here.
>
> Depending on the nature of the change in question, it may match well
> or worse to what you are trying to find out.  When you are trying to
> say "What were you smoking when you implemented this broken logic?",
> using -C may be good, but when your question is "Even though all the
> callers of this function live in that other file, somebody moved
> this function that used to be file static in that file to here and
> made it public. Why?", you do not want to use -C.
>
> I am reasonably sure that in the finished code later in the series
> it will become configurable, but a fallback default is better to be
> not so expensive one.

The script's purpose is to find related commits, -CCC does that, does it not?

>> +               '-L', '%u,+%u' % [start, len],
>> +               '--since', $since, from + '^',
>
> Is "from" unconditionally set?
>
> Perhaps that nil + '^' magically disappear and this code is relying
> on that, but it smells like a too much magic to me.

I personally don't care. You decide what's the behavior when no 'From
' line is available in the patch. I don't see the point in worrying
about non-git patches.

>> +               '--', source]) do |p|
>> +      p.each do |line|
>> +        if line =~ /^(\h{40})/
>> +          id = $&
>> +          @items[id] = Commit.new(id)
>> +        end
>> +      end
>> +    end
>> +  end
>> +
>> +  def from_patch(file)
>> +    from = source = nil
>> +    File.open(file) do |f|
>> +      f.each do |line|
>> +        case line
>> +        when /^From (\h+) (.+)$/
>> +          from = $1
>> +        when /^---\s+(\S+)/
>> +          source = $1 != '/dev/null' ? $1[2..-1] : nil
>> +        when /^@@ -(\d+)(?:,(\d+))?/
>> +          get_blame(source, $1, $2, from)
>> +        end
>
> Makes sense to start from the preimage so that you can find out who
> wrote the original block of lines your patch is removing.
>
> But then if source is /dev/null, wouldn't you be able to stop
> without running blame at all?  You know the patch is creating a new
> file at that point and there is nobody to point a finger at.

A patch can touch multiple files.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 22:23   ` Felipe Contreras
@ 2013-05-22 22:38     ` Junio C Hamano
  2013-05-22 22:43       ` Felipe Contreras
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2013-05-22 22:38 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

>> Depending on the nature of the change in question, it may match well
>> or worse to what you are trying to find out.  When you are trying to
>> say "What were you smoking when you implemented this broken logic?",
>> using -C may be good, but when your question is "Even though all the
>> callers of this function live in that other file, somebody moved
>> this function that used to be file static in that file to here and
>> made it public. Why?", you do not want to use -C.
>>
>> I am reasonably sure that in the finished code later in the series
>> it will become configurable, but a fallback default is better to be
>> not so expensive one.
>
> The script's purpose is to find related commits, -CCC does that, does it not?

As I already said in the above, the answer is no, when you are
trying to find who moved the code from the original place.

>> Makes sense to start from the preimage so that you can find out who
>> wrote the original block of lines your patch is removing.
>>
>> But then if source is /dev/null, wouldn't you be able to stop
>> without running blame at all?  You know the patch is creating a new
>> file at that point and there is nobody to point a finger at.
>
> A patch can touch multiple files.

So?

What line range would you be feeding "blame" with, for such a file
creation event?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 22:38     ` Junio C Hamano
@ 2013-05-22 22:43       ` Felipe Contreras
  2013-05-22 22:53         ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-22 22:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>>> Depending on the nature of the change in question, it may match well
>>> or worse to what you are trying to find out.  When you are trying to
>>> say "What were you smoking when you implemented this broken logic?",
>>> using -C may be good, but when your question is "Even though all the
>>> callers of this function live in that other file, somebody moved
>>> this function that used to be file static in that file to here and
>>> made it public. Why?", you do not want to use -C.
>>>
>>> I am reasonably sure that in the finished code later in the series
>>> it will become configurable, but a fallback default is better to be
>>> not so expensive one.
>>
>> The script's purpose is to find related commits, -CCC does that, does it not?
>
> As I already said in the above, the answer is no, when you are
> trying to find who moved the code from the original place.

But I'm not trying to find who moved the code, I'm trying to find
related commits; hence the name 'git related'.

The person who moved the code will be on the list regardless, so 'git
related' does find the person who moved the code indirectly; by
finding the persons that touched the code.

>>> Makes sense to start from the preimage so that you can find out who
>>> wrote the original block of lines your patch is removing.
>>>
>>> But then if source is /dev/null, wouldn't you be able to stop
>>> without running blame at all?  You know the patch is creating a new
>>> file at that point and there is nobody to point a finger at.
>>
>> A patch can touch multiple files.
>
> So?
>
> What line range would you be feeding "blame" with, for such a file
> creation event?

I thought it was skipping git blame in such cases, perhaps it got
dropped in one of the other countless versions of this script.

Good catch.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 22:43       ` Felipe Contreras
@ 2013-05-22 22:53         ` Junio C Hamano
  2013-05-22 22:58           ` Junio C Hamano
  2013-05-22 23:19           ` Felipe Contreras
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-22 22:53 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

> On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
> ...
>> As I already said in the above, the answer is no, when you are
>> trying to find who moved the code from the original place.
>
> But I'm not trying to find who moved the code, I'm trying to find
> related commits; hence the name 'git related'.
>
> The person who moved the code will be on the list regardless,

That is exactly the point I have been trying to raise.  Does the
person appear in the list when you run blame with -CCC?  You ask for
the body of the function, and the -C mode of blame sees through the
block-of-line movement across file boundaries, and goes straight to
the last commit that touched the body of the function in its original
file, no?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 22:53         ` Junio C Hamano
@ 2013-05-22 22:58           ` Junio C Hamano
  2013-05-22 23:42             ` Junio C Hamano
  2013-05-22 23:19           ` Felipe Contreras
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2013-05-22 22:58 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Junio C Hamano <gitster@pobox.com> writes:

> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>> On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>>> As I already said in the above, the answer is no, when you are
>>> trying to find who moved the code from the original place.
>>
>> But I'm not trying to find who moved the code, I'm trying to find
>> related commits; hence the name 'git related'.
>>
>> The person who moved the code will be on the list regardless,
>
> That is exactly the point I have been trying to raise.  Does the
> person appear in the list when you run blame with -CCC?  You ask for

s/person/commit/;

> the body of the function, and the -C mode of blame sees through the
> block-of-line movement across file boundaries, and goes straight to
> the last commit that touched the body of the function in its original
> file, no?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 22:53         ` Junio C Hamano
  2013-05-22 22:58           ` Junio C Hamano
@ 2013-05-22 23:19           ` Felipe Contreras
  1 sibling, 0 replies; 27+ messages in thread
From: Felipe Contreras @ 2013-05-22 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 5:53 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>> On Wed, May 22, 2013 at 5:38 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>>> As I already said in the above, the answer is no, when you are
>>> trying to find who moved the code from the original place.
>>
>> But I'm not trying to find who moved the code, I'm trying to find
>> related commits; hence the name 'git related'.
>>
>> The person who moved the code will be on the list regardless,
>
> That is exactly the point I have been trying to raise.  Does the
> person appear in the list when you run blame with -CCC?  You ask for
> the body of the function, and the -C mode of blame sees through the
> block-of-line movement across file boundaries, and goes straight to
> the last commit that touched the body of the function in its original
> file, no?

I'm not familiar how the different -C options work, but I'm testing
right now and I see the commit that moved a file with both -C and
-CCC, but strangely enough, not if it's the previous commit (with
both).

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 22:58           ` Junio C Hamano
@ 2013-05-22 23:42             ` Junio C Hamano
  2013-05-22 23:57               ` Felipe Contreras
  2013-05-23  3:23               ` Felipe Contreras
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-22 23:42 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Junio C Hamano <gitster@pobox.com> writes:

>>> The person who moved the code will be on the list regardless,
>>
>> That is exactly the point I have been trying to raise.  Does the
>> person appear in the list when you run blame with -CCC?  You ask for
>
> s/person/commit/;
>
>> the body of the function, and the -C mode of blame sees through the
>> block-of-line movement across file boundaries, and goes straight to
>> the last commit that touched the body of the function in its original
>> file, no?

-- >8 --
cd /var/tmp/
git init blame
cd blame

cp /src/git/COPYING COPYING
git add COPYING
git commit -m initial

sed -ne 120,140p COPYING >EXTRACTING
git add EXTRACTING
git commit -m second

git blame -C -C -C EXTRACTING
-- 8< --

will show that all lines from EXTRACTING came from the original
revision of COPYING, and we will miss the line-move event.

"blame -C" with a single -C does not look at the files that did not
change in the commit that added these lines to EXTRACTING
(i.e. COPYING), so the digging stops there.

After this, if you do this:

-- >8 --
echo >>COPYING
git commit --amend -a --no-edit

git blame -C EXTRACTING
-- 8< --

then the commit that did add these lines to EXTRACTING touched
COPYING, and the origin of these lines are traced to it (this is
designed to be useful in a typical "refactor by moving code";
"cut and paste without changing the original" people need heavier
copy detection with more -C).

IIRC, git-gui runs two blames, one without any -C and one with (I do
not offhand recall how many -C it uses) to show both.

HTH.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 23:42             ` Junio C Hamano
@ 2013-05-22 23:57               ` Felipe Contreras
  2013-05-23  0:08                 ` Junio C Hamano
  2013-05-23  3:23               ` Felipe Contreras
  1 sibling, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-22 23:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 6:42 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>>>> The person who moved the code will be on the list regardless,
>>>
>>> That is exactly the point I have been trying to raise.  Does the
>>> person appear in the list when you run blame with -CCC?  You ask for
>>
>> s/person/commit/;
>>
>>> the body of the function, and the -C mode of blame sees through the
>>> block-of-line movement across file boundaries, and goes straight to
>>> the last commit that touched the body of the function in its original
>>> file, no?
>
> -- >8 --
> cd /var/tmp/
> git init blame
> cd blame
>
> cp /src/git/COPYING COPYING
> git add COPYING
> git commit -m initial
>
> sed -ne 120,140p COPYING >EXTRACTING
> git add EXTRACTING
> git commit -m second
>
> git blame -C -C -C EXTRACTING
> -- 8< --

I just realized that -CCC does not do the same thing as -C -C -C.

> then the commit that did add these lines to EXTRACTING touched
> COPYING, and the origin of these lines are traced to it (this is
> designed to be useful in a typical "refactor by moving code";
> "cut and paste without changing the original" people need heavier
> copy detection with more -C).
>
> IIRC, git-gui runs two blames, one without any -C and one with (I do
> not offhand recall how many -C it uses) to show both.

'git blame' is a very expensive operation, perhaps we should add
another option so users don't need to run two blames to find this.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 23:57               ` Felipe Contreras
@ 2013-05-23  0:08                 ` Junio C Hamano
  2013-05-23  4:07                   ` Felipe Contreras
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23  0:08 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

>> IIRC, git-gui runs two blames, one without any -C and one with (I do
>> not offhand recall how many -C it uses) to show both.
>
> 'git blame' is a very expensive operation, perhaps we should add
> another option so users don't need to run two blames to find this.

Yes, you could add a "struct origin *suspect_in_the_same_file" next
to the current "struct origin *suspect" to the blame_entry in order
to represent the origin you would get if you were to stop at a "move
across files" event, and keep digging, when you are operating under
"-C -C" or higher mode.

That would make the result just as expensive as a single run of "-C
-C" (or higher mode) [*1*] without having to run an extra "git
blame" without any -C (even though that mode is much cheaper).

Representing the result for a terminal with reasonable width might
become unwieldy, but for --porcelain output format that should not
be a problem.


[Footnote]

*1* It would make it slightly more expensive than the current code,
    as it is very likely that you have to split a single blame_entry
    into many separate pieces.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-22 23:42             ` Junio C Hamano
  2013-05-22 23:57               ` Felipe Contreras
@ 2013-05-23  3:23               ` Felipe Contreras
  2013-05-23  3:50                 ` Felipe Contreras
  1 sibling, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23  3:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

This doesn't make any sense:

---
cd /tmp
rm -rf blame

git init blame
cd blame

cp ~/dev/git/COPYING COPYING
git add COPYING
git commit -m initial

sed -ne 120,140p COPYING >EXTRACTING
git add EXTRACTING
git commit -m second

git log --oneline
git blame -C EXTRACTING

echo >>COPYING
git commit --amend -a --no-edit

git log --oneline
git blame -C EXTRACTING
---

Why would the first instance find the blame in commit #2, and the
second one at commit #1? If anything, the second instance should have
more trouble finding the original commit.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23  3:23               ` Felipe Contreras
@ 2013-05-23  3:50                 ` Felipe Contreras
  2013-05-23 17:05                   ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23  3:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 10:23 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> This doesn't make any sense:

Ah, never mind, it's COPYING the one being modified, not EXTRACTING.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23  0:08                 ` Junio C Hamano
@ 2013-05-23  4:07                   ` Felipe Contreras
  2013-05-23  5:22                     ` Felipe Contreras
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23  4:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>>> IIRC, git-gui runs two blames, one without any -C and one with (I do
>>> not offhand recall how many -C it uses) to show both.
>>
>> 'git blame' is a very expensive operation, perhaps we should add
>> another option so users don't need to run two blames to find this.
>
> Yes, you could add a "struct origin *suspect_in_the_same_file" next
> to the current "struct origin *suspect" to the blame_entry in order
> to represent the origin you would get if you were to stop at a "move
> across files" event, and keep digging, when you are operating under
> "-C -C" or higher mode.

No, this is what I meant:

--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1509,16 +1509,6 @@ static void found_guilty_entry(struct blame_entry *ent)
        if (ent->guilty)
                return;
        ent->guilty = 1;
-       if (incremental) {
-               struct origin *suspect = ent->suspect;
-
-               printf("%s %d %d %d\n",
-                      sha1_to_hex(suspect->commit->object.sha1),
-                      ent->s_lno + 1, ent->lno + 1, ent->num_lines);
-               emit_one_suspect_detail(suspect, 0);
-               write_filename_info(suspect->path);
-               maybe_flush_or_die(stdout, "stdout");
-       }
 }

 /*
@@ -1536,12 +1526,24 @@ static void assign_blame(struct scoreboard *sb, int opt)
                struct origin *suspect = NULL;

                /* find one suspect to break down */
-               for (ent = sb->ent; !suspect && ent; ent = ent->next)
-                       if (!ent->guilty)
+               for (ent = sb->ent; ent; ent = ent->next)
+                       if (!ent->guilty) {
                                suspect = ent->suspect;
+                               break;
+                       }
+
                if (!suspect)
                        return; /* all done */

+               if (incremental &&
!is_null_sha1(suspect->commit->object.sha1)) {
+                       printf("%s %d %d %d\n",
+
sha1_to_hex(suspect->commit->object.sha1),
+                                       ent->s_lno + 1, ent->lno + 1,
ent->num_lines);
+                       emit_one_suspect_detail(suspect, 0);
+                       write_filename_info(suspect->path);
+                       maybe_flush_or_die(stdout, "stdout");
+               }
+
                /*
                 * We will use this suspect later in the loop,
                 * so hold onto it in the meantime.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23  4:07                   ` Felipe Contreras
@ 2013-05-23  5:22                     ` Felipe Contreras
  2013-05-23 16:54                       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23  5:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Felipe Contreras <felipe.contreras@gmail.com> writes:
>>
>>>> IIRC, git-gui runs two blames, one without any -C and one with (I do
>>>> not offhand recall how many -C it uses) to show both.
>>>
>>> 'git blame' is a very expensive operation, perhaps we should add
>>> another option so users don't need to run two blames to find this.
>>
>> Yes, you could add a "struct origin *suspect_in_the_same_file" next
>> to the current "struct origin *suspect" to the blame_entry in order
>> to represent the origin you would get if you were to stop at a "move
>> across files" event, and keep digging, when you are operating under
>> "-C -C" or higher mode.
>
> No, this is what I meant:

But that would not print the right line numbers, so perhaps:

--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1500,25 +1500,16 @@ static int emit_one_suspect_detail(struct
origin *suspect, int repeat)
 	return 1;
 }

-/*
- * The blame_entry is found to be guilty for the range.  Mark it
- * as such, and show it in incremental output.
- */
-static void found_guilty_entry(struct blame_entry *ent)
+static void print_guilty_entry(struct blame_entry *ent)
 {
-	if (ent->guilty)
-		return;
-	ent->guilty = 1;
-	if (incremental) {
-		struct origin *suspect = ent->suspect;
-
-		printf("%s %d %d %d\n",
-		       sha1_to_hex(suspect->commit->object.sha1),
-		       ent->s_lno + 1, ent->lno + 1, ent->num_lines);
-		emit_one_suspect_detail(suspect, 0);
-		write_filename_info(suspect->path);
-		maybe_flush_or_die(stdout, "stdout");
-	}
+	struct origin *suspect = ent->suspect;
+
+	printf("%s %d %d %d\n",
+			sha1_to_hex(suspect->commit->object.sha1),
+			ent->s_lno + 1, ent->lno + 1, ent->num_lines);
+	emit_one_suspect_detail(suspect, 0);
+	write_filename_info(suspect->path);
+	maybe_flush_or_die(stdout, "stdout");
 }

 /*
@@ -1531,14 +1522,19 @@ static void assign_blame(struct scoreboard *sb, int opt)
 	struct rev_info *revs = sb->revs;

 	while (1) {
-		struct blame_entry *ent;
+		struct blame_entry *ent, tmp_ent;
 		struct commit *commit;
 		struct origin *suspect = NULL;
+		int found_guilty = 0;

 		/* find one suspect to break down */
-		for (ent = sb->ent; !suspect && ent; ent = ent->next)
-			if (!ent->guilty)
+		for (ent = sb->ent; ent; ent = ent->next)
+			if (!ent->guilty) {
+				tmp_ent = *ent;
 				suspect = ent->suspect;
+				break;
+			}
+
 		if (!suspect)
 			return; /* all done */

@@ -1564,9 +1560,20 @@ static void assign_blame(struct scoreboard *sb, int opt)
 			commit->object.flags |= UNINTERESTING;

 		/* Take responsibility for the remaining entries */
-		for (ent = sb->ent; ent; ent = ent->next)
-			if (same_suspect(ent->suspect, suspect))
-				found_guilty_entry(ent);
+		for (ent = sb->ent; ent; ent = ent->next) {
+			if (same_suspect(ent->suspect, suspect)) {
+				if (ent->guilty)
+					continue;
+				found_guilty = ent->guilty = 1;
+				if (incremental)
+					print_guilty_entry(ent);
+			}
+		}
+
+		if (incremental && !found_guilty &&
+				!is_null_sha1(suspect->commit->object.sha1))
+			print_guilty_entry(&tmp_ent);
+
 		origin_decref(suspect);

 		if (DEBUG) /* sanity */

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23  5:22                     ` Felipe Contreras
@ 2013-05-23 16:54                       ` Junio C Hamano
  2013-05-23 18:34                         ` Junio C Hamano
  2013-05-23 21:33                         ` Felipe Contreras
  0 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23 16:54 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

> On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Felipe Contreras <felipe.contreras@gmail.com> writes:
>>>
>>>>> IIRC, git-gui runs two blames, one without any -C and one with (I do
>>>>> not offhand recall how many -C it uses) to show both.
>>>>
>>>> 'git blame' is a very expensive operation, perhaps we should add
>>>> another option so users don't need to run two blames to find this.
>>>
>>> Yes, you could add a "struct origin *suspect_in_the_same_file" next
>>> to the current "struct origin *suspect" to the blame_entry in order
>>> to represent the origin you would get if you were to stop at a "move
>>> across files" event, and keep digging, when you are operating under
>>> "-C -C" or higher mode.
>>
>> No, this is what I meant:
>
> But that would not print the right line numbers, so perhaps:

Users of full-output may want to be able to see the same thing.

I have a suspicion that you misread what assign_blame() does.  The
first inner loop to find one "suspect" is really what it says.  It
loops over blame entries in the scoreboard but it is not about
finding one "entry", but is to find one "suspect".  The "ent" found
in the loop that you store in tmp_ent is no more special than any
other "ent" that shares the same "suspect" as its origin.

Imagine that your scoreboard originally has three blocks of text
(i.e. blame_entry) A, B, and C, and the current suspect for A and C
are the same, while we already know what the final guilty party for
B is (which may be some descendant of the "suspect").

Once we find one "suspect" in the first inner loop, the call to
pass_blame() does everything it can do to exonerate that "suspect".

It runs a single diff between the suspect and each of its parents to
find lines in both A and C that can be blamed to one of these
parents, and blame entries A and C are split into pieces, some of
which still have the same suspect as their origin, which are where
their lines originate from, while others are attributable to these
parents or their ancestors.

If you keep the original entry for A to do something special, like
printing what the original range of A was before it was split by
pass_blame(), wouldn't you need to do the same for C?  We will not
be visiting C later in the outer while(1) loop, as a single call to
pass_blame() for "suspect" we did when we found it in A has already
taken care of it.

I am not sure what you are trying to achieve with that found-guilty
logic, even if the "save away in tmp_ent" logic is changed to keep
all the blame entries that have "suspect" we are looking at as their
origin.  When the current suspect is found to have passed all lines
intact from its parents, we will see found-guilty left to be false.

How would it make the original blame_entry (perhaps halfway) guilty
in that situation?

> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -1500,25 +1500,16 @@ static int emit_one_suspect_detail(struct
> origin *suspect, int repeat)
>  	return 1;
>  }
>
> -/*
> - * The blame_entry is found to be guilty for the range.  Mark it
> - * as such, and show it in incremental output.
> - */
> -static void found_guilty_entry(struct blame_entry *ent)
> +static void print_guilty_entry(struct blame_entry *ent)
>  {
> -	if (ent->guilty)
> -		return;
> -	ent->guilty = 1;
> -	if (incremental) {
> -		struct origin *suspect = ent->suspect;
> -
> -		printf("%s %d %d %d\n",
> -		       sha1_to_hex(suspect->commit->object.sha1),
> -		       ent->s_lno + 1, ent->lno + 1, ent->num_lines);
> -		emit_one_suspect_detail(suspect, 0);
> -		write_filename_info(suspect->path);
> -		maybe_flush_or_die(stdout, "stdout");
> -	}
> +	struct origin *suspect = ent->suspect;
> +
> +	printf("%s %d %d %d\n",
> +			sha1_to_hex(suspect->commit->object.sha1),
> +			ent->s_lno + 1, ent->lno + 1, ent->num_lines);
> +	emit_one_suspect_detail(suspect, 0);
> +	write_filename_info(suspect->path);
> +	maybe_flush_or_die(stdout, "stdout");
>  }
>
>  /*
> @@ -1531,14 +1522,19 @@ static void assign_blame(struct scoreboard *sb, int opt)
>  	struct rev_info *revs = sb->revs;
>
>  	while (1) {
> -		struct blame_entry *ent;
> +		struct blame_entry *ent, tmp_ent;
>  		struct commit *commit;
>  		struct origin *suspect = NULL;
> +		int found_guilty = 0;
>
>  		/* find one suspect to break down */
> -		for (ent = sb->ent; !suspect && ent; ent = ent->next)
> -			if (!ent->guilty)
> +		for (ent = sb->ent; ent; ent = ent->next)
> +			if (!ent->guilty) {
> +				tmp_ent = *ent;
>  				suspect = ent->suspect;
> +				break;
> +			}
> +
>  		if (!suspect)
>  			return; /* all done */
>
> @@ -1564,9 +1560,20 @@ static void assign_blame(struct scoreboard *sb, int opt)
>  			commit->object.flags |= UNINTERESTING;
>
>  		/* Take responsibility for the remaining entries */
> -		for (ent = sb->ent; ent; ent = ent->next)
> -			if (same_suspect(ent->suspect, suspect))
> -				found_guilty_entry(ent);
> +		for (ent = sb->ent; ent; ent = ent->next) {
> +			if (same_suspect(ent->suspect, suspect)) {
> +				if (ent->guilty)
> +					continue;
> +				found_guilty = ent->guilty = 1;
> +				if (incremental)
> +					print_guilty_entry(ent);
> +			}
> +		}
> +
> +		if (incremental && !found_guilty &&
> +				!is_null_sha1(suspect->commit->object.sha1))
> +			print_guilty_entry(&tmp_ent);
> +
>  		origin_decref(suspect);
>
>  		if (DEBUG) /* sanity */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23  3:50                 ` Felipe Contreras
@ 2013-05-23 17:05                   ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23 17:05 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

> On Wed, May 22, 2013 at 10:23 PM, Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>> This doesn't make any sense:
>
> Ah, never mind, it's COPYING the one being modified, not EXTRACTING.

Yes.

The different levels of -C happens to correspond to the software
engineering practice from best to sloppy:

 - When you refactor to have a block of lines in a file , the
   original of that block would have come from another file (or the
   same file) you touched to remove duplication, so a single -C
   looks for an origin only from modified files (best).

 - When you start a new file, you may have to start from some
   boilerplate material that already exists in another file that is
   not (and does not have to be) changed in the commit you add that
   new file, so double -C -C looks for an origin of lines from other
   files, even unmodified ones, when looking at the lines in a new
   file (unfortunate but acceptable).

 - When you copy and paste without making any effort to refactor,
   you modify a file by adding new lines that match identically from
   another existing file, the latter of which does not change in the
   commit you do that copy and paste.  To find this, you need to
   look for an origin for any file from all the other files, and
   triple -C -C -C lets you do so (sloppy).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 16:54                       ` Junio C Hamano
@ 2013-05-23 18:34                         ` Junio C Hamano
  2013-05-23 21:33                         ` Felipe Contreras
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23 18:34 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Junio C Hamano <gitster@pobox.com> writes:

> Users of full-output may want to be able to see the same thing.

... but I agree we may be able to cheat if we only want to support
interactive, and we do want to find a way to cheat when we are
running interactive without having to store much more information in
each blame_entry.

> Imagine that your scoreboard originally has three blocks of text
> ...
> in that situation?

(I snipped everything I said in the previous reply only for
brevity but they are still relevant).

I _think_ one way to "cheat" without storing too much information in
each blame_entry is to first make a copy of all the original blame
entries that share the "suspect", see if any line in the line range
represented by each of the blame entries ended up being blamed to an
origin with a path different from that of the "suspect".  And print
those original blame entries that fits the criterion as additional
"these are not the final blame as you are digging with the option to
detect copy across files, but at this commit this line appeared anew
as far as the origin->path is concerned" output.

So I think you were going in the right direction (in other words,
the patch inserted new code to the right places), even though you
may have got the details in what the new code should do wrong.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 16:54                       ` Junio C Hamano
  2013-05-23 18:34                         ` Junio C Hamano
@ 2013-05-23 21:33                         ` Felipe Contreras
  2013-05-23 21:52                           ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Thu, May 23, 2013 at 11:54 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>> On Wed, May 22, 2013 at 11:07 PM, Felipe Contreras
>> <felipe.contreras@gmail.com> wrote:
>>> On Wed, May 22, 2013 at 7:08 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>> Felipe Contreras <felipe.contreras@gmail.com> writes:
>>>>
>>>>>> IIRC, git-gui runs two blames, one without any -C and one with (I do
>>>>>> not offhand recall how many -C it uses) to show both.
>>>>>
>>>>> 'git blame' is a very expensive operation, perhaps we should add
>>>>> another option so users don't need to run two blames to find this.
>>>>
>>>> Yes, you could add a "struct origin *suspect_in_the_same_file" next
>>>> to the current "struct origin *suspect" to the blame_entry in order
>>>> to represent the origin you would get if you were to stop at a "move
>>>> across files" event, and keep digging, when you are operating under
>>>> "-C -C" or higher mode.
>>>
>>> No, this is what I meant:
>>
>> But that would not print the right line numbers, so perhaps:
>
> Users of full-output may want to be able to see the same thing.
>
> I have a suspicion that you misread what assign_blame() does.  The
> first inner loop to find one "suspect" is really what it says.  It
> loops over blame entries in the scoreboard but it is not about
> finding one "entry", but is to find one "suspect".  The "ent" found
> in the loop that you store in tmp_ent is no more special than any
> other "ent" that shares the same "suspect" as its origin.

It is stored because it's the only structure that has the line
numbers. If the blame is passed to another 'ent', the original line
numbers are gone. We could manually copy the line numbers to three
variables, or we could copy the whole thing to one variable. The rest
of the 'ent' doesn't matter.

> Imagine that your scoreboard originally has three blocks of text
> (i.e. blame_entry) A, B, and C, and the current suspect for A and C
> are the same, while we already know what the final guilty party for
> B is (which may be some descendant of the "suspect").

I don't see how that's possible, but whatever.

> Once we find one "suspect" in the first inner loop, the call to
> pass_blame() does everything it can do to exonerate that "suspect".
>
> It runs a single diff between the suspect and each of its parents to
> find lines in both A and C that can be blamed to one of these
> parents, and blame entries A and C are split into pieces, some of
> which still have the same suspect as their origin, which are where
> their lines originate from, while others are attributable to these
> parents or their ancestors.
>
> If you keep the original entry for A to do something special, like
> printing what the original range of A was before it was split by
> pass_blame(), wouldn't you need to do the same for C?

tmp_ent is only used when there's no entry taken the guilt away from
A. If the blame entry of A is split, and the result of the split has a
different filename, found_guilty() would not be called, and thus we
won't show the blame entry, and we want to show it.

And yes, we would need to do the same for C, and we would once the
turn of C comes along. If it's possible that A descendant from A takes
the blame for C, then sure, we would need to do ensure C is printed
before it's gone, but pass_blame(A) would not destroy C.

> We will not
> be visiting C later in the outer while(1) loop, as a single call to
> pass_blame() for "suspect" we did when we found it in A has already
> taken care of it.

It took care of A's suspect, but not C. Isn't it possible that C is
split further and creates another blame entry? Either way, in
--incremental we want to print C as well, whether it's the guilty one
or not.

> I am not sure what you are trying to achieve with that found-guilty
> logic, even if the "save away in tmp_ent" logic is changed to keep
> all the blame entries that have "suspect" we are looking at as their
> origin.

> When the current suspect is found to have passed all lines
> intact from its parents, we will see found-guilty left to be false.

What? When a 'blame entry' passes the whole blame to a parent,
found_guilty is false, so we print the entry, which is exactly what we
want to do.

> How would it make the original blame_entry (perhaps halfway) guilty
> in that situation?

I thought 'guilty' meant that it was guilty of adding those lines to
the 'final' text, so it ends up in the non-incremental blame. In that
sense those blame entries are not guilty.

But they are still blame entries, and we want to print all blame
entries in incremental, guilty or not.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 21:33                         ` Felipe Contreras
@ 2013-05-23 21:52                           ` Junio C Hamano
  2013-05-23 21:58                             ` Felipe Contreras
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23 21:52 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

>> Imagine that your scoreboard originally has three blocks of text
>> (i.e. blame_entry) A, B, and C, and the current suspect for A and C
>> are the same, while we already know what the final guilty party for
>> B is (which may be some descendant of the "suspect").
>
> I don't see how that's possible, but whatever.

The tree in your latest commit HEAD has a file with block of text A
followed by block of text B followed by block of text C.  The latest
commit was the one that added B (perhaps new lines were inserted, or
perhaps existing contiguous block of text was replaced, there is no
difference between these two cases).  You start digging the history
of this file from HEAD.

Your scoreboard begins with a single blame-entry that covers all
three segments, with its suspect set to HEAD.  Then pass_blame()
discovers that top and bottom segments are older than this latest
commit, and splits the originally-one blame entry into three blame
entries.  The first and the third blame entries cover the block A
and the block C respectively, and their suspect fields are both set
to HEAD^.  The second blame entry covers the block B and its suspect
does not change (it still is HEAD).  Then it returns to the caller.

The caller of pass_blame() looks at the scoreboard and finds these
three blame entries.  The second one's supect is still the original
suspect the caller asked pass_blame() to exonerate blames for, and
the suspect failed to do so for block B.  The final blame for the
middle block is now known to be HEAD.

After all of the above, the next iteration of while(1) loop begins.
That is how you arrive to the "whatever" situation.  You have three
blame entries, A, B and C, and suspect of A and C are the same,
while B has a different suspect.

Then in that "next iteration", we pick blame entry for A and decide
to see if HEAD^, which is the suspect for A, can be exonerated of
blames for _any_ (not just A) blame entry it currently is suspected
for.  We call pass_blame() and it will find and process both A and
C with a single "git diff-tree", attempting to pass blame to HEAD^^
and its ancestors.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 21:52                           ` Junio C Hamano
@ 2013-05-23 21:58                             ` Felipe Contreras
  2013-05-23 22:44                               ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23 21:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Thu, May 23, 2013 at 4:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>>> Imagine that your scoreboard originally has three blocks of text
>>> (i.e. blame_entry) A, B, and C, and the current suspect for A and C
>>> are the same, while we already know what the final guilty party for
>>> B is (which may be some descendant of the "suspect").
>>
>> I don't see how that's possible, but whatever.
>
> The tree in your latest commit HEAD has a file with block of text A
> followed by block of text B followed by block of text C.  The latest
> commit was the one that added B (perhaps new lines were inserted, or
> perhaps existing contiguous block of text was replaced, there is no
> difference between these two cases).  You start digging the history
> of this file from HEAD.
>
> Your scoreboard begins with a single blame-entry that covers all
> three segments, with its suspect set to HEAD.  Then pass_blame()
> discovers that top and bottom segments are older than this latest
> commit, and splits the originally-one blame entry into three blame
> entries.  The first and the third blame entries cover the block A
> and the block C respectively, and their suspect fields are both set
> to HEAD^.  The second blame entry covers the block B and its suspect
> does not change (it still is HEAD).  Then it returns to the caller.
>
> The caller of pass_blame() looks at the scoreboard and finds these
> three blame entries.  The second one's supect is still the original
> suspect the caller asked pass_blame() to exonerate blames for, and
> the suspect failed to do so for block B.  The final blame for the
> middle block is now known to be HEAD.
>
> After all of the above, the next iteration of while(1) loop begins.
> That is how you arrive to the "whatever" situation.  You have three
> blame entries, A, B and C, and suspect of A and C are the same,
> while B has a different suspect.
>
> Then in that "next iteration", we pick blame entry for A and decide
> to see if HEAD^, which is the suspect for A, can be exonerated of
> blames for _any_ (not just A) blame entry it currently is suspected
> for.  We call pass_blame() and it will find and process both A and
> C with a single "git diff-tree", attempting to pass blame to HEAD^^
> and its ancestors.

All right, my code still works in that situation.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 21:58                             ` Felipe Contreras
@ 2013-05-23 22:44                               ` Junio C Hamano
  2013-05-23 22:59                                 ` Felipe Contreras
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23 22:44 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

>> Then in that "next iteration", we pick blame entry for A and decide
>> to see if HEAD^, which is the suspect for A, can be exonerated of
>> blames for _any_ (not just A) blame entry it currently is suspected
>> for.  We call pass_blame() and it will find and process both A and
>> C with a single "git diff-tree", attempting to pass blame to HEAD^^
>> and its ancestors.
>
> All right, my code still works in that situation.

When HEAD^ was processed, pass_blame() finds that the first line in
A is attributable to HEAD^ (our current suspect) while the rest were
copied from a different file in HEAD^^ .  All lines in C are found
to be copy from a different file in HEAD^^.

Then your scoreboard would have:

  1. a blame entry that failed to pass blame to parents of HEAD^ (the
     first line in A), which still has suspect == HEAD^

  2. a blame entry that covers the remainder of A, blaming HEAD^^.

  3. a blame entry that covers all of B, whose final guilty party is
     known already.

  4. a blame entry that covers all of C, blaming a different file in
     HEAD^^.

Your "Take responsibility" loop goes over these blame entries, sets
found_guilty to true because of the first blame entry (the first
line of A), and calls print_guilty_entry() for blame entry 1,
showing that HEAD^ is guilty for the first line of A.

After the loop, your "if we did not find anybody guilty" logic does
not kick in, and the original line range for block A you saved in
tmp_ent is not used.

You lost the fact that the second and remainder of A were in this
file at the commit HEAD^ but not in HEAD^^ (i.e. these lines were
moved by HEAD^ from elsewhere).

The fact that HEAD^ touched _something_ is not lost, so if _all_ you
care about is "list all the commits, but I do not care about what
line range was modified how", you can claim it "working", but that
is a very limited definition of "working" and is not very reusable
or extensible in order to help those like gitk that currently have
to run two separate blames.  They need an output that lets them tell
between

 - this is the earliest we saw these lines in the path (it may have
   been copied from another path, but this entry in the incremental
   output stream is not about that final blame); and

 - this is the final blame where these lines came from, possibly
   from different path"

and coalesce both kind of origin..

Also the fact that the entire C was copied from elsewhere by HEAD^
is lost but that is the same issue.  The next round will not find
any blame entry that is !ent->guity because the call to pass_blame()
for HEAD^ already handled the final blame not just for blame entries
1 and 2, but also for 4 during this round.

If the change in HEAD^ in the above example were to copy the whole A
and C from a different file, then your !found_guilty logic would
kick in and say all lines of A where copied from elsewhere in HEAD^,
but again we would not learn the same information for C.

I do not think "I care only about a single bit per commit, i.e. if
the commit touched the path; screw everybody else who wants to
actually know where each line came from" can be called "working".

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 22:44                               ` Junio C Hamano
@ 2013-05-23 22:59                                 ` Felipe Contreras
  2013-05-23 23:47                                   ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Felipe Contreras @ 2013-05-23 22:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Thu, May 23, 2013 at 5:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>>> Then in that "next iteration", we pick blame entry for A and decide
>>> to see if HEAD^, which is the suspect for A, can be exonerated of
>>> blames for _any_ (not just A) blame entry it currently is suspected
>>> for.  We call pass_blame() and it will find and process both A and
>>> C with a single "git diff-tree", attempting to pass blame to HEAD^^
>>> and its ancestors.
>>
>> All right, my code still works in that situation.
>
> When HEAD^ was processed, pass_blame() finds that the first line in
> A is attributable to HEAD^ (our current suspect) while the rest were
> copied from a different file in HEAD^^ .  All lines in C are found
> to be copy from a different file in HEAD^^.
>
> Then your scoreboard would have:
>
>   1. a blame entry that failed to pass blame to parents of HEAD^ (the
>      first line in A), which still has suspect == HEAD^
>
>   2. a blame entry that covers the remainder of A, blaming HEAD^^.
>
>   3. a blame entry that covers all of B, whose final guilty party is
>      known already.
>
>   4. a blame entry that covers all of C, blaming a different file in
>      HEAD^^.
>
> Your "Take responsibility" loop goes over these blame entries, sets
> found_guilty to true because of the first blame entry (the first
> line of A), and calls print_guilty_entry() for blame entry 1,
> showing that HEAD^ is guilty for the first line of A.
>
> After the loop, your "if we did not find anybody guilty" logic does
> not kick in, and the original line range for block A you saved in
> tmp_ent is not used.
>
> You lost the fact that the second and remainder of A were in this
> file at the commit HEAD^ but not in HEAD^^ (i.e. these lines were
> moved by HEAD^ from elsewhere).

> The fact that HEAD^ touched _something_ is not lost, so if _all_ you
> care about is "list all the commits, but I do not care about what
> line range was modified how", you can claim it "working",

The line range printed is correct.

> but that
> is a very limited definition of "working" and is not very reusable
> or extensible in order to help those like gitk that currently have
> to run two separate blames.  They need an output that lets them tell
> between
>
>  - this is the earliest we saw these lines in the path (it may have
>    been copied from another path, but this entry in the incremental
>    output stream is not about that final blame); and
>
>  - this is the final blame where these lines came from, possibly
>    from different path"
>
> and coalesce both kind of origin..

They can already do that by looking at the lines; if they get replaced
by another entry; they are not guilty.

Or we can add a 'guilty' line to the incremental output, that's trivial.

> Also the fact that the entire C was copied from elsewhere by HEAD^
> is lost but that is the same issue.  The next round will not find
> any blame entry that is !ent->guity because the call to pass_blame()
> for HEAD^ already handled the final blame not just for blame entries
> 1 and 2, but also for 4 during this round.

No, that's not true. If it's a different file the blame entry is not
found guilty.

> If the change in HEAD^ in the above example were to copy the whole A
> and C from a different file, then your !found_guilty logic would
> kick in and say all lines of A where copied from elsewhere in HEAD^,
> but again we would not learn the same information for C.

We would, when it's the turn for C, which is not guilty at this point.

> I do not think "I care only about a single bit per commit, i.e. if
> the commit touched the path; screw everybody else who wants to
> actually know where each line came from" can be called "working".

They can know where each line came from; the lines of each entry are
printed properly; that's the whole point of 'tmp_ent'.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 22:59                                 ` Felipe Contreras
@ 2013-05-23 23:47                                   ` Junio C Hamano
  2013-05-24  0:22                                     ` Felipe Contreras
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2013-05-23 23:47 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

Felipe Contreras <felipe.contreras@gmail.com> writes:

>> If the change in HEAD^ in the above example were to copy the whole A
>> and C from a different file, then your !found_guilty logic would
>> kick in and say all lines of A where copied from elsewhere in HEAD^,
>> but again we would not learn the same information for C.
>
> We would, when it's the turn for C, which is not guilty at this point.

In _this_ round of the while(1) loop, pass_blame_to_parent() gets
the scoreboard and two origins (HEAD^ that we are looking at and
HEAD^^ that is its parent); it does not even know what blame entry
this request came from.

It runs a single diff using diff_hunks(), and asks blame_chunk() to
split all the blame entries in the scoreboard that suspect it is
looking at may be guilty for.  Blame entry for A and C are both
processed exactly the same way when HEAD^ is given to pass_blame()
for the first time, which is when assign_blame() decided to call it
with HEAD^ because it happened to have seen A before seeing C.  At
that point, both A and C are processed, and the post-processing loop
"Take responsibility for the remaining" will clean up remnants from
both A and C.  After this round ends, the suspect for A and C are
both set to HEAD^^.

In the next round of the while(1) loop, C already forgot that its
line movement happened in HEAD^.  Its suspect is now HEAD^^.  When
"it's the turn for C" [*1*], you can say "These lines originate in
that different path in HEAD^^", but it is too late to say "But the
first time they appeared in the original file was HEAD^" (which is
when they were moved from the different path in HEAD^^), isn't it?


[Footnote]

*1* It does not make much sense to say "turn for C", by the way.
    pass_blame() work per suspect (not per blame entry), so what you
    are talking about really is "when it is turn for HEAD^^", at
    which point we will discover that these lines in C originate in
    that different path in the commit.  And in the scenario we are
    discussing, we will decide to check HEAD^^ because the blame ent
    for A is also the same origin as C's suspect in HEAD^^, so
    technically it is still "turn for A" ;-).

    The sample would be like

	(cat A; cat C) >file0; git commit file0	 ; HEAD^^
	(cat A; cat C) >file1; git commit file1	 ; HEAD^
        (cat A; cat B; cat C) >file1; git commit file1 ; HEAD
        git blame -C -C -C file1 HEAD


        

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6] Add new git-related helper to contrib
  2013-05-23 23:47                                   ` Junio C Hamano
@ 2013-05-24  0:22                                     ` Felipe Contreras
  0 siblings, 0 replies; 27+ messages in thread
From: Felipe Contreras @ 2013-05-24  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Duy Nguyen

On Thu, May 23, 2013 at 6:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>>> If the change in HEAD^ in the above example were to copy the whole A
>>> and C from a different file, then your !found_guilty logic would
>>> kick in and say all lines of A where copied from elsewhere in HEAD^,
>>> but again we would not learn the same information for C.
>>
>> We would, when it's the turn for C, which is not guilty at this point.
>
> In _this_ round of the while(1) loop, pass_blame_to_parent() gets
> the scoreboard and two origins (HEAD^ that we are looking at and
> HEAD^^ that is its parent); it does not even know what blame entry
> this request came from.
>
> It runs a single diff using diff_hunks(), and asks blame_chunk() to
> split all the blame entries in the scoreboard that suspect it is
> looking at may be guilty for.  Blame entry for A and C are both
> processed exactly the same way when HEAD^ is given to pass_blame()
> for the first time, which is when assign_blame() decided to call it
> with HEAD^ because it happened to have seen A before seeing C.  At
> that point, both A and C are processed, and the post-processing loop
> "Take responsibility for the remaining" will clean up remnants from
> both A and C.  After this round ends, the suspect for A and C are
> both set to HEAD^^.
>
> In the next round of the while(1) loop, C already forgot that its
> line movement happened in HEAD^.  Its suspect is now HEAD^^.  When
> "it's the turn for C" [*1*], you can say "These lines originate in
> that different path in HEAD^^", but it is too late to say "But the
> first time they appeared in the original file was HEAD^" (which is
> when they were moved from the different path in HEAD^^), isn't it?

If that's the case then we'll need another list of blame entries where
each discarded blame entry goes. Given the luck of my previous obvious
patches, I'm not interested in implementing this non-obvious one in
the least.

The point is 'git related' should do -C -C -C, if 'git blame' doesn't
throw the right output, that's a bug in 'git blame' not 'git related.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-05-24  0:23 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-19 15:53 [PATCH v6] Add new git-related helper to contrib Felipe Contreras
2013-05-20  1:47 ` Eric Sunshine
2013-05-22 19:23 ` Junio C Hamano
2013-05-22 19:54   ` Junio C Hamano
2013-05-22 22:23   ` Felipe Contreras
2013-05-22 22:38     ` Junio C Hamano
2013-05-22 22:43       ` Felipe Contreras
2013-05-22 22:53         ` Junio C Hamano
2013-05-22 22:58           ` Junio C Hamano
2013-05-22 23:42             ` Junio C Hamano
2013-05-22 23:57               ` Felipe Contreras
2013-05-23  0:08                 ` Junio C Hamano
2013-05-23  4:07                   ` Felipe Contreras
2013-05-23  5:22                     ` Felipe Contreras
2013-05-23 16:54                       ` Junio C Hamano
2013-05-23 18:34                         ` Junio C Hamano
2013-05-23 21:33                         ` Felipe Contreras
2013-05-23 21:52                           ` Junio C Hamano
2013-05-23 21:58                             ` Felipe Contreras
2013-05-23 22:44                               ` Junio C Hamano
2013-05-23 22:59                                 ` Felipe Contreras
2013-05-23 23:47                                   ` Junio C Hamano
2013-05-24  0:22                                     ` Felipe Contreras
2013-05-23  3:23               ` Felipe Contreras
2013-05-23  3:50                 ` Felipe Contreras
2013-05-23 17:05                   ` Junio C Hamano
2013-05-22 23:19           ` Felipe Contreras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.