All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Bo Yang <struggleyb.nku@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	gitzilla@gmail.com, Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org
Subject: Re: GSoC draft proposal: Line-level history browser
Date: Tue, 23 Mar 2010 13:57:45 -0500	[thread overview]
Message-ID: <20100323185745.GA1382@progeny.tock> (raw)
In-Reply-To: <41f08ee11003222301y569a5972q3c67d10c77abe27a@mail.gmail.com>

Hi,

[reordering quoted text for convenience]

Bo Yang wrote:

> I can't understand fully about your above strategy. I think we can
> category the code change into two cases:

Thanks!  What you said is much more coherent than the vague things I
wrote.

> 2. The diff looks like:
[...]
> This means, the code here is added from scratch. Here, I think we have
> three options.
> 1. Find if the new code is moved here from other place.
> 2. Find if the new code is copied from other place.
> 3. We find the end of the history, so stop here.

If the code is copied verbatim from elsewhere, this is something ‘git
blame’ is already very good at.  See [1].

Fuzzy matching is a big pain.  ‘git blame’ knows how to ignore
whitespace.  Dscho suggested counting common words.  Maybe there are
some other ways.  I think there is a real danger of getting lost in this
problem and wasting a lot of time, so although it is very interesting, I
would consider any progress in this area a bonus rather than a goal.

> 1. The diff looks like:
> 
> @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char
> **argv, const char *prefix)
>                 add_signoff = xmemdupz(committer, endpos - committer + 1);
>         }
> 
> -       for (i = 0; i < extra_hdr_nr; i++) {
> -               strbuf_addstr(&buf, extra_hdr[i]);
> +       for (i = 0; i < extra_hdr.nr; i++) {
> +               strbuf_addstr(&buf, extra_hdr.items[i].string);
>                 strbuf_addch(&buf, '\n');
>         }
> 
> 
> ie: there is both deletion and addition in a change. And this means we
> modify some lines of the code. So, what we do will be tracing the two
> 'minus' lines and then find another diff. Start trace from that diff
> recursively.

If you can make a heuristic along these lines this work well, I think it
would be great.  I imagine it might work very well for commits that made
nice, small changes (like many of those in git.git).  Jakub pointed out
some of the difficulties, and I like to hope your idea of “when in doubt,
include more lines” may work well in many cases in git.git still.

Good luck, and thank you for taking my crazy ideas seriously. :)

Regards,
Jonathan

[1] See v1.4.4-rc1~2 (Merge branch 'jc/pickaxe', 2006-11-07) and the
commits preceding it.  About that series, Junio wrote:

	Actually the plan is to make it do _true_ pickaxe,
	although it will most likely end up either in dustbin or
	replace blame.

It replaced blame.

I am not actually sure, but I assume “true pickaxe” refers to the
goals described in <http://gitster.livejournal.com/35628.html>
and the linked-to message.

  parent reply	other threads:[~2010-03-23 18:57 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-20  9:18 GSoC draft proposal: Line-level history browser Bo Yang
2010-03-20 11:30 ` Johannes Schindelin
2010-03-20 13:10   ` Bo Yang
2010-03-20 13:30     ` Junio C Hamano
2010-03-21  6:03       ` Bo Yang
2010-03-20 13:36     ` Johannes Schindelin
2010-03-21  6:05       ` Bo Yang
2010-03-20 20:35 ` Alex Riesen
2010-03-20 20:57   ` Junio C Hamano
2010-03-21  6:10     ` Bo Yang
2010-03-20 21:58   ` A Large Angry SCM
2010-03-21  6:16     ` Bo Yang
2010-03-21 13:19       ` A Large Angry SCM
2010-03-22  3:48         ` Bo Yang
2010-03-22  4:24           ` Junio C Hamano
2010-03-22  4:34             ` Bo Yang
2010-03-22  5:32               ` Junio C Hamano
2010-03-22  7:31                 ` Bo Yang
2010-03-22  7:41                   ` Junio C Hamano
2010-03-22  7:52                     ` Bo Yang
2010-03-22  8:10                     ` Jonathan Nieder
2010-03-23  6:01                       ` Bo Yang
2010-03-23 10:08                         ` Jakub Narebski
2010-03-23 10:38                           ` Bo Yang
2010-03-23 11:22                             ` Jakub Narebski
2010-03-23 12:23                               ` Bo Yang
2010-03-23 13:49                                 ` Jakub Narebski
2010-03-23 15:23                                   ` Bo Yang
2010-03-23 19:57                                     ` Jonathan Nieder
2010-03-23 21:51                                       ` A Large Angry SCM
2010-03-24  2:30                                       ` Bo Yang
2010-03-23 12:02                             ` Peter Kjellerstedt
2010-03-23 18:57                         ` Jonathan Nieder [this message]
2010-03-24  2:39                           ` Bo Yang
2010-03-24  4:02                             ` Jonathan Nieder
2010-03-22 10:39                 ` Alex Riesen
2010-03-22 15:05                   ` Johannes Schindelin
2010-03-22  3:52         ` Bo Yang
2010-03-22 15:48           ` Jakub Narebski
2010-03-22 18:21             ` Johannes Schindelin
2010-03-22 18:38               ` Sverre Rabbelier
2010-03-22 19:26                 ` Johannes Schindelin
2010-03-22 20:21                   ` Sverre Rabbelier
2010-03-22 19:24           ` Johannes Schindelin
2010-03-23  6:08             ` Bo Yang
2010-03-23  6:27             ` Bo Yang
     [not found]           ` <201003282120.40536.trast@student.ethz.ch>
2010-03-29  4:14             ` Bo Yang
2010-03-29 18:42               ` Thomas Rast
2010-03-30  2:52                 ` Bo Yang
2010-03-30  9:07                   ` Michael J Gruber
2010-03-30  9:38                     ` Michael J Gruber
2010-03-30 11:10                     ` Bo Yang
2010-03-30  9:10                   ` Jakub Narebski
2010-03-30 11:15                     ` Bo Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100323185745.GA1382@progeny.tock \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=gitzilla@gmail.com \
    --cc=raa.lkml@gmail.com \
    --cc=struggleyb.nku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.