All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Bo Yang <struggleyb.nku@gmail.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	gitzilla@gmail.com, Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org
Subject: Re: GSoC draft proposal: Line-level history browser
Date: Tue, 23 Mar 2010 03:08:28 -0700 (PDT)	[thread overview]
Message-ID: <m38w9jjqqd.fsf@localhost.localdomain> (raw)
In-Reply-To: <41f08ee11003222301y569a5972q3c67d10c77abe27a@mail.gmail.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 3719 bytes --]

Bo Yang <struggleyb.nku@gmail.com> writes:
> Jonathan Nieder <jrnieder@gmail.com> writes:

> > Hmm, I can imagine some (mutually inconsistent) heuristics:
> >
> >  - Suppose in the blamed commit a single isolated line changed.  Then
> >   it is clear where to look next.
> >
> >  - If the mystery code is at the beginning of the file (resp.
> >   beginning of a diff -C0 hunk), maybe it was based on the line at the
> >   same position within the previous commit.
> >
> >  - Take the line with the lowest Levenshtein distance from the mystery
> >   code.
> >
> >  - Expect certain common patterns of change: substituted words,
> >   whitespace changes, added arguments for a function, things like that.
> >
> > That said, I still donÂ’t have a clear picture of a basic strategy.
> 
> I can't understand fully about your above strategy. I think we can
> category the code change into two cases:
>
> 1. The diff looks like this:
> 
> @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char
> **argv, const char *prefix)
>                 add_signoff = xmemdupz(committer, endpos - committer + 1);
>         }
> 
> -       for (i = 0; i < extra_hdr_nr; i++) {
> -               strbuf_addstr(&buf, extra_hdr[i]);
> +       for (i = 0; i < extra_hdr.nr; i++) {
> +               strbuf_addstr(&buf, extra_hdr.items[i].string);
>                 strbuf_addch(&buf, '\n');
>         }

Errr... how the first line in preimage differs from first line in
postimage?  The look as if they are the same:

  -       for (i = 0; i < extra_hdr_nr; i++) {
  +       for (i = 0; i < extra_hdr.nr; i++) {

> 
> i.e. there is both deletion and addition in a change. And this means we
> modify some lines of the code. So, what we do will be tracing the two
> 'minus' lines and then find another diff. Start trace from that diff
> recursively.
>
> Yes, the new added code may also be moved or copied from other place.
> But, I think here, we should focus on the lines before this changeset.

The problem is when you are asking about tracking a subset of lines
that appear in postimage of a patch.  For example if we ask for
history of

                  strbuf_addstr(&buf, extra_hdr.items[i].string);

line, should we track history of

          for (i = 0; i < extra_hdr.nr; i++) {

line which appears in relevant diff chunk?  If not, how we should
detect which line in preimage (if any) corresponds to given line in
postimage?

> 2. The diff looks like:
> 
> @@ -879,9 +885,12 @@ int cmd_grep(int argc, const char **argv, const
> char *prefix)
>         opt.regflags = REG_NEWLINE;
>         opt.max_depth = -1;
> 
> +       strcpy(opt.color_context, "");
>         strcpy(opt.color_filename, "");
> +       strcpy(opt.color_function, "");
>         strcpy(opt.color_lineno, "");
>         strcpy(opt.color_match, GIT_COLOR_BOLD_RED);
> 
> This means, the code here is added from scratch. Here, I think we have
> three options.
> 1. Find if the new code is moved here from other place.
> 2. Find if the new code is copied from other place.
> 3. We find the end of the history, so stop here.
> 
> The problems remain how do we find the copied/moved code. The new
> added code may be copied/moved from multiple place with little
> changes.

I guess that you could take a look at how git-blame does handle
this... but I think you would get something like generalization of
ordinary patch, where preimage of chunk can come from different place
/ different file.


P.S. I like it that you provide real-life examples.  They really help
     with understanding what are you talking about.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

  reply	other threads:[~2010-03-23 10:08 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-20  9:18 GSoC draft proposal: Line-level history browser Bo Yang
2010-03-20 11:30 ` Johannes Schindelin
2010-03-20 13:10   ` Bo Yang
2010-03-20 13:30     ` Junio C Hamano
2010-03-21  6:03       ` Bo Yang
2010-03-20 13:36     ` Johannes Schindelin
2010-03-21  6:05       ` Bo Yang
2010-03-20 20:35 ` Alex Riesen
2010-03-20 20:57   ` Junio C Hamano
2010-03-21  6:10     ` Bo Yang
2010-03-20 21:58   ` A Large Angry SCM
2010-03-21  6:16     ` Bo Yang
2010-03-21 13:19       ` A Large Angry SCM
2010-03-22  3:48         ` Bo Yang
2010-03-22  4:24           ` Junio C Hamano
2010-03-22  4:34             ` Bo Yang
2010-03-22  5:32               ` Junio C Hamano
2010-03-22  7:31                 ` Bo Yang
2010-03-22  7:41                   ` Junio C Hamano
2010-03-22  7:52                     ` Bo Yang
2010-03-22  8:10                     ` Jonathan Nieder
2010-03-23  6:01                       ` Bo Yang
2010-03-23 10:08                         ` Jakub Narebski [this message]
2010-03-23 10:38                           ` Bo Yang
2010-03-23 11:22                             ` Jakub Narebski
2010-03-23 12:23                               ` Bo Yang
2010-03-23 13:49                                 ` Jakub Narebski
2010-03-23 15:23                                   ` Bo Yang
2010-03-23 19:57                                     ` Jonathan Nieder
2010-03-23 21:51                                       ` A Large Angry SCM
2010-03-24  2:30                                       ` Bo Yang
2010-03-23 12:02                             ` Peter Kjellerstedt
2010-03-23 18:57                         ` Jonathan Nieder
2010-03-24  2:39                           ` Bo Yang
2010-03-24  4:02                             ` Jonathan Nieder
2010-03-22 10:39                 ` Alex Riesen
2010-03-22 15:05                   ` Johannes Schindelin
2010-03-22  3:52         ` Bo Yang
2010-03-22 15:48           ` Jakub Narebski
2010-03-22 18:21             ` Johannes Schindelin
2010-03-22 18:38               ` Sverre Rabbelier
2010-03-22 19:26                 ` Johannes Schindelin
2010-03-22 20:21                   ` Sverre Rabbelier
2010-03-22 19:24           ` Johannes Schindelin
2010-03-23  6:08             ` Bo Yang
2010-03-23  6:27             ` Bo Yang
     [not found]           ` <201003282120.40536.trast@student.ethz.ch>
2010-03-29  4:14             ` Bo Yang
2010-03-29 18:42               ` Thomas Rast
2010-03-30  2:52                 ` Bo Yang
2010-03-30  9:07                   ` Michael J Gruber
2010-03-30  9:38                     ` Michael J Gruber
2010-03-30 11:10                     ` Bo Yang
2010-03-30  9:10                   ` Jakub Narebski
2010-03-30 11:15                     ` Bo Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m38w9jjqqd.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=gitzilla@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=raa.lkml@gmail.com \
    --cc=struggleyb.nku@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.