GSoC draft proposal: Line-level history browser

* GSoC draft proposal: Line-level history browser
@ 2010-03-20  9:18 Bo Yang
  2010-03-20 11:30 ` Johannes Schindelin
  2010-03-20 20:35 ` Alex Riesen
  0 siblings, 2 replies; 54+ messages in thread
From: Bo Yang @ 2010-03-20  9:18 UTC (permalink / raw)
  To: git

Hi,

I am very interested in the project 'Line-level history browser',
after some days consideration, now I made up a draft of my proposal, I
think it is helpful to send it to the list before submitting it. Could
you please give me some advise?

-----------------------------------------------
Draft proposal: Line-level History Browser

=====Purpose of this project=====
"git blame" can tell us who is responsible for a line of code, but it
can't help if we want to get the detail of how the lines of code have
evolved as what it is now.
This project will add a new utility for git called 'git line-log'. It
can trace the history of any line range of certain file at any
revision. For simplity, users can run the command like: ' git line-log
builtin/diff.c 6..8 ', he will get the change history of code between
line 6 and line 8 of the diff.c file. And for each history entry, it
will provide the commits, the diff block which contains changes of
users' interested lines.
This utility will trace all the modification history of interested
lines and stop until it finds the root of the lines, which is a point
where all the new code is added from scratch. Also, the users can
specify how deeply he wants this utility to trace. And this tool will
treat code move just like modification too, so it will follow the code
move inside one file.
Note that, the history may not always be a single thread of commits.
If there are more than one commit which produce the specified line
range, the thread of history will split. And this utility will stop
and provide all commits with its code changes to the user, let the
user to select which one to trace next.

=====Work and technical issues=====
==Command options==
This new tool should be used for exploring the history of changes for
certain line range of code in one file.

git line-log [options] <file> <line range>

Options:
1. Since it will output commit description, it will contain the option
used to control whether we should show the whole commit message or
just a short title.
2. Option whether we should display only the 'user interested lines'
diff block [default] or display the whole diff with the interested
area colorfully displayed.
3. The max depth we trace into the commit history.
4. The revision of the <file>. This is very useful when the current
interested line range is produced by more than one commit. The user
can use this option to specify the file revision and trace down from
that revision and the line range.

<line range>
Its format should be <start pos>..<end pos> or just a <line number>.

==Design and implementation==
Git store all the blobs instead of code delta, so we should traverse
the commit history and directly access the tree/blob objects to
compute the code delta and search for the diff which contains the
interested lines. Since git use libxdiff to format its diff file, we
should iterate through all xdiff's diff blocks and find what the code
looks like before the commit. Here, we will find a new line range
which is the origin code before this commit. And then start another
search from the current commit and the new line range. Recursively, we
can find all the modification history. We will stop when we find that
the current interested line range is added from scratch and is not
moved from other place of the file. We may also stop the traverse when
we reach the max search depth. Also, if the thread of change history
split into two or more commits, we stop and provide the users all the
related commits and corresponding line range.

For implementation related stuff, this tool heavily depends on
libxdiff. Because we will search our interested lines through xdiff's
output to find the right diff trunk to display and trace down. So, how
we search the xdiff's diff blocks is very important. After reading
some libxdiff document and code, I find that libxdiff output all the
diff blocks as string into a memory file. If we parse the diff block
string to find the changed lines, it is very inefficient. So, I
suggest changing xdiff's xdl_diff function to let it store some meta
data for each diff trunk. I think this will be very helpful for the
performance of this tool.

Generally,
1. xdiff/xdiffi.c will get changed to make xdl_diff store some desired
meta data and pass it to caller.
2. builtin/line-log.c will be added to complete most of the new
features, the most important function here may be cmd_linelog.
3. git.c will be changed to add this new utility to the front end.
4. Documents will be updated to introduce this new tool.

=====About me=====
I am Bo Yang, a Chinese graduate student majoring in Computer Science
of NanKai University. I have touched some open source software since 5
years ago and began to contribute code to open source community from
three years ago. I have contributed to Mozilla/Mingw/Netsurf.
Technically, I am experienced in C/Bash Shell. I have attended last
year's GSoC with Netsurf project. In that project, I have completed
most of a DOM library in C.
I begin to use git for source code revision from about two years ago.
I use Git for track my Mozilla trunk source code. Because updating
Mozilla code by CVS in my school is very slow. So, I write one script
to automatically updating the trunk with CVS at mid-night, when the
network flow is fast, on the server, and then use Git to maintain the
code. Then I use Git in my PC to clone/update the source code from my
local server and that is very fast. I use Git to track my changes to
the code and some bug fixes. It is an excellent tool for
branch/history, I think.
Git is my lovely daily tool for revision control. I have much
experience with it and have read "Git Internals" and also get some
basic knowledge about Git's code base. And I think the line-level
history explorer is really suitable for me and I can make a good start
with this project in Git community.
-----------------------------------------------

Any feedback from you will be appreciated very much, thanks a lot!

Regards!
Bo
-- 
My blog: http://blog.morebits.org

^ permalink raw reply	[flat|nested] 54+ messages in thread