* GSoC draft proposal: Line-level history browser @ 2010-03-20 9:18 Bo Yang 2010-03-20 11:30 ` Johannes Schindelin 2010-03-20 20:35 ` Alex Riesen 0 siblings, 2 replies; 54+ messages in thread From: Bo Yang @ 2010-03-20 9:18 UTC (permalink / raw) To: git Hi, I am very interested in the project 'Line-level history browser', after some days consideration, now I made up a draft of my proposal, I think it is helpful to send it to the list before submitting it. Could you please give me some advise? ----------------------------------------------- Draft proposal: Line-level History Browser =====Purpose of this project===== "git blame" can tell us who is responsible for a line of code, but it can't help if we want to get the detail of how the lines of code have evolved as what it is now. This project will add a new utility for git called 'git line-log'. It can trace the history of any line range of certain file at any revision. For simplity, users can run the command like: ' git line-log builtin/diff.c 6..8 ', he will get the change history of code between line 6 and line 8 of the diff.c file. And for each history entry, it will provide the commits, the diff block which contains changes of users' interested lines. This utility will trace all the modification history of interested lines and stop until it finds the root of the lines, which is a point where all the new code is added from scratch. Also, the users can specify how deeply he wants this utility to trace. And this tool will treat code move just like modification too, so it will follow the code move inside one file. Note that, the history may not always be a single thread of commits. If there are more than one commit which produce the specified line range, the thread of history will split. And this utility will stop and provide all commits with its code changes to the user, let the user to select which one to trace next. =====Work and technical issues===== ==Command options== This new tool should be used for exploring the history of changes for certain line range of code in one file. git line-log [options] <file> <line range> Options: 1. Since it will output commit description, it will contain the option used to control whether we should show the whole commit message or just a short title. 2. Option whether we should display only the 'user interested lines' diff block [default] or display the whole diff with the interested area colorfully displayed. 3. The max depth we trace into the commit history. 4. The revision of the <file>. This is very useful when the current interested line range is produced by more than one commit. The user can use this option to specify the file revision and trace down from that revision and the line range. <line range> Its format should be <start pos>..<end pos> or just a <line number>. ==Design and implementation== Git store all the blobs instead of code delta, so we should traverse the commit history and directly access the tree/blob objects to compute the code delta and search for the diff which contains the interested lines. Since git use libxdiff to format its diff file, we should iterate through all xdiff's diff blocks and find what the code looks like before the commit. Here, we will find a new line range which is the origin code before this commit. And then start another search from the current commit and the new line range. Recursively, we can find all the modification history. We will stop when we find that the current interested line range is added from scratch and is not moved from other place of the file. We may also stop the traverse when we reach the max search depth. Also, if the thread of change history split into two or more commits, we stop and provide the users all the related commits and corresponding line range. For implementation related stuff, this tool heavily depends on libxdiff. Because we will search our interested lines through xdiff's output to find the right diff trunk to display and trace down. So, how we search the xdiff's diff blocks is very important. After reading some libxdiff document and code, I find that libxdiff output all the diff blocks as string into a memory file. If we parse the diff block string to find the changed lines, it is very inefficient. So, I suggest changing xdiff's xdl_diff function to let it store some meta data for each diff trunk. I think this will be very helpful for the performance of this tool. Generally, 1. xdiff/xdiffi.c will get changed to make xdl_diff store some desired meta data and pass it to caller. 2. builtin/line-log.c will be added to complete most of the new features, the most important function here may be cmd_linelog. 3. git.c will be changed to add this new utility to the front end. 4. Documents will be updated to introduce this new tool. =====About me===== I am Bo Yang, a Chinese graduate student majoring in Computer Science of NanKai University. I have touched some open source software since 5 years ago and began to contribute code to open source community from three years ago. I have contributed to Mozilla/Mingw/Netsurf. Technically, I am experienced in C/Bash Shell. I have attended last year's GSoC with Netsurf project. In that project, I have completed most of a DOM library in C. I begin to use git for source code revision from about two years ago. I use Git for track my Mozilla trunk source code. Because updating Mozilla code by CVS in my school is very slow. So, I write one script to automatically updating the trunk with CVS at mid-night, when the network flow is fast, on the server, and then use Git to maintain the code. Then I use Git in my PC to clone/update the source code from my local server and that is very fast. I use Git to track my changes to the code and some bug fixes. It is an excellent tool for branch/history, I think. Git is my lovely daily tool for revision control. I have much experience with it and have read "Git Internals" and also get some basic knowledge about Git's code base. And I think the line-level history explorer is really suitable for me and I can make a good start with this project in Git community. ----------------------------------------------- Any feedback from you will be appreciated very much, thanks a lot! Regards! Bo -- My blog: http://blog.morebits.org ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 9:18 GSoC draft proposal: Line-level history browser Bo Yang @ 2010-03-20 11:30 ` Johannes Schindelin 2010-03-20 13:10 ` Bo Yang 2010-03-20 20:35 ` Alex Riesen 1 sibling, 1 reply; 54+ messages in thread From: Johannes Schindelin @ 2010-03-20 11:30 UTC (permalink / raw) To: Bo Yang; +Cc: git Hi, On Sat, 20 Mar 2010, Bo Yang wrote: > I am very interested in the project 'Line-level history browser', after > some days consideration, now I made up a draft of my proposal, I think > it is helpful to send it to the list before submitting it. Could you > please give me some advise? I like it very much already! You obviously put in a substantial amount of time to learn intricate details about the way Git operates, and what is already available. And you also provided a patch (unrelated to line-level history browser), so you proved that you actually cloned Git, and that you can actually patch it and use Git itself to send a patch to this list. Very good. Just a few constructive criticisms (inlined): > This project will add a new utility for git called 'git line-log'. It > can trace the history of any line range of certain file at any revision. I think that that might be good for starters, but one could imagine that an integration into "git log" might be even better, so that gitk can use this without any further changes. > For simplity, users can run the command like: ' git line-log > builtin/diff.c 6..8 ', he will get the change history of code between > line 6 and line 8 of the diff.c file. It would be good if the code looked harder after failing with the simple strategy, such as looking for code removed in other files, fuzzy matching (optional), and looking for code duplication (i.e. literal copying, or slightly modified copying). The fuzzy matching might be necessary to catch things like a Java class moving from one file into another (and changing its name): the first line changes, but not completely. > After reading some libxdiff document and code, I find that libxdiff > output all the diff blocks as string into a memory file. Almost. Just have a look at the word-level diff (--color-words): http://repo.or.cz/w/git/dscho.git/blob/bc1ed6aafd9ee4937559535c66c8bddf1864bec6:/diff.c#l382 You will see that there is a function fn_out_diff_words_aux(), which is passed to xdi_diff_outf(). That latter function calls xdiff such that the former function receives a complete line at a time. And this is what I would suggest doing in the line-level log, too. Ciao, Dscho ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 11:30 ` Johannes Schindelin @ 2010-03-20 13:10 ` Bo Yang 2010-03-20 13:30 ` Junio C Hamano 2010-03-20 13:36 ` Johannes Schindelin 0 siblings, 2 replies; 54+ messages in thread From: Bo Yang @ 2010-03-20 13:10 UTC (permalink / raw) To: git Hi Johannes, Thank you very much for your advice! > > I like it very much already! You obviously put in a substantial amount of > time to learn intricate details about the way Git operates, and what is > already available. > > And you also provided a patch (unrelated to line-level history browser), > so you proved that you actually cloned Git, and that you can actually > patch it and use Git itself to send a patch to this list. I am very happy you like it. > > I think that that might be good for starters, but one could imagine that > an integration into "git log" might be even better, so that gitk can use > this without any further changes. So, I think add some new options to 'git log' is preferred. > > It would be good if the code looked harder after failing with the simple > strategy, such as looking for code removed in other files, fuzzy matching > (optional), and looking for code duplication (i.e. literal copying, or > slightly modified copying). > > The fuzzy matching might be necessary to catch things like a Java class > moving from one file into another (and changing its name): the first line > changes, but not completely. That's really a good idea. So, when the program reach the end of the history thread of some changes of line range, it should not stop immediately. It then should make a harder code search and try to find whether the new add lines of code is moved to there or just copied from other place to there. And these kind of search should use fuzzy matching instead of exact string matching. But notice that, detect code movement in one commit is much efficient than detecting code copy. So, I think we should add an option to control whether we detect such kind of code copy. By default, we detect code move but not code copy. How do you think about this? > Just have a look at the word-level diff (--color-words): > > http://repo.or.cz/w/git/dscho.git/blob/bc1ed6aafd9ee4937559535c66c8bddf1864bec6:/diff.c#l382 > > You will see that there is a function fn_out_diff_words_aux(), which is > passed to xdi_diff_outf(). That latter function calls xdiff such that the > former function receives a complete line at a time. And this is what I > would suggest doing in the line-level log, too. I have look over the function fn_out_diff_words_aux, this function parse each line of a memory diff. We can use it to detect the diff hunk head and find the line change. If you think the performance is acceptable, I think using this callback mechanism is all right. Regards! Bo -- My blog: http://blog.morebits.org ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 13:10 ` Bo Yang @ 2010-03-20 13:30 ` Junio C Hamano 2010-03-21 6:03 ` Bo Yang 2010-03-20 13:36 ` Johannes Schindelin 1 sibling, 1 reply; 54+ messages in thread From: Junio C Hamano @ 2010-03-20 13:30 UTC (permalink / raw) To: Bo Yang; +Cc: git Bo Yang <struggleyb.nku@gmail.com> writes: > But notice that, detect code movement in one commit is much efficient > than detecting code copy. So, I think we should add an option to > control whether we detect such kind of code copy. If you are hooking into "git log", it already has "-M / -C / -C -C" as a notion to express "different levels of digging" to find code movement and copies, and so does "git blame". You probably will save a lot of time if you studied the current blame implementation thouroughly before designing or coding. Two things that you need to think about carefully is why "blame" stops at the commits it shows, and if you could "peel" these lines in its output to peek what are behind the lines, what you would see. This is not a rocket science topic, but it is not entirely trivial. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 13:30 ` Junio C Hamano @ 2010-03-21 6:03 ` Bo Yang 0 siblings, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-21 6:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi Junio, Thank you very much for your advice. > > If you are hooking into "git log", it already has "-M / -C / -C -C" as a > notion to express "different levels of digging" to find code movement and > copies, and so does "git blame". You probably will save a lot of time if > you studied the current blame implementation thouroughly before designing > or coding. Yes, both blame and log has such '-M/-C/-C -C/' options. But the meaning are not very same: For 'git log': -M is used to detect file rename, -C is used to trace code copy. Both options accept no argument. For 'git blame': -M is used to trace code move, -C is used to trace code copy. And both options accept a <num> which specify the lower bound of the 'same code characters'. And, I think the line-level history tool act more like 'git blame'. So, the '-C' option for 'git log' is exactly what we need but '-M' is not. So, I think, maybe we should add another '-m' option to 'git log' for line-level code movement detect. I have make a rough look over blame.c, it is really very helpful and I find I can borrow some code from 'git blame' to make the line-level history browser. Thanks a lot! > > Two things that you need to think about carefully is why "blame" stops at > the commits it shows, and if you could "peel" these lines in its output to > peek what are behind the lines, what you would see. This is not a rocket > science topic, but it is not entirely trivial. I think blame's purpose is to find who is responsible for which line of code. So, it stop after it find the origin of the code. And line-level history browser will continue back into more history on what blame got, it will find what the line should be before this commit, and go backward the history based on the origin line to get a more old status and go on again. Simply, it is something like 'git blame' recursively. :) Thanks again for your advice, I get too much from your feedback, thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 13:10 ` Bo Yang 2010-03-20 13:30 ` Junio C Hamano @ 2010-03-20 13:36 ` Johannes Schindelin 2010-03-21 6:05 ` Bo Yang 1 sibling, 1 reply; 54+ messages in thread From: Johannes Schindelin @ 2010-03-20 13:36 UTC (permalink / raw) To: Bo Yang; +Cc: git Hi, [please do not cull the Cc: list] On Sat, 20 Mar 2010, Bo Yang wrote: > I (Johannes) wrote: > > > I think that that might be good for starters, but one could imagine > > that an integration into "git log" might be even better, so that gitk > > can use this without any further changes. > > So, I think add some new options to 'git log' is preferred. Yes, I think that this should be the target for the user interface. However, the logic should be different enough to merit a completely new file for the code (think "git add --interactive"). > > It would be good if the code looked harder after failing with the > > simple strategy, such as looking for code removed in other files, > > fuzzy matching (optional), and looking for code duplication (i.e. > > literal copying, or slightly modified copying). > > > > The fuzzy matching might be necessary to catch things like a Java > > class moving from one file into another (and changing its name): the > > first line changes, but not completely. > > That's really a good idea. > So, when the program reach the end of the history thread of some > changes of line range, it should not stop immediately. It then should > make a harder code search and try to find whether the new add lines of > code is moved to there or just copied from other place to there. And > these kind of search should use fuzzy matching instead of exact string > matching. > > But notice that, detect code movement in one commit is much efficient > than detecting code copy. So, I think we should add an option to > control whether we detect such kind of code copy. By default, we > detect code move but not code copy. How do you think about this? Yes, it is much more difficult, and it is more expensive. So: there are several steps in the project (you could also call them "milestones"), and fuzzy matching end lines would come later than simple code movement. And still later than code movement between files. > > Just have a look at the word-level diff (--color-words): > > > > http://repo.or.cz/w/git/dscho.git/blob/bc1ed6aafd9ee4937559535c66c8bddf1864bec6:/diff.c#l382 > > > > You will see that there is a function fn_out_diff_words_aux(), which > > is passed to xdi_diff_outf(). That latter function calls xdiff such > > that the former function receives a complete line at a time. And this > > is what I would suggest doing in the line-level log, too. > > I have look over the function fn_out_diff_words_aux, this function parse > each line of a memory diff. We can use it to detect the diff hunk head > and find the line change. If you think the performance is acceptable, I > think using this callback mechanism is all right. Yes, I think that the performance is alright there, it works well enough for --color-words. Thanks, Dscho ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 13:36 ` Johannes Schindelin @ 2010-03-21 6:05 ` Bo Yang 0 siblings, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-21 6:05 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git >> >> So, I think add some new options to 'git log' is preferred. > > Yes, I think that this should be the target for the user interface. > However, the logic should be different enough to merit a completely new > file for the code (think "git add --interactive"). So, a new file builtin/line-level.c will be added. > > Yes, it is much more difficult, and it is more expensive. So: there are > several steps in the project (you could also call them "milestones"), and > fuzzy matching end lines would come later than simple code movement. And > still later than code movement between files. Ok, I will add some milestones on my next version proposal, thanks. Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 9:18 GSoC draft proposal: Line-level history browser Bo Yang 2010-03-20 11:30 ` Johannes Schindelin @ 2010-03-20 20:35 ` Alex Riesen 2010-03-20 20:57 ` Junio C Hamano 2010-03-20 21:58 ` A Large Angry SCM 1 sibling, 2 replies; 54+ messages in thread From: Alex Riesen @ 2010-03-20 20:35 UTC (permalink / raw) To: Bo Yang; +Cc: git On Sat, Mar 20, 2010 at 10:18, Bo Yang <struggleyb.nku@gmail.com> wrote: > <line range> > Its format should be <start pos>..<end pos> or just a <line number>. You might want to reconsider the line range syntax. Exactly the same syntax is already used to specify a commit range, so reusing it may lead to confusion. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 20:35 ` Alex Riesen @ 2010-03-20 20:57 ` Junio C Hamano 2010-03-21 6:10 ` Bo Yang 2010-03-20 21:58 ` A Large Angry SCM 1 sibling, 1 reply; 54+ messages in thread From: Junio C Hamano @ 2010-03-20 20:57 UTC (permalink / raw) To: Bo Yang; +Cc: Alex Riesen, git Alex Riesen <raa.lkml@gmail.com> writes: > On Sat, Mar 20, 2010 at 10:18, Bo Yang <struggleyb.nku@gmail.com> wrote: >> <line range> >> Its format should be <start pos>..<end pos> or just a <line number>. > > You might want to reconsider the line range syntax. Exactly the same syntax > is already used to specify a commit range, so reusing it may lead to confusion. I would actually recommend you take a look at -L option from blame. What I use most often and find very handy myself is this pattern: blame -L '/^void some_function()/,/^}/' -- path as I do not have to count the line numbers. There also was a discussion on allowing more than one -L to blame, which I think is applicable to this feature. Check the list archive for the past few months. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 20:57 ` Junio C Hamano @ 2010-03-21 6:10 ` Bo Yang 0 siblings, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-21 6:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Alex Riesen, git > > I would actually recommend you take a look at -L option from blame. What > I use most often and find very handy myself is this pattern: > > blame -L '/^void some_function()/,/^}/' -- path > > as I do not have to count the line numbers. I have look at that options and I find it is very convenient and line-level browser will adopt that line syntax, too. > There also was a discussion on allowing more than one -L to blame, which I > think is applicable to this feature. Check the list archive for the past > few months. I think it is rationale for 'git blame' to allow more than one -L to let the users see more than one block of code. But for a tool which used to explore history, I think the user almost focus on one thread of history. If the history split on some point, we should ask user for choose one to go on. So, I think the line-level browser need not to support such a thing. :) Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 20:35 ` Alex Riesen 2010-03-20 20:57 ` Junio C Hamano @ 2010-03-20 21:58 ` A Large Angry SCM 2010-03-21 6:16 ` Bo Yang 1 sibling, 1 reply; 54+ messages in thread From: A Large Angry SCM @ 2010-03-20 21:58 UTC (permalink / raw) To: Alex Riesen; +Cc: Bo Yang, git Alex Riesen wrote: > On Sat, Mar 20, 2010 at 10:18, Bo Yang <struggleyb.nku@gmail.com> wrote: >> <line range> >> Its format should be <start pos>..<end pos> or just a <line number>. > > You might want to reconsider the line range syntax. Exactly the same syntax > is already used to specify a commit range, so reusing it may lead to confusion. I, actually, think the proposed line range syntax works because it uses the same _range_ notation. The issue is how to differentiate the _line_ range(s) from the _commit_ range(s); and, yes, I would like multiple ranges of each type as well as multiple files. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-20 21:58 ` A Large Angry SCM @ 2010-03-21 6:16 ` Bo Yang 2010-03-21 13:19 ` A Large Angry SCM 0 siblings, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-21 6:16 UTC (permalink / raw) To: gitzilla; +Cc: Alex Riesen, git On Sun, Mar 21, 2010 at 5:58 AM, A Large Angry SCM <gitzilla@gmail.com> wrote: > Alex Riesen wrote: >> >> On Sat, Mar 20, 2010 at 10:18, Bo Yang <struggleyb.nku@gmail.com> wrote: >>> >>> <line range> >>> Its format should be <start pos>..<end pos> or just a <line number>. >> >> You might want to reconsider the line range syntax. Exactly the same >> syntax >> is already used to specify a commit range, so reusing it may lead to >> confusion. > > I, actually, think the proposed line range syntax works because it uses the > same _range_ notation. The issue is how to differentiate the _line_ range(s) > from the _commit_ range(s); and, yes, I would like multiple ranges of each > type as well as multiple files. As what I said in previous post, I think we should adopt 'git blame' way. Use a '-L <start pos>,<end pos>' to specify the line range. It support both line number and posix regex. For multiple ranges stuff, I don't think it is very useful to support it for a history browser. Anyway, our users can only focus on one line of thread history. I am very willing to listen what is your use case for a multiple ranges? Thanks for your precious advice! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-21 6:16 ` Bo Yang @ 2010-03-21 13:19 ` A Large Angry SCM 2010-03-22 3:48 ` Bo Yang 2010-03-22 3:52 ` Bo Yang 0 siblings, 2 replies; 54+ messages in thread From: A Large Angry SCM @ 2010-03-21 13:19 UTC (permalink / raw) To: Bo Yang; +Cc: Alex Riesen, git Bo Yang wrote: [...] > For multiple ranges stuff, I don't think it is very useful to support > it for a history browser. Anyway, our users can only focus on one line > of thread history. I am very willing to listen what is your use case > for a multiple ranges? More than one line range can be related and of interest to a forensics/archeology task. In a simple multi range case, you'd have 2 line ranges in the same file that you want to see the history and graph of. Such as 2 related macro definitions in a header file. In a complex multi range case, you'd have many line ranges spread over multiple blobs and some of the blobs have disjoint commit graphs. The complex multi range case may be too much for a GSOC project, and the simple multi range case may be also. However, the command syntax should be general enough to handle them without being too ugly so that the implementation could be improved and expanded later. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-21 13:19 ` A Large Angry SCM @ 2010-03-22 3:48 ` Bo Yang 2010-03-22 4:24 ` Junio C Hamano 2010-03-22 3:52 ` Bo Yang 1 sibling, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-22 3:48 UTC (permalink / raw) To: gitzilla; +Cc: Alex Riesen, git On Sun, Mar 21, 2010 at 9:19 PM, A Large Angry SCM <gitzilla@gmail.com> wrote: > Bo Yang wrote: > [...] >> >> For multiple ranges stuff, I don't think it is very useful to support >> it for a history browser. Anyway, our users can only focus on one line >> of thread history. I am very willing to listen what is your use case >> for a multiple ranges? > > More than one line range can be related and of interest to a > forensics/archeology task. > > In a simple multi range case, you'd have 2 line ranges in the same file that > you want to see the history and graph of. Such as 2 related macro > definitions in a header file. > > In a complex multi range case, you'd have many line ranges spread over > multiple blobs and some of the blobs have disjoint commit graphs. > > The complex multi range case may be too much for a GSOC project, and the > simple multi range case may be also. However, the command syntax should be > general enough to handle them without being too ugly so that the > implementation could be improved and expanded later. Yeah, how do you think use the following syntax: <file1>@<rev1>:<start pos>,<end pos> <file2>@<rev2>:<start pos>,<end pos> Thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 3:48 ` Bo Yang @ 2010-03-22 4:24 ` Junio C Hamano 2010-03-22 4:34 ` Bo Yang 0 siblings, 1 reply; 54+ messages in thread From: Junio C Hamano @ 2010-03-22 4:24 UTC (permalink / raw) To: Bo Yang; +Cc: gitzilla, Alex Riesen, git Bo Yang <struggleyb.nku@gmail.com> writes: > Yeah, how do you think use the following syntax: > > <file1>@<rev1>:<start pos>,<end pos> <file2>@<rev2>:<start pos>,<end pos> Horrible. That is not how we name things. What's wrong with bog standard: $ git log -L 10,20 master -- Documentation/git.txt which is exactly how "blame" does it? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 4:24 ` Junio C Hamano @ 2010-03-22 4:34 ` Bo Yang 2010-03-22 5:32 ` Junio C Hamano 0 siblings, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-22 4:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: gitzilla, Alex Riesen, git > Horrible. That is not how we name things. > > What's wrong with bog standard: > > $ git log -L 10,20 master -- Documentation/git.txt > > which is exactly how "blame" does it? The 'blame' way is very good if we only support one line range. But if we want to support multiple line ranges, I don't think it is suitable for that case. Anyway, how can I specify multi-ranges which refers to multiple files at multiple revision and multiple line ranges using above syntax? Except that, I still can't convince myself that we need multiple ranges support. Anyway, how do we display such a result to our users? Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 4:34 ` Bo Yang @ 2010-03-22 5:32 ` Junio C Hamano 2010-03-22 7:31 ` Bo Yang 2010-03-22 10:39 ` Alex Riesen 0 siblings, 2 replies; 54+ messages in thread From: Junio C Hamano @ 2010-03-22 5:32 UTC (permalink / raw) To: Bo Yang; +Cc: gitzilla, Alex Riesen, git Bo Yang <struggleyb.nku@gmail.com> writes: > The 'blame' way is very good if we only support one line range. But if > we want to support multiple line ranges, I don't think it is suitable > for that case. Anyway, how can I specify multi-ranges which refers to > multiple files at multiple revision and multiple line ranges using > above syntax? I would sort of see you may want to be able to say "explain lines 10 thru 15 of config.h and lines 100-115 of hello.c that appear in v1.2.0", but I think it is a total nonsense to ask for "ll 10-15 of config.h in v1.2.0 and ll 110-115 of hello.c in v1.0.0". After all they never existed in the same revision (otherwise you would have said "ll 7-13 of config.h and ll 110-115 of hello.c that appear in v1.0.0"). So I would reject the SVN-like "rev@" in the first place. While I don't seriously buy "multiple files" either, if that is really needed, I could be pursuaded with "log -- path1:10-15 path2:1-7", or "log -L path1:10-15 -Lpath2:1-7 -- path1 path2" or something similarly ugly like these, but that is not how we generally name things, and it probably shouldn't be a new option to "log" anymore. On the other hand, multiple ranges in a single file is something that may be quite reasonable, e.g. $ git log -L10-15 -L200-210 -- Makefile $ git log -L'*/^#ifdef WINDOWS/,/^#endif \/\* WINDOWS \/\*/' -- config.h As I already said, I wouldn't be so worried about multiple-range feature, but I would be worried about the usefulness of this feature, even for the case to track a single range of a single file, starting from one given revision. When you want to know where the first few lines of Makefile came from, and if blame says the first line came from 2731d048, that really means that between the revision you started digging from and the found revision, there is no commit that touched that particular line, but equally importantly, that before that found revision, there wasn't a corresponding line in that file---blame stopped exactly because there is nobody before that found revision that the line can be blamed on. So implementing "git log -L1,10 -- Makefile" might be just the matter of doing something like: 1. Run "git blame -L1,10 -- Makefile"; 2. Note the commits that appear in the output; 3. Topologically sort these commits; 4. Run "git show <the result of that toposort>" which is not very satisfying. And "git log -L1 -- Makefile" naturally degenerates into: 1. Run "git blame -L1,1 -- Makefile"; 2. Note the commits that appear in the output; 3. Run "git show <that commit>" which is not just unsatisfying, but is almost boring. I dunno. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 5:32 ` Junio C Hamano @ 2010-03-22 7:31 ` Bo Yang 2010-03-22 7:41 ` Junio C Hamano 2010-03-22 10:39 ` Alex Riesen 1 sibling, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-22 7:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: gitzilla, Alex Riesen, git On Mon, Mar 22, 2010 at 1:32 PM, Junio C Hamano <gitster@pobox.com> wrote: > Bo Yang <struggleyb.nku@gmail.com> writes: > >> The 'blame' way is very good if we only support one line range. But if >> we want to support multiple line ranges, I don't think it is suitable >> for that case. Anyway, how can I specify multi-ranges which refers to >> multiple files at multiple revision and multiple line ranges using >> above syntax? > > I would sort of see you may want to be able to say "explain lines 10 thru > 15 of config.h and lines 100-115 of hello.c that appear in v1.2.0", but I > think it is a total nonsense to ask for "ll 10-15 of config.h in v1.2.0 > and ll 110-115 of hello.c in v1.0.0". After all they never existed in the > same revision (otherwise you would have said "ll 7-13 of config.h and ll > 110-115 of hello.c that appear in v1.0.0"). So I would reject the > SVN-like "rev@" in the first place. > > While I don't seriously buy "multiple files" either, if that is really > needed, I could be pursuaded with "log -- path1:10-15 path2:1-7", or > "log -L path1:10-15 -Lpath2:1-7 -- path1 path2" or something similarly > ugly like these, but that is not how we generally name things, and it > probably shouldn't be a new option to "log" anymore. > > On the other hand, multiple ranges in a single file is something that > may be quite reasonable, e.g. > > $ git log -L10-15 -L200-210 -- Makefile > $ git log -L'*/^#ifdef WINDOWS/,/^#endif \/\* WINDOWS \/\*/' -- config.h Yeah, maybe one file multiple ranges is most rationale. > As I already said, I wouldn't be so worried about multiple-range feature, > but I would be worried about the usefulness of this feature, even for the > case to track a single range of a single file, starting from one given > revision. I am sorry, but I did not catch up you here. You worried about the usefulness of the multi-range feature or the line level history browser? I think tracking a single range of a single file, starting from one given revision is useful when the line of history split on some point. This can let users focus on a single line of history using this feature. >When you want to know where the first few lines of Makefile > came from, and if blame says the first line came from 2731d048, that > really means that between the revision you started digging from and the > found revision, there is no commit that touched that particular line, but > equally importantly, that before that found revision, there wasn't a > corresponding line in that file---blame stopped exactly because there is > nobody before that found revision that the line can be blamed on. > > So implementing "git log -L1,10 -- Makefile" might be just the matter of > doing something like: > > 1. Run "git blame -L1,10 -- Makefile"; > 2. Note the commits that appear in the output; > 3. Topologically sort these commits; > 4. Run "git show <the result of that toposort>" > > which is not very satisfying. Yes, this is not satisfying. But as I understand, the line level history browser will do more than just this. It will not stop on 'step 4', it can follow the change history recursively and deeply, to find more. I think this is useful when we focus just one a range of code and want to know how it become into such a now condition. Anyway, it is not a bad thing too add a new convenient feature to a daily tool. :) Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 7:31 ` Bo Yang @ 2010-03-22 7:41 ` Junio C Hamano 2010-03-22 7:52 ` Bo Yang 2010-03-22 8:10 ` Jonathan Nieder 0 siblings, 2 replies; 54+ messages in thread From: Junio C Hamano @ 2010-03-22 7:41 UTC (permalink / raw) To: Bo Yang; +Cc: gitzilla, Alex Riesen, git Bo Yang <struggleyb.nku@gmail.com> writes: >> When you want to know where the first few lines of Makefile >> came from, and if blame says the first line came from 2731d048, that >> really means that between the revision you started digging from and the >> found revision, there is no commit that touched that particular line, but >> equally importantly, that before that found revision, there wasn't a >> corresponding line in that file---blame stopped exactly because there is >> nobody before that found revision that the line can be blamed on. > ... > Yes, this is not satisfying. But as I understand, the line level > history browser will do more than just this. It will not stop on 'step > 4', it can follow the change history recursively and deeply, to find > more. I am actually questioning the existence of "recursively and deeply to find more"; the reason blame stopped at a particular commit is exactly because there is no more---otherwise it wouldn't have stopped there but kept digging deeper. That is what I meant in the message you are responding to, quoted at the top of this message. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 7:41 ` Junio C Hamano @ 2010-03-22 7:52 ` Bo Yang 2010-03-22 8:10 ` Jonathan Nieder 1 sibling, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-22 7:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: gitzilla, Alex Riesen, git On Mon, Mar 22, 2010 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote: > Bo Yang <struggleyb.nku@gmail.com> writes: > >>> When you want to know where the first few lines of Makefile >>> came from, and if blame says the first line came from 2731d048, that >>> really means that between the revision you started digging from and the >>> found revision, there is no commit that touched that particular line, but >>> equally importantly, that before that found revision, there wasn't a >>> corresponding line in that file---blame stopped exactly because there is >>> nobody before that found revision that the line can be blamed on. >> ... >> Yes, this is not satisfying. But as I understand, the line level >> history browser will do more than just this. It will not stop on 'step >> 4', it can follow the change history recursively and deeply, to find >> more. > > I am actually questioning the existence of "recursively and deeply to find > more"; the reason blame stopped at a particular commit is exactly because > there is no more---otherwise it wouldn't have stopped there but kept > digging deeper. I think an example may explain me well. commit 1 of the file: line 1 rev 1 line 2 rev 1 commit 2 of the file: line 1 rev 2 line 2 rev 2 commit 3 of the file: line 1 rev 3 line 2 rev 3 If we run, git blame file, it will show two lines are blamed on commit 3. Line level utility will also show rev2 and rev1 to users as the format like what git log provide. I think git blame focus on who produce the current code range. And the line level browser will provide more than that, it also answer, how the lines evolved into current condition. I hope I explain everything clearly. :) Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 7:41 ` Junio C Hamano 2010-03-22 7:52 ` Bo Yang @ 2010-03-22 8:10 ` Jonathan Nieder 2010-03-23 6:01 ` Bo Yang 1 sibling, 1 reply; 54+ messages in thread From: Jonathan Nieder @ 2010-03-22 8:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Bo Yang, gitzilla, Alex Riesen, git Junio C Hamano wrote: > I am actually questioning the existence of "recursively and deeply to find > more"; the reason blame stopped at a particular commit is exactly because > there is no more Hmm, I can imagine some (mutually inconsistent) heuristics: - Suppose in the blamed commit a single isolated line changed. Then it is clear where to look next. - If the mystery code is at the beginning of the file (resp. beginning of a diff -C0 hunk), maybe it was based on the line at the same position within the previous commit. - Take the line with the lowest Levenshtein distance from the mystery code. - Expect certain common patterns of change: substituted words, whitespace changes, added arguments for a function, things like that. That said, I still don’t have a clear picture of a basic strategy. Interested, Jonathan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 8:10 ` Jonathan Nieder @ 2010-03-23 6:01 ` Bo Yang 2010-03-23 10:08 ` Jakub Narebski 2010-03-23 18:57 ` Jonathan Nieder 0 siblings, 2 replies; 54+ messages in thread From: Bo Yang @ 2010-03-23 6:01 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Junio C Hamano, gitzilla, Alex Riesen, git Hi, > Hmm, I can imagine some (mutually inconsistent) heuristics: > > - Suppose in the blamed commit a single isolated line changed. Then > it is clear where to look next. > > - If the mystery code is at the beginning of the file (resp. > beginning of a diff -C0 hunk), maybe it was based on the line at the > same position within the previous commit. > > - Take the line with the lowest Levenshtein distance from the mystery > code. > > - Expect certain common patterns of change: substituted words, > whitespace changes, added arguments for a function, things like that. > > That said, I still don’t have a clear picture of a basic strategy. I can't understand fully about your above strategy. I think we can category the code change into two cases: 1. The diff looks like: @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) add_signoff = xmemdupz(committer, endpos - committer + 1); } - for (i = 0; i < extra_hdr_nr; i++) { - strbuf_addstr(&buf, extra_hdr[i]); + for (i = 0; i < extra_hdr.nr; i++) { + strbuf_addstr(&buf, extra_hdr.items[i].string); strbuf_addch(&buf, '\n'); } ie: there is both deletion and addition in a change. And this means we modify some lines of the code. So, what we do will be tracing the two 'minus' lines and then find another diff. Start trace from that diff recursively. Yes, the new added code may also be moved or copied from other place. But, I think here, we should focus on the lines before this changeset. 2. The diff looks like: @@ -879,9 +885,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix) opt.regflags = REG_NEWLINE; opt.max_depth = -1; + strcpy(opt.color_context, ""); strcpy(opt.color_filename, ""); + strcpy(opt.color_function, ""); strcpy(opt.color_lineno, ""); strcpy(opt.color_match, GIT_COLOR_BOLD_RED); This means, the code here is added from scratch. Here, I think we have three options. 1. Find if the new code is moved here from other place. 2. Find if the new code is copied from other place. 3. We find the end of the history, so stop here. The problems remain how do we find the copied/moved code. The new added code may be copied/moved from multiple place with little changes. I hope I understand the requirement of the line-level browser, could you please point it out if I have made some mistake? Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 6:01 ` Bo Yang @ 2010-03-23 10:08 ` Jakub Narebski 2010-03-23 10:38 ` Bo Yang 2010-03-23 18:57 ` Jonathan Nieder 1 sibling, 1 reply; 54+ messages in thread From: Jakub Narebski @ 2010-03-23 10:08 UTC (permalink / raw) To: Bo Yang; +Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=utf-8, Size: 3719 bytes --] Bo Yang <struggleyb.nku@gmail.com> writes: > Jonathan Nieder <jrnieder@gmail.com> writes: > > Hmm, I can imagine some (mutually inconsistent) heuristics: > > > > - Suppose in the blamed commit a single isolated line changed. Then > > it is clear where to look next. > > > > - If the mystery code is at the beginning of the file (resp. > > beginning of a diff -C0 hunk), maybe it was based on the line at the > > same position within the previous commit. > > > > - Take the line with the lowest Levenshtein distance from the mystery > > code. > > > > - Expect certain common patterns of change: substituted words, > > whitespace changes, added arguments for a function, things like that. > > > > That said, I still donÂt have a clear picture of a basic strategy. > > I can't understand fully about your above strategy. I think we can > category the code change into two cases: > > 1. The diff looks like this: > > @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char > **argv, const char *prefix) > add_signoff = xmemdupz(committer, endpos - committer + 1); > } > > - for (i = 0; i < extra_hdr_nr; i++) { > - strbuf_addstr(&buf, extra_hdr[i]); > + for (i = 0; i < extra_hdr.nr; i++) { > + strbuf_addstr(&buf, extra_hdr.items[i].string); > strbuf_addch(&buf, '\n'); > } Errr... how the first line in preimage differs from first line in postimage? The look as if they are the same: - for (i = 0; i < extra_hdr_nr; i++) { + for (i = 0; i < extra_hdr.nr; i++) { > > i.e. there is both deletion and addition in a change. And this means we > modify some lines of the code. So, what we do will be tracing the two > 'minus' lines and then find another diff. Start trace from that diff > recursively. > > Yes, the new added code may also be moved or copied from other place. > But, I think here, we should focus on the lines before this changeset. The problem is when you are asking about tracking a subset of lines that appear in postimage of a patch. For example if we ask for history of strbuf_addstr(&buf, extra_hdr.items[i].string); line, should we track history of for (i = 0; i < extra_hdr.nr; i++) { line which appears in relevant diff chunk? If not, how we should detect which line in preimage (if any) corresponds to given line in postimage? > 2. The diff looks like: > > @@ -879,9 +885,12 @@ int cmd_grep(int argc, const char **argv, const > char *prefix) > opt.regflags = REG_NEWLINE; > opt.max_depth = -1; > > + strcpy(opt.color_context, ""); > strcpy(opt.color_filename, ""); > + strcpy(opt.color_function, ""); > strcpy(opt.color_lineno, ""); > strcpy(opt.color_match, GIT_COLOR_BOLD_RED); > > This means, the code here is added from scratch. Here, I think we have > three options. > 1. Find if the new code is moved here from other place. > 2. Find if the new code is copied from other place. > 3. We find the end of the history, so stop here. > > The problems remain how do we find the copied/moved code. The new > added code may be copied/moved from multiple place with little > changes. I guess that you could take a look at how git-blame does handle this... but I think you would get something like generalization of ordinary patch, where preimage of chunk can come from different place / different file. P.S. I like it that you provide real-life examples. They really help with understanding what are you talking about. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 10:08 ` Jakub Narebski @ 2010-03-23 10:38 ` Bo Yang 2010-03-23 11:22 ` Jakub Narebski 2010-03-23 12:02 ` Peter Kjellerstedt 0 siblings, 2 replies; 54+ messages in thread From: Bo Yang @ 2010-03-23 10:38 UTC (permalink / raw) To: Jakub Narebski Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git Hi, > > Errr... how the first line in preimage differs from first line in > postimage? The look as if they are the same: > > - for (i = 0; i < extra_hdr_nr; i++) { > + for (i = 0; i < extra_hdr.nr; i++) { > Maybe some space... :) > > The problem is when you are asking about tracking a subset of lines > that appear in postimage of a patch. For example if we ask for > history of > > strbuf_addstr(&buf, extra_hdr.items[i].string); > > line, should we track history of > > for (i = 0; i < extra_hdr.nr; i++) { > > line which appears in relevant diff chunk? If not, how we should > detect which line in preimage (if any) corresponds to given line in > postimage? If I understand correctly, that is as following. @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) add_signoff = xmemdupz(committer, endpos - committer + 1); } - for (i = 0; i < extra_hdr_nr; i++) { - strbuf_addstr(&buf, extra_hdr[i]); + for (i = 0; i < extra_hdr.nr; i++) { + strbuf_addstr(&buf, extra_hdr.items[i].string); strbuf_addch(&buf, '\n'); } Here, the user only ask for tracking the strbuf_addstr line. And we find the above diff hunk. I think we can then find what the line would be in the preimage using @@ -1008,29 +1000,29 @@. The strbuf_addstr is located at 1000(the postimage start line number) +3(the context number) +1(the number of lines '+' before this line) in the postimage, and we can calculate its line number in the preimage by the same way 1008 +3 +1(the number of lines with '-' before this line). How do you think about this method? Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 10:38 ` Bo Yang @ 2010-03-23 11:22 ` Jakub Narebski 2010-03-23 12:23 ` Bo Yang 2010-03-23 12:02 ` Peter Kjellerstedt 1 sibling, 1 reply; 54+ messages in thread From: Jakub Narebski @ 2010-03-23 11:22 UTC (permalink / raw) To: Bo Yang; +Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git On Tue, 23 Mar 2010, Bo Yang wrote: Please do not forget to include attribution line, like the one I have added below: > Jakub Narebski wrote: > > The problem is when you are asking about tracking a subset of lines > > that appear in postimage of a patch. For example if we ask for > > history of > > > > strbuf_addstr(&buf, extra_hdr.items[i].string); > > > > line, should we track history of > > > > for (i = 0; i < extra_hdr.nr; i++) { > > > > line which appears in relevant diff chunk? If not, how we should > > detect which line in preimage (if any) corresponds to given line in > > postimage? > > If I understand correctly, that is as following. > > @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char > **argv, const char *prefix) > add_signoff = xmemdupz(committer, endpos - committer + 1); > } > > - for (i = 0; i < extra_hdr_nr; i++) { > - strbuf_addstr(&buf, extra_hdr[i]); > + for (i = 0; i < extra_hdr.nr; i++) { > + strbuf_addstr(&buf, extra_hdr.items[i].string); > strbuf_addch(&buf, '\n'); > } > > Here, the user only ask for tracking the strbuf_addstr line. And we > find the above diff hunk. I think we can then find what the line would > be in the preimage using @@ -1008,29 +1000,29 @@. The strbuf_addstr > is located at > 1000(the postimage start line number) > +3(the context number) > +1(the number of lines '+' before this line) in the postimage, > and we can calculate its line number in the preimage by the same way > 1008 > +3 > +1(the number of lines with '-' before this line). > > How do you think about this method? This would work with the simplest case, but not in more complicated cases, like for example preimage and postimage with different size. Take for example the following chunk (fragment): diff --git a/run-command.c b/run-command.c index 2feb493..3206d61 100644 --- a/run-command.c +++ b/run-command.c @@ -67,19 +67,21 @@ static int child_notifier = -1; static void notify_parent(void) { - write(child_notifier, "", 1); + ssize_t unused; + unused = write(child_notifier, "", 1); } static NORETURN void die_child(const char *err, va_list params) If you follow ssize_t line, it is created. If you follow line with write, which is 2nd line in postimage, its previous version is 1st line in preimage. Another example would be reordering of lines, or reordering with some change. -- Jakub Narebski Poland ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 11:22 ` Jakub Narebski @ 2010-03-23 12:23 ` Bo Yang 2010-03-23 13:49 ` Jakub Narebski 0 siblings, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-23 12:23 UTC (permalink / raw) To: Jakub Narebski Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git Hi, On Tue, Mar 23, 2010 at 7:22 PM, Jakub Narebski <jnareb@gmail.com> wrote: > > This would work with the simplest case, but not in more complicated > cases, like for example preimage and postimage with different size. > > Take for example the following chunk (fragment): > > diff --git a/run-command.c b/run-command.c > index 2feb493..3206d61 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -67,19 +67,21 @@ static int child_notifier = -1; > > static void notify_parent(void) > { > - write(child_notifier, "", 1); > + ssize_t unused; > + unused = write(child_notifier, "", 1); > } > > static NORETURN void die_child(const char *err, va_list params) > > If you follow ssize_t line, it is created. If you follow line with > write, which is 2nd line in postimage, its previous version is 1st > line in preimage. > > > Another example would be reordering of lines, or reordering with > some change. Ah, yes, you are right. And now, I really get the difference between the understanding about line level browser of us. :) When users want to browsing the history of some line or line range, you want to display only the related lines to them, but I want to display the minim diff hunk to them. :) And I think displaying the minimum diff hunk is sensible and feasible. Could you please tell me how do you think about this? Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 12:23 ` Bo Yang @ 2010-03-23 13:49 ` Jakub Narebski 2010-03-23 15:23 ` Bo Yang 0 siblings, 1 reply; 54+ messages in thread From: Jakub Narebski @ 2010-03-23 13:49 UTC (permalink / raw) To: Bo Yang; +Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git On Tue, Mar 23, 2010, Bo Yang wrote: > On Tue, Mar 23, 2010 at 7:22 PM, Jakub Narebski <jnareb@gmail.com> wrote: > > > > This would work with the simplest case, but not in more complicated > > cases, like for example preimage and postimage with different size. > > > > Take for example the following chunk (fragment): > > > > diff --git a/run-command.c b/run-command.c > > index 2feb493..3206d61 100644 > > --- a/run-command.c > > +++ b/run-command.c > > @@ -67,19 +67,21 @@ static int child_notifier = -1; > > > > static void notify_parent(void) > > { > > - write(child_notifier, "", 1); > > + ssize_t unused; > > + unused = write(child_notifier, "", 1); > > } > > > > static NORETURN void die_child(const char *err, va_list params) > > > > If you follow ssize_t line, it is created. If you follow line with > > write, which is 2nd line in postimage, its previous version is 1st > > line in preimage. > > > > > > Another example would be reordering of lines, or reordering with > > some change. > > Ah, yes, you are right. > > And now, I really get the difference between the understanding about > line level browser of us. :) When users want to browsing the history > of some line or line range, you want to display only the related lines > to them, but I want to display the minim diff hunk to them. :) > And I think displaying the minimum diff hunk is sensible and feasible. > Could you please tell me what do you think about this? The problem is not what (part of) diff you would display. The problem is with following the history (with history simplification). *After* displaying diff / chunk / chunk fragment, do we further follow history of the whole preimage? Or do we follow history of line pre-change starting from blamed commit? If we *don't* follow the history, how line-level browser is different from (wrapped) git-blame? Try to come with the result of line-level history for some line in git sources "by hand": this would help in discussion about what line-level history browser should do, and perhaps even be first test of it (see e.g. tests for git-blame). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 13:49 ` Jakub Narebski @ 2010-03-23 15:23 ` Bo Yang 2010-03-23 19:57 ` Jonathan Nieder 0 siblings, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-23 15:23 UTC (permalink / raw) To: Jakub Narebski Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git Hi, On Tue, Mar 23, 2010 at 9:49 PM, Jakub Narebski <jnareb@gmail.com> wrote: > Try to come with the result of line-level history for some line in > git sources "by hand": this would help in discussion about what > line-level history browser should do, and perhaps even be first test > of it (see e.g. tests for git-blame). Thanks for your advice of coming with a real example, Jakub! And I can give a not too trivial one, :) If you look at the pretty.c line 1032 line, you will find a line like: format_commit_message(commit, user_format, sb, context); Take for example, we will trace the history of this line. We will find that the first time this line appears: @@ -900,18 +900,18 @@ char *reencode_commit_message(const struct commit *commit, const char **encoding ...skipped... if (fmt == CMIT_FMT_USERFORMAT) { - format_commit_message(commit, user_format, sb, dmode); + format_commit_message(commit, user_format, sb, context); return; } And we should trace the preimage, something like: if (fmt == CMIT_FMT_USERFORMAT) { format_commit_message(commit, user_format, sb, dmode); We will find these below: @@ -770,7 +775,7 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct com const char *encoding; if (fmt == CMIT_FMT_USERFORMAT) { - format_commit_message(commit, user_format, sb); + format_commit_message(commit, user_format, sb, dmode); return; } Again: + + if (fmt == CMIT_FMT_USERFORMAT) { + format_commit_message(commit, user_format, sb); + return; + } + Here, we find that the line is added from scratch and line level history browser will do a code movement and copy matching try to find whether this line if moved from other files. And it is. In commit 93fc05eb9(Split off the pretty print stuff into its own file), some code is moved from commit.c to pretty.c and this line if from commit.c . Ok, now, we will trace into commit.c for this line. Again: char *reencoded; const char *encoding; - char *buf; - if (fmt == CMIT_FMT_USERFORMAT) - return format_commit_message(commit, user_format, buf_p, space_p); + if (fmt == CMIT_FMT_USERFORMAT) { + format_commit_message(commit, user_format, sb); + return; + } encoding = (git_log_output_encoding ? git_log_output_encoding Now, we will trace the commit which produce the above preimage of the diff hunk. And because there are four lines of the preimage in our tracing window. We should follow any commit which intersect with these four lines. Fortunately, there is only one commit. @@ -1165,7 +1166,7 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt, char *buf; if (fmt == CMIT_FMT_USERFORMAT) - return format_commit_message(commit, msg, buf_p, space_p); + return format_commit_message(commit, user_format, buf_p, space_p); encoding = (git_log_output_encoding ? git_log_output_encoding Again, we find: if (fmt == CMIT_FMT_USERFORMAT) - return format_commit_message(commit, msg, buf, space); + return format_commit_message(commit, msg, buf_p, space_p); encoding = (git_log_output_encoding Again: char *encoding; + if (fmt == CMIT_FMT_USERFORMAT) + return format_commit_message(commit, msg, buf, space); + encoding = (git_log_output_encoding ? git_log_output_encoding And here, finally, we reach a place where the code is added from scratch and not copied/moved from other place. Line level history browser will just display all the related diff to users and trace the code modification/move/copy. It traces the preimage of the minimum related diff hunk carefully, if there is any case that there are more than one commit intersect with the preimage, we will stop and ask the users to select which way to go on tracing. I hope this can help us to discuss the problem, thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 15:23 ` Bo Yang @ 2010-03-23 19:57 ` Jonathan Nieder 2010-03-23 21:51 ` A Large Angry SCM 2010-03-24 2:30 ` Bo Yang 0 siblings, 2 replies; 54+ messages in thread From: Jonathan Nieder @ 2010-03-23 19:57 UTC (permalink / raw) To: Bo Yang; +Cc: Jakub Narebski, Junio C Hamano, gitzilla, Alex Riesen, git Bo Yang wrote: > It traces the preimage of the minimum related diff hunk carefully, if > there is any case that there are more than one commit intersect with > the preimage, we will stop and ask the users to select which way to go > on tracing. That might be necessary, but I will admit that I suspect it to be harder to make useful. One of the very nice things about ‘git log’ is that it is easy to browse through history in a nonlinear way in a pager (by using a pager’s search functionality). The “backend” ‘git rev-list’ is easy to write scripts with, also because of its simple input and output. If your program requires input from the user, how will it paginate its output? Most pagers expect the standard input to be available for input from the user. One approach (I will not say it is a good one) to the problem of ambiguous origins for a line is to blame _both_ parents. That is, start following both lines of history in your revision walking. Perhaps higher-level tools like ‘git log --graph’ and gitk could visually represent the branched history you are showing. Another approach is to just choose one parent automatically: for example, prefer the first parent, or assign some score representing the relatedness of each parent and choose the most related one. Jonathan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 19:57 ` Jonathan Nieder @ 2010-03-23 21:51 ` A Large Angry SCM 2010-03-24 2:30 ` Bo Yang 1 sibling, 0 replies; 54+ messages in thread From: A Large Angry SCM @ 2010-03-23 21:51 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Bo Yang, Jakub Narebski, Junio C Hamano, Alex Riesen, git Jonathan Nieder wrote: > Bo Yang wrote: > >> It traces the preimage of the minimum related diff hunk carefully, if >> there is any case that there are more than one commit intersect with >> the preimage, we will stop and ask the users to select which way to go >> on tracing. > > That might be necessary, but I will admit that I suspect it to be > harder to make useful. One of the very nice things about ‘git log’ is > that it is easy to browse through history in a nonlinear way in a > pager (by using a pager’s search functionality). The “backend” ‘git > rev-list’ is easy to write scripts with, also because of its simple > input and output. > > If your program requires input from the user, how will it paginate its > output? Most pagers expect the standard input to be available for > input from the user. > > One approach (I will not say it is a good one) to the problem of > ambiguous origins for a line is to blame _both_ parents. That is, > start following both lines of history in your revision walking. > Perhaps higher-level tools like ‘git log --graph’ and gitk could > visually represent the branched history you are showing. > > Another approach is to just choose one parent automatically: for > example, prefer the first parent, or assign some score representing > the relatedness of each parent and choose the most related one. What I would like to see (and may be too much for a GSOC project) is the result to be a simplified commit graph with additional annotations of the line range mappings that could be fed into something like a modified gitk to view the _history_ of the lines of interest. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 19:57 ` Jonathan Nieder 2010-03-23 21:51 ` A Large Angry SCM @ 2010-03-24 2:30 ` Bo Yang 1 sibling, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-24 2:30 UTC (permalink / raw) To: Jonathan Nieder Cc: Jakub Narebski, Junio C Hamano, gitzilla, Alex Riesen, git Hi, On Wed, Mar 24, 2010 at 3:57 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Bo Yang wrote: > >> It traces the preimage of the minimum related diff hunk carefully, if >> there is any case that there are more than one commit intersect with >> the preimage, we will stop and ask the users to select which way to go >> on tracing. > > That might be necessary, but I will admit that I suspect it to be > harder to make useful. One of the very nice things about ‘git log’ is > that it is easy to browse through history in a nonlinear way in a > pager (by using a pager’s search functionality). The “backend” ‘git > rev-list’ is easy to write scripts with, also because of its simple > input and output. > > If your program requires input from the user, how will it paginate its > output? Most pagers expect the standard input to be available for > input from the user. > > One approach (I will not say it is a good one) to the problem of > ambiguous origins for a line is to blame _both_ parents. That is, > start following both lines of history in your revision walking. > Perhaps higher-level tools like ‘git log --graph’ and gitk could > visually represent the branched history you are showing. > > Another approach is to just choose one parent automatically: for > example, prefer the first parent, or assign some score representing > the relatedness of each parent and choose the most related one. Both the approach is very precious for me. I think maybe I will propose the first one in my real proposal to Git, thanks a lot! You really help my too much! Thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: GSoC draft proposal: Line-level history browser 2010-03-23 10:38 ` Bo Yang 2010-03-23 11:22 ` Jakub Narebski @ 2010-03-23 12:02 ` Peter Kjellerstedt 1 sibling, 0 replies; 54+ messages in thread From: Peter Kjellerstedt @ 2010-03-23 12:02 UTC (permalink / raw) To: Bo Yang, Jakub Narebski Cc: Jonathan Nieder, Junio C Hamano, gitzilla, Alex Riesen, git > -----Original Message----- > From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On > Behalf Of Bo Yang > Sent: den 23 mars 2010 11:39 > To: Jakub Narebski > Cc: Jonathan Nieder; Junio C Hamano; gitzilla@gmail.com; Alex Riesen; > git@vger.kernel.org > Subject: Re: GSoC draft proposal: Line-level history browser > > Hi, > > > > > Errr... how the first line in preimage differs from first line in > > postimage? The look as if they are the same: > > > > - for (i = 0; i < extra_hdr_nr; i++) { > > + for (i = 0; i < extra_hdr.nr; i++) { > > > > Maybe some space... :) Look more closely. Hint: a _ is not the same as a . ;) //Peter ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 6:01 ` Bo Yang 2010-03-23 10:08 ` Jakub Narebski @ 2010-03-23 18:57 ` Jonathan Nieder 2010-03-24 2:39 ` Bo Yang 1 sibling, 1 reply; 54+ messages in thread From: Jonathan Nieder @ 2010-03-23 18:57 UTC (permalink / raw) To: Bo Yang; +Cc: Junio C Hamano, gitzilla, Alex Riesen, git Hi, [reordering quoted text for convenience] Bo Yang wrote: > I can't understand fully about your above strategy. I think we can > category the code change into two cases: Thanks! What you said is much more coherent than the vague things I wrote. > 2. The diff looks like: [...] > This means, the code here is added from scratch. Here, I think we have > three options. > 1. Find if the new code is moved here from other place. > 2. Find if the new code is copied from other place. > 3. We find the end of the history, so stop here. If the code is copied verbatim from elsewhere, this is something ‘git blame’ is already very good at. See [1]. Fuzzy matching is a big pain. ‘git blame’ knows how to ignore whitespace. Dscho suggested counting common words. Maybe there are some other ways. I think there is a real danger of getting lost in this problem and wasting a lot of time, so although it is very interesting, I would consider any progress in this area a bonus rather than a goal. > 1. The diff looks like: > > @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char > **argv, const char *prefix) > add_signoff = xmemdupz(committer, endpos - committer + 1); > } > > - for (i = 0; i < extra_hdr_nr; i++) { > - strbuf_addstr(&buf, extra_hdr[i]); > + for (i = 0; i < extra_hdr.nr; i++) { > + strbuf_addstr(&buf, extra_hdr.items[i].string); > strbuf_addch(&buf, '\n'); > } > > > ie: there is both deletion and addition in a change. And this means we > modify some lines of the code. So, what we do will be tracing the two > 'minus' lines and then find another diff. Start trace from that diff > recursively. If you can make a heuristic along these lines this work well, I think it would be great. I imagine it might work very well for commits that made nice, small changes (like many of those in git.git). Jakub pointed out some of the difficulties, and I like to hope your idea of “when in doubt, include more lines” may work well in many cases in git.git still. Good luck, and thank you for taking my crazy ideas seriously. :) Regards, Jonathan [1] See v1.4.4-rc1~2 (Merge branch 'jc/pickaxe', 2006-11-07) and the commits preceding it. About that series, Junio wrote: Actually the plan is to make it do _true_ pickaxe, although it will most likely end up either in dustbin or replace blame. It replaced blame. I am not actually sure, but I assume “true pickaxe” refers to the goals described in <http://gitster.livejournal.com/35628.html> and the linked-to message. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-23 18:57 ` Jonathan Nieder @ 2010-03-24 2:39 ` Bo Yang 2010-03-24 4:02 ` Jonathan Nieder 0 siblings, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-24 2:39 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Junio C Hamano, gitzilla, Alex Riesen, git HI, On Wed, Mar 24, 2010 at 2:57 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: > > If you can make a heuristic along these lines this work well, I think it > would be great. I imagine it might work very well for commits that made > nice, small changes (like many of those in git.git). Jakub pointed out > some of the difficulties, and I like to hope your idea of “when in doubt, > include more lines” may work well in many cases in git.git still. > > Good luck, and thank you for taking my crazy ideas seriously. :) > > Regards, > Jonathan > > [1] See v1.4.4-rc1~2 (Merge branch 'jc/pickaxe', 2006-11-07) and the > commits preceding it. About that series, Junio wrote: > > Actually the plan is to make it do _true_ pickaxe, > although it will most likely end up either in dustbin or > replace blame. > > It replaced blame. > > I am not actually sure, but I assume “true pickaxe” refers to the > goals described in <http://gitster.livejournal.com/35628.html> > and the linked-to message. I have looked over the article and the message from Linus, it really help me very much. The message and article pointed out most of the things a line level tool should do, and I am happy to find that it is similar with my proposal. :) Thanks again for your precious advice and I think I can come up a better proposal, now. Thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-24 2:39 ` Bo Yang @ 2010-03-24 4:02 ` Jonathan Nieder 0 siblings, 0 replies; 54+ messages in thread From: Jonathan Nieder @ 2010-03-24 4:02 UTC (permalink / raw) To: Bo Yang; +Cc: Junio C Hamano, gitzilla, Alex Riesen, git, Linus Torvalds Bo Yang wrote: > On Wed, Mar 24, 2010 at 2:57 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: >> I am not actually sure, but I assume “true pickaxe” refers to the >> goals described in <http://gitster.livejournal.com/35628.html> >> and the linked-to message. > > I have looked over the article and the message from Linus, it really > help me very much. Okay, so now I looked over that thread again. I found this [1]: <http://minnie.tuhs.org/Programs/Ctcompare/index.html> It’s for fuzzy matching of a certain kind. The latest version is under the GPLv3, unfortunately for us. I would still like to reiterate my warning to not get sidetracked on this, but maybe it would be pleasant reading. Enjoy, Jonathan [1] Thanks, Linus. http://thread.gmane.org/gmane.comp.version-control.git/27/focus=225 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 5:32 ` Junio C Hamano 2010-03-22 7:31 ` Bo Yang @ 2010-03-22 10:39 ` Alex Riesen 2010-03-22 15:05 ` Johannes Schindelin 1 sibling, 1 reply; 54+ messages in thread From: Alex Riesen @ 2010-03-22 10:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Bo Yang, gitzilla, git On Mon, Mar 22, 2010 at 06:32, Junio C Hamano <gitster@pobox.com> wrote: > While I don't seriously buy "multiple files" either, if that is really yeah, _really_ > needed, I could be pursuaded with "log -- path1:10-15 path2:1-7", or > "log -L path1:10-15 -Lpath2:1-7 -- path1 path2" or something similarly > ugly like these, but that is not how we generally name things, and it > probably shouldn't be a new option to "log" anymore. But then, how about putting the "path" last in the argument, so that the unambiguosly defined part of the format comes first? Less need for quoting of ":" (or "@") in pathnames. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 10:39 ` Alex Riesen @ 2010-03-22 15:05 ` Johannes Schindelin 0 siblings, 0 replies; 54+ messages in thread From: Johannes Schindelin @ 2010-03-22 15:05 UTC (permalink / raw) To: Alex Riesen; +Cc: Junio C Hamano, Bo Yang, gitzilla, git Hi, On Mon, 22 Mar 2010, Alex Riesen wrote: > On Mon, Mar 22, 2010 at 06:32, Junio C Hamano <gitster@pobox.com> wrote: > > While I don't seriously buy "multiple files" either, if that is really > > yeah, _really_ Yes. Besides, it is an easy fall-out of the common "a Java class was split into two" case, where you follow line ranges in different files (at least at some stage) _anyway_. Ciao, Dscho ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-21 13:19 ` A Large Angry SCM 2010-03-22 3:48 ` Bo Yang @ 2010-03-22 3:52 ` Bo Yang 2010-03-22 15:48 ` Jakub Narebski ` (2 more replies) 1 sibling, 3 replies; 54+ messages in thread From: Bo Yang @ 2010-03-22 3:52 UTC (permalink / raw) To: Johannes.Schindelin, gitster, gitzilla, Alex Riesen; +Cc: git Hi all, Thanks a lot for your precious advice and based on that, I have prepared a new version of my proposal, generally it provide a detailed options which I want to add to 'git log' and a new syntax for supporting multi line ranges in any file at any revision. Also, this version provide a milestones and timeline for this project. Thanks again for your advice and I appreciate your feedback very much for this version. ----------------------------------------------------------------------- Draft proposal(v2): Line-level History Browser =====Purpose of this project===== "git blame" can tell us who is responsible for a line of code, but it can't help if we want to get the detail of how the lines of code have evolved as what it is now. This project will add a new feature for 'git log' to display line level history. It can trace the history of any line range of certain file at any revision. For simplity, users can run the command like: ' git log -L builtin/diff.c:6,8 ', he will get the change history of code between line 6 and line 8 of the diff.c file. And for each history entry, it will provide the commits, the diff block which contains changes of users' interested lines. This utility will trace all the modification history of interested lines and stop until it finds the root of the lines, which is a point where all the new code is added from scratch. Also, the users can specify how deeply he wants this utility to trace. And this tool will treat code move just like modification too, so it will follow the code move inside one commit. Note that, the history may not always be a single thread of commits. If there are more than one commit which produce the specified line range, the thread of history will split. And this utility will stop and provide all commits with its code changes to the user, let the user to select which one to trace next. =====Work and technical issues===== ==Command options== This new feature should be used for exploring the history of changes for certain line range of code in one file. git log [-m<num>] [-I] [-d depth] [--fuzzy] -L file1@rev1:<start pos>,<end pos> file2@rev2:<start pos>,<end pos> Options: 1. -m<num>, option to control whether we should follow code movement. If one -m is given, we follow code movement inside file, when more than one '-m' is given, we follow the movement between files in one commit. The <num> is used to specify the lower bound for the number of lines of moved code. If it is not given, we set it as 1. 2. -I, option to control whether we should display only the 'user interested lines' diff block [default] or display the whole diff with the interested area colorfully displayed. 3. -d, option to control the max depth we trace into the commit history. 4. --fuzzy, option to control whether fuzzy code copy mathing is used. 5. '-L' to control whether we run a simple log or we want a line level log. 6. Files and lines. I propose to use such a syntax to specify the files at revision and line range, <file>@<revision>:<start pos>,<end pos>. This looks a little complex, but I think it is neccessary because we will support multiple file at any version and any line range finally. The revision can be any revision format of Git and the <pos> can be a number, or a posix regex, just like what 'git blame' do. 7. And we will support code copy detect, too. The option which control whether we trace code copy does exist in current 'git log', which is the option '-C'. Similiarly, one '-C' is used to trace code copy of new added code inside one commit. Two '-C' will trace any code copy inside commit tree. ==Design and implementation== Git store all the blobs instead of code delta, so we should traverse the commit history and directly access the tree/blob objects to compute the code delta and search for the diff which contains the interested lines. Since git use libxdiff to format its diff file, we should iterate through all xdiff's diff blocks and find what the code looks like before the commit. This will be done using the callback mechanism. Here, we will find a new line range which is the origin code before this commit. And then start another search from the current commit and the new line range. Recursively, we can find all the modification history. We will stop when we find that the current interested line range is added from scratch and is not moved from other place of the file. Here, if the user want to trace code copy, more work will be done to find the possible code copy. We may also stop the traverse when we reach the max search depth. Also, if the thread of change history split into two or more commits, we stop and provide the users all the related commits and corresponding line range. Generally, 1. New callback for xdi_diff to parse the diff hunk and store line level history info. 2. builtin/line-log.c will be added to complete most of the new features. 3. builtin/log.c will be changed to add this new utility to the front end. 4. Documents will be updated to introduce this new tool. =====Milestones and Timeline===== In this summer, we will add support of line level history browser for only one file. The multiple ranges support is currently not in this project. The milestones of the project are: 1. Simple modification change history. 2. Code movement inside one file detect. 3. Code movement inside one commit but not a file. 4. Code copy of modified file in one commit. 5. Code copy of any place in one commit tree. 6. Fuzzy matching support. And the timeline will be: April 26 - May 23: Catch up with Git code base and study the implementation of blame.c and log.c thouroughly. May 24 - June 21 : Complete a version which supports code modifcation trace but without code movement and code copy support. June 22 - June 29: Complete a version which supports code movement inside one file. June 30 - July 7: Complete a version which supports code movement between files inside one commit. July 8 - July 15: Complete a version which supports code copy of modified file in one commit. July 16 - July 23: Complete a version which supports code copy of any file in one commit tree. July 24 - August 7: Complete fuzzy matching of code movement and copy detect. =====About me===== I am Bo Yang, a Chinese graduate student majoring in Computer Science of NanKai University. I have touched some open source software since 5 years ago and began to contribute code to open source community from three years ago. I have contributed to Mozilla/Mingw/Netsurf. Technically, I am experienced in C/Bash Shell. I have attended last year's GSoC with Netsurf project. In that project, I have completed most of a DOM library in C. I begin to use git for source code revision from about two years ago. I use Git for track my Mozilla trunk source code. Because updating Mozilla code by CVS in my school is very slow. So, I write one script to automatically updating the trunk with CVS at mid-night, when the network flow is fast, on the server, and then use Git to maintain the code. Then I use Git in my PC to clone/update the source code from my local server and that is very fast. I use Git to track my changes to the code and some bug fixes. It is an excellent tool for branch/history, I think. Git is my lovely daily tool for revision control. I have much experience with it and have read "Git Internals" and also get some basic knowledge about Git's code base. And I think the line-level history explorer is really suitable for me and I can make a good start with this project in Git community. --------------------------------------------------- Thank you very much! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 3:52 ` Bo Yang @ 2010-03-22 15:48 ` Jakub Narebski 2010-03-22 18:21 ` Johannes Schindelin 2010-03-22 19:24 ` Johannes Schindelin [not found] ` <201003282120.40536.trast@student.ethz.ch> 2 siblings, 1 reply; 54+ messages in thread From: Jakub Narebski @ 2010-03-22 15:48 UTC (permalink / raw) To: Bo Yang; +Cc: Johannes.Schindelin, gitster, gitzilla, Alex Riesen, git Bo Yang <struggleyb.nku@gmail.com> writes: > This project will add a new feature for 'git log' to display line > level history. It can trace the history of any line range of certain > file at any revision. For simplity, users can run the command like: ' > git log -L builtin/diff.c:6,8 ', he will get the change history of > code between line 6 and line 8 of the diff.c file. I think that, at least at first, line-level log should follow the git-blame, i.e. git log -L <begin>,<end> <revs> -- <file> If we want (in the future) to follow history of some lines from one file, and other lines from other file together, we do not need to use -L <file>:<begin>,<end> syntax. If parseopt allows, we can use posotion of parameters, i.e. <file1> -L <m>,<n> <file2> -L <k>,<j> > And for each history entry, it will provide the commits, the diff > block which contains changes of users' interested lines. The most important *new* algorithm you need to implement is, after finding (blame-like) the commit that created given version of given line, what was previous version of given line and which line that was. You can probably find some heuristic in existing merge tools, like emerge from GNU Emacs, or graphical diff tools. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 15:48 ` Jakub Narebski @ 2010-03-22 18:21 ` Johannes Schindelin 2010-03-22 18:38 ` Sverre Rabbelier 0 siblings, 1 reply; 54+ messages in thread From: Johannes Schindelin @ 2010-03-22 18:21 UTC (permalink / raw) To: Jakub Narebski; +Cc: Bo Yang, gitster, gitzilla, Alex Riesen, git Hi, On Mon, 22 Mar 2010, Jakub Narebski wrote: > Bo Yang <struggleyb.nku@gmail.com> writes: > > > This project will add a new feature for 'git log' to display line > > level history. It can trace the history of any line range of certain > > file at any revision. For simplity, users can run the command like: ' > > git log -L builtin/diff.c:6,8 ', he will get the change history of > > code between line 6 and line 8 of the diff.c file. > > I think that, at least at first, line-level log should follow the > git-blame, i.e. > > git log -L <begin>,<end> <revs> -- <file> > > If we want (in the future) to follow history of some lines from one > file, and other lines from other file together, we do not need to use > > -L <file>:<begin>,<end> > > syntax. If parseopt allows, we can use posotion of parameters, i.e. > > <file1> -L <m>,<n> <file2> -L <k>,<j> Oh, is it bikeshedding time already? /me might have missed the start signal. > > And for each history entry, it will provide the commits, the diff > > block which contains changes of users' interested lines. > > The most important *new* algorithm you need to implement is, after > finding (blame-like) the commit that created given version of given > line, what was previous version of given line and which line that was. > > You can probably find some heuristic in existing merge tools, like > emerge from GNU Emacs, or graphical diff tools. I do not think that these tools can help, as they never look further than identical lines (and they mustn't, either). More importantly, the first step really is about driving the libxdiff in such a way that you can recognize the exact same lines. (One point to note for the technical details: the algorithm has to expect opposite code moves, i.e. it must cope well when the diff shows the code in question removed in one hunk and added in another.) We also should not get ahead of ourselves, but allow the student to get a full understanding of the requirements, from which he can then make a project plan (with milestones, Christian, no problem). BTW by "requirements" I do not mean something as technical as the syntax, but rather a definition what people should be able to expect to do with this at the end of the summer. As to fuzzy matching of lines that could not be attributed otherwise, I think that that will require a lot of playing around with different ideas. A simple Levenshtein-Damerau is highly unlikely to be enough. Ciao, Dscho ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 18:21 ` Johannes Schindelin @ 2010-03-22 18:38 ` Sverre Rabbelier 2010-03-22 19:26 ` Johannes Schindelin 0 siblings, 1 reply; 54+ messages in thread From: Sverre Rabbelier @ 2010-03-22 18:38 UTC (permalink / raw) To: Johannes Schindelin Cc: Jakub Narebski, Bo Yang, gitster, gitzilla, Alex Riesen, git Heya, On Mon, Mar 22, 2010 at 19:21, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > As to fuzzy matching of lines that could not be attributed otherwise, I > think that that will require a lot of playing around with different ideas. > A simple Levenshtein-Damerau is highly unlikely to be enough. I'd recommend making this either the last milestone, or not a milestone at all. As I noticed with git-stats such metrics might not exist at all (or at least be too hard to find/implement), and it's quite a bummer to not be able to implement your primary milestone ;). -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 18:38 ` Sverre Rabbelier @ 2010-03-22 19:26 ` Johannes Schindelin 2010-03-22 20:21 ` Sverre Rabbelier 0 siblings, 1 reply; 54+ messages in thread From: Johannes Schindelin @ 2010-03-22 19:26 UTC (permalink / raw) To: Sverre Rabbelier Cc: Jakub Narebski, Bo Yang, gitster, gitzilla, Alex Riesen, git Hi, On Mon, 22 Mar 2010, Sverre Rabbelier wrote: > On Mon, Mar 22, 2010 at 19:21, Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > As to fuzzy matching of lines that could not be attributed otherwise, > > I think that that will require a lot of playing around with different > > ideas. A simple Levenshtein-Damerau is highly unlikely to be enough. > > I'd recommend making this either the last milestone, or not a milestone > at all. As I noticed with git-stats such metrics might not exist at all > (or at least be too hard to find/implement), and it's quite a bummer to > not be able to implement your primary milestone ;). Indeed. TBH, I wanted to ask you to assist in that part of the project. You probably can give a good overview over what does not work, and why. Ciao, Dscho ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 19:26 ` Johannes Schindelin @ 2010-03-22 20:21 ` Sverre Rabbelier 0 siblings, 0 replies; 54+ messages in thread From: Sverre Rabbelier @ 2010-03-22 20:21 UTC (permalink / raw) To: Johannes Schindelin Cc: Jakub Narebski, Bo Yang, gitster, gitzilla, Alex Riesen, git Heya, On Mon, Mar 22, 2010 at 20:26, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Indeed. TBH, I wanted to ask you to assist in that part of the project. > You probably can give a good overview over what does not work, and why. Back then I think we even talked about teaching git log to find code moves? I have some silly code online on repo.or.cz even. maybe. Anyway, my main problem there was finding a heuristic that would give a sensible answer both in small _and_ large moves. It might be worth investigating two or more metrics instead, one that works for (very) small chunks of code, and thus require an almost exact match, then perhaps a somewhat linear function (the longer the block moved, the more 'fuzz' you allow), and maybe after some size, say practical full-file moves, use an algorithm similar to what rename detection does. </brandump> -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 3:52 ` Bo Yang 2010-03-22 15:48 ` Jakub Narebski @ 2010-03-22 19:24 ` Johannes Schindelin 2010-03-23 6:08 ` Bo Yang 2010-03-23 6:27 ` Bo Yang [not found] ` <201003282120.40536.trast@student.ethz.ch> 2 siblings, 2 replies; 54+ messages in thread From: Johannes Schindelin @ 2010-03-22 19:24 UTC (permalink / raw) To: Bo Yang; +Cc: gitster, gitzilla, Alex Riesen, git Hi, On Mon, 22 Mar 2010, Bo Yang wrote: > Draft proposal(v2): Line-level History Browser > > =====Purpose of this project===== > "git blame" can tell us who is responsible for a line of code, but it > can't help if we want to get the detail of how the lines of code have > evolved as what it is now. > This project will add a new feature for 'git log' to display line > level history. It can trace the history of any line range of certain > file at any revision. For simplity, users can run the command like: ' > git log -L builtin/diff.c:6,8 ', he will get the change history of > code between line 6 and line 8 of the diff.c file. And for each > history entry, it will provide the commits, the diff block which > contains changes of users' interested lines. I would not be too specific here about the exact syntax. I would rather have an example where this might be useful. In git.git, for example, you could point to pretty_print_commit() which was split out from commit.c into pretty.c in 93fc05e(Split off the pretty print stuff into its own file), and mention that it is hard to verify without much hassle that the code split was really only a code split, rather than a split with an evil change. Or you could point to 691f1a2(replace direct calls to unlink(2) with unlink_or_warn), where code was refactored, into a new function (unfortunately in two commits, so it might be a case not covered by your project) and it might be somebody's task to find out the original author for that function. Basically, I would like to have a structure in the proposal like this: what? why? how? when? > This utility will trace all the modification history of interested > lines and stop until it finds the root of the lines, which is a point > where all the new code is added from scratch. Also, the users can > specify how deeply he wants this utility to trace. And this tool will > treat code move just like modification too, so it will follow the code > move inside one commit. > > Note that, the history may not always be a single thread of commits. > If there are more than one commit which produce the specified line > range, the thread of history will split. Do not forget the case where there are more than one source of a code move. Think "refactoring". > =====Work and technical issues===== > ==Command options== > This new feature should be used for exploring the history of changes > for certain line range of code in one file. > > git log [-m<num>] [-I] [-d depth] [--fuzzy] -L file1@rev1:<start > pos>,<end pos> file2@rev2:<start pos>,<end pos> I would like this not to be specified too much here. For example, we do not know yet, whether the matching will be fuzzy, or whether we find something cleverer than that. So, I suggest to list not the command line options, but what you intend to support. I.e.: > Options: > > 1. -m<num>, option to control whether we should follow code movement. > If one -m is given, we follow code movement inside file, when more > than one '-m' is given, we follow the movement between files in one > commit. The <num> is used to specify the lower bound for the number > of lines of moved code. If it is not given, we set it as 1. Here you do not need to say that it is -m<num>, but that you want to support following code movements both inside and between files, but only optionally, for performance reasons (or some such). In any case, this would probably just reuse the -M option. > 2. -I, option to control whether we should display only the 'user > interested lines' diff block [default] or display the whole diff with > the interested area colorfully displayed. It would be more in line with the diff options to use -U, but you do not have to state that. Just talk about a configurable amount of context. > 3. -d, option to control the max depth we trace into the commit history. Again, there are better options for "git log" already, but you do not need to be too explicit on the syntax side. Just say that you want to be able to use as many of "git log"s options as make sense in the context of line-level history. > 4. --fuzzy, option to control whether fuzzy code copy mathing is used. See above. > 5. '-L' to control whether we run a simple log or we want a line level > log. See above. > 6. Files and lines. I propose to use such a syntax to specify the files > at revision and line range, <file>@<revision>:<start pos>,<end pos>. > This looks a little complex, but I think it is neccessary because we > will support multiple file at any version and any line range finally. > The revision can be any revision format of Git and the <pos> can be a > number, or a posix regex, just like what 'git blame' do. See above. > 7. And we will support code copy detect, too. The option which control > whether we trace code copy does exist in current 'git log', which is > the option '-C'. Similiarly, one '-C' is used to trace code copy of > new added code inside one commit. Two '-C' will trace any code copy > inside commit tree. Again, do not be too specific about details that have to be fleshed out while working on the project. For example, we do not know yet whether it would make more sense to look for code movements automatically when we detected a deletion, and maybe fall back automatically to detecting code copies when we found an inter-file move. > ==Design and implementation== > Git store all the blobs instead of code delta, so we should traverse > the commit history and directly access the tree/blob objects to > compute the code delta and search for the diff which contains the > interested lines. s/ed/ing/ > Since git use libxdiff to format its diff file, we should iterate > through all xdiff's diff blocks and find what the code looks like before > the commit. This will be done using the callback mechanism. Here, we > will find a new line range which is the origin code before this commit. > And then start another search from the current commit and the new line > range. > > Recursively, we can find all the modification history. We will stop when > we find that the current interested line range is added from scratch and > is not moved from other place of the file. Here, if the user want to > trace code copy, more work will be done to find the possible code copy. > We may also stop the traverse when we reach the max search depth. > > Also, if the thread of change history split into two or more commits, we > stop and provide the users all the related commits and corresponding > line range. Good. > Generally, > 1. New callback for xdi_diff to parse the diff hunk and store line > level history info. > 2. builtin/line-log.c will be added to complete most of the new features. > 3. builtin/log.c will be changed to add this new utility to the front end. > 4. Documents will be updated to introduce this new tool. Good. > =====Milestones and Timeline===== > In this summer, we will add support of line level history browser for > only one file. The multiple ranges support is currently not in this > project. > > The milestones of the project are: > 1. Simple modification change history. IMHO this should be split into 1a) have an initial version which does nothing else than parse git-log options and a single additional -L, requiring exactly one file to be specified 1b) implement the xdiff callback and identify the commits touching the line range (this is not completely trivial due to merges) > 2. Code movement inside one file detect. Again, this has to be split a little bit. Code can split, and it can also unite. So, a single line range can easily become multiple ones. > 3. Code movement inside one commit but not a file. s/but not a file/between files/ > 4. Code copy of modified file in one commit. You mean code copy from somewhere in the same file? > 5. Code copy of any place in one commit tree. > 6. Fuzzy matching support. For fuzzy matching support, I would add some ideas, such as trying to match alpha-numeric characters, or matching longest words or some such. Also mention the possibility that this might be infeasible. In any case, give an example what case this is trying to help with. > And the timeline will be: > April 26 - May 23: Catch up with Git code base and study the > implementation of blame.c and log.c thouroughly. Hmm. Maybe it would be better to be more precise. Like: 1st week: follow the bird's eye view on Git's source code. 2nd week, analyze the rev-list machinery (probably first looking at the code of merge-base, for easier understanding), 3rd week, have a look at builtin/log.c, 4th week, understand blame.c > May 24 - June 21 : Complete a version which supports code > modifcation trace but without code movement and code copy support. > > June 22 - June 29: Complete a version which supports code movement > inside one file. > > June 30 - July 7: Complete a version which supports code movement > between files inside one commit. > > July 8 - July 15: Complete a version which supports code copy of > modified file in one commit. > > July 16 - July 23: Complete a version which supports code copy of > any file in one commit tree. > > July 24 - August 7: Complete fuzzy matching of code movement and copy detect. This should probably adjusted a bit to my suggestions above. Ciao, Dscho ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 19:24 ` Johannes Schindelin @ 2010-03-23 6:08 ` Bo Yang 2010-03-23 6:27 ` Bo Yang 1 sibling, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-23 6:08 UTC (permalink / raw) To: Johannes Schindelin; +Cc: gitster, gitzilla, Alex Riesen, git Hi, >> Note that, the history may not always be a single thread of commits. >> If there are more than one commit which produce the specified line >> range, the thread of history will split. > > Do not forget the case where there are more than one source of a code > move. Think "refactoring". Yeah, I really ignore such a condition. Thanks a lot! And any new added code can be moved/copied from multiple source. This will really be a new problem for the fuzzy matching. >> =====Work and technical issues===== >> ==Command options== >> This new feature should be used for exploring the history of changes >> for certain line range of code in one file. >> >> git log [-m<num>] [-I] [-d depth] [--fuzzy] -L file1@rev1:<start >> pos>,<end pos> file2@rev2:<start pos>,<end pos> > > I would like this not to be specified too much here. For example, we do > not know yet, whether the matching will be fuzzy, or whether we find > something cleverer than that. Ok, I will focus on express what I will support instead of command line options. > >> =====Milestones and Timeline===== >> In this summer, we will add support of line level history browser for >> only one file. The multiple ranges support is currently not in this >> project. >> >> The milestones of the project are: >> 1. Simple modification change history. > > IMHO this should be split into > > 1a) have an initial version which does nothing else than parse > git-log options and a single additional -L, requiring exactly > one file to be specified > > 1b) implement the xdiff callback and identify the commits touching > the line range (this is not completely trivial due to merges) > I will make a more specified milestones and timeline, thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-22 19:24 ` Johannes Schindelin 2010-03-23 6:08 ` Bo Yang @ 2010-03-23 6:27 ` Bo Yang 1 sibling, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-23 6:27 UTC (permalink / raw) To: Johannes Schindelin; +Cc: gitster, gitzilla, Alex Riesen, git Hi, >> 4. Code copy of modified file in one commit. > > You mean code copy from somewhere in the same file? I am sorry not. I mean, lines copied from other files that were modified in the same commit. Just what 'blame' means with one '-C' options. > >> 5. Code copy of any place in one commit tree. >> 6. Fuzzy matching support. > > For fuzzy matching support, I would add some ideas, such as trying to > match alpha-numeric characters, or matching longest words or some such. > Also mention the possibility that this might be infeasible. In any case, > give an example what case this is trying to help with. > I think fuzzy matching is used to track multiple lines of copy/movement, even with little change of the source. For example, one C function is moved from file1 to file2 and get renamed. In this case, most of the origin code of function body will remain unchanged except the function name. So, simply compare the new added lines with original code line by line and permit some percent of mismatch will help to find this kind of movement. Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
[parent not found: <201003282120.40536.trast@student.ethz.ch>]
* Re: GSoC draft proposal: Line-level history browser [not found] ` <201003282120.40536.trast@student.ethz.ch> @ 2010-03-29 4:14 ` Bo Yang 2010-03-29 18:42 ` Thomas Rast 0 siblings, 1 reply; 54+ messages in thread From: Bo Yang @ 2010-03-29 4:14 UTC (permalink / raw) To: Thomas Rast; +Cc: Johannes Schindelin, git Hi Thomas, On Mon, Mar 29, 2010 at 3:20 AM, Thomas Rast <trast@student.ethz.ch> wrote: > Hi Bo > > I have one specific question about the draft project description: > > You wrote: >> And the timeline will be: >> April 26 - May 23: Catch up with Git code base and study the >> implementation of blame.c and log.c thouroughly. >> >> May 24 - June 21 : Complete a version which supports code >> modifcation trace but without code movement and code copy support. >> >> June 22 - June 29: Complete a version which supports code movement >> inside one file. >> >> June 30 - July 7: Complete a version which supports code movement >> between files inside one commit. >> >> July 8 - July 15: Complete a version which supports code copy of >> modified file in one commit. >> >> July 16 - July 23: Complete a version which supports code copy of >> any file in one commit tree. >> >> July 24 - August 7: Complete fuzzy matching of code movement and copy detect. > > Where are you taking those numbers from? > > (I'm fine if the answer is "I'm making them up from whole cloth" but I > want to know anyway :-P) You mean the dates? They are made up according on 'GSoC's timeline' and my estimation about the workload of each milestone. And this is the draft proposal, after a long thread of discussion, the timeline and milestone change much. The fuzzy matching milestone will become a bonus milestone instead of a primary GSoC milestone. And I think it may help that I provide a newest version of it, I paste it in the end of the email. And I will appreciate any feedback from you. Especially about the implementation section :) Regards! Bo ------------------------------------------------------------------------- Draft proposal(v3): Line-level History Browser =====Purpose of this project===== "git blame" can tell us who is responsible for a line of code, but it can't help if we want to get the detail of how the lines of code have evolved as what it is now. For example, in Git, commit 93fc05e(Split off the pretty print stuff into its own file) split out pretty_print_commit() from commit.c into pretty.c, and it is hard to verify without much hassle that the code split was really only a code split, rather than a split with an evil change. This project will add a new feature for 'git log' to display line level history. It can trace the history of any line range of certain file at any revision. And for each history entry, it will provide the commits, the diff block which contains changes of users' interested lines. This utility will trace all the modification history of interested lines and stop until it finds the root of the lines, which is a point where all the new code is added from scratch. Also, the users can specify how deeply he wants this utility to trace. And this tool will also follow the code movement and copy inside one commit, too. Note that, the history may not always be a single thread of commits. If there are more than one commits which produce the specified line range, or there are more than one source of code move/copy, the thread of history will split. And this utility may stop and provide all commits with its code changes to the user, let the user to select which one to trace next. Or, it may also use 'git log --graph' way to display the splitted history, we will provide options to control this. =====Work and technical issues===== ==Scenario== For how we use the line level browser and how the utility should act to us, here is an scenario: http://article.gmane.org/gmane.comp.version-control.git/143024/match=line+level+history+browser It contains code movement between files but not code copy and fuzzy matching. ==Features== This new feature should be used for exploring the history of changes for certain line range of code in one file. Following features will be supported: 1. Follow history of code modification of any single line range starting from any revision. The above scenario provide a good example for what this function used for and how it acts with users. 2. Follow code movement inside one file. And follow code movement between files optionally for performance reason. With code movement detect, we can find code refactoring easily just like what the above scenario do. 3. Provide a configurable context to users, display only the 'user interested lines' diff block or display the whole diff with the interested area colorfully displayed. 4. Detect code copy optionally. This may help us to understand why some code is here and help on code refactoring. For example, we can always make some 'usually copied code' a function. 5. Simply fuzzy matching for code move/copy. Provide an option to control whether we start a fuzzy matching for performance reason. This can help us to find whether some code is really literally moved to here or with some evil changes. And this may also help in some situation like if we move some Java class to another file with only its class name changed. Anyway, fuzzy matching can help much on code detection. And there can be many fuzzy detect strategies, but we will only try to support the simplest one in this summer for time reason. Maybe a strategy like: 90% of the lines between two ranges of code are identical or 90% of words are identical. This will be discussed again before coding I think. 6. Provide a configurable way for how to display the history. A 'git log --graph' way or stop to ask users when we meet history splitting. 7. Reuse 'git log' existing options as many as possible. ==Design and implementation== Git store all the blobs instead of code delta, so we should traverse the commit history and directly access the tree/blob objects to compute the code delta and search for the diff which contains the interesting lines. Since git use libxdiff to format its diff file, we should iterate through all xdiff's diff blocks and find what the code looks like before the commit. This will be done using the callback mechanism. Here, we will find a new line range which is the origin code before this commit. And then start another search from the current commit and the new line range. Recursively, we can find all the modification history. We will stop when we find that the current interested line range is added from scratch and is not moved from other place of the file. Here, if the user want to trace code copy, more work will be done to find the possible code copy. We may also stop the traverse when we reach the max search depth. Also, if the thread of change history split into two or more commits, we stop and provide the users all the related commits and corresponding line range. Generally, 1. New callback for xdi_diff to parse the diff hunk and store line level history info. 2. builtin/line-log.c will be added to complete most of the new features. 3. builtin/log.c will be changed to add this new utility to the front end. 4. Documents will be updated to introduce this new tool. =====Milestones and Timeline===== In this summer, we will add support of line level history browser for only one file. The multiple ranges support is currently not in this project. The milestones of the project are: 1. Simple modification change history. 1a) Have an initial version which does nothing else than parse git-log options and a single additional -L, requiring exactly one file to be specified 1b) Implement the xdiff callback and identify the commits touching the line range 1c) Implement a workable line level log browser 2. Code movement inside one file. 2a) Support the whole section of code literally move. 2b) Support code movement with splitting. 3c) Support code movement with code uniting. 3. Code movement inside one commit between files. 4. Code lines copied from other files that were modified in the same commit. 4a) Support the whole section of code literally copy. 4b) Support code copy split and unite. 5. Code copy of any place in one commit tree. 6. Fuzzy matching support. Note that there is not a exact strategy for fuzzy matching and I would like this milestone a bonus one instead of a primary milestone for GSoC. We will make a good support for this if time allows. And the timeline will be: April 26 - May 23: 1st week, follow the bird's eye view on Git's source code. 2nd week, have a look at the code of merge-base, analyze the rev-listmachinery 3rd week, have a look at builtin/log.c, 4th week, understand blame.c May 24 - June 13 : Complete a version which supports code modifcation trace but without code movement and code copy support. For detail: 1st week, milestone 1a, 1b 2-3 week, milestone 1c June 14 - July 11: Complete a version which supports code movement. 1st week, milestone 2a 2nd week, milestone 2b 3rd week, milestone 2c 4th week, milestone 3 July 12 - August 1: Complete a version which supports code copy. 1st week, milestone 4a 2nd week, milestone 4b 3rd week, milestone 5 August 2 - August 14: Complete fuzzy matching of code movement and copy detect. And there is one milestone for each week nearly, so every week, I will post a stutas update to the list to let the community know the project progress. And, patches will be sent for feature completion but not milestone. =====About me===== I am Bo Yang, a Chinese graduate student majoring in Computer Science of NanKai University. I have touched some open source software since 5 years ago and began to contribute code to open source community from three years ago. I have contributed to Mozilla/Mingw/Netsurf. Technically, I am experienced in C/Bash Shell. I have attended last year's GSoC with Netsurf project. In that project, I have completed most of a DOM library in C. I begin to use git for source code revision from about two years ago. I use Git for track my Mozilla trunk source code. Because updating Mozilla code by CVS in my school is very slow. So, I write one script to automatically updating the trunk with CVS at mid-night, when the network flow is fast, on the server, and then use Git to maintain the code. Then I use Git in my PC to clone/update the source code from my local server and that is very fast. I use Git to track my changes to the code and some bug fixes. It is an excellent tool for branch/history, I think. Git is my lovely daily tool for revision control. I have much experience with it and have read "Git Internals" and also get some basic knowledge about Git's code base. And I think the line-level history explorer is really suitable for me and I can make a good start with this project in Git community. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-29 4:14 ` Bo Yang @ 2010-03-29 18:42 ` Thomas Rast 2010-03-30 2:52 ` Bo Yang 0 siblings, 1 reply; 54+ messages in thread From: Thomas Rast @ 2010-03-29 18:42 UTC (permalink / raw) To: Bo Yang; +Cc: Johannes Schindelin, git, Jens Lehmann Bo Yang wrote: > Draft proposal(v3): Line-level History Browser > > =====Purpose of this project===== > "git blame" can tell us who is responsible for a line of code, but it > can't help if we want to get the detail of how the lines of code have > evolved as what it is now. For example, in Git, commit 93fc05e(Split > off the pretty print stuff into its own file) split out > pretty_print_commit() from commit.c into pretty.c, and it is hard to > verify without much hassle that the code split was really only a code > split, rather than a split with an evil change. Is this really the right use-case? AFAICT the answer to the implied question is given by simply running 'git blame -M 93fc05e:pretty.c'. (Coming up with a better example should be easy; the way I currently think of the feature means that it will mostly replace git-blame for me...) > Note that, the history may not always be a single thread of commits. > If there are more than one commits which produce the specified line > range, or there are more than one source of code move/copy, the thread > of history will split. And this utility may stop and provide all > commits with its code changes to the user, let the user to select > which one to trace next. Or, it may also use 'git log --graph' way to > display the splitted history, we will provide options to control this. I would, by far, prefer the latter. So far 'git log' has always been noninteractive, and there's no really good way to make it interactive because it also goes through the pager. (In the case of blame this is solved in 'git gui blame', which might also be a reasonable approach.) OTOH, if you can really fake a history walk, then just about any log-oriented tool should be able to work with it. You'd get graphical output for free with gitk and git log --graph. I haven't really thought through the ramifications, though. > =====Work and technical issues===== > ==Scenario== > For how we use the line level browser and how the utility should act > to us, here is an scenario: > http://article.gmane.org/gmane.comp.version-control.git/143024/match=line+level+history+browser > It contains code movement between files but not code copy and fuzzy matching. I would prefer if you could inline a short example, perhaps starting at your second diff snippet. Examples are good ;-) Even if not, please drop the /match= parameter since it is very distracting. > 5. Simply fuzzy matching for code move/copy. Provide an option to > control whether we start a fuzzy matching for performance reason. This > can help us to find whether some code is really literally moved to > here or with some evil changes. And this may also help in some > situation like if we move some Java class to another file with only > its class name changed. Anyway, fuzzy matching can help much on code > detection. And there can be many fuzzy detect strategies, but we will > only try to support the simplest one in this summer for time reason. > Maybe a strategy like: 90% of the lines between two ranges of code are > identical or 90% of words are identical. This will be discussed again > before coding I think. > > 6. Provide a configurable way for how to display the history. A 'git > log --graph' way or stop to ask users when we meet history splitting. See above. > 7. Reuse 'git log' existing options as many as possible. One thing that IMO is missing from this list, is a plumbing mode that just feeds the raw data to a (presumed) frontend. It could be as simple as supporting git log -L ... --pretty=raw --raw or similar, if this provides sufficient information. Compare 'git blame --porcelain'. > ==Design and implementation== > Git store all the blobs instead of code delta, so we should traverse > the commit history and directly access the tree/blob objects to > compute the code delta and search for the diff which contains the > interesting lines. Since git use libxdiff to format its diff file, we > should iterate through all xdiff's diff blocks and find what the code > looks like before the commit. This will be done using the callback > mechanism. Here, we will find a new line range which is the origin > code before this commit. And then start another search from the > current commit and the new line range. Recursively, we can find all > the modification history. We will stop when we find that the current > interested line range is added from scratch and is not moved from > other place of the file. Here, if the user want to trace code copy, > more work will be done to find the possible code copy. We may also > stop the traverse when we reach the max search depth. Also, if the > thread of change history split into two or more commits, we stop and > provide the users all the related commits and corresponding line > range. > > Generally, > 1. New callback for xdi_diff to parse the diff hunk and store line > level history info. > 2. builtin/line-log.c will be added to complete most of the new features. > 3. builtin/log.c will be changed to add this new utility to the front end. > 4. Documents will be updated to introduce this new tool. This section is too handwavy for my taste. I think in most cases you say "we can" when you really mean "git-blame already does it, so we can just use a similar algorithm". Which is fine, but I'd rather see it spelled out so as to see what is not already covered by blame's code. > =====Milestones and Timeline===== > In this summer, we will add support of line level history browser for > only one file. The multiple ranges support is currently not in this > project. I agree with what Dscho pointed out earlier in the thread: multiple ranges will be an easy exercise once you can follow a "blame split" where half the lines blame to some file and half the lines blame to another. Other than that I think the milestones look sensible. As a theory guy, I'm not a huge believer in timelines, so lets hope someone else comments on it. > And there is one milestone for each week nearly, so every week, I will > post a stutas update to the list to let the community know the project > progress. And, patches will be sent for feature completion but not > milestone. Push the code somewhere public as you go, even between feature completions. Post RFCs once you have workable features so people can comment. Generally try to be visible. Bonus points if you can think of something visible to do during the period where you look at code, > April 26 - May 23: > 1st week, follow the bird's eye view on Git's source code. > 2nd week, have a look at the code of merge-base, analyze the rev-listmachinery > 3rd week, have a look at builtin/log.c, > 4th week, understand blame.c whether it be documenting your learnings in some way, improving docs as you go, or documenting the APIs you find. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-29 18:42 ` Thomas Rast @ 2010-03-30 2:52 ` Bo Yang 2010-03-30 9:07 ` Michael J Gruber 2010-03-30 9:10 ` Jakub Narebski 0 siblings, 2 replies; 54+ messages in thread From: Bo Yang @ 2010-03-30 2:52 UTC (permalink / raw) To: Thomas Rast; +Cc: Johannes Schindelin, git, Jens Lehmann Hi Thomas, On Tue, Mar 30, 2010 at 2:42 AM, Thomas Rast <trast@student.ethz.ch> wrote: > > Is this really the right use-case? AFAICT the answer to the implied > question is given by simply running 'git blame -M 93fc05e:pretty.c'. > > (Coming up with a better example should be easy; the way I currently > think of the feature means that it will mostly replace git-blame for > me...) I will cite the same example below in the scenario. :) > I would, by far, prefer the latter. So far 'git log' has always been > noninteractive, and there's no really good way to make it interactive > because it also goes through the pager. (In the case of blame this is > solved in 'git gui blame', which might also be a reasonable approach.) > > OTOH, if you can really fake a history walk, then just about any > log-oriented tool should be able to work with it. You'd get graphical > output for free with gitk and git log --graph. I haven't really > thought through the ramifications, though. Ok, so let us try to abandon the interactive way totally. >> =====Work and technical issues===== >> ==Scenario== >> For how we use the line level browser and how the utility should act >> to us, here is an scenario: >> http://article.gmane.org/gmane.comp.version-control.git/143024/match=line+level+history+browser >> It contains code movement between files but not code copy and fuzzy matching. > > I would prefer if you could inline a short example, perhaps starting > at your second diff snippet. Examples are good ;-) > > Even if not, please drop the /match= parameter since it is very > distracting. I put the example at the end of the proposal as a reference. > >> 7. Reuse 'git log' existing options as many as possible. > > One thing that IMO is missing from this list, is a plumbing mode that > just feeds the raw data to a (presumed) frontend. It could be as > simple as supporting > > git log -L ... --pretty=raw --raw > > or similar, if this provides sufficient information. Compare 'git > blame --porcelain'. Very good feedback, I will add this, thanks a lot! > > This section is too handwavy for my taste. I think in most cases you > say "we can" when you really mean "git-blame already does it, so we > can just use a similar algorithm". Which is fine, but I'd rather see > it spelled out so as to see what is not already covered by blame's code. Changed in next version to make this clear. But only add some words to state that 'blame does similar' :) > > Push the code somewhere public as you go, even between feature > completions. Post RFCs once you have workable features so people can > comment. Generally try to be visible. > > Bonus points if you can think of something visible to do during the > period where you look at code, Yeah, really is a good point. And I have tried to play around on github.com and try to set up a http://github.com/byang/my_git for this purpose. :) >> April 26 - May 23: >> 1st week, follow the bird's eye view on Git's source code. >> 2nd week, have a look at the code of merge-base, analyze the rev-listmachinery >> 3rd week, have a look at builtin/log.c, >> 4th week, understand blame.c > > whether it be documenting your learnings in some way, improving docs > as you go, or documenting the APIs you find. Thanks a lot for this good advice, I will do so. With these feedback, I think I can make up a complete version of the proposal and submit it to Google. Thanks! Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-30 2:52 ` Bo Yang @ 2010-03-30 9:07 ` Michael J Gruber 2010-03-30 9:38 ` Michael J Gruber 2010-03-30 11:10 ` Bo Yang 2010-03-30 9:10 ` Jakub Narebski 1 sibling, 2 replies; 54+ messages in thread From: Michael J Gruber @ 2010-03-30 9:07 UTC (permalink / raw) To: Bo Yang; +Cc: Thomas Rast, Johannes Schindelin, git, Jens Lehmann Bo Yang venit, vidit, dixit 30.03.2010 04:52: > Hi Thomas, > > On Tue, Mar 30, 2010 at 2:42 AM, Thomas Rast <trast@student.ethz.ch> wrote: >> >> Is this really the right use-case? AFAICT the answer to the implied >> question is given by simply running 'git blame -M 93fc05e:pretty.c'. >> >> (Coming up with a better example should be easy; the way I currently >> think of the feature means that it will mostly replace git-blame for >> me...) > > I will cite the same example below in the scenario. :) > >> I would, by far, prefer the latter. So far 'git log' has always been >> noninteractive, and there's no really good way to make it interactive >> because it also goes through the pager. (In the case of blame this is >> solved in 'git gui blame', which might also be a reasonable approach.) >> >> OTOH, if you can really fake a history walk, then just about any >> log-oriented tool should be able to work with it. You'd get graphical >> output for free with gitk and git log --graph. I haven't really >> thought through the ramifications, though. > > Ok, so let us try to abandon the interactive way totally. > >>> =====Work and technical issues===== >>> ==Scenario== >>> For how we use the line level browser and how the utility should act >>> to us, here is an scenario: >>> http://article.gmane.org/gmane.comp.version-control.git/143024/match=line+level+history+browser >>> It contains code movement between files but not code copy and fuzzy matching. >> >> I would prefer if you could inline a short example, perhaps starting >> at your second diff snippet. Examples are good ;-) >> >> Even if not, please drop the /match= parameter since it is very >> distracting. > > I put the example at the end of the proposal as a reference. > >> >>> 7. Reuse 'git log' existing options as many as possible. >> >> One thing that IMO is missing from this list, is a plumbing mode that >> just feeds the raw data to a (presumed) frontend. It could be as >> simple as supporting >> >> git log -L ... --pretty=raw --raw >> >> or similar, if this provides sufficient information. Compare 'git >> blame --porcelain'. > > Very good feedback, I will add this, thanks a lot! > >> >> This section is too handwavy for my taste. I think in most cases you >> say "we can" when you really mean "git-blame already does it, so we >> can just use a similar algorithm". Which is fine, but I'd rather see >> it spelled out so as to see what is not already covered by blame's code. > > Changed in next version to make this clear. But only add some words to > state that 'blame does similar' :) > >> >> Push the code somewhere public as you go, even between feature >> completions. Post RFCs once you have workable features so people can >> comment. Generally try to be visible. >> >> Bonus points if you can think of something visible to do during the >> period where you look at code, > > Yeah, really is a good point. And I have tried to play around on > github.com and try to set up a http://github.com/byang/my_git for this > purpose. :) You may want to create your repo as a fork of gitster/git instead. That's easier on github, they have a hard time anyways these days ;) Seriously, it helps making use of their network feature etc. I don't have anything to add to your proposal (I like it), but I'll be at NKU next week (Conference @ Chern Institute) so drop me a PM if you wish. Cheers, Michael ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-30 9:07 ` Michael J Gruber @ 2010-03-30 9:38 ` Michael J Gruber 2010-03-30 11:10 ` Bo Yang 1 sibling, 0 replies; 54+ messages in thread From: Michael J Gruber @ 2010-03-30 9:38 UTC (permalink / raw) Cc: Bo Yang, Thomas Rast, Johannes Schindelin, git, Jens Lehmann Michael J Gruber venit, vidit, dixit 30.03.2010 11:07: > You may want to create your repo as a fork of gitster/git instead. Actually, make this git/git, the other one isn't being updated... Sorry! Michael ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-30 9:07 ` Michael J Gruber 2010-03-30 9:38 ` Michael J Gruber @ 2010-03-30 11:10 ` Bo Yang 1 sibling, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-30 11:10 UTC (permalink / raw) To: Michael J Gruber; +Cc: Thomas Rast, Johannes Schindelin, git, Jens Lehmann Hi Michael, On Tue, Mar 30, 2010 at 5:07 PM, Michael J Gruber <git@drmicha.warpmail.net> wrote: > > You may want to create your repo as a fork of gitster/git instead. > That's easier on github, they have a hard time anyways these days ;) > Seriously, it helps making use of their network feature etc. Yeah, forked git/git. :) > I don't have anything to add to your proposal (I like it), but I'll be > at NKU next week (Conference @ Chern Institute) so drop me a PM if you wish. That is really a big coincidence. :) I am very willing to meet you at NKU, and I think I can be your guide in NKU and some beautiful spots in Tianjin if you have spare time. :) Anyway, let us talk about this in personal email off the list. :-) Regards! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-30 2:52 ` Bo Yang 2010-03-30 9:07 ` Michael J Gruber @ 2010-03-30 9:10 ` Jakub Narebski 2010-03-30 11:15 ` Bo Yang 1 sibling, 1 reply; 54+ messages in thread From: Jakub Narebski @ 2010-03-30 9:10 UTC (permalink / raw) To: Bo Yang; +Cc: Thomas Rast, Johannes Schindelin, git, Jens Lehmann Bo Yang <struggleyb.nku@gmail.com> writes: > On Tue, Mar 30, 2010 at 2:42 AM, Thomas Rast <trast@student.ethz.ch> wrote: > > > > Is this really the right use-case? AFAICT the answer to the implied > > question is given by simply running 'git blame -M 93fc05e:pretty.c'. > > > > (Coming up with a better example should be easy; the way I currently > > think of the feature means that it will mostly replace git-blame for > > me...) > > I will cite the same example below in the scenario. :) By the way, it would be good to find an example with "evil merge", which means that the change to given line(s) is in the merge commit itself. Correctly simplifying history in such case might be non-trivial. Another example that it would be good to have is "history split" example, which means the case where some lines were consolidated (e.g. after refactoring), and some of lines in "preimage" come from different lines of history. This would help with writing tests for this feature (compare tests for blame), although they are not in my opinion necessary for the proposal itself. I hope that all this cases would fall naturally from the implementation. [...] > > Push the code somewhere public as you go, even between feature > > completions. Post RFCs once you have workable features so people can > > comment. Generally try to be visible. > > > > Bonus points if you can think of something visible to do during the > > period where you look at code, > > Yeah, really is a good point. And I have tried to play around on > github.com and try to set up a http://github.com/byang/my_git for this > purpose. :) my_git is not very descriptive... well, unless you would do your work on GSoC2010/line-level-history-browser branch, or something like that. It might be good idea to have repo.or.cz as an additional repository, as a fork of git.git repo, and with SoC / GSoC labels. See http://repo.or.cz/w/git.git/forks?t=soc -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: GSoC draft proposal: Line-level history browser 2010-03-30 9:10 ` Jakub Narebski @ 2010-03-30 11:15 ` Bo Yang 0 siblings, 0 replies; 54+ messages in thread From: Bo Yang @ 2010-03-30 11:15 UTC (permalink / raw) To: Jakub Narebski; +Cc: Thomas Rast, Johannes Schindelin, git, Jens Lehmann Hi Jakub, On Tue, Mar 30, 2010 at 5:10 PM, Jakub Narebski <jnareb@gmail.com> wrote: > By the way, it would be good to find an example with "evil merge", > which means that the change to given line(s) is in the merge commit > itself. Correctly simplifying history in such case might be > non-trivial. It is a little time consuming to find such a change in the history. I think we can come up some ones at the start of the project manually and put them into the testcases. :) > Another example that it would be good to have is "history split" > example, which means the case where some lines were consolidated > (e.g. after refactoring), and some of lines in "preimage" come > from different lines of history. > > This would help with writing tests for this feature (compare tests > for blame), although they are not in my opinion necessary for the > proposal itself. > > I hope that all this cases would fall naturally from the > implementation. > [...] >> > Push the code somewhere public as you go, even between feature >> > completions. Post RFCs once you have workable features so people can >> > comment. Generally try to be visible. >> > >> > Bonus points if you can think of something visible to do during the >> > period where you look at code, >> >> Yeah, really is a good point. And I have tried to play around on >> github.com and try to set up a http://github.com/byang/my_git for this >> purpose. :) > > my_git is not very descriptive... well, unless you would do your work > on GSoC2010/line-level-history-browser branch, or something like that. > > It might be good idea to have repo.or.cz as an additional repository, > as a fork of git.git repo, and with SoC / GSoC labels. See > http://repo.or.cz/w/git.git/forks?t=soc Ah, a repo at http://github.com/byang/gsoc-line-browser is created and a mirror at http://repo.or.cz/w/gsoc-line-browser.git, I think this is enough. :-) Thanks! Bo ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2010-03-30 11:15 UTC | newest] Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-03-20 9:18 GSoC draft proposal: Line-level history browser Bo Yang 2010-03-20 11:30 ` Johannes Schindelin 2010-03-20 13:10 ` Bo Yang 2010-03-20 13:30 ` Junio C Hamano 2010-03-21 6:03 ` Bo Yang 2010-03-20 13:36 ` Johannes Schindelin 2010-03-21 6:05 ` Bo Yang 2010-03-20 20:35 ` Alex Riesen 2010-03-20 20:57 ` Junio C Hamano 2010-03-21 6:10 ` Bo Yang 2010-03-20 21:58 ` A Large Angry SCM 2010-03-21 6:16 ` Bo Yang 2010-03-21 13:19 ` A Large Angry SCM 2010-03-22 3:48 ` Bo Yang 2010-03-22 4:24 ` Junio C Hamano 2010-03-22 4:34 ` Bo Yang 2010-03-22 5:32 ` Junio C Hamano 2010-03-22 7:31 ` Bo Yang 2010-03-22 7:41 ` Junio C Hamano 2010-03-22 7:52 ` Bo Yang 2010-03-22 8:10 ` Jonathan Nieder 2010-03-23 6:01 ` Bo Yang 2010-03-23 10:08 ` Jakub Narebski 2010-03-23 10:38 ` Bo Yang 2010-03-23 11:22 ` Jakub Narebski 2010-03-23 12:23 ` Bo Yang 2010-03-23 13:49 ` Jakub Narebski 2010-03-23 15:23 ` Bo Yang 2010-03-23 19:57 ` Jonathan Nieder 2010-03-23 21:51 ` A Large Angry SCM 2010-03-24 2:30 ` Bo Yang 2010-03-23 12:02 ` Peter Kjellerstedt 2010-03-23 18:57 ` Jonathan Nieder 2010-03-24 2:39 ` Bo Yang 2010-03-24 4:02 ` Jonathan Nieder 2010-03-22 10:39 ` Alex Riesen 2010-03-22 15:05 ` Johannes Schindelin 2010-03-22 3:52 ` Bo Yang 2010-03-22 15:48 ` Jakub Narebski 2010-03-22 18:21 ` Johannes Schindelin 2010-03-22 18:38 ` Sverre Rabbelier 2010-03-22 19:26 ` Johannes Schindelin 2010-03-22 20:21 ` Sverre Rabbelier 2010-03-22 19:24 ` Johannes Schindelin 2010-03-23 6:08 ` Bo Yang 2010-03-23 6:27 ` Bo Yang [not found] ` <201003282120.40536.trast@student.ethz.ch> 2010-03-29 4:14 ` Bo Yang 2010-03-29 18:42 ` Thomas Rast 2010-03-30 2:52 ` Bo Yang 2010-03-30 9:07 ` Michael J Gruber 2010-03-30 9:38 ` Michael J Gruber 2010-03-30 11:10 ` Bo Yang 2010-03-30 9:10 ` Jakub Narebski 2010-03-30 11:15 ` Bo Yang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.