All of lore.kernel.org
 help / color / mirror / Atom feed
* GSoC intro
@ 2012-03-19 14:42 Florian Achleitner
  2012-03-19 21:31 ` Andrew Sayers
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-03-19 14:42 UTC (permalink / raw)
  To: Git Mailing List

Hi fellow git developers!

I'm curious about applying for GSoC 2012 considering the idea "Remote helper 
for Subversion". 
I'm using git since years and have converted my svn repos to git years ago, 
but I'm not yet familiar with the pre-work on this topic. Is there a branch in 
git's git?
Does a "full-featured bi-directional git-remote-svn" mean, that it should work 
like any remote git repository where you can push to and fetch from?

Below I briefly introduce myself, for those who are interested.

About me
My name is Florian Achleitner (IRC: FlyingFlo). I'm 
from Austria and I study Telematics (a blend of computer science and 
electric engineering) at the Graz University of Technology. I'm currently in 
the first year of the master program. Before starting my studies I worked for 
four years as a developer of embedded systems in industry.

My programming experience grew since I started writing programs on TI 
calculators in school probably 15 years ago.
I'm open-source enthusiast, exclusively using Linux since years.
I currently work as teaching assistant for an exercise about programming 
operating systems. In this course we also teach the students to use git.

About me and GSoC
In summer 2010 I participated in GSoC for hugin writing a Makefile-creation 
library in C++, which is used to drive the panorama creation [1]. It was a 
great experience and a cool, successful summer job! ( and it was merged in 
hugin's master branch :-) )

Why git?
- I use git daily. It's always good to work on things you use and a chance to 
contribute something.
- I like C
- I used svn. Nowadays I only use it if i have to ;)
- The community interaction aspect of open source development is very 
interesting.. as the ideas page says ".. and get it merged into upstream Git."


[1] http://hugin.hg.sourceforge.net/hgweb/hugin/hugin/branches branch: 
gsoc2010_makefilelib (unfortunately the web fronted doesn't display a specific 
branch)

Regards,
Flo
-- 
Florian Achleitner, BSc

"In a world without walls and fences, 
who needs windows and gates?"  ;-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-19 14:42 GSoC intro Florian Achleitner
@ 2012-03-19 21:31 ` Andrew Sayers
  2012-03-20 12:25 ` Florian Achleitner
  2012-03-20 13:19 ` David Barr
  2 siblings, 0 replies; 46+ messages in thread
From: Andrew Sayers @ 2012-03-19 21:31 UTC (permalink / raw)
  To: Florian Achleitner; +Cc: Git Mailing List

Hi Florian,

I've become interested in Git<->SVN issues lately, so I'll tell you what
I know.  Hopefully more knowledgeable people will correct me if I'm wrong.

The main thrust of SVN development is done in the "svn-fe" project.  You
can see the work so far in the "vcs-svn"[1] and "contrib/svn-fe"[2]
subdirectories of the main git repo.  My experience as a user has been
that it does a great job of the things it does, but so far it only does
a subset of the things I want.  For example, it can't write to SVN and I
think I'm right in saying it can't yet update from SVN after the initial
download.  David Barr is the main contact for svn-fe - he's an
experienced mentor and will be able to tell you all about the juicy
low-hanging fruit.

One limitation of svn-fe is that it downloads the whole SVN repository
into a single git branch, without separating out trunk, branches and
tags.  I've been working on this problem over the past few months, and
have split it into three parts (a language to describe which directories
are branches etc., export from SVN to that language, and import from
that language to git).  I'd be very flattered if you wanted to work on
this, but I couldn't honestly recommend it over svn-fe.  The language
itself is a one man job that doesn't have much creative work left; SVN
export is all about exposing yourself to weird little abuses of version
control that don't teach you much beyond bad habits; and while git
import would be a fun little project, I don't know enough about git's C
implementation to provide any useful mentoring.

Good luck with the summer, and as an svn-fe user I hope you're very
productive :)

	- Andrew

[1] https://github.com/git/git/tree/master/vcs-svn
[2] https://github.com/git/git/tree/master/contrib/svn-fe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-19 14:42 GSoC intro Florian Achleitner
  2012-03-19 21:31 ` Andrew Sayers
@ 2012-03-20 12:25 ` Florian Achleitner
  2012-03-20 13:19 ` David Barr
  2 siblings, 0 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-03-20 12:25 UTC (permalink / raw)
  To: Git Mailing List

On Monday 19 March 2012 21:31:34 you wrote:
> [...]
> Good luck with the summer, and as an svn-fe user I hope you're very
> productive
> 
>         - Andrew
> 
> [1] https://github.com/git/git/tree/master/vcs-svn
> [2] https://github.com/git/git/tree/master/contrib/svn-fe

Thanks for the starting points Andrew!

-- Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-19 14:42 GSoC intro Florian Achleitner
  2012-03-19 21:31 ` Andrew Sayers
  2012-03-20 12:25 ` Florian Achleitner
@ 2012-03-20 13:19 ` David Barr
  2012-03-21 21:16   ` Florian Achleitner
  2 siblings, 1 reply; 46+ messages in thread
From: David Barr @ 2012-03-20 13:19 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Git Mailing List, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Ramkumar Ramachandra, Dmitry Ivankov

Hi Florian,

> I'm curious about applying for GSoC 2012 considering the idea "Remote helper
> for Subversion".
> I'm using git since years and have converted my svn repos to git years ago,
> but I'm not yet familiar with the pre-work on this topic. Is there a branch in
> git's git?

Much of the progress so far has been merged into master.
Still outstanding are some of Dmitry's patches:
remote-svn-alpha_v2 [1]
svn-fe-options_v7 [2]

> Does a "full-featured bi-directional git-remote-svn" mean, that it should work
> like any remote git repository where you can push to and fetch from?

Yes, that's the plan. To be fair, it is a stretch goal. Two GSoC
students have brought us as far as a read-only remote helper. So I
think there's at least two summers' worth of work remaining.

> Below I briefly introduce myself, for those who are interested.
>
> About me
> My name is Florian Achleitner (IRC: FlyingFlo). I'm
> from Austria and I study Telematics (a blend of computer science and
> electric engineering) at the Graz University of Technology. I'm currently in
> the first year of the master program. Before starting my studies I worked for
> four years as a developer of embedded systems in industry.
>
> My programming experience grew since I started writing programs on TI
> calculators in school probably 15 years ago.
> I'm open-source enthusiast, exclusively using Linux since years.
> I currently work as teaching assistant for an exercise about programming
> operating systems. In this course we also teach the students to use git.

Thanks for the introduction. When I first got involved with this
sub-project, I gave a quick self introduction [3]. As a potential
mentor, it would be prudent to let you know what my commitments are.
My day job is primarily to contribute to chromium.org and webkit.org.
I also have a 20% commitment to git-core and related projects.

> About me and GSoC
> In summer 2010 I participated in GSoC for hugin writing a Makefile-creation
> library in C++, which is used to drive the panorama creation [1]. It was a
> great experience and a cool, successful summer job! ( and it was merged in
> hugin's master branch :-) )

> [1] http://hugin.hg.sourceforge.net/hgweb/hugin/hugin/branches branch:
> gsoc2010_makefilelib (unfortunately the web fronted doesn't display a specific
> branch)

A track record is a plus.

> Why git?
> - I use git daily. It's always good to work on things you use and a chance to
> contribute something.

I'm sure this is the reason most git contributors are here.

> - I like C
> - I used svn. Nowadays I only use it if i have to ;)

You're in good company.

> - The community interaction aspect of open source development is very
> interesting.. as the ideas page says ".. and get it merged into upstream Git."

The git contributors are mostly a pleasure to work with. The volume
and quality of feedback to contribution, especially from newcomers,
sets it apart from the other communities I participate in.

Some extra reading:

To catch up on the current state of the art with respect to
translating Subversion history read:
Another bite of the reposturgeon, Eric S. Raymond [4].
Unfortunately, he hasn't published the code quite yet.
However, he did what we have been lax to do and contacted the
Subversion developers to assist updating protocol documentation [5].
I think the corner cases for the Subversion delta format are still
undocumented [6].

[1] https://github.com/divanorama/git/tree/remote-svn-alpha_v2
[2] https://github.com/divanorama/git/tree/svn-fe-options_v7
[3] http://thread.gmane.org/gmane.comp.version-control.git/143187/focus=143201
[4] http://esr.ibiblio.org/?p=4071
[5] http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
[6] http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff

> Regards,
> Flo

--
David Barr

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-20 13:19 ` David Barr
@ 2012-03-21 21:16   ` Florian Achleitner
  2012-03-26 11:06     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 46+ messages in thread
From: Florian Achleitner @ 2012-03-21 21:16 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Ramkumar Ramachandra, Dmitry Ivankov

Hi!

After the exam today, I started to dig into the topic a little.
So I accumulated some questions ..

On Wednesday 21 March 2012 00:19:41 David Barr wrote:
> Much of the progress so far has been merged into master.
> Still outstanding are some of Dmitry's patches:
> remote-svn-alpha_v2 [1]
> svn-fe-options_v7 [2]

I tried to find svn-related parts in gits sources. I found:
 - the huge ./git-svn.perl, which seems to be the git-svn implementation.
 - ./contrib/svn-fe/ and ./vcs-svn/, 
those you pointed me at.
Did I miss something?
Is there any seperate source documentation? The source files I looked at 
contain only very few comments. And nothing about the big picture.
I built make doc, but it seems it's mostly user documentation.

> Yes, that's the plan. To be fair, it is a stretch goal. Two GSoC
> students have brought us as far as a read-only remote helper. So I
> think there's at least two summers' worth of work remaining.

What is the remote helper? How can I use/try it?
> [1] https://github.com/divanorama/git/tree/remote-svn-alpha_v2
Is it in here? Should my project continue on this work?
Until now, I've never used any remote that was not git.

> > About me and GSoC
> > In summer 2010 I participated in GSoC for hugin writing a
> > Makefile-creation library in C++, which is used to drive the panorama
> > creation [1]. It was a great experience and a cool, successful summer
> > job! ( and it was merged in hugin's master branch :-) )
> > 
> > [1] http://hugin.hg.sourceforge.net/hgweb/hugin/hugin/branches branch:
> > gsoc2010_makefilelib (unfortunately the web fronted doesn't display a
> > specific branch)
> 
> A track record is a plus.

If you like, I could provide more references, e.g. a university course project 
in C using git.

> Some extra reading:
> [...]
Haven't yet read it.

Hm, and there are still some general questions:
What about git-svn? Whats wrong with it? (I haven't used it) I saw the huge 
perl script, this looks a little extreme ;). But it provides bi-directional 
access?!

svn-fe reads a dump of the svn repo. How can this approach ever be 
bidirectional? Probably I've to do the extra reading first .. 

-- 
Florian 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-21 21:16   ` Florian Achleitner
@ 2012-03-26 11:06     ` Ramkumar Ramachandra
  2012-03-27 13:53       ` Florian Achleitner
  2012-03-28  8:09       ` GSoC intro Miles Bader
  0 siblings, 2 replies; 46+ messages in thread
From: Ramkumar Ramachandra @ 2012-03-26 11:06 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Dmitry Ivankov

Hi Florian,

Florian Achleitner wrote:
> On Wednesday 21 March 2012 00:19:41 David Barr wrote:
>> Much of the progress so far has been merged into master.
>> Still outstanding are some of Dmitry's patches:
>> remote-svn-alpha_v2 [1]
>> svn-fe-options_v7 [2]
>
> I tried to find svn-related parts in gits sources. I found:
>  - the huge ./git-svn.perl, which seems to be the git-svn implementation.
>  - ./contrib/svn-fe/ and ./vcs-svn/,
> those you pointed me at.
> Did I miss something?
> Is there any seperate source documentation? The source files I looked at
> contain only very few comments. And nothing about the big picture.

A lot of big-picture discussions can be found in mailing list
archives.  Let us know what you're looking for exactly.

>> Yes, that's the plan. To be fair, it is a stretch goal. Two GSoC
>> students have brought us as far as a read-only remote helper. So I
>> think there's at least two summers' worth of work remaining.
>
> What is the remote helper? How can I use/try it?

The remote helper is an external program that git invokes to handle
specific protocols.  See ./git_remote_helpers for example.

>> [1] https://github.com/divanorama/git/tree/remote-svn-alpha_v2
> Is it in here? Should my project continue on this work?
> Until now, I've never used any remote that was not git.

You might also decide to build a brand new remote helper.

>> A track record is a plus.
>
> If you like, I could provide more references, e.g. a university course project
> in C using git.
>
>> Some extra reading:
>> [...]
> Haven't yet read it.
>
> Hm, and there are still some general questions:
> What about git-svn? Whats wrong with it? (I haven't used it) I saw the huge
> perl script, this looks a little extreme ;). But it provides bi-directional
> access?!

The main problem with git-svn.perl is that it's hard to maintain or
extend.  See also: David Barr's LCA talk [1].

> svn-fe reads a dump of the svn repo. How can this approach ever be
> bidirectional? Probably I've to do the extra reading first ..

It can't.  You'll have to write something to handle the Git -> SVN
conversion.  See also: one of my earlier attempts in this regard [2].

[1]: http://www.youtube.com/watch?v=0hVuv-wv4Dw
[2]: https://github.com/artagnon/git/tree/svn-fi

    Ram

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-26 11:06     ` Ramkumar Ramachandra
@ 2012-03-27 13:53       ` Florian Achleitner
  2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
  2012-03-28  8:09       ` GSoC intro Miles Bader
  1 sibling, 1 reply; 46+ messages in thread
From: Florian Achleitner @ 2012-03-27 13:53 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Dmitry Ivankov

Hi Ramkumar, and everybody!

Thanks for your tips!
I'm currently a little quiet on that topic because I'm working hard on a 
university project that we need to kick off this week, because easter holidays 
start in a few days. I'll then switch back to git immediatly, it's proposal 
time!

On Monday 26 March 2012 16:36:46 Ramkumar Ramachandra wrote:

> 
> A lot of big-picture discussions can be found in mailing list
> archives.  Let us know what you're looking for exactly.
>  
> The main problem with git-svn.perl is that it's hard to maintain or
> extend.  See also: David Barr's LCA talk [1].

The talk from David Barr is exactly what I was searching (about the big 
picture and so ..). It gives a good introduction to what you guys already know 
very well, of course. Like what exists, which are the well-known problems..
Thx for the link!
 
>     Ram

Florian 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-26 11:06     ` Ramkumar Ramachandra
  2012-03-27 13:53       ` Florian Achleitner
@ 2012-03-28  8:09       ` Miles Bader
  2012-03-28  9:30         ` Dmitry Ivankov
  1 sibling, 1 reply; 46+ messages in thread
From: Miles Bader @ 2012-03-28  8:09 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Florian Achleitner, David Barr, Git Mailing List, Andrew Sayers,
	Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov

Ramkumar Ramachandra <artagnon@gmail.com> writes:
>> Hm, and there are still some general questions:
>> What about git-svn? Whats wrong with it? (I haven't used it) I saw the huge
>> perl script, this looks a little extreme ;). But it provides bi-directional
>> access?!
>
> The main problem with git-svn.perl is that it's hard to maintain or
> extend.  See also: David Barr's LCA talk [1].

git-svn's also pretty annoying to use (e.g.  the way dcommit rebases
anything you push to svn, which makes juggling local git branches
problematic; ugh)... :/

-miles

-- 
.Numeric stability is probably not all that important when you're guessing.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSoC intro
  2012-03-28  8:09       ` GSoC intro Miles Bader
@ 2012-03-28  9:30         ` Dmitry Ivankov
  0 siblings, 0 replies; 46+ messages in thread
From: Dmitry Ivankov @ 2012-03-28  9:30 UTC (permalink / raw)
  To: git

Miles Bader <miles <at> gnu.org> writes:

> git-svn's also pretty annoying to use (e.g.  the way dcommit rebases
> anything you push to svn, which makes juggling local git branches
> problematic; ugh)... :/

There is git svn dcommit --no-rebase to disable the rebase step.
"upstream svn" branch history will still diverge from your local branch though.

> 
> -miles
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* GSOC Proposal draft: git-remote-svn
  2012-03-27 13:53       ` Florian Achleitner
@ 2012-04-02  8:30         ` Florian Achleitner
  2012-04-02 11:00           ` Ramkumar Ramachandra
                             ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-04-02  8:30 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Dmitry Ivankov

Hi everybody!

Here is my draft of the proposal for the GSoC project. RFC!
Please comment and tell me what you think and if I understood it all right!

I spent a lot of lines with wiriting about the current situation. This is 
mostly because, as a newbee, I spent a lot of time examining what we already 
have and wrote it down finally.

The draft is inlined below. I hope it's not too long to read. I will put it on 
a github wiki later, once i figure out how this works ;)

Florian


==Remote helper for Subversion==

==Introduction==
{ for non-insiders }
Git [1] is a powerful distributed version control system (DVCS). "Distributed" 
means that everybody works on a full featured repository. To collaborate with 
other user's repositories git can fetch and pull from remote repositories 
using several transports (http://, ssh://, git://, ...). Git has a very 
powerful and useful concept of branches. They are lightweight pointers to 
commits (heads).

Subversion (svn) [2] was created as a successor of CVS, both follow a strict 
client-server design, where the repository exclusively lives on the central 
server and every client only checks out a copy of a single revision at a time. 
SVN doesn't truly have a concept of branches. SVN branches are a copy of a 
directory (so are tags).

==What we want (the general goal)==
short: 
git clone svn://<url>
git push
git fetch

A full-featured bi-directional remote helper for svn that allows git to use a 
svn repository as a remote, mostly like a remote git repo.
Remote helpers are separate programs invoked by git to communicate with 
foreign repositories. They are used by transceiving a command and data stream 
via stdin and stdout.

The remote helper interface [2] supports commands that deliver a git-fast-
import stream from the remote repo.

git-fast-import [4] is a format to serialize a git repository into a text 
format. It is used by the tools git-fast-import and git-fast-export.

The remote helper has to convert the foreign protocol and data (svn) to the 
git-fast-import format.

==What are the challenges? ==
To summarize: The way git tracks the state of the working tree and svn's way 
are different in several aspects. This makes a direct mapping impossible.
There are lots of discussions about these issues on the git mailing list [5].

Some aspects: (I'm sure this is incomplete)
- svn commit and file metadata, it's symlink and permission representation have 
to to be mapped to git.

- svn history can only be extracted from the server (we have svnrdump for 
that)

- svn commits are only possible after updating the working copy first, i.e. 
fetching and merging new revisions on the server. This is like implicitly 
rebasing your local work on the remote head before pushing to an svn 
repository.
In git there is of course no such restriction.

- and the most challenging: mapping subversion branches to git branches. 
In svn a branch is created by copying a directory with 'svn copy'. svn doesn't 
have a concept of branches by itself. 

Branches exist due to the convention of having branches/, trunk/, and tags/ 
directories in a repository, so do tags. But this is not mandatory and 
therefore there are many different layouts. It follows that in svn it is also 
possible to commit across branches. This means that a single commit can change 
files on more than one branch (accidentally or deliberately).
To convert svn branches to git we have to detect branch semantics by examining 
the svn tree's structure and it's metadata (it has a 'copyfrom' property). 
Previous efforts show that this will not be possible fully automatically 
without configuration and interaction with the user.

This brings us to:
==What we have: (existing work)==
Andrew Sayers is currently developing a language to describe svn to git branch 
mapping [6]. I plan to use the language as a configuration for the remote 
helper that specifies unclear aspects.

"esr" developed a tool to manipulate and export subversion repositories [7] 
that should be able to detect branches, but it's sources are not available 
yet.

In git's tree there is git-svn, a huge Perl script used to convert svn to git. 
It detects branches, but with problems. It also supports some kind of pushing 
commits to svn using a separate command. It's problem: it's unmaintainable, 
bugs are hard to locate and to fix.

There are several other one-way conversion tools, e.g. svn-fast-export, 
svn2git.py.

In git's source tree we have a vcs-svn/, a set of functions to convert svn 
dumps to git-fast-import streams. Those are used by svn-fe to one-way import 
svn history to git. svn-fe doesn't do branch mapping yet.

We have Ramkumar Ramachandra's svnrdump [8] which now lives in the svn source 
tree. It can create dump files [9] from remote svn servers and load dump files 
up to svn server.
It practically provides read-write access to svn using a text format.

There is a prototype remote helper from Dmitry Ivankov. A bash script 
providing one way fetching from svn via svnrdump and svn-fe.

{ did I miss something important? }

==Project outline==
Please look at the drawing on:
http://filestore.mg34.vc-graz.ac.at/flo/drawing.svg

1. Write a new bi-directional remote helper in C. 
  - It uses vcs-svn utilities to convert svn dumps to git-fast-import and 
vice-versa.
  - It calls svnrdump as a backend to communicate with svn.
  - It reads a configuration file containing branch mappings according to [6]. 
These mapping have to be pre-generated using tools developed along with the 
language. The remote helper has no way of asking the user what to do. It will 
fail if a mapping is unclear.
  - Because generating the branch mapping configuration already requires that 
you have a dump of the svn repo, the helper should probably be able to read 
from a file in place of svnrdump too.
  - Using the config the helper translates svn branches/tags to git 
branches/tags and converts other metadata as applicable. It probably has to 
store some information about the mapping in a file in .git to allow a 
reconstruction on subsequent invocations. I think this is especially important 
when pushing to branches (does it already exist in svn, and where? is it new).
  - It communicate with git via the fast-import format. The remote helper 
interface (will have)|has commands for that.

2. Extend the remote helper interface as necessary to read and write fast-
import streams to remote helpers

3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only 
convert svn to git. To push to svn we also need conversion and mapping from 
git to svn. The actual mapping code for branches should also be placed here 
{??} and called by the remote helper.

{ Hmm.. so it looks like thats a lot? what do you think? }

Timeline
{ Still to come !}

About me
{ I sent an introduction to the list already, so I'll not copy it here. But it 
will be in the application on GSOC site.}

[1] http://git-scm.com/
[2] http://subversion.tigris.org/
[3] git sources git/Documentation/git-remote-helpers.html
[4] git sources git/Documentation/git-fast-import.html
[5] http://thread.gmane.org/gmane.comp.version-control.git/192106
[6] https://github.com/andrew-sayers/SVN-Branching-Language
[7] http://esr.ibiblio.org/?p=4071
[8] http://svnbook.red-bean.com/en/1.7/svn.ref.svnrdump.html
[9] svn sources subversion/notes/dump-load-format.txt
[10] https://github.com/divanorama/git/blob/remote-svn-alpha/contrib/svn-
fe/git-remote-svn-alpha

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
@ 2012-04-02 11:00           ` Ramkumar Ramachandra
  2012-04-02 20:57           ` Jonathan Nieder
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Ramkumar Ramachandra @ 2012-04-02 11:00 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Dmitry Ivankov

Hi Florian,

Florian Achleitner wrote:
> - svn commits are only possible after updating the working copy first, i.e.
> fetching and merging new revisions on the server. This is like implicitly
> rebasing your local work on the remote head before pushing to an svn
> repository.

This shouldn't worry you, because we don't have a Git -> SVN converter
yet.  However, I have written a prototype svn-fi.  Unfortunately, due
to the way marks work in fast-import, svn-fi is far from complete.

See: https://github.com/artagnon/git/tree/svn-fi

> Branches exist due to the convention of having branches/, trunk/, and tags/
> directories in a repository, so do tags. But this is not mandatory and
> therefore there are many different layouts. It follows that in svn it is also
> possible to commit across branches. This means that a single commit can change
> files on more than one branch (accidentally or deliberately).
> To convert svn branches to git we have to detect branch semantics by examining
> the svn tree's structure and it's metadata (it has a 'copyfrom' property).
> Previous efforts show that this will not be possible fully automatically
> without configuration and interaction with the user.

See also: http://article.gmane.org/gmane.comp.version-control.git/150007

> "esr" developed a tool to manipulate and export subversion repositories [7]
> that should be able to detect branches, but it's sources are not available
> yet.

Sources are available at git://gitorious.org/reposurgeon/reposurgeon.git
Do let us know how SBL compares to reposurgeon.  Personally, I like
the idea of a standard "language" to express the mapping.

> In git's source tree we have a vcs-svn/, a set of functions to convert svn
> dumps to git-fast-import streams. Those are used by svn-fe to one-way import
> svn history to git. svn-fe doesn't do branch mapping yet.

Are you planning to extend svn-fe to do the mapping, write it as a
separate program, or write it into the remote helper? I personally
don't mind if the mapping is done in Perl (like in git-svn or SBL) as
opposed to C; mapping is just parse-intensive.

> 1. Write a new bi-directional remote helper in C.
> [...]
>  - It reads a configuration file containing branch mappings according to [6].
> These mapping have to be pre-generated using tools developed along with the
> language. The remote helper has no way of asking the user what to do. It will
> fail if a mapping is unclear.

Right.

>  - Because generating the branch mapping configuration already requires that
> you have a dump of the svn repo, the helper should probably be able to read
> from a file in place of svnrdump too.

You can clone the SVN dumpstream from svnrdump using tee (or similar),
sending one copy to svn-fe and another to the SBL configuration
generator.

>  - Using the config the helper translates svn branches/tags to git
> branches/tags and converts other metadata as applicable. It probably has to
> store some information about the mapping in a file in .git to allow a
> reconstruction on subsequent invocations. I think this is especially important
> when pushing to branches (does it already exist in svn, and where? is it new).

How will the actual mapping be done?  Using filter-branch's
subdirectory filter, or something else?

> 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only
> convert svn to git. To push to svn we also need conversion and mapping from
> git to svn. The actual mapping code for branches should also be placed here
> {??} and called by the remote helper.

I think this bit sounds overtly ambitious.  I think if you can build a
seamless one-way SVN -> Git bridge in one summer, it'll be quite an
achievement in itself.  Finishing and getting svn-fi merged should be
last priority; I'll try to work on it myself in summer.

    Ram

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
  2012-04-02 11:00           ` Ramkumar Ramachandra
@ 2012-04-02 20:57           ` Jonathan Nieder
  2012-04-02 23:04             ` Jonathan Nieder
                               ` (2 more replies)
  2012-04-02 22:17           ` Andrew Sayers
  2012-04-05 13:36           ` Florian Achleitner
  3 siblings, 3 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-02 20:57 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Ramkumar Ramachandra, David Barr, Git Mailing List,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Hi Florian,

Florian Achleitner wrote:

> Here is my draft of the proposal for the GSoC project. RFC!
> Please comment and tell me what you think and if I understood it all right!

I like the rough idea.  I also agree with Ram that the scope seems too
wide for one summer and think it would be useful to narrow the scope a
little.

Some tasks I can think of:

 - getting Dmitry's importer into contrib/ and making sure it works
   reliably.  This might require some fixes to svnrdump, svn-fe,
   and the transport-helper.  Some known problems that I suspect may
   be still unresolved:

   - files marked with both svn:special (symlink) and svn:executable

   - dealing with after-the-fact edits to the svn repository.  For
     example, revprops including svn:log can be and often are changed
     after the fact.

   - what happens when the connection to the Subversion server is
     interrupted?  The Subversion dump format does not have an
     "end of commit" marker so currently we can get confused and
     seem to succeed.

   - svn-fe does not correctly handle revs that change a text file to
     a symlink or vice versa without changing its text.

 - UI for importing only some revisions (e.g., "all revisions after
   r1000").  Dmitry has a patch for the svn-fe plumbing to handle
   this but I don't think the corresponding change for the remote
   helper has been written.

   - this would probably also require changes to svnrdump.  What
     happens when r2000 involves copying a file from a version before
     r1000?  If imports do not start at r0, normal dumps of r1000:
     are not self-contained.

 - UI for storing the mapping between Subversion revision numbers and
   git commit names in the git object db somewhere.  Currently we
   store it in a marks file.  There is a script floating around to
   convert that marks file into a set of commit notes and Dmitry also
   has a patch for svn-fe to make it write commit notes directly.
   What happens when the notes and marks file go out of sync --- which
   is authoritative?

   This also implies that repeated fetches would not have to start
   importing again at r1.

 - Storing empty directories and path-specific properties like
   svn:ignore that we don't currently handle.

 - Splitting history into branches.

   Somehow svn-fe has to communicate "svn cp" source and target
   information to the branch mapper so we can trace history to before
   the birth of the paths we are following.  That is, the full history
   of branches/1.7.x/ includes the early history of trunk/ if the
   1.7.x branch was originally created as a copy of the trunk.

   This might be able to use mechanism similar to storage of
   empty directories and path properties.

 - UI for importing only a subset of paths (e.g., "just the trunk").

   - this would probably also require changes to svnrdump.  What
     happens when r2000 involves copying a file from a branch we
     have chosen not to import?

 - Mapping authorship information from Subversion (which usually
   amounts to a remote username) to something more idiomatic in git
   (usually a human's name and email address) in a way that makes
   round trips possible.

 - Sharing an imported repository with other users of the remote
   helper.

   - this might involve changes to the remote helper machinery to
     allow new clones to use some fetch/push ref specification
     different from refs/heads/*:refs/remotes/origin/*, or it might
     involve some change to core git to automatically push notes
     corresponding to some refs in some situations.

 - Importing <rev, path> pairs that have multiple parents.  In the
   subversion model, path nodes have only one (copyfrom) parent,
   but repositories can use the svn:mergeinfo property to indicate
   that changes made in certain revs to another patch have been
   incorporated.  Under what circumstances is that enough
   justification to add a second parent on the git side?

   - Because svn:mergeinfo is a normal path property, the branch
     mapper could have enough information to take care of this with
     the help of the previously mentioned facility for storing path
     properties.

All of the above is just for reasonable fetch support.

For push support, one early problem to solve would be that pushing
a commit so that the git commit id from re-importing it is the same
requires permission to set the svn:date property.  Is our target
audience one that already has that permission?  Is that permission
something reasonable for a committer to ask for from the repository
admin in order to use the remote helper?

Because of the above:

> 1. Write a new bi-directional remote helper in C. 

The word "new" makes me worried that you'd be throwing away whatever
work already exists. :)

[...]
> { Hmm.. so it looks like thats a lot? what do you think? }

I agree --- what you've described is more than one summer's worth
of work.  Are there any aspects you're particularly interested in
focusing on?  For example,

 (1) If we focus on repositories without any branching structure at
     all and where the user has full ability to write whatever she
     pleases to the repository, I think developing a bidirectional
     remote helper is feasible during the summer.  Round-trip
     support (i.e., commit ids staying the same with a push followed
     by a fetch) is feasible with such a quick plan if we're willing
     to store some git-specific junk in the repo.

 (2) Regarding a tool that sits between svn-fe and the remote helper
     and implements the "follow parent" rule for tracing the full
     history of a single (linear) branch: I think developing that
     _and_ getting it merged could fit in the summer.

 (3) Regarding storing and sharing Subversion's path-specific
     and revision-specific properties: I think implementing a
     mechanism for that and getting it merged could fit in one
     summer.

 (4) Regarding getting git weirdness like distinct author and
     committer names, lack of rename information cooked at commit
     time, and timezones in author and committer dates handled during
     pushes to Subversion in a non-invasive way that is user-friendly
     for the pusher likely to be acceptable on the receiving side for
     normal projects: that could certainly fill a summer.

 (5) Subversion weirdness like revs that change the entire repository
     at once in a many-branch repo, non-standard file modes, and
     noticing and acting appropriately for svn:log messages that were
     changed after the fact could fill another summer.

So ideally I would like 5 students working on the remote helper
project. ;-)

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
  2012-04-02 11:00           ` Ramkumar Ramachandra
  2012-04-02 20:57           ` Jonathan Nieder
@ 2012-04-02 22:17           ` Andrew Sayers
  2012-04-02 22:29             ` Jonathan Nieder
  2012-04-05 13:36           ` Florian Achleitner
  3 siblings, 1 reply; 46+ messages in thread
From: Andrew Sayers @ 2012-04-02 22:17 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Ramkumar Ramachandra, David Barr, Git Mailing List,
	Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov

Hey Florian,

Comments below.  The nitpickier ones aren't so much there to help the
proposal as for general information.

On 02/04/12 09:30, Florian Achleitner wrote:
<snip>
> 
> Subversion (svn) [2] was created as a successor of CVS, both follow a strict 
> client-server design, where the repository exclusively lives on the central 
> server and every client only checks out a copy of a single revision at a time. 
> SVN doesn't truly have a concept of branches. SVN branches are a copy of a 
> directory (so are tags).

Just a little nitpick - SVN was primarily inspired by CVS, but there's
no formal connection between the projects - both are developed by
different development teams even to this day.

<snip>
> git-fast-import [4] is a format to serialize a git repository into a text 
> format. It is used by the tools git-fast-import and git-fast-export.
> 
> The remote helper has to convert the foreign protocol and data (svn) to the 
> git-fast-import format.

As discussed on IRC, I'd like to see some discussion of solutions that
use plumbing directly (e.g. git-commit-tree) if you choose to focus on
branch import.

<snip>
> Branches exist due to the convention of having branches/, trunk/, and tags/ 
> directories in a repository, so do tags. But this is not mandatory and 
> therefore there are many different layouts. It follows that in svn it is also 
> possible to commit across branches. This means that a single commit can change 
> files on more than one branch (accidentally or deliberately).

This is basically accurate, but a contrived example might help explain
why fully automatic branch export is impossible in the general case:

Imagine a repository that consists of a single revision with a single
file, "scratchpad/libfoo/foo.c" - how would we decide which directory is
the branch?  Has the author has even decided yet?  For example, he might
be learning version control and not understand what branches are.

Having said that, automatic branch export might be possible in some
important special cases (like repositories that use the standard
layout).  I haven't really looked into this yet.

<snip>
>   - Because generating the branch mapping configuration already requires that 
> you have a dump of the svn repo, the helper should probably be able to read 
> from a file in place of svnrdump too.

It might help if I explain how the SVN branch exporter will work:

First, it will read an SVN dump and create a file containing JSON blobs
summarising each revision - e.g. it specifies which files were changed,
but not the contents of the changes.  As Ram mentioned, downloading the
dump and tee'ing it to both this process and svn-fe makes a lot of sense.

Next, it will read the JSON file and detect trunks.  This turns out to
be extremely fast now it's been freed from the SVN dump format.

Next, the user will have the opportunity to review the detected trunks.
 For example, if somebody put a "README.txt" in the root directory, the
previous step will need to be rerun with that file ignored.

Next, the main branch detection stage will be run using the JSON file
and the previous branch information.

Next, the user has another chance to make changes.  Some users will blow
straight past this stage, but sufficiently fussy users with sufficiently
large repositories could spend several days looping through this and the
previous stage until their branches and merges are just right.

The SBL file is finally complete whenever the user decides - you'll need
to tell them how to restart the import process, in case they restarted
their computer while they were refining the file.

<snip>
> 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only 
> convert svn to git. To push to svn we also need conversion and mapping from 
> git to svn. The actual mapping code for branches should also be placed here 
> {??} and called by the remote helper.

I agree with Jonathan and Ram that we're not ready for this yet.  Even
mapping git branches back to a branchless representation won't be
practical until branch import is fairly mature.

	- Andrew

[1]https://github.com/andrew-sayers/Proof-of-concept-History-Converter/blob/master/git-branch-import.pl
[2]git sources git/Documentation/git-commit-tree.txt

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02 22:17           ` Andrew Sayers
@ 2012-04-02 22:29             ` Jonathan Nieder
  2012-04-02 23:20               ` Andrew Sayers
  0 siblings, 1 reply; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-02 22:29 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr,
	Git Mailing List, Sverre Rabbelier, Dmitry Ivankov

Hi Andrew,

Andrew Sayers wrote:
> On 02/04/12 09:30, Florian Achleitner wrote:

>> The remote helper has to convert the foreign protocol and data (svn) to the 
>> git-fast-import format.
>
> As discussed on IRC, I'd like to see some discussion of solutions that
> use plumbing directly (e.g. git-commit-tree) if you choose to focus on
> branch import.

Do you mean that fast-import is not a plumbing command?

>From the IRC log[1]:

> andrew_sayers	From my reading of the protocol, you'd have to pass
>              	all the files in for each branch.
> andrew_sayers	For each commit.

I'm a little confused by this.  Do you mean that a fast-import stream
is not allowed to use multiple branches, or that when a fast-import
stream represents a commit that changes one file, it needs to list
all files rather than the one that changed?  Neither is true.

The fast-import tool started as a tool to write objects to pack
directly, or in other words to save time by avoiding the step of
writing loose objects.  That is still one of its main benefits.

[...]
>> 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only 
>> convert svn to git. To push to svn we also need conversion and mapping from 
>> git to svn. The actual mapping code for branches should also be placed here 
>> {??} and called by the remote helper.
>
> I agree with Jonathan and Ram that we're not ready for this yet.

Just to be clear, I never said such a thing. :)

Thanks for some useful clarifications.
Jonathan

[1] http://colabti.org/irclogger/irclogger_log/git-devel?date=2012-04-02#l153

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02 20:57           ` Jonathan Nieder
@ 2012-04-02 23:04             ` Jonathan Nieder
  2012-04-03  7:49             ` Florian Achleitner
  2012-04-05 16:18             ` Tomas Carnecky
  2 siblings, 0 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-02 23:04 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Ramkumar Ramachandra, David Barr, Git Mailing List,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Jonathan Nieder wrote:

>  - Importing <rev, path> pairs that have multiple parents.  In the
>    subversion model, path nodes have only one (copyfrom) parent,
>    but repositories can use the svn:mergeinfo property to indicate
>    that changes made in certain revs to another patch have been

The above should say "... changes made in certain revs to another
_path_ ...".

>    incorporated.  Under what circumstances is that enough
>    justification to add a second parent on the git side?

One subtlety here is that sometimes people merge almost everything
from some branch but leave a few revisions out.  Imagine the following
history:

 o --- B1 --- B2 --- B3 ---- B4 -- F' ---- B6 --- B7 --- B8 [branch]
  \                                 \                     \
   \                                 \                     \
    T1 --- F ------------------------ M1 ------------------ M2 [trunk]

The bugfix F was applied to the trunk first and then applied to the
branch as rev F'.  Then the maintainer merged the remaining changes
B1, B2, B3, B4 from the branch to trunk.  In git this operation would
be carried out by running "git merge branch".  Finally some more
changes were made on the branch and the maintainer merged those to
trunk, too.

In subversion, this could be done like so:

 1. Make commit T1 on trunk.
 2. Make commit F on trunk.
 3. Make commits B1, B2, B3, B4 on branch.
 4. Make commit F' on branch, either using "svn merge" or by hand.
 5. Merge changes B1, B2, B3, B4 from branch to trunk using
    "svn merge -r o:B4 <url for branch>" and commit.
 6. Make commits B6, B7, B8 on branch.
 7. Merge changes B6, B7, B8 from branch to trunk using
    "svn merge -r F':B8 <url for branch>" and commit.

The resulting svn:mergeinfo property on trunk in revision M1 would
look like this:

	/branches/branch:B1-B4

To a naive importer, this looks like a merge of B4.  The svn:mergeinfo
property on trunk in revision M2 would look like this:

	/branches/branch:B1-B4,B6-B8

which looks like a bunch of cherry-picks rather than a merge, since
it looks like this almost-merge leaves out F'.

If the maintainer used "svn merge --reintegrate" instead, the
svn:mergeinfo properties are a little simpler, so maybe I am worrying
for no good reason.  Anyway, hopefully that makes the setup a little
clearer.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02 22:29             ` Jonathan Nieder
@ 2012-04-02 23:20               ` Andrew Sayers
  2012-04-03  0:09                 ` Jonathan Nieder
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Sayers @ 2012-04-02 23:20 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr,
	Git Mailing List, Sverre Rabbelier, Dmitry Ivankov

On 02/04/12 23:29, Jonathan Nieder wrote:
> Hi Andrew,
> 
> Andrew Sayers wrote:
>> On 02/04/12 09:30, Florian Achleitner wrote:
> 
>>> The remote helper has to convert the foreign protocol and data (svn) to the 
>>> git-fast-import format.
>>
>> As discussed on IRC, I'd like to see some discussion of solutions that
>> use plumbing directly (e.g. git-commit-tree) if you choose to focus on
>> branch import.
> 
> Do you mean that fast-import is not a plumbing command?

Sorry, that wasn't clear.  I meant commands that just expose a single
primitive bit of functionality (like git-commit-tree) instead of those
that present an abstract interface to the whole git machinery (like
git-fast-import).  I'm not sure what the right word for that would be?

I agree it's possible to use fast-import for this problem, but it seems
like it's redundant after svn-fe has already loaded everything into git.
 For example, if svn-fe loaded three revisions into the master branch,
you could create a trunk branch by doing something like:

COMMIT=$( git show -s --pretty=%b master^^ | \
          git commit-tree master^^:trunk )
COMMIT=$( git show -s --pretty=%b master^ | \
          git commit-tree master^:trunk -p $COMMIT )
COMMIT=$( git show -s --pretty=%b master | \
          git commit-tree master:trunk -p $COMMIT )
echo $COMMIT > .git/refs/heads/foo

The point I was making in IRC was that (so far as I understand)
fast-import doesn't let you pass trees around in this way, but instead
requires you to transmit the contents of all the changed files.

The code above could of course be achieved more easily with
git-filter-branch, or achieved more efficiently with a custom bit of C.
 I suggested discussing the problem in terms of single-purpose commands
because it strikes me as about the right level to expose the
architectural questions without getting bogged down in detail.

	- Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02 23:20               ` Andrew Sayers
@ 2012-04-03  0:09                 ` Jonathan Nieder
  2012-04-03 21:53                   ` Andrew Sayers
  0 siblings, 1 reply; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-03  0:09 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr,
	Git Mailing List, Sverre Rabbelier, Dmitry Ivankov

Andrew Sayers wrote:

> Sorry, that wasn't clear.  I meant commands that just expose a single
> primitive bit of functionality (like git-commit-tree) instead of those
> that present an abstract interface to the whole git machinery (like
> git-fast-import).

Ok.  I think you are misunderstanding the purpose of fast-import[1] but
it doesn't take away from what you're saying.

> I agree it's possible to use fast-import for this problem, but it seems
> like it's redundant after svn-fe has already loaded everything into git.

Right, I missed your point here before.  The fundamental question is
not about what commands to use but about the order of operations.

1. In one scheme, first you import the whole tree without splitting it
   into branches, with a tool like svn-fe.  Afterwards, you
   postprocess the resulting repository with tools like "git
   filter-branch --subdirectory-filter".  The result of the import can
   depend on all revisions --- you can say, in rev 1, "I'm not sure
   whether this new directory is a branch; let me see how it develops
   by rev 1000 to decide how to process it".

2. In another scheme, you only import the subset of the repository
   you are interested in.  This is what git-svn does, for example.
   This requires the branch discovery to happen at the same time as
   the import, because otherwise there is no way to tell what subset
   of the repository you are actually interested in.

3. Lastly, in yet another scheme, you import the whole tree and it is
   split into branches on the fly.  The advantages relative to (1) are:

   - impatient people can peek at the partial result of the import as
     it happens

   - the result of importing rev n is guaranteed to depend only on
     revs <= n, so different people importing at different times will
     get the same commits (assuming nobody is rewriting early history
     behind the scenes) and it is obvious how to support incremental
     importants to expand a repository with all revs <= n to a
     repository with all revs <= 2n

   However, if splitting branches only can happen during the initial
   import, that makes it harder to tweak the configuration and try
   again to see what changes.

The relevant technical difference is that in the naive implementation
of scheme (2) you can make use of arbitrary information available over
svn protocol, in naive scheme (3) you can only use information that
makes it into the fast-import stream, and in naive scheme (1) you can
only use information that makes it into the actual git repository.  So
to use scheme (1) you need to make sure svn-fe stores all interesting
data in a visible way, including copyfrom info (which is not a bad
idea anyway).

[...]
> The point I was making in IRC was that (so far as I understand)
> fast-import doesn't let you pass trees around in this way, but instead
> requires you to transmit the contents of all the changed files.

fast-import's "ls" command allows exactly what you are talking about,
and svn-fe uses it to copy subtrees from earlier revs into later ones
when it receives an "svn cp" command.

See [2] for some work that preexists that.

Did I understand correctly?
Jonathan

[1] By acting as a single process that takes a stream of commands it
really is able to do something that no other plumbing command can do.
[2] http://thread.gmane.org/gmane.comp.version-control.git/158375

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02 20:57           ` Jonathan Nieder
  2012-04-02 23:04             ` Jonathan Nieder
@ 2012-04-03  7:49             ` Florian Achleitner
  2012-04-03 18:48               ` Jonathan Nieder
  2012-04-05 16:18             ` Tomas Carnecky
  2 siblings, 1 reply; 46+ messages in thread
From: Florian Achleitner @ 2012-04-03  7:49 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, David Barr, Git Mailing List,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Hi!

I'm curiously watching the discussion I kicked off with my proposal. Before 
refining the proposal I think I will let the discussion continue at the moment.
But just to clarify some things:
You know I'm rather new to this topic. I've used svn and git, I know what git 
plumbing is about, but I haven't used plumbing commands to write something 
into git yet. So I can't tell from experience if it would be good or not, 
compared to fast-import.
So please explain what's the advantage/disadvantage of which design decision.
That makes it easier to get the point.

I'm also not yet familiar with svn's internals and what properties they use 
for what. 
So there are several questions I simply don't have an answer for.
I know that you have discussed several issues in a huge lot of mails on this 
list. I'm watching and learning currently.

Jonathan wrote about a script "floating around". What's that?
Is it somewhere in a tree in some repo, is at a patch somewhere in a mail on 
the list, is it in git.git in some branch?? 
How does one find catch floating scripts?

And two clarifications about what I meant in the proposal:

On Monday 02 April 2012 16:30:14 Ramkumar Ramachandra wrote:
> Are you planning to extend svn-fe to do the mapping, write it as a
> separate program, or write it into the remote helper? I personally
> don't mind if the mapping is done in Perl (like in git-svn or SBL) as
> opposed to C; mapping is just parse-intensive.

I personally don't like Perl. :p (I would use python if i need a scripting 
language).
As far as I've seen, svn-fe is a 5-liner calling functions in vcs-svn/. So I 
thought there is no point of piping something through svn-fe in the remote-
helper. I thought I would use those functions like svn-fe does.
I thought about vcs-svn/ being a library for svn interaction that the remote-
helper, and svn-fe, and svn-fi (?) are using.

On Monday 02 April 2012 15:57:00 Jonathan Nieder wrote:
> Florian Achleitner wrote:
> Because of the above:
> > 1. Write a new bi-directional remote helper in C.
> 
> The word "new" makes me worried that you'd be throwing away whatever
> work already exists. :)

Probably I missed something. 
But all I've seen that is directly a remote-helper is a bash script which 
basically calls a pipeline from svnrdump | svn-fe | fast-import [2]. 
I'm not planning to write a longer program in bash. (I personally use bash 
only for things that fit on one terminal height).

Bash and Perl are not my favourites ;)

[1] https://github.com/divanorama/git/blob/remote-svn-alpha_v2/contrib/svn-
fe/git-remote-svn-alpha

Cheers,
Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-03  7:49             ` Florian Achleitner
@ 2012-04-03 18:48               ` Jonathan Nieder
  0 siblings, 0 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-03 18:48 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Ramkumar Ramachandra, David Barr, Git Mailing List,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Hi,

Florian Achleitner wrote:

> You know I'm rather new to this topic. I've used svn and git, I know what git 
> plumbing is about, but I haven't used plumbing commands to write something 
> into git yet. So I can't tell from experience if it would be good or not, 
> compared to fast-import.

Yes, no problem.  I think the question of using fast-import or other
commands is not a fundamental one.

> So please explain what's the advantage/disadvantage of which design decision.
> That makes it easier to get the point.

The main advantages of using fast-import are:

 - it's faster (assuming it works correctly) :)
 - there are backends for version control systems other than git
 - remote helpers can declare the export/import capabilities to support other
   version control systems, instead of declaring fetch/push and supporting
   only git

However, whatever tools you use, the immediate idea is to transfer
data between a Subversion repository and a Git repository, and the
problems to be solved are the same.

[...]
> I'm also not yet familiar with svn's internals and what properties they use
> for what.
> So there are several questions I simply don't have an answer for.
> I know that you have discussed several issues in a huge lot of mails on this
> list. I'm watching and learning currently.

The svnbook at http://svnbook.red-bean.com/, the Subversion lists at
<http://subversion.apache.org/mailing-lists.html>, and the #svn-dev
IRC channel on freenode
<http://colabti.org/irclogger/irclogger_logs/svn-dev> are the best
resources I know for questions in that vein.

I also learned a lot from looking at the dump format that "svnadmin
dump" spits out, since it matches Subversion concepts pretty well.  It
is documented at

  https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt

Some basic design questions are covered in the thread starting at

  http://thread.gmane.org/gmane.comp.version-control.git/159054

> Jonathan wrote about a script "floating around". What's that?

I think you mean the marks-to-notes converter.  One version is at

  http://thread.gmane.org/gmane.comp.version-control.git/163395/focus=168514

[...]
> On Monday 02 April 2012 16:30:14 Ramkumar Ramachandra wrote:

>> Are you planning to extend svn-fe to do the mapping, write it as a
>> separate program, or write it into the remote helper? I personally
>> don't mind if the mapping is done in Perl (like in git-svn or SBL) as
>> opposed to C; mapping is just parse-intensive.
>
> I personally don't like Perl. :p (I would use python if i need a scripting 
> language).
> As far as I've seen, svn-fe is a 5-liner calling functions in vcs-svn/. So I 
> thought there is no point of piping something through svn-fe in the remote-
> helper. I thought I would use those functions like svn-fe does.
> I thought about vcs-svn/ being a library for svn interaction that the remote-
> helper, and svn-fe, and svn-fi (?) are using.

Yes, I think when Ram added vcs-svn/ to the main git repository, the
intent was to make it a library that some git-remote-svn.c could use
directly.

[...]
> On Monday 02 April 2012 15:57:00 Jonathan Nieder wrote:

>> The word "new" makes me worried that you'd be throwing away whatever
>> work already exists. :)
>
> Probably I missed something. 
> But all I've seen that is directly a remote-helper is a bash script which 
> basically calls a pipeline from svnrdump | svn-fe | fast-import [2]. 
> I'm not planning to write a longer program in bash. (I personally use bash 
> only for things that fit on one terminal height).
>
> Bash and Perl are not my favourites ;)

I think that's fine.  It's a prototype, and it has -alpha in its name
to make sure people understand there are no compatibility guarantees
which avoids constraining us.  What I was more worried about is
throwing away discoveries made in the previous design and starting
over.

Jonathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-03  0:09                 ` Jonathan Nieder
@ 2012-04-03 21:53                   ` Andrew Sayers
  2012-04-03 22:21                     ` Jonathan Nieder
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Sayers @ 2012-04-03 21:53 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr,
	Git Mailing List, Sverre Rabbelier, Dmitry Ivankov

On 03/04/12 01:09, Jonathan Nieder wrote:
> Andrew Sayers wrote:
> 
>> Sorry, that wasn't clear.  I meant commands that just expose a single
>> primitive bit of functionality (like git-commit-tree) instead of those
>> that present an abstract interface to the whole git machinery (like
>> git-fast-import).
> 
> Ok.  I think you are misunderstanding the purpose of fast-import[1] but
> it doesn't take away from what you're saying.

I had certainly missed the "ls" command - having seen that, I agree
fast-import is the best solution to this problem.  I'm still a bit
concerned about fast-import as a learning tool, although this is a bit
of a meta-conversation as far as GSoC is concerned.

Personally, I like to learn things by understanding the basic building
blocks, then seeing how to construct things from them.  I found git easy
to learn because I could start with the basic data structures and
algorithms, then layer an approximation of a patches-and-tarballs
workflow on top of it.  I would expect a discussion of the problem in
terms of primitive commands like git-commit-tree to help that learning
style, although I am committing a logical fallacy by assuming that
everyone thinks like me until proven otherwise :)

I think a lot of learners want to play a bit, make some informative
mistakes, then flesh out their understanding with something a bit more
technical.  People that want to "look under the hood" are well-served by
git, because they can use the ordinary interface
(status/commit/branch/etc.) then use the source when they're ready.  It
seems like people that want to "peek behind the curtain" at a
communication stream would be well-served by fast-import if only there
was a curtain for them to peek behind.  I'd be intereseted to know what
git learners think, but I'd feel more comfortable pointing students at
fast-import if there was a FUSE module, or a shell, or some other
interface on top of it whose failure mode was a puzzling mess instead of
a safely inert repository.

Incidentally Florian, some of the above probably spoke to you, other
bits probably less so.  It took me several years after leaving
university to see my own learning style, so if you find it hard to learn
git one way, try some different approaches before assuming it's a
personal problem :)

>> I agree it's possible to use fast-import for this problem, but it seems
>> like it's redundant after svn-fe has already loaded everything into git.
> 
> Right, I missed your point here before.  The fundamental question is
> not about what commands to use but about the order of operations.
> 
> 1. In one scheme, first you import the whole tree without splitting it
>    into branches, with a tool like svn-fe.  Afterwards, you
>    postprocess the resulting repository with tools like "git
>    filter-branch --subdirectory-filter".  The result of the import can
>    depend on all revisions --- you can say, in rev 1, "I'm not sure
>    whether this new directory is a branch; let me see how it develops
>    by rev 1000 to decide how to process it".
> 
> 2. In another scheme, you only import the subset of the repository
>    you are interested in.  This is what git-svn does, for example.
>    This requires the branch discovery to happen at the same time as
>    the import, because otherwise there is no way to tell what subset
>    of the repository you are actually interested in.
> 
> 3. Lastly, in yet another scheme, you import the whole tree and it is
>    split into branches on the fly.  The advantages relative to (1) are:
> 
>    - impatient people can peek at the partial result of the import as
>      it happens
> 
>    - the result of importing rev n is guaranteed to depend only on
>      revs <= n, so different people importing at different times will
>      get the same commits (assuming nobody is rewriting early history
>      behind the scenes) and it is obvious how to support incremental
>      importants to expand a repository with all revs <= n to a
>      repository with all revs <= 2n
> 
>    However, if splitting branches only can happen during the initial
>    import, that makes it harder to tweak the configuration and try
>    again to see what changes.
> 

That's a good way of putting the question, but for SVN it's useful to
distinguish between trunk and non-trunk branches.  I previously[1]
suggested this algorithm for deciding if a directory is a branch:

A directory is a branch if...
1. it is not a subdirectory of an existing branch; and
2. either:
2a. it is in a list of branches specified by the user, or
2b. it is copied from a (subdirectory of a) branch

This is a pretty solid heuristic for detecting branches copied from an
existing branch even in scheme (2) or (3), but does absolutely nothing
for trunk detection.  Although trunk detection is trivial in the sane
case (the "trunk" directory is the one and only trunk, end of story),
here's a contrived example for why it's hard in the general case:

Our SVN newbie created "scratchpad/libfoo/foo.c" in revision 1.  He
spends the next 1,000 revisions working in scratchpad/libfoo, creating
the fooiest foo that ever did foo.  After that, he creates
"scratchpad/libbar/bar.c" and continues for another thousand revisions.
 This cycle repeats until he's finally ready to tie all his libraries
together.  It's only now that he finally decides whether to create
"scratchpad/main.c" (if he thinks "scratchpad" is the trunk), or
"trunk/main.c" (if he thinks all the subdirectories of scratchpad were
trunks) or "scratchpad/main/main.c" (if he wants to give me an aneurysm
worrying how to cope when he does `svn cp scratchpad/main scratchpad`).

I paused after writing the paragraph above, because the last part got me
thinking.  Copying a subdirectory to its parent directory isn't actually
possible in SVN, but the concept of "branch absorption" is an
interesting one.  In theory, we could say that "scratchpad/libfoo" and
"scratchpad/libbar" were trunk branches at first, but were deleted when
the "scratchpad" branch was created.  I'll have to check whether this
leads to undesirable results in the real world, but this might make it
possible to do on-the-fly trunk detection as described in scheme (3).

> The relevant technical difference is that in the naive implementation
> of scheme (2) you can make use of arbitrary information available over
> svn protocol, in naive scheme (3) you can only use information that
> makes it into the fast-import stream, and in naive scheme (1) you can
> only use information that makes it into the actual git repository.  So
> to use scheme (1) you need to make sure svn-fe stores all interesting
> data in a visible way, including copyfrom info (which is not a bad
> idea anyway).

The approach I'm looking at is to extract information from an SVN dump
at an early stage, then use the extracted information when the user
tidies up the SBL file.  This was originally a simple optimisation
(reading a small gzipped JSON file is much faster than reading an SVN
dump that's 99% file bodies you don't care about) but it wouldn't be too
hard to teach svn-fe how to produce the file if you were so inclined.

	- Andrew

[1] http://article.gmane.org/gmane.comp.version-control.git/192286

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-03 21:53                   ` Andrew Sayers
@ 2012-04-03 22:21                     ` Jonathan Nieder
  0 siblings, 0 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-03 22:21 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr,
	Git Mailing List, Sverre Rabbelier, Dmitry Ivankov

Andrew Sayers wrote:

> This is a pretty solid heuristic for detecting branches copied from an
> existing branch even in scheme (2) or (3), but does absolutely nothing
> for trunk detection.  Although trunk detection is trivial in the sane
> case (the "trunk" directory is the one and only trunk, end of story),
> here's a contrived example for why it's hard in the general case:

For the remote helper in its default configuration, I think it's ok to
assume the standard layout (trunk/, branches/*, tags/*).

Thanks for some useful examples.

Sincerely,
Jonathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
                             ` (2 preceding siblings ...)
  2012-04-02 22:17           ` Andrew Sayers
@ 2012-04-05 13:36           ` Florian Achleitner
  2012-04-05 15:47             ` Dmitry Ivankov
                               ` (2 more replies)
  3 siblings, 3 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-04-05 13:36 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Ramkumar Ramachandra, David Barr, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Dmitry Ivankov

Hi everybody!

Thanks for your inputs. I've now submitted a slightly updated version of my 
proposal to google. Additionally it's on github [1].

Summary of diffs:
I'll concentrate on the fetching from svn, writing a remote helper without 
branch detection (like svn-fe) first, and then creating the branch mapper.

[1] https://github.com/flyingflo/git/wiki/

-- Florian

On Monday 02 April 2012 10:30:58 Florian Achleitner wrote:

> 
> ==Remote helper for Subversion==
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-05 13:36           ` Florian Achleitner
@ 2012-04-05 15:47             ` Dmitry Ivankov
  2012-04-09 18:59             ` Stephen Bash
  2012-04-10 17:17             ` Jonathan Nieder
  2 siblings, 0 replies; 46+ messages in thread
From: Dmitry Ivankov @ 2012-04-05 15:47 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Jonathan Nieder, Sverre Rabbelier

Hi!

On Thu, Apr 5, 2012 at 7:36 PM, Florian Achleitner
<florian.achleitner2.6.31@gmail.com> wrote:
> Hi everybody!
>
> Thanks for your inputs. I've now submitted a slightly updated version of my
> proposal to google. Additionally it's on github [1].
>
> Summary of diffs:
> I'll concentrate on the fetching from svn, writing a remote helper without
> branch detection (like svn-fe) first, and then creating the branch mapper.
I think that the general goal should include a possibility to clone
"svn:// clone" (not necessarily exactly "clone", special easy to use
command/script looks fine too) so that this new clone is able to fetch
and push too. This is a new feature compared to git-svn.perl and
allows to share svn->git conversion result. Not completely trivial,
but cool to have. At least I recommend to keep it in mind during
design phase(s).

Though it is not a must as there are many many other cool things to
implement in git-svn area :)

>
> [1] https://github.com/flyingflo/git/wiki/
>
> -- Florian
>
> On Monday 02 April 2012 10:30:58 Florian Achleitner wrote:
>
>>
>> ==Remote helper for Subversion==
>>
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-02 20:57           ` Jonathan Nieder
  2012-04-02 23:04             ` Jonathan Nieder
  2012-04-03  7:49             ` Florian Achleitner
@ 2012-04-05 16:18             ` Tomas Carnecky
  2 siblings, 0 replies; 46+ messages in thread
From: Tomas Carnecky @ 2012-04-05 16:18 UTC (permalink / raw)
  To: Jonathan Nieder, Florian Achleitner
  Cc: Ramkumar Ramachandra, David Barr, Git Mailing List,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

On Mon, 02 Apr 2012 15:57:00 -0500, Jonathan Nieder <jrnieder@gmail.com> wrote:
>  - UI for storing the mapping between Subversion revision numbers and
>    git commit names in the git object db somewhere.  Currently we

I wrote a proof-of-concept importer which stored this mapping in notes. Worked
fairly well. Maybe I can dig up the code again.

tom

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-05 13:36           ` Florian Achleitner
  2012-04-05 15:47             ` Dmitry Ivankov
@ 2012-04-09 18:59             ` Stephen Bash
  2012-04-10 17:17             ` Jonathan Nieder
  2 siblings, 0 replies; 46+ messages in thread
From: Stephen Bash @ 2012-04-09 18:59 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Ramkumar Ramachandra, David Barr, Andrew Sayers, Jonathan Nieder,
	Sverre Rabbelier, Dmitry Ivankov, Git Mailing List

----- Original Message -----
> From: "Florian Achleitner" <florian.achleitner2.6.31@gmail.com>
> Sent: Thursday, April 5, 2012 9:36:40 AM
> Subject: Re: GSOC Proposal draft: git-remote-svn
> 
> Thanks for your inputs. I've now submitted a slightly updated version
> of my proposal to google. Additionally it's on github [1].
> 
> Summary of diffs:
> I'll concentrate on the fetching from svn, writing a remote helper
> without branch detection (like svn-fe) first, and then creating the
> branch mapper.
> 
> [1] https://github.com/flyingflo/git/wiki/

Florian - I just skimmed the github page since I've been away for a week.  Not to toot my own horn to much, there's a lot of good discussion about svn-isms in my thread from 2010 (starts at [1], but most of the good stuff is the discussion that follows).  I didn't see it in the references, and it probably doesn't need to be there, but if you haven't seen it yet, take a look at it (and cringe at my horrible abuse of git in my early days... ugh!).

[1] http://article.gmane.org/gmane.comp.version-control.git/159054

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-05 13:36           ` Florian Achleitner
  2012-04-05 15:47             ` Dmitry Ivankov
  2012-04-09 18:59             ` Stephen Bash
@ 2012-04-10 17:17             ` Jonathan Nieder
  2012-04-10 22:30               ` Andrew Sayers
                                 ` (4 more replies)
  2 siblings, 5 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-10 17:17 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Hi,

Florian Achleitner wrote:

> Thanks for your inputs. I've now submitted a slightly updated version of my 
> proposal to google. Additionally it's on github [1].
>
> Summary of diffs:
> I'll concentrate on the fetching from svn, writing a remote helper without 
> branch detection (like svn-fe) first, and then creating the branch mapper.

Thanks for the update.

If I understand correctly, the remote helper from the first half would
do essentially the same thing as Dmitry's remote-svn-alpha script.
Since in shell script form it is very simple, I don't think it should
take more than a couple of days to write such a thing in C.

> Timeline
>
> GSoC timeline and summer holidays
> Summer holidays in Austria at 9th of July. So until the mid-term
> evaluations my git project will have co-exist with my regular
> university work and projects. But holidays extend until the beginning
> of October, so there’s some time left to catch up after the official
> end of GSoC.

Another possibility that some people in similar situations have
followed is to start early.  That works a little better since it means
that by the time midterm evaluations come around we can have a
reasonable idea of whether a change in strategy is needed for the
project to finished on time.

> I plan to split the project in two parts:
>
> Writing the remote helper using existing functions in vcs-svn to
> import svn history without detecting branches, like svn-fe does.
> Milestone: 9th of July, GSoC mid-term
>
> Writing a branch mapper for the remote helper that reads the config
> language (SBL) and imports branches trying to deal as good as possible
> with all the little pitfalls that will occur. Milestone: 20th of
> August, GSoC end

Could you flesh out this timeline more?  Ideally it would be nice to
have a definite plan here, even to the point of listing what patches
would need to be written, so during the summer all that would need to
happen is to execute and deal with bugs as they come.

Given the goal described here of an import with support for
automatically detecting branches, here are some rough steps I imagine
would be involved:

 . baseline: remote helper in C

 . option to import starting with a particular numbered revision.
   This would be good practice for seeing how options passed to
   "git clone -c" can be read from the config file.

 . option or URL schema to import a single project from a large
   Subversion repository that houses several projects.  This would
   already be useful in practice since importing the entire Apache
   Software Foundation repository takes a while which is a waste
   when one only wants the history of the Subversion project.

   How should the importer handle Subversion copy commands that
   refer to other projects in this case?

 . automatically detecting trunk when importing a project with the
   standard layout.  The trunk usually is not branched from elsewhere
   so this does not require copyfrom info.  Some design questions
   come up here: should the remote helper import the entire project
   tree, too?  (I think "yes", since copy commands that copy from
   other branches are very common and that would ensure the relevant
   info is available to git.)  What should the mapping of git commit
   names to Subversion revision numbers that is stored in notes say
   in this case?

 . detecting trunk and branches and exposing them as different remote
   branches.  This is a small step that just involves understanding
   how remote helpers expose branches.

 . storing path properties and copyfrom information in the commits
   produced by the vcs-svn/ library.  How should these be stored?
   For example, there could be a parallel directory structure
   in the tree:

	foo/
		bar.c
	baz/
		qux.c
	.properties/
		foo.properties
		foo/
			bar.c.properties
		baz/
			qux.c.properties

   with properites for <path> stored at .properties/<path>.properties.
   This strawman scheme doesn't work if the repository being imported
   has any paths ending with ".properties", though.  Ideas?

 . tracing history past branch creation events, using the now-saved
   copyfrom information.

 . tracing second-parent history using svn:mergeinfo properties.

In other words, in the above list the strategy is:

 1. First convert the remote helper to C so it doesn't have to be
    translated again later.

 2. Teach the remote helper to import a single project from a
    repository that houses multiple projects (i.e., path limiting).

 3. Teach the remote helper to split an imported project that uses
    the standard layout into branches (an application of the code
    from (2)).  This complicates the scheme for mapping between
    Subversion revision numbers and git commit ids.

 4. Teach the SVN dumpfile to fast-import stream converter not to
    lose the information that is needed in order to get parenthood
    information.

 5. Use the information from step (4) to get parenthood right for a
    project split into branches.

 6. Getting the second parent right (i.e., merges).  I mentioned
    this for fun but I don't expect there to be time for it.

Does that seem right, or does it need tweaks?  How long would each
step take?  Can the steps be subdivided into smaller steps?

Another question is: what is the design for this?  With the existing
remote-svn-alpha script, there are a few different components with
well defined interfaces:

	commands like "git fetch"
	  |
	  | (1)
	  |
	transport-helper --- (2) --- git fast-import
	  |                               |
	  | (2, 3)                        |
	  |                               |
	remote-svn-alpha                  | (3)
	  |             ''..              |
	  | (2)             ''(2)..       |
	  |                        ''..   |
	svnrdump --------- (3) -------- svn-fe

 (1) communicates using function calls and shared data
 (2) launches
 (3) communicates over pipe

Once remote-svn-alpha is rewritten in C, the same structure is still
present, though it might be less obvious because some of the (2)
and (3) can change into (1).

Where does the functionality you are adding fit into this picture?
Are there any new components being added, and if so what do they take
as input and output?

Hope that helps,
Jonathan

> [1] https://github.com/flyingflo/git/wiki/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 17:17             ` Jonathan Nieder
@ 2012-04-10 22:30               ` Andrew Sayers
  2012-04-10 23:46                 ` Jonathan Nieder
  2012-04-11 19:09                 ` Florian Achleitner
  2012-04-11 15:51               ` Jakub Narebski
                                 ` (3 subsequent siblings)
  4 siblings, 2 replies; 46+ messages in thread
From: Andrew Sayers @ 2012-04-10 22:30 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

On 10/04/12 18:17, Jonathan Nieder wrote:
<snip>
> Given the goal described here of an import with support for
> automatically detecting branches, here are some rough steps I imagine
> would be involved:

Just to be clear, my understanding is that this project will take SBL
created by another program (that I'm writing) and create branches as
specified.  This frees Florian from having to deal with the maze of edge
cases involved in that part of the problem.

> 
>  . baseline: remote helper in C
> 
>  . option to import starting with a particular numbered revision.
>    This would be good practice for seeing how options passed to
>    "git clone -c" can be read from the config file.
> 
>  . option or URL schema to import a single project from a large
>    Subversion repository that houses several projects.  This would
>    already be useful in practice since importing the entire Apache
>    Software Foundation repository takes a while which is a waste
>    when one only wants the history of the Subversion project.
> 
>    How should the importer handle Subversion copy commands that
>    refer to other projects in this case?

This is a good point.  I've just svnadmin and svnrdump, and it turns out
svnadmin doesn't allow you to dump a subtree while svnrdump strips out
the offending copy commands, so either way there's nothing to be done.

>  . automatically detecting trunk when importing a project with the
>    standard layout.  The trunk usually is not branched from elsewhere
>    so this does not require copyfrom info.  Some design questions
>    come up here: should the remote helper import the entire project
>    tree, too?  (I think "yes", since copy commands that copy from
>    other branches are very common and that would ensure the relevant
>    info is available to git.)  What should the mapping of git commit
>    names to Subversion revision numbers that is stored in notes say
>    in this case?
> 
>  . detecting trunk and branches and exposing them as different remote
>    branches.  This is a small step that just involves understanding
>    how remote helpers expose branches.

After last week's discussion about branch absorption, I tried writing
another algorithm over the weekend.  I plan to test it during the week,
but online detection of branches and trunks looks fairly practical in
most real world cases (even those that are sanitily challenged).

>  . storing path properties and copyfrom information in the commits
>    produced by the vcs-svn/ library.  How should these be stored?
>    For example, there could be a parallel directory structure
>    in the tree:

Yes, this is an important problem.  It became apparent over the weekend
that my code was I/O bound, so I started caching the metadata I need
(without e.g. file contents) in a gzipped file containing a list of JSON
blobs (one blob per revision).  That immediately caused the script to
jump from about a hundred revisions/second to a few thousand(!), and
each further size optimisation caused it to jump by another few thousand
per second.

This sort of speed is useful for the initial SVN->git conversion,
because it means even people with very large repositories can have a
quick edit/compile/test loop when they're looking for mis-detected branches.

Having said all that, a git directory is easier to examine and update
than a gzipped file.  I have no idea what the performance would be like,
but even if a directory was slower we could use gzipped JSON as a cache
layer during the initial import, then throw it away and read straight
from a git directory on update.

>  . tracing history past branch creation events, using the now-saved
>    copyfrom information.

I'm not sure if I understand correctly, but I think you're referring to
this edge case:

mkdir tronk brunches
svn add tronk brunches
svn ci -m "Initial commit, with typos to evade stdlayout detection"

mkdir tronk/libfoo
touch tronk/libfoo/main.c
svn add tronk/libfoo
svn ci -m "Created libfoo - no way to know this isn't a branch"

svn up # so the 'svn cp' works correctly below

svn cp tronk brunches/copy_of_tronk
touch brunches/copy_of_tronk/main.c
svn add brunches/copy_of_tronk/main.c
svn ci -m "Marking the copy as a branch, but what about the original?"


I'm not actually sure what the right behaviour is here.  You could argue
that once we know "copy_of_tronk" is a branch, it follows that "tronk"
itself is a branch.  On the other hand, these directories have diverged,
and who's to say it wasn't because of a disagreement about which
directory was the branch?

Branch absorption makes this problem less important - the "tronk/libfoo"
branch will be deleted and merged into the new "tronk" branch the moment
someone creates "tronk/main.c", which tends to happen pretty quickly in
the real world.

I'm open to suggestions, but my instinct right now is to say that
communicating branchiness back through a copyfrom should at least
require confirmation by the user.

>  . tracing second-parent history using svn:mergeinfo properties.

My old POC code did this, and I plan to include it in the work I'm doing
now.  I expect this to be the hardest single part of the project to
solve in the general case, because of SVN's troubled approach to merge
handling.

<snip>
> Another question is: what is the design for this?

Here's my part of the equation:

Right now I have a script that first takes an SVN dump and produces
gzipped JSON as output, then takes the gzipped JSON as input and
produces an SBL file as output.  The first round will generally only
need to be run once (and is comparable to svn-fe in speed), whereas the
second round might need to be run an arbitrary number of times (but is
very fast).

Incidentally, the initial cache generation is the only part that's still
tied to the SVN dump format, and I doubt it would be that hard for
someone to rewrite it inside svn-fe or to make it read from git metadata
in future.

I'm currently focussing on bringing all the modules up to release
quality, so that I can have something for Florian to play with in the
near future.  This should have an interface that is mature but flexible,
so I can change the interface to make his life easier but won't need to
change the interface because I missed something.  After that, I'll
concentrate on improving the quality of the SBL output.

	- Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 22:30               ` Andrew Sayers
@ 2012-04-10 23:46                 ` Jonathan Nieder
  2012-04-11 19:09                 ` Florian Achleitner
  1 sibling, 0 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-10 23:46 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

Andrew Sayers wrote:

> Just to be clear, my understanding is that this project will take SBL
> created by another program (that I'm writing) and create branches as
> specified.

If that seems like the right thing to do for the people involved
(Florian and mentor, list consensus) and if that's easy.  I'm happy as
long as the default configuration works well with sane repositories.

[...]
> On 10/04/12 18:17, Jonathan Nieder wrote:

>>    How should the importer handle Subversion copy commands that
>>    refer to other projects in this case?
>
> This is a good point.  I've just svnadmin and svnrdump, and it turns out
> svnadmin doesn't allow you to dump a subtree while svnrdump strips out
> the offending copy commands, so either way there's nothing to be done.

>From a quick test, it looks like svnrdump converts a directory copy
into the addition of its contents.  Good.

svndumpfilter produces

	svndumpfilter: E200003: Invalid copy source path '/branches/foo/subdir'

and exits with status 1 so it seems like we're ok.

[...]
>>  . tracing history past branch creation events, using the now-saved
>>    copyfrom information.
>
> I'm not sure if I understand correctly, but I think you're referring to
> this edge case:

Nope, I'm talking about the most typical and boring case there is:

	svn cp <repo>/trunk <repo>/branches/topic

When cloning <repo>, it seems reasonable to expect that the ancestry
of the trunk and branch would not be shown as disjoint linear histories,
but that the revision in which the branch was introduced would be
shown as a child of the previous revision of the trunk, like so:


	              o --- o --- o [topic]
	             /
	o --- o --- o --- o --- o --- o [trunk]

This requires paying attention to copyfrom information.

[...]
> Right now I have a script that first takes an SVN dump and produces
> gzipped JSON as output, then takes the gzipped JSON as input and
> produces an SBL file as output.  The first round will generally only
> need to be run once (and is comparable to svn-fe in speed), whereas the
> second round might need to be run an arbitrary number of times (but is
> very fast).

For what it's worth, for importing from repositories that use a
nonstandard layout I do think this "start with a quick pass to figure
the layout out" approach is a sane one.

[...]
> I'm currently focussing on bringing all the modules up to release
> quality, so that I can have something for Florian to play with in the
> near future.  This should have an interface that is mature but flexible,
> so I can change the interface to make his life easier but won't need to
> change the interface because I missed something.  After that, I'll
> concentrate on improving the quality of the SBL output.

Neat.

Thanks for some useful clarifications.
Jonathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 17:17             ` Jonathan Nieder
  2012-04-10 22:30               ` Andrew Sayers
@ 2012-04-11 15:51               ` Jakub Narebski
  2012-04-11 15:56                 ` Jonathan Nieder
  2012-04-11 19:20               ` Florian Achleitner
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Jakub Narebski @ 2012-04-11 15:51 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Jonathan Nieder <jrnieder@gmail.com> writes:

>  2. Teach the remote helper to import a single project from a
>     repository that houses multiple projects (i.e., path limiting).
> 
>  3. Teach the remote helper to split an imported project that uses
>     the standard layout into branches (an application of the code
>     from (2)).  This complicates the scheme for mapping between
>     Subversion revision numbers and git commit ids.

Can't we use the either peg rev notation of externals, or the notation
that Subversion itself uses for svn:mergeinfo?

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-11 15:51               ` Jakub Narebski
@ 2012-04-11 15:56                 ` Jonathan Nieder
  0 siblings, 0 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-11 15:56 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Jakub Narebski wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>>  2. Teach the remote helper to import a single project from a
>>     repository that houses multiple projects (i.e., path limiting).
>>
>>  3. Teach the remote helper to split an imported project that uses
>>     the standard layout into branches (an application of the code
>>     from (2)).  This complicates the scheme for mapping between
>>     Subversion revision numbers and git commit ids.
>
> Can't we use the either peg rev notation of externals, or the notation
> that Subversion itself uses for svn:mergeinfo?

Maybe. ;-)  Could you give an example?  Where would the text in this
notation be stored in the git repository?  How are lookups performed?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 22:30               ` Andrew Sayers
  2012-04-10 23:46                 ` Jonathan Nieder
@ 2012-04-11 19:09                 ` Florian Achleitner
  2012-04-14 22:57                   ` Andrew Sayers
  1 sibling, 1 reply; 46+ messages in thread
From: Florian Achleitner @ 2012-04-11 19:09 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

On Tuesday 10 April 2012 23:30:21 you wrote:
> On 10/04/12 18:17, Jonathan Nieder wrote:
> <snip>
> 
> > Given the goal described here of an import with support for
> > automatically detecting branches, here are some rough steps I imagine
> 
> > would be involved:
> Just to be clear, my understanding is that this project will take SBL
> created by another program (that I'm writing) and create branches as
> specified.  This frees Florian from having to deal with the maze of edge
> cases involved in that part of the problem.

Furthermore the remote-helper has no way of asking the user something, right?
So it can only fail if something is ambigous in the svn repository layout. So 
I thought the SBL is exactly to describe these cases, and that's what I need.

> [..]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 17:17             ` Jonathan Nieder
  2012-04-10 22:30               ` Andrew Sayers
  2012-04-11 15:51               ` Jakub Narebski
@ 2012-04-11 19:20               ` Florian Achleitner
  2012-04-11 19:44                 ` Dmitry Ivankov
  2012-04-11 19:53                 ` Jonathan Nieder
  2012-04-12 15:28               ` Florian Achleitner
  2012-04-18 20:16               ` Florian Achleitner
  4 siblings, 2 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-04-11 19:20 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:
> Hi,
> 
> Florian Achleitner wrote:
> > Thanks for your inputs. I've now submitted a slightly updated version of
> > my proposal to google. Additionally it's on github [1].
> > 
> > Summary of diffs:
> > I'll concentrate on the fetching from svn, writing a remote helper
> > without branch detection (like svn-fe) first, and then creating the
> > branch mapper.
> Thanks for the update.
> 
> If I understand correctly, the remote helper from the first half would
> do essentially the same thing as Dmitry's remote-svn-alpha script.
> Since in shell script form it is very simple, I don't think it should
> take more than a couple of days to write such a thing in C.

If the remote-svn-alpha script is really all that needs to be done, you're 
right. It just pipes through svn-fe. I thought svn-fe could only import an svn 
repo initially, and there would be some difference between importing the whole 
history and fetching new revisions later, (?).

> Via
> > Timeline
> > 
> > GSoC timeline and summer holidays
> > Summer holidays in Austria at 9th of July. So until the mid-term
> > evaluations my git project will have co-exist with my regular
> > university work and projects. But holidays extend until the beginning
> > of October, so there’s some time left to catch up after the official
> > end of GSoC.
> 
> Another possibility that some people in similar situations have
> followed is to start early.  That works a little better since it means
> that by the time midterm evaluations come around we can have a
> reasonable idea of whether a change in strategy is needed for the
> project to finished on time.
> 
> > I plan to split the project in two parts:
> > 
> > Writing the remote helper using existing functions in vcs-svn to
> > import svn history without detecting branches, like svn-fe does.
> > Milestone: 9th of July, GSoC mid-term
> > 
> > Writing a branch mapper for the remote helper that reads the config
> > language (SBL) and imports branches trying to deal as good as possible
> > with all the little pitfalls that will occur. Milestone: 20th of
> > August, GSoC end
> 
> Could you flesh out this timeline more?  Ideally it would be nice to
> have a definite plan here, even to the point of listing what patches
> would need to be written, so during the summer all that would need to
> happen is to execute and deal with bugs as they come.

Listing patches and planing all details in the submitted proposal would 
require me to know what I do and how I will do it all before last Friday! As 
I'm not yet an expert on this topic, I don't know how I could have known all 
details a-priori.
Of course the project's documentation will evolve outside the GSoC project 
proposal, which cannot be changed anymore.

> 
> Given the goal described here of an import with support for
> automatically detecting branches, here are some rough steps I imagine
> would be involved:
> 
>  . baseline: remote helper in C
> 
>  . option to import starting with a particular numbered revision.
>    This would be good practice for seeing how options passed to
>    "git clone -c" can be read from the config file.
> 
>  . option or URL schema to import a single project from a large
>    Subversion repository that houses several projects.  This would
>    already be useful in practice since importing the entire Apache
>    Software Foundation repository takes a while which is a waste
>    when one only wants the history of the Subversion project.
> 
>    How should the importer handle Subversion copy commands that
>    refer to other projects in this case?
> 
>  . automatically detecting trunk when importing a project with the
>    standard layout.  The trunk usually is not branched from elsewhere
>    so this does not require copyfrom info.  Some design questions
>    come up here: should the remote helper import the entire project
>    tree, too?  (I think "yes", since copy commands that copy from
>    other branches are very common and that would ensure the relevant
>    info is available to git.)  What should the mapping of git commit
>    names to Subversion revision numbers that is stored in notes say
>    in this case?
> 
>  . detecting trunk and branches and exposing them as different remote
>    branches.  This is a small step that just involves understanding
>    how remote helpers expose branches.
> 
>  . storing path properties and copyfrom information in the commits
>    produced by the vcs-svn/ library.  How should these be stored?
>    For example, there could be a parallel directory structure
>    in the tree:
> 
> 	foo/
> 		bar.c
> 	baz/
> 		qux.c
> 	.properties/
> 		foo.properties
> 		foo/
> 			bar.c.properties
> 		baz/
> 			qux.c.properties
> 
>    with properites for <path> stored at .properties/<path>.properties.
>    This strawman scheme doesn't work if the repository being imported
>    has any paths ending with ".properties", though.  Ideas?
> 
>  . tracing history past branch creation events, using the now-saved
>    copyfrom information.
> 
>  . tracing second-parent history using svn:mergeinfo properties.
> 
> In other words, in the above list the strategy is:
> 
>  1. First convert the remote helper to C so it doesn't have to be
>     translated again later.
> 
>  2. Teach the remote helper to import a single project from a
>     repository that houses multiple projects (i.e., path limiting).
> 
>  3. Teach the remote helper to split an imported project that uses
>     the standard layout into branches (an application of the code
>     from (2)).  This complicates the scheme for mapping between
>     Subversion revision numbers and git commit ids.
> 
>  4. Teach the SVN dumpfile to fast-import stream converter not to
>     lose the information that is needed in order to get parenthood
>     information.
> 
>  5. Use the information from step (4) to get parenthood right for a
>     project split into branches.
> 
>  6. Getting the second parent right (i.e., merges).  I mentioned
>     this for fun but I don't expect there to be time for it.
> 
> Does that seem right, or does it need tweaks?  How long would each
> step take?  Can the steps be subdivided into smaller steps?
> 
> Another question is: what is the design for this?  With the existing
> remote-svn-alpha script, there are a few different components with
> well defined interfaces:
> 
> 	commands like "git fetch"
> 
> 	  | (1)
> 
> 	transport-helper --- (2) --- git fast-import
> 
> 	  | (2, 3)                        |
> 
> 	remote-svn-alpha                  | (3)
> 
> 	  |             ''..              |
> 	  | 
> 	  | (2)             ''(2)..       |
> 	  | 
> 	  |                        ''..   |
> 
> 	svnrdump --------- (3) -------- svn-fe
> 
>  (1) communicates using function calls and shared data
>  (2) launches
>  (3) communicates over pipe
> 
> Once remote-svn-alpha is rewritten in C, the same structure is still
> present, though it might be less obvious because some of the (2)
> and (3) can change into (1).
> 
> Where does the functionality you are adding fit into this picture?
> Are there any new components being added, and if so what do they take
> as input and output?

I planned to implement a remote-helper using the existing interface 
specification to communicate over pipes with git's transport-helper. 
Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions 
directly from the remote-helper and place new functions in this directory (?).
To communicate with svn, the remote-helper launches svnrdump as a subprocess.
Additionally the remote-helper will read a configuration file containing 
additional information about branch-mapping, this should be closely related to 
Andrew's SBL.

> 
> Hope that helps,
> Jonathan
> 
> > [1] https://github.com/flyingflo/git/wiki/

Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-11 19:20               ` Florian Achleitner
@ 2012-04-11 19:44                 ` Dmitry Ivankov
  2012-04-11 19:53                 ` Jonathan Nieder
  1 sibling, 0 replies; 46+ messages in thread
From: Dmitry Ivankov @ 2012-04-11 19:44 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Andrew Sayers, Sverre Rabbelier

On Thu, Apr 12, 2012 at 1:20 AM, Florian Achleitner
<florian.achleitner2.6.31@gmail.com> wrote:
> On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:
>> Hi,
>>
>> Florian Achleitner wrote:
>> > Thanks for your inputs. I've now submitted a slightly updated version of
>> > my proposal to google. Additionally it's on github [1].
>> >
>> > Summary of diffs:
>> > I'll concentrate on the fetching from svn, writing a remote helper
>> > without branch detection (like svn-fe) first, and then creating the
>> > branch mapper.
>> Thanks for the update.
>>
>> If I understand correctly, the remote helper from the first half would
>> do essentially the same thing as Dmitry's remote-svn-alpha script.
>> Since in shell script form it is very simple, I don't think it should
>> take more than a couple of days to write such a thing in C.
>
> If the remote-svn-alpha script is really all that needs to be done, you're
> right. It just pipes through svn-fe. I thought svn-fe could only import an svn
> repo initially, and there would be some difference between importing the whole
> history and fetching new revisions later, (?).
I've already forgotten the exact details, but svnrdump --incremental
from r0 to rX and then from rX+1 to Y is the same (modulo small dump
header) as from r0 to rY. And svn-fe is able to continue like this too
(maybe some bits of this are not merged, sadly I've forgotten this
too).

A side note is that svnrdump can't do the same trick for rZ, Z>1
(that's shallow clone) starting point as --incremental may produce
delta references to rX, X<Z.
So svnrdump rZ..rY is ok, but it's impossible to continue this with
svnrdump --incremental rY+1..rX. Though it probably is not too hard to
fix from inside svnrdump (disable deltas agains given"old"-threshold
revs) or if the helper becomes very smart about partial history import
it may be done from outside svnrdump, obviously via calling a new
svnrdump request for the needed data and somehow glueing it together.

>
>> Via
>> > Timeline
>> >
>> > GSoC timeline and summer holidays
>> > Summer holidays in Austria at 9th of July. So until the mid-term
>> > evaluations my git project will have co-exist with my regular
>> > university work and projects. But holidays extend until the beginning
>> > of October, so there’s some time left to catch up after the official
>> > end of GSoC.
>>
>> Another possibility that some people in similar situations have
>> followed is to start early.  That works a little better since it means
>> that by the time midterm evaluations come around we can have a
>> reasonable idea of whether a change in strategy is needed for the
>> project to finished on time.
>>
>> > I plan to split the project in two parts:
>> >
>> > Writing the remote helper using existing functions in vcs-svn to
>> > import svn history without detecting branches, like svn-fe does.
>> > Milestone: 9th of July, GSoC mid-term
>> >
>> > Writing a branch mapper for the remote helper that reads the config
>> > language (SBL) and imports branches trying to deal as good as possible
>> > with all the little pitfalls that will occur. Milestone: 20th of
>> > August, GSoC end
>>
>> Could you flesh out this timeline more?  Ideally it would be nice to
>> have a definite plan here, even to the point of listing what patches
>> would need to be written, so during the summer all that would need to
>> happen is to execute and deal with bugs as they come.
>
> Listing patches and planing all details in the submitted proposal would
> require me to know what I do and how I will do it all before last Friday! As
> I'm not yet an expert on this topic, I don't know how I could have known all
> details a-priori.
> Of course the project's documentation will evolve outside the GSoC project
> proposal, which cannot be changed anymore.
>
>>
>> Given the goal described here of an import with support for
>> automatically detecting branches, here are some rough steps I imagine
>> would be involved:
>>
>>  . baseline: remote helper in C
>>
>>  . option to import starting with a particular numbered revision.
>>    This would be good practice for seeing how options passed to
>>    "git clone -c" can be read from the config file.
>>
>>  . option or URL schema to import a single project from a large
>>    Subversion repository that houses several projects.  This would
>>    already be useful in practice since importing the entire Apache
>>    Software Foundation repository takes a while which is a waste
>>    when one only wants the history of the Subversion project.
>>
>>    How should the importer handle Subversion copy commands that
>>    refer to other projects in this case?
>>
>>  . automatically detecting trunk when importing a project with the
>>    standard layout.  The trunk usually is not branched from elsewhere
>>    so this does not require copyfrom info.  Some design questions
>>    come up here: should the remote helper import the entire project
>>    tree, too?  (I think "yes", since copy commands that copy from
>>    other branches are very common and that would ensure the relevant
>>    info is available to git.)  What should the mapping of git commit
>>    names to Subversion revision numbers that is stored in notes say
>>    in this case?
>>
>>  . detecting trunk and branches and exposing them as different remote
>>    branches.  This is a small step that just involves understanding
>>    how remote helpers expose branches.
>>
>>  . storing path properties and copyfrom information in the commits
>>    produced by the vcs-svn/ library.  How should these be stored?
>>    For example, there could be a parallel directory structure
>>    in the tree:
>>
>>       foo/
>>               bar.c
>>       baz/
>>               qux.c
>>       .properties/
>>               foo.properties
>>               foo/
>>                       bar.c.properties
>>               baz/
>>                       qux.c.properties
>>
>>    with properites for <path> stored at .properties/<path>.properties.
>>    This strawman scheme doesn't work if the repository being imported
>>    has any paths ending with ".properties", though.  Ideas?
>>
>>  . tracing history past branch creation events, using the now-saved
>>    copyfrom information.
>>
>>  . tracing second-parent history using svn:mergeinfo properties.
>>
>> In other words, in the above list the strategy is:
>>
>>  1. First convert the remote helper to C so it doesn't have to be
>>     translated again later.
>>
>>  2. Teach the remote helper to import a single project from a
>>     repository that houses multiple projects (i.e., path limiting).
>>
>>  3. Teach the remote helper to split an imported project that uses
>>     the standard layout into branches (an application of the code
>>     from (2)).  This complicates the scheme for mapping between
>>     Subversion revision numbers and git commit ids.
>>
>>  4. Teach the SVN dumpfile to fast-import stream converter not to
>>     lose the information that is needed in order to get parenthood
>>     information.
>>
>>  5. Use the information from step (4) to get parenthood right for a
>>     project split into branches.
>>
>>  6. Getting the second parent right (i.e., merges).  I mentioned
>>     this for fun but I don't expect there to be time for it.
>>
>> Does that seem right, or does it need tweaks?  How long would each
>> step take?  Can the steps be subdivided into smaller steps?
>>
>> Another question is: what is the design for this?  With the existing
>> remote-svn-alpha script, there are a few different components with
>> well defined interfaces:
>>
>>       commands like "git fetch"
>>
>>         | (1)
>>
>>       transport-helper --- (2) --- git fast-import
>>
>>         | (2, 3)                        |
>>
>>       remote-svn-alpha                  | (3)
>>
>>         |             ''..              |
>>         |
>>         | (2)             ''(2)..       |
>>         |
>>         |                        ''..   |
>>
>>       svnrdump --------- (3) -------- svn-fe
>>
>>  (1) communicates using function calls and shared data
>>  (2) launches
>>  (3) communicates over pipe
>>
>> Once remote-svn-alpha is rewritten in C, the same structure is still
>> present, though it might be less obvious because some of the (2)
>> and (3) can change into (1).
>>
>> Where does the functionality you are adding fit into this picture?
>> Are there any new components being added, and if so what do they take
>> as input and output?
>
> I planned to implement a remote-helper using the existing interface
> specification to communicate over pipes with git's transport-helper.
> Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions
> directly from the remote-helper and place new functions in this directory (?).
> To communicate with svn, the remote-helper launches svnrdump as a subprocess.
> Additionally the remote-helper will read a configuration file containing
> additional information about branch-mapping, this should be closely related to
> Andrew's SBL.
>
>>
>> Hope that helps,
>> Jonathan
>>
>> > [1] https://github.com/flyingflo/git/wiki/
>
> Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-11 19:20               ` Florian Achleitner
  2012-04-11 19:44                 ` Dmitry Ivankov
@ 2012-04-11 19:53                 ` Jonathan Nieder
  2012-04-11 22:43                   ` Andrew Sayers
  2012-04-12  9:02                   ` Thomas Rast
  1 sibling, 2 replies; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-11 19:53 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Florian Achleitner wrote:

> If the remote-svn-alpha script is really all that needs to be done, you're 
> right. It just pipes through svn-fe. I thought svn-fe could only import an svn 
> repo initially, and there would be some difference between importing the whole 
> history and fetching new revisions later, (?).

Yes, Dmitry's script (not the first version, but a later one) supports
incremental imports without trouble if I remember correctly.

[...]
> Listing patches and planing all details in the submitted proposal would 
> require me to know what I do and how I will do it all before last Friday! As 
> I'm not yet an expert on this topic, I don't know how I could have known all 
> details a-priori.

Oh, I didn't mean you would need to do that alone. :)  Dmitry, David,
Ram, Sverre, and I should be able to answer any questions you have
about how git, vcs-svn, svnrdump, and the transport-helper currently
work in the importer.

I've marked the proposal editable to allow details to be filled in.

[...]
> I planned to implement a remote-helper using the existing interface 
> specification to communicate over pipes with git's transport-helper. 
> Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions 
> directly from the remote-helper and place new functions in this directory (?).

Ah, this is a good place to start.  In my diagram I lumped everything
under vcs-svn/ together as svn-fe for convenience, but in fact the
vcs-svn lib is made up of multiple components:

	caller
	 .
	 :
	 |
	public interface (svndump_init, svndump_read, etc)
	 |
	 |
	 |
	dump file parser (svndump_read body)
	 |
	 |
	 |
	fast-export interface (fast_export_*, repo_*) --------- svndiff0 parser
	 |
	 :
	 .
	git fast-import

Each component has a narrow interface.  For each action in the dump,
svndump_read() calls some appropriate function from the fast-export
interface to bring about the corresponding change on the git side.
Details of svndump syntax and the state needed to parse it are
isolated in svndump.c and details of fast-import syntax are in
fast-export.c and repo_tree.c.

(The structure used to be more complicated when the repo_* functions
had to keep track of the repository state instead of relying on
fast-import for that.)

Where would the branch mapping go?  What kind of state needs to be
maintained as it occurs?  What steps would I follow to imitate the
code and work out a branch mapping on paper?  How do I invoke the code
if I want to try it out (i.e., what functions form the public
interface needed to support branch mapping)?

I don't expect you to have answers to all these questions already; I
understand that getting used to what's already there and trying out
ideas will take time.  However, I do think we have a much better
chance of this going well if there are answers to these questions by
the time the coding period starts.

[...]
> Additionally the remote-helper will read a configuration file containing 
> additional information about branch-mapping, this should be closely related to 
> Andrew's SBL.

That sounds reasonable to me.  I am somewhat unconvinced (but
convinceable) about the need to use a configuration scheme that
handles all the edge cases right away.  Shouldn't it be enough to tell
the importer the following?

 - the path to the repository (from which it can deduce $SVNROOT
   and the path within there to the subproject of interest)

 - a single bit of information on top of that: "this repository uses
   the standard layout"

Once that works, the tools could easily be tweaked to respect a
configuration file that describes more complex situations, and as a
bonus the SBL tools for making sense of those situations would have
time to become more mature in the meantime.

Thanks for some useful clarifications.
Jonathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-11 19:53                 ` Jonathan Nieder
@ 2012-04-11 22:43                   ` Andrew Sayers
  2012-04-12  9:02                   ` Thomas Rast
  1 sibling, 0 replies; 46+ messages in thread
From: Andrew Sayers @ 2012-04-11 22:43 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

On 11/04/12 20:53, Jonathan Nieder wrote:
> [...]
>> Additionally the remote-helper will read a configuration file containing 
>> additional information about branch-mapping, this should be closely related to 
>> Andrew's SBL.
> 
> That sounds reasonable to me.  I am somewhat unconvinced (but
> convinceable) about the need to use a configuration scheme that
> handles all the edge cases right away.  Shouldn't it be enough to tell
> the importer the following?
> 
>  - the path to the repository (from which it can deduce $SVNROOT
>    and the path within there to the subproject of interest)
> 
>  - a single bit of information on top of that: "this repository uses
>    the standard layout"
> 
> Once that works, the tools could easily be tweaked to respect a
> configuration file that describes more complex situations, and as a
> bonus the SBL tools for making sense of those situations would have
> time to become more mature in the meantime.

SBL itself is just a plain text description of which directories are
branches etc. - there are a handful of tricky bits on Florian's side of
the fence, but it shouldn't be that hard to add everything necessary to
parse any arbitrary SBL file.  For example, if he gets an SBL action
that looks like this:

In r105, create branch "/foo" as "foo-bar" from "/bar/baz" r25

... then the logic that produced that line doesn't really matter, so
long as he can convert it to a series of fast-import commands.

I started work on exporting branches from SVN a few months ago, and
happened to be polishing off SBL when GSoC got going, so my work ties
nicely into Florian's.  I've been keen to talk about edge cases lately
because that's the point I'm at in my work - to make a long story short,
I know how to do the easy cases now, and need to veer off into some
weird edge cases for a month or two, before swinging back by the
standard layout and optimising for that.  If Florian needs something
that generates SBL before I'm ready, I'd be happy to cobble a basic
"standard layout only" script from the modules I've got.

	- Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-11 19:53                 ` Jonathan Nieder
  2012-04-11 22:43                   ` Andrew Sayers
@ 2012-04-12  9:02                   ` Thomas Rast
  1 sibling, 0 replies; 46+ messages in thread
From: Thomas Rast @ 2012-04-12  9:02 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Florian Achleitner, Git Mailing List, Jeff King

[clean up Cc; +Peff]

Jonathan Nieder <jrnieder@gmail.com> writes:

> I've marked the proposal editable to allow details to be filled in.

That went wrong, or somebody toggled it back.  Since nobody objected
here, I'm assuming it was fine, and set it to editable again.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 17:17             ` Jonathan Nieder
                                 ` (2 preceding siblings ...)
  2012-04-11 19:20               ` Florian Achleitner
@ 2012-04-12 15:28               ` Florian Achleitner
  2012-04-12 22:30                 ` Andrew Sayers
  2012-04-13 19:19                 ` Jonathan Nieder
  2012-04-18 20:16               ` Florian Achleitner
  4 siblings, 2 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-04-12 15:28 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

Hi!

Let's discuss the details as suggested by Jonathan! I will collect them in the 
wiki, leading to a more elaborated project plan at the end.
It's rather hard to keep an overview over all the issues and pitfalls that may 
exist, and over all the existing discussions, and whether there was an 
solution or the issue is still unsolved.
So I want to create some collection of information with your support.

On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:
> Given the goal described here of an import with support for
> automatically detecting branches, here are some rough steps I imagine
> would be involved:
> 
>  . baseline: remote helper in C
> 
>  . option to import starting with a particular numbered revision.
>    This would be good practice for seeing how options passed to
>    "git clone -c" can be read from the config file.

Really -c? My installed git doesn't have that switch. Should it pass arguments 
to the remote-helper?

> 
>  . option or URL schema to import a single project from a large
>    Subversion repository that houses several projects.  This would
>    already be useful in practice since importing the entire Apache
>    Software Foundation repository takes a while which is a waste
>    when one only wants the history of the Subversion project.
> 
>    How should the importer handle Subversion copy commands that
>    refer to other projects in this case?

Jonathan tried that, it's handled by svnrdump nicely.

> 
>  . automatically detecting trunk when importing a project with the
>    standard layout.  The trunk usually is not branched from elsewhere
>    so this does not require copyfrom info.  Some design questions
>    come up here: should the remote helper import the entire project
>    tree, too?  (I think "yes", since copy commands that copy from
>    other branches are very common and that would ensure the relevant
>    info is available to git.)  What should the mapping of git commit
>    names to Subversion revision numbers that is stored in notes say
>    in this case?

What does it mean, "import the entire project tree"? Importing other 
directories than "trunk"?
About the mapping of git commits to svn refs .. I've seen the thread about the 
marks-to-notes converter.
But can somebody please explain what it's for? There is this mark file 
mentioned in the git-fast-import help page ..

Do we create two commits from one revision if it's some special case, like 
modifying two branches at once?

> 
>  . detecting trunk and branches and exposing them as different remote
>    branches.  This is a small step that just involves understanding
>    how remote helpers expose branches.
> 
>  . storing path properties and copyfrom information in the commits
>    produced by the vcs-svn/ library.  How should these be stored?
>    For example, there could be a parallel directory structure
>    in the tree:
> 
>         foo/
>                 bar.c
>         baz/
>                 qux.c
>         .properties/
>                 foo.properties
>                 foo/
>                         bar.c.properties
>                 baz/
>                         qux.c.properties
> 
>    with properites for <path> stored at .properties/<path>.properties.
>    This strawman scheme doesn't work if the repository being imported
>    has any paths ending with ".properties", though.  Ideas?

This includes collecting which metadata we actually need to store? We could 
probably collect a list of important svn properties.

Is there a general policy how to store additional metadata for git's helpers? 
I guess it would live somewhere in the .git dir. (.git/info/ ?)
Dmitry mentioned the case where a git repository that fetched from svn is 
cloned, and the cloned repo should be able to fetch from svn too. Is there an 
exisiting concept about metadata in this case?

I'm not sure if storing this in a seperate directory tree makes sense, mostly 
looking at performance. All these files will only contain some bytes, I guess.
Andrew, why did you choose JSON?

> 
>  . tracing history past branch creation events, using the now-saved
>    copyfrom information.
> 
>  . tracing second-parent history using svn:mergeinfo properties.

This is about detection when to create a git merge-commit, right?

> 
> In other words, in the above list the strategy is:

.. still to come..

Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-12 15:28               ` Florian Achleitner
@ 2012-04-12 22:30                 ` Andrew Sayers
  2012-04-14 20:09                   ` Florian Achleitner
  2012-04-13 19:19                 ` Jonathan Nieder
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Sayers @ 2012-04-12 22:30 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

[-- Attachment #1: Type: text/plain, Size: 3801 bytes --]

On 12/04/12 16:28, Florian Achleitner wrote:
> 
> I'm not sure if storing this in a seperate directory tree makes sense, mostly 
> looking at performance. All these files will only contain some bytes, I guess.
> Andrew, why did you choose JSON?
> 

JSON has become my default storage format in recent years, so it seemed
like the natural thing to use for a format I wanted to chuck in and get
on with my work :)

JSON is my default format because it's reasonably space-efficient,
human-readable, widely supported and can represent everything I care
about except recursive data structures (which I didn't need for this
job).  You can do cleverer things if you don't mind being
language-specific (e.g. Perl's "Storable" module supports recursive data
structures but can't be used with other languages) or if you don't mind
needing special tools (e.g. git's index is highly efficient but can't be
debugged with `less`).  I've found you won't go far wrong if you start
with JSON and pick something else when the requirements become more obvious.

I gzipped the file because JSON isn't *that* space-efficient, and
because very large repositories are likely to produce enough JSON that
people will notice.  I found that gzipping the file significantly
reduced its size without having too much effect on run time.

I've attached a sample file representing the first few commits from the
GNU R repository.  The problem I referred to obliquely before isn't with
JSON, but with gzip - how would you add more revisions to the end of the
file without gunzipping it, adding one line, then gzipping it again?
One very nice feature of a directory structure is that you could store
it in git and get all that stuff for free.

To be clear, I'm not pushing any particular solution to this problem,
just offering some anecdotal evidence.  I'm pretty sure that SVN branch
export is an I/O bound problem - David Barr has said much the same about
svn-fe, but I was surprised to see it was still the bottleneck with a
problem that stripped out almost all the data from the dump and pushed
it through not-particularly-optimised Perl.  Having said that, the
initial import problem (potentially hundreds of thousands of revisions
needing manual attention) doesn't necessarily want the same solution as
update (tens of revisions that can almost always be read automatically).

>>  . tracing history past branch creation events, using the now-saved
>>    copyfrom information.
>>
>>  . tracing second-parent history using svn:mergeinfo properties.
> 
> This is about detection when to create a git merge-commit, right?

Yes - SVN has always stored metadata about where a directory was copied
from (unlike git, which prefers to detect it automatically), and since
version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and
directories specifying which revisions of which other files or
directories have been cherry-picked in to them.

If you know a directory is a branch, "copyfrom" metadata is a very
useful signal for detecting branches created from it.  Unfortunately,
"svn:mergeinfo" is not as useful - aside from anything else, older
repositories often exhibit a period where there's no metadata at all,
then a gradual migration through SVN's early experiments with merge
tracking (like svnmerge.py), before everyone gradually standardises on
svn:mergeinfo and leaves the other tools behind.  Oh, and the interface
doesn't tell you about unmerged revisions, so if anybody ever forgets to
merge a revision then you'll probably never notice.

I'm planning to tackle this stuff in the work I'm doing, but I expect
people will be reporting edge cases until the day the last SVN
repository shuts down.  You shouldn't need to worry about it much on the
git side of SBL, which is probably best for your sanity ;)

	- Andrew

[-- Attachment #2: repo.json.gz --]
[-- Type: application/x-gzip, Size: 466 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-12 15:28               ` Florian Achleitner
  2012-04-12 22:30                 ` Andrew Sayers
@ 2012-04-13 19:19                 ` Jonathan Nieder
  2012-04-14 20:15                   ` Florian Achleitner
  1 sibling, 1 reply; 46+ messages in thread
From: Jonathan Nieder @ 2012-04-13 19:19 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov, Tomas Carnecky

Hi again,

Florian Achleitner wrote:

> So I want to create some collection of information with your support.

Sounds like a plan.  Thanks, Florian.

[...]
> Really -c? My installed git doesn't have that switch. Should it pass arguments 
> to the remote-helper?

What git version do you use?  "man git clone" tells me that -c is an
abbreviation for --config and "grep -e --config Documentation/RelNotes/*"
tells me it was introduced in v1.7.7.

[...]
> On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:

>>    How should the importer handle Subversion copy commands that
>>    refer to other projects in this case?
>
> Jonathan tried that, it's handled by svnrdump nicely.

Yes, except that it does not follow the history of the copy source.
So if your project was renamed, then "svnrdump <new SVN URL for the
project>" will dump a fictional history in which the first rev under
the new name created the project out of thin air.

That is not ideal, but it seems tolerable in the short term.

>>                                             Some design questions
>>    come up here: should the remote helper import the entire project
>>    tree, too?  (I think "yes", since copy commands that copy from
>>    other branches are very common and that would ensure the relevant
>>    info is available to git.)  What should the mapping of git commit
>>    names to Subversion revision numbers that is stored in notes say
>>    in this case?
>
> What does it mean, "import the entire project tree"? Importing other 
> directories than "trunk"?

Yes.  For an import that is going to be dumping the subdirectories of
tags/ and branches/ anyway, it seems sensible to ask svnrdump to dump
the entire {trunk,tags,branches} hierarchy and sort it out on the git
side.  The question is then: for each rev, in addition to making
commits for each branch that changed, should we keep a commit
representing the state of the combined whole-project tree for internal
use?  A person trying to check out this commit would get to see the
enormous

	trunk/
	tags/
	branches/

directories.  My rough answer was "yes, it's convenient to keep that
information around, especially given that with git's repository model
it doesn't waste a lot of space and makes debugging easier".

> About the mapping of git commits to svn refs .. I've seen the thread about the 
> marks-to-notes converter.
> But can somebody please explain what it's for? There is this mark file 
> mentioned in the git-fast-import help page ..

There are two operations that need to be very fast:

 1. Given a Subversion revision number, what is the corresponding git
    commit?  svn-fe uses this to get the preimage data when executing
    an "svn copy" operation that refers to an old rev.  For example:

	svn copy some/path@a-long-time-ago another/path

    Code tracking branches would use this same map to find the
    appropriate parent commit for a new branch.  For example:

	svn copy trunk@a-long-time-ago branches/new-branch

    becomes:

	parent f572d396fae9206628714fb2ce00f72e94f2258f

 2. Given a git commit, what is the corresponding Subversion revision
    number?  For example, "git fetch" needs this information in order
    to get a first unfetched revision number when updating an existing
    clone of a Subversion repository.

"git notes" is a mechanism for efficiently storing a mapping from git
commit names to arbitrary data.  For example, it can be used to cache
the compiled form of some slow-to-compile source code, or it can be
used to store reminders to a human that has reviewed these commits and
wanted to scribble a little in the margin.  A patch (in Dmitry's tree,
not in git.git yet) teaches svn-fe to use the notes facility to store
the mapping from git commit names to Subversion revision numbers,
addressing problem (2) above.  Tomas's human-friendly importer used
the same trick.

As you noticed, "git fast-import" has a facility that fits well for
mapping in the other direction: a marks file can store an arbitrary
mapping from numbers to objects (usually objects that were part of the
import).  svn-fe writes a mark for each Subversion revision it imports
to address problem (1) above.

Because "git notes" are stored in the git object db as native objects,
they can be shared using the usual "git fetch" / "git push" commands
as long as you specify the appropriate source and destination refs on
the command line or in git's configuration file.  Commands like "git
rebase" that modify history also have some support for carrying notes
along.  By contrast, a marks file is just a flat text file and there
is no standard facility for updating it when commit names change or
sharing it using ordinary git transport.

The marks-to-notes converter I wrote was a toy to show how the notes
and marks can easily be kept in sync.  If I remember correctly the
last time this was discussed there was some feeling that when the two
tables fall out of synch the notes should be considered authoritative
and marks can be recomputed from them.

> Do we create two commits from one revision if it's some special case, like 
> modifying two branches at once?

remote-svn-alpha and svn-fe do not currently split by branch at all so
the problem doesn't come up.

Yes, I think the only sane way to represent a Subversion revision that
modifies multiple branches is with a git commit on each branch.

[...]
>>    For example, there could be a parallel directory structure
>>    in the tree:
>>
>>         foo/
>>                 bar.c
>>         baz/
>>                 qux.c
>>         .properties/
>>                 foo.properties
>>                 foo/
>>                         bar.c.properties
>>                 baz/
>>                         qux.c.properties
>>
>>    with properites for <path> stored at .properties/<path>.properties.
>>    This strawman scheme doesn't work if the repository being imported
>>    has any paths ending with ".properties", though.  Ideas?
>
> This includes collecting which metadata we actually need to store? We could 
> probably collect a list of important svn properties.

I imagined the importer just collecting all path properties, like "git
svn" does in its .git/svn/refs/remotes/git-svn/unhandled.log.  They're
easy to iterate through on the svn side.

> Is there a general policy how to store additional metadata for git's helpers? 
> I guess it would live somewhere in the .git dir. (.git/info/ ?)

One simple design would be to keep properties in the "entire project"
commit objects for internal use, since that's easy to share.

I think David had a few other ideas. ;-)

[...]
>>  . tracing second-parent history using svn:mergeinfo properties.
>
> This is about detection when to create a git merge-commit, right?

Yep.  A goal would be to allow a person would be able to push a git
merge to an svn repository, fetch from another machine, and get the
same commit back.

>> In other words, in the above list the strategy is:
>
> .. still to come..

Thanks for your thoughtfulness.

Jonathan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-12 22:30                 ` Andrew Sayers
@ 2012-04-14 20:09                   ` Florian Achleitner
  2012-04-14 21:35                     ` Andrew Sayers
  0 siblings, 1 reply; 46+ messages in thread
From: Florian Achleitner @ 2012-04-14 20:09 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

Hi!

Thanks for your explainations.

On Thursday 12 April 2012 23:30:29 Andrew Sayers wrote:
> On 12/04/12 16:28, Florian Achleitner wrote:
> > I'm not sure if storing this in a seperate directory tree makes sense,
> > mostly looking at performance. All these files will only contain some
> > bytes, I guess. Andrew, why did you choose JSON?
> 
> JSON has become my default storage format in recent years, so it seemed
> like the natural thing to use for a format I wanted to chuck in and get
> on with my work :)
> 
> JSON is my default format because it's reasonably space-efficient,
> human-readable, widely supported and can represent everything I care
> about except recursive data structures (which I didn't need for this
> job).  You can do cleverer things if you don't mind being
> language-specific (e.g. Perl's "Storable" module supports recursive data
> structures but can't be used with other languages) or if you don't mind
> needing special tools (e.g. git's index is highly efficient but can't be
> debugged with `less`).  I've found you won't go far wrong if you start
> with JSON and pick something else when the requirements become more obvious.
> 
> I gzipped the file because JSON isn't *that* space-efficient, and
> because very large repositories are likely to produce enough JSON that
> people will notice.  I found that gzipping the file significantly
> reduced its size without having too much effect on run time.
> 
> I've attached a sample file representing the first few commits from the
> GNU R repository.  The problem I referred to obliquely before isn't with
> JSON, but with gzip - how would you add more revisions to the end of the
> file without gunzipping it, adding one line, then gzipping it again?
> One very nice feature of a directory structure is that you could store
> it in git and get all that stuff for free.
> 
> To be clear, I'm not pushing any particular solution to this problem,
> just offering some anecdotal evidence.  I'm pretty sure that SVN branch
> export is an I/O bound problem - David Barr has said much the same about
> svn-fe, but I was surprised to see it was still the bottleneck with a
> problem that stripped out almost all the data from the dump and pushed
> it through not-particularly-optimised Perl.  Having said that, the
> initial import problem (potentially hundreds of thousands of revisions
> needing manual attention) doesn't necessarily want the same solution as
> update (tens of revisions that can almost always be read automatically).

JSON seems to be a good initial choice..

> 
> >>  . tracing history past branch creation events, using the now-saved
> >>  
> >>    copyfrom information.
> >>  
> >>  . tracing second-parent history using svn:mergeinfo properties.
> > 
> > This is about detection when to create a git merge-commit, right?
> 
> Yes - SVN has always stored metadata about where a directory was copied
> from (unlike git, which prefers to detect it automatically), and since
> version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and
> directories specifying which revisions of which other files or
> directories have been cherry-picked in to them.
> 
> If you know a directory is a branch, "copyfrom" metadata is a very
> useful signal for detecting branches created from it.  Unfortunately,
> "svn:mergeinfo" is not as useful - aside from anything else, older
> repositories often exhibit a period where there's no metadata at all,
> then a gradual migration through SVN's early experiments with merge
> tracking (like svnmerge.py), before everyone gradually standardises on
> svn:mergeinfo and leaves the other tools behind.  Oh, and the interface
> doesn't tell you about unmerged revisions, so if anybody ever forgets to
> merge a revision then you'll probably never notice.

This doesn't look very straight forward. In the svn docs they say there is a 
command that outputs which changesets are eligible to merge.
http://svnbook.red-
bean.com/en/1.7/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.mergeinfo

But I don't know if that helps.
>
> I'm planning to tackle this stuff in the work I'm doing, but I expect
> people will be reporting edge cases until the day the last SVN
> repository shuts down.  You shouldn't need to worry about it much on the
> git side of SBL, which is probably best for your sanity ;)

:)

> 
> 	- Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-13 19:19                 ` Jonathan Nieder
@ 2012-04-14 20:15                   ` Florian Achleitner
  0 siblings, 0 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-04-14 20:15 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov, Tomas Carnecky

Hi!

Thanks for your help.
I updated the wiki page.

On Friday 13 April 2012 14:19:08 Jonathan Nieder wrote:

> > Really -c? My installed git doesn't have that switch. Should it pass
> > arguments to the remote-helper?
> 
> What git version do you use?  "man git clone" tells me that -c is an
> abbreviation for --config and "grep -e --config Documentation/RelNotes/*"
> tells me it was introduced in v1.7.7.

Sorry, that was clumsy, I should use build and search the docs of the current 
version, not the one my distro ships!

> > 
> > What does it mean, "import the entire project tree"? Importing other
> > directories than "trunk"?
> 
> Yes.  For an import that is going to be dumping the subdirectories of
> tags/ and branches/ anyway, it seems sensible to ask svnrdump to dump
> the entire {trunk,tags,branches} hierarchy and sort it out on the git
> side.  The question is then: for each rev, in addition to making
> commits for each branch that changed, should we keep a commit
> representing the state of the combined whole-project tree for internal
> use?  A person trying to check out this commit would get to see the
> enormous
> 
> 	trunk/
> 	tags/
> 	branches/
> 
> directories.  My rough answer was "yes, it's convenient to keep that
> information around, especially given that with git's repository model
> it doesn't waste a lot of space and makes debugging easier".

Sounds reasonable.

> 
> > About the mapping of git commits to svn refs .. I've seen the thread
> > about the marks-to-notes converter.
> > But can somebody please explain what it's for? There is this mark file
> > mentioned in the git-fast-import help page ..
> 
> There are two operations that need to be very fast:
> 
>  1. Given a Subversion revision number, what is the corresponding git
>     commit?  svn-fe uses this to get the preimage data when executing
>     an "svn copy" operation that refers to an old rev.  For example:
> 
> 	svn copy some/path@a-long-time-ago another/path
> 
>     Code tracking branches would use this same map to find the
>     appropriate parent commit for a new branch.  For example:
> 
> 	svn copy trunk@a-long-time-ago branches/new-branch
> 
>     becomes:
> 
> 	parent f572d396fae9206628714fb2ce00f72e94f2258f
> 
>  2. Given a git commit, what is the corresponding Subversion revision
>     number?  For example, "git fetch" needs this information in order
>     to get a first unfetched revision number when updating an existing
>     clone of a Subversion repository.
> 
> "git notes" is a mechanism for efficiently storing a mapping from git
> commit names to arbitrary data.  For example, it can be used to cache
> the compiled form of some slow-to-compile source code, or it can be
> used to store reminders to a human that has reviewed these commits and
> wanted to scribble a little in the margin.  A patch (in Dmitry's tree,
> not in git.git yet) teaches svn-fe to use the notes facility to store
> the mapping from git commit names to Subversion revision numbers,
> addressing problem (2) above.  Tomas's human-friendly importer used
> the same trick.
> 
> As you noticed, "git fast-import" has a facility that fits well for
> mapping in the other direction: a marks file can store an arbitrary
> mapping from numbers to objects (usually objects that were part of the
> import).  svn-fe writes a mark for each Subversion revision it imports
> to address problem (1) above.
> 
> Because "git notes" are stored in the git object db as native objects,
> they can be shared using the usual "git fetch" / "git push" commands
> as long as you specify the appropriate source and destination refs on
> the command line or in git's configuration file.  Commands like "git
> rebase" that modify history also have some support for carrying notes
> along.  By contrast, a marks file is just a flat text file and there
> is no standard facility for updating it when commit names change or
> sharing it using ordinary git transport.
> 
> The marks-to-notes converter I wrote was a toy to show how the notes
> and marks can easily be kept in sync.  If I remember correctly the
> last time this was discussed there was some feeling that when the two
> tables fall out of synch the notes should be considered authoritative
> and marks can be recomputed from them.

Oh, thats intersting, I haven't heard of git notes yet. (I should have greped 
the Documentation ..). 
Because of the possibility that one revision is  transformed into two commits, 
the bi-directional mapping has to support 1-to-n or probably n-to-n mappings, 
I think. But this should be possible with these mechanisms.

> 
> > Do we create two commits from one revision if it's some special case,
> > like modifying two branches at once?
> 
> remote-svn-alpha and svn-fe do not currently split by branch at all so
> the problem doesn't come up.
> 
> Yes, I think the only sane way to represent a Subversion revision that
> modifies multiple branches is with a git commit on each branch.
> 
> [...]
> 
> >>    For example, there could be a parallel directory structure
> >>    
> >>    in the tree:
> >>         foo/
> >>         
> >>                 bar.c
> >>         
> >>         baz/
> >>         
> >>                 qux.c
> >>         
> >>         .properties/
> >>         
> >>                 foo.properties
> >>                 foo/
> >>                 
> >>                         bar.c.properties
> >>                 
> >>                 baz/
> >>                 
> >>                         qux.c.properties
> >>    
> >>    with properites for <path> stored at
> >>    .properties/<path>.properties.
> >>    This strawman scheme doesn't work if the repository being
> >>    imported
> >>    has any paths ending with ".properties", though.  Ideas?
> > 
> > This includes collecting which metadata we actually need to store? We
> > could probably collect a list of important svn properties.
> 
> I imagined the importer just collecting all path properties, like "git
> svn" does in its .git/svn/refs/remotes/git-svn/unhandled.log.  They're
> easy to iterate through on the svn side.

Ok, and it will be useful for pushing to svn in the future.

> 
> > Is there a general policy how to store additional metadata for git's
> > helpers? I guess it would live somewhere in the .git dir. (.git/info/
> > ?)
> 
> One simple design would be to keep properties in the "entire project"
> commit objects for internal use, since that's easy to share.
> 
> I think David had a few other ideas. ;-)

Commit objects that are actually not commits but store metadata?


> 
> [...]
> 
> >>  . tracing second-parent history using svn:mergeinfo properties.
> > 
> > This is about detection when to create a git merge-commit, right?
> 
> Yep.  A goal would be to allow a person would be able to push a git
> merge to an svn repository, fetch from another machine, and get the
> same commit back.
> 
> >> In other words, in the above list the strategy is:
> > .. still to come..
> 
> Thanks for your thoughtfulness.
> 
> Jonathan

Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-14 20:09                   ` Florian Achleitner
@ 2012-04-14 21:35                     ` Andrew Sayers
  2012-04-15  3:13                       ` Stephen Bash
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Sayers @ 2012-04-14 21:35 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

On a slightly different topic, here's the only branching edge case I
know of that will affect you.  I agree with Jonathan that you should
focus on the standard layout for now, but I think it's worth having the
trickier cases in your head when you're planning things out.

Imagine a team does this:


# Slight misunderstanding of the standard layout at first:

mkdir trunk/project1 trunk/project2
svn add trunk
svn ci -m "Initial revsion" # r1

# Time passes, commits are made, people get smarter.
# In revision 1000, the team decides to put the structure right:

svn rm trunk
svn ci -m "Removed incorrect directory name" # r1000

mkdir trunk
touch trunk/MOVED_TO_PROJECT_TRUNK
svn ci -m "Added signpost file for future reference" # r1001

mkdir project1 project2
svn cp -r 999 trunk/project1 project1/trunk
svn cp -r 999 trunk/project2 project2/trunk
svn ci -m "Recreated projects with correct directory names" # r1002


This would be represented in SBL something like:

In r1, create branch "trunk/project1"
In r1, create branch "trunk/project2"

# We would prefer just to deactivate these...
In r1000, deactivate "trunk/project1"
In r1000, deactivate "trunk/project2"

# ... but we have to delete them,
# because git doesn't support recursive branch names:
In r1001, delete branch "trunk/project1"
In r1001, delete branch "trunk/project2"
In r1001, create branch "trunk"

# We deleted the branches, so how do we get the commit to fork from?
In r1002, create branch "project1/trunk" from "trunk/project1" r999
In r1002, create branch "project2/trunk" from "trunk/project2" r999


If you look in your ".git/refs/heads/" directory, you'll see git
branches are stored as files on disk.  So if you have a branch
"trunk/project1", you can't create a branch called "trunk" unless you
delete the directory called "trunk" first.  This unfortunate limitation
of an otherwise neat solution means you can't reliably use git branches
when retrieving older revisions.

Other people will be able to tell you if there's any interest in
removing this limitation, but even if there is, users will occasionally
change their mind after asking for a branch to be deleted, and be
surprised if SVN lets them but git doesn't.

One solution you could look at would be storing dead branches in a JSON
file somewhere.  If you go down that route, remember that `git gc` will
try to garbage collect the commits once the branches have been dead for
long enough.

	- Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-11 19:09                 ` Florian Achleitner
@ 2012-04-14 22:57                   ` Andrew Sayers
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Sayers @ 2012-04-14 22:57 UTC (permalink / raw)
  To: Florian Achleitner
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov

On 11/04/12 20:09, Florian Achleitner wrote:
> Furthermore the remote-helper has no way of asking the user something, right?
> So it can only fail if something is ambigous in the svn repository layout. So 
> I thought the SBL is exactly to describe these cases, and that's what I need.

Sorry, I missed this when it was first posted.

I'm not sure whether the remote helper is allowed to ask the user
things, but there can be times when that would be helpful.  The one that
jumps to mind is tag handling.

SVN considers tags and branches to be functionally identical, whereas
git likes to create "annotated tags" (commits with a special tag message
on top of the normal commit message) that can't be changed once they've
been created.  So if e.g. a tag is created then later committed to
again, what do you do?  Do you refuse to make annotated tags in case you
need to change them later?  Do you ignore later commits so that
annotated tags work nicely?

SBL can't provide much help here, as a tag could be created in one
update, then committed to again in another update.  Last time this was
discussed[1], the consensus seemed to be that there any clever solution
would drive straight past "it just works" into "why did it do that?"
territory, so the only sensible solution would be to ask what to do.

As I say, I don't really know anything about remote helpers, but I'd be
very surprised if you weren't allowed to at least fail with a message
like "Please set svn.tagStrategy, see `man git-config` for details".

	- Andrew

[1]http://thread.gmane.org/gmane.comp.version-control.git/192106/focus=192286

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-14 21:35                     ` Andrew Sayers
@ 2012-04-15  3:13                       ` Stephen Bash
  0 siblings, 0 replies; 46+ messages in thread
From: Stephen Bash @ 2012-04-15  3:13 UTC (permalink / raw)
  To: Andrew Sayers
  Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra,
	David Barr, Sverre Rabbelier, Dmitry Ivankov, Florian Achleitner

----- Original Message -----
> From: "Andrew Sayers" <andrew-git@pileofstuff.org>
> Sent: Saturday, April 14, 2012 5:35:59 PM
> Subject: Re: GSOC Proposal draft: git-remote-svn
>
> ... snip ...
> 
> One solution you could look at would be storing dead branches in a
> JSON file somewhere.  If you go down that route, remember that `git
> gc` will try to garbage collect the commits once the branches have
> been dead for long enough.

I don't remember if this has already been discussed, but as I see it there are basically three approaches to closed/deleted SVN branches in the Git world:

  1) Just delete the branch, allow git gc to later cleanup the objects
  2) Just leave them be for the user to deal with at a later date
  3) Move them to another namespace

I think (3) is the only semi-tricky one.  If you read the git-gc manpage, it turns out gc will consider any object reachable from any ref under refs/ as safe.  When cloning/pushing/pulling/etc. git only looks at refs/heads and refs/tags (unless told otherwise).  So for our conversion I created refs/hidden/heads and refs/hidden/tags (other choices could be refs/svn or refs/junk, but you get the idea).  Just as a fun stat, the hidden namespace in our central repo has 280 refs in it vs 502 in the visible/normal namespace (surprisingly the hidden ones are almost perfectly split with 138 dead Subversion branches and 142 SVN tags that were later retagged/committed to).

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-10 17:17             ` Jonathan Nieder
                                 ` (3 preceding siblings ...)
  2012-04-12 15:28               ` Florian Achleitner
@ 2012-04-18 20:16               ` Florian Achleitner
  2012-04-19 12:26                 ` Florian Achleitner
  4 siblings, 1 reply; 46+ messages in thread
From: Florian Achleitner @ 2012-04-18 20:16 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote:
> In other words, in the above list the strategy is:
> 
>  1. First convert the remote helper to C so it doesn't have to be
>     translated again later.
Store rev <--> commit mappings using marks and notes.
Store svn metadata.
> 
>  2. Teach the remote helper to import a single project from a
>     repository that houses multiple projects (i.e., path limiting).

I would plan to have this until the mid-term. From that point my summer 
holidays start ..

> 
>  3. Teach the remote helper to split an imported project that uses
>     the standard layout into branches (an application of the code
>     from (2)).  This complicates the scheme for mapping between
>     Subversion revision numbers and git commit ids.

Read ambigouos branches/tags from SBL.
> 
>  4. Teach the SVN dumpfile to fast-import stream converter not to
>     lose the information that is needed in order to get parenthood
>     information.

This means actually saving  svn:copyfrom properties. (right?)

> 
>  5. Use the information from step (4) to get parenthood right for a
>     project split into branches.

.. and using svn:copyfrom properties. (right?)

> 
>  6. Getting the second parent right (i.e., merges).  I mentioned
>     this for fun but I don't expect there to be time for it.

I think this needs a little morge discussion, let's do this if it's the time.
mergeinfo stores a list of revs merged for a file. This looks like a list of 
git cherry-picks to me ..
> 
> Does that seem right, or does it need tweaks?  How long would each
> step take?  Can the steps be subdivided into smaller steps?

What do you think?
I will finally add this strategy to the proposal.

-- Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: GSOC Proposal draft: git-remote-svn
  2012-04-18 20:16               ` Florian Achleitner
@ 2012-04-19 12:26                 ` Florian Achleitner
  0 siblings, 0 replies; 46+ messages in thread
From: Florian Achleitner @ 2012-04-19 12:26 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, Ramkumar Ramachandra, David Barr,
	Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov

I have now updated the proposal in the github wiki [1] and on melange.
Most important change: Added a more detailed timeline.

[1]  https://github.com/flyingflo/git/wiki

-- Florian 

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2012-04-19 12:26 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16   ` Florian Achleitner
2012-03-26 11:06     ` Ramkumar Ramachandra
2012-03-27 13:53       ` Florian Achleitner
2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00           ` Ramkumar Ramachandra
2012-04-02 20:57           ` Jonathan Nieder
2012-04-02 23:04             ` Jonathan Nieder
2012-04-03  7:49             ` Florian Achleitner
2012-04-03 18:48               ` Jonathan Nieder
2012-04-05 16:18             ` Tomas Carnecky
2012-04-02 22:17           ` Andrew Sayers
2012-04-02 22:29             ` Jonathan Nieder
2012-04-02 23:20               ` Andrew Sayers
2012-04-03  0:09                 ` Jonathan Nieder
2012-04-03 21:53                   ` Andrew Sayers
2012-04-03 22:21                     ` Jonathan Nieder
2012-04-05 13:36           ` Florian Achleitner
2012-04-05 15:47             ` Dmitry Ivankov
2012-04-09 18:59             ` Stephen Bash
2012-04-10 17:17             ` Jonathan Nieder
2012-04-10 22:30               ` Andrew Sayers
2012-04-10 23:46                 ` Jonathan Nieder
2012-04-11 19:09                 ` Florian Achleitner
2012-04-14 22:57                   ` Andrew Sayers
2012-04-11 15:51               ` Jakub Narebski
2012-04-11 15:56                 ` Jonathan Nieder
2012-04-11 19:20               ` Florian Achleitner
2012-04-11 19:44                 ` Dmitry Ivankov
2012-04-11 19:53                 ` Jonathan Nieder
2012-04-11 22:43                   ` Andrew Sayers
2012-04-12  9:02                   ` Thomas Rast
2012-04-12 15:28               ` Florian Achleitner
2012-04-12 22:30                 ` Andrew Sayers
2012-04-14 20:09                   ` Florian Achleitner
2012-04-14 21:35                     ` Andrew Sayers
2012-04-15  3:13                       ` Stephen Bash
2012-04-13 19:19                 ` Jonathan Nieder
2012-04-14 20:15                   ` Florian Achleitner
2012-04-18 20:16               ` Florian Achleitner
2012-04-19 12:26                 ` Florian Achleitner
2012-03-28  8:09       ` GSoC intro Miles Bader
2012-03-28  9:30         ` Dmitry Ivankov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.