* GSoC intro @ 2012-03-19 14:42 Florian Achleitner 2012-03-19 21:31 ` Andrew Sayers ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Florian Achleitner @ 2012-03-19 14:42 UTC (permalink / raw) To: Git Mailing List Hi fellow git developers! I'm curious about applying for GSoC 2012 considering the idea "Remote helper for Subversion". I'm using git since years and have converted my svn repos to git years ago, but I'm not yet familiar with the pre-work on this topic. Is there a branch in git's git? Does a "full-featured bi-directional git-remote-svn" mean, that it should work like any remote git repository where you can push to and fetch from? Below I briefly introduce myself, for those who are interested. About me My name is Florian Achleitner (IRC: FlyingFlo). I'm from Austria and I study Telematics (a blend of computer science and electric engineering) at the Graz University of Technology. I'm currently in the first year of the master program. Before starting my studies I worked for four years as a developer of embedded systems in industry. My programming experience grew since I started writing programs on TI calculators in school probably 15 years ago. I'm open-source enthusiast, exclusively using Linux since years. I currently work as teaching assistant for an exercise about programming operating systems. In this course we also teach the students to use git. About me and GSoC In summer 2010 I participated in GSoC for hugin writing a Makefile-creation library in C++, which is used to drive the panorama creation [1]. It was a great experience and a cool, successful summer job! ( and it was merged in hugin's master branch :-) ) Why git? - I use git daily. It's always good to work on things you use and a chance to contribute something. - I like C - I used svn. Nowadays I only use it if i have to ;) - The community interaction aspect of open source development is very interesting.. as the ideas page says ".. and get it merged into upstream Git." [1] http://hugin.hg.sourceforge.net/hgweb/hugin/hugin/branches branch: gsoc2010_makefilelib (unfortunately the web fronted doesn't display a specific branch) Regards, Flo -- Florian Achleitner, BSc "In a world without walls and fences, who needs windows and gates?" ;-) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-19 14:42 GSoC intro Florian Achleitner @ 2012-03-19 21:31 ` Andrew Sayers 2012-03-20 12:25 ` Florian Achleitner 2012-03-20 13:19 ` David Barr 2 siblings, 0 replies; 46+ messages in thread From: Andrew Sayers @ 2012-03-19 21:31 UTC (permalink / raw) To: Florian Achleitner; +Cc: Git Mailing List Hi Florian, I've become interested in Git<->SVN issues lately, so I'll tell you what I know. Hopefully more knowledgeable people will correct me if I'm wrong. The main thrust of SVN development is done in the "svn-fe" project. You can see the work so far in the "vcs-svn"[1] and "contrib/svn-fe"[2] subdirectories of the main git repo. My experience as a user has been that it does a great job of the things it does, but so far it only does a subset of the things I want. For example, it can't write to SVN and I think I'm right in saying it can't yet update from SVN after the initial download. David Barr is the main contact for svn-fe - he's an experienced mentor and will be able to tell you all about the juicy low-hanging fruit. One limitation of svn-fe is that it downloads the whole SVN repository into a single git branch, without separating out trunk, branches and tags. I've been working on this problem over the past few months, and have split it into three parts (a language to describe which directories are branches etc., export from SVN to that language, and import from that language to git). I'd be very flattered if you wanted to work on this, but I couldn't honestly recommend it over svn-fe. The language itself is a one man job that doesn't have much creative work left; SVN export is all about exposing yourself to weird little abuses of version control that don't teach you much beyond bad habits; and while git import would be a fun little project, I don't know enough about git's C implementation to provide any useful mentoring. Good luck with the summer, and as an svn-fe user I hope you're very productive :) - Andrew [1] https://github.com/git/git/tree/master/vcs-svn [2] https://github.com/git/git/tree/master/contrib/svn-fe ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-19 14:42 GSoC intro Florian Achleitner 2012-03-19 21:31 ` Andrew Sayers @ 2012-03-20 12:25 ` Florian Achleitner 2012-03-20 13:19 ` David Barr 2 siblings, 0 replies; 46+ messages in thread From: Florian Achleitner @ 2012-03-20 12:25 UTC (permalink / raw) To: Git Mailing List On Monday 19 March 2012 21:31:34 you wrote: > [...] > Good luck with the summer, and as an svn-fe user I hope you're very > productive > > - Andrew > > [1] https://github.com/git/git/tree/master/vcs-svn > [2] https://github.com/git/git/tree/master/contrib/svn-fe Thanks for the starting points Andrew! -- Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-19 14:42 GSoC intro Florian Achleitner 2012-03-19 21:31 ` Andrew Sayers 2012-03-20 12:25 ` Florian Achleitner @ 2012-03-20 13:19 ` David Barr 2012-03-21 21:16 ` Florian Achleitner 2 siblings, 1 reply; 46+ messages in thread From: David Barr @ 2012-03-20 13:19 UTC (permalink / raw) To: Florian Achleitner Cc: Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Ramkumar Ramachandra, Dmitry Ivankov Hi Florian, > I'm curious about applying for GSoC 2012 considering the idea "Remote helper > for Subversion". > I'm using git since years and have converted my svn repos to git years ago, > but I'm not yet familiar with the pre-work on this topic. Is there a branch in > git's git? Much of the progress so far has been merged into master. Still outstanding are some of Dmitry's patches: remote-svn-alpha_v2 [1] svn-fe-options_v7 [2] > Does a "full-featured bi-directional git-remote-svn" mean, that it should work > like any remote git repository where you can push to and fetch from? Yes, that's the plan. To be fair, it is a stretch goal. Two GSoC students have brought us as far as a read-only remote helper. So I think there's at least two summers' worth of work remaining. > Below I briefly introduce myself, for those who are interested. > > About me > My name is Florian Achleitner (IRC: FlyingFlo). I'm > from Austria and I study Telematics (a blend of computer science and > electric engineering) at the Graz University of Technology. I'm currently in > the first year of the master program. Before starting my studies I worked for > four years as a developer of embedded systems in industry. > > My programming experience grew since I started writing programs on TI > calculators in school probably 15 years ago. > I'm open-source enthusiast, exclusively using Linux since years. > I currently work as teaching assistant for an exercise about programming > operating systems. In this course we also teach the students to use git. Thanks for the introduction. When I first got involved with this sub-project, I gave a quick self introduction [3]. As a potential mentor, it would be prudent to let you know what my commitments are. My day job is primarily to contribute to chromium.org and webkit.org. I also have a 20% commitment to git-core and related projects. > About me and GSoC > In summer 2010 I participated in GSoC for hugin writing a Makefile-creation > library in C++, which is used to drive the panorama creation [1]. It was a > great experience and a cool, successful summer job! ( and it was merged in > hugin's master branch :-) ) > [1] http://hugin.hg.sourceforge.net/hgweb/hugin/hugin/branches branch: > gsoc2010_makefilelib (unfortunately the web fronted doesn't display a specific > branch) A track record is a plus. > Why git? > - I use git daily. It's always good to work on things you use and a chance to > contribute something. I'm sure this is the reason most git contributors are here. > - I like C > - I used svn. Nowadays I only use it if i have to ;) You're in good company. > - The community interaction aspect of open source development is very > interesting.. as the ideas page says ".. and get it merged into upstream Git." The git contributors are mostly a pleasure to work with. The volume and quality of feedback to contribution, especially from newcomers, sets it apart from the other communities I participate in. Some extra reading: To catch up on the current state of the art with respect to translating Subversion history read: Another bite of the reposturgeon, Eric S. Raymond [4]. Unfortunately, he hasn't published the code quite yet. However, he did what we have been lax to do and contacted the Subversion developers to assist updating protocol documentation [5]. I think the corner cases for the Subversion delta format are still undocumented [6]. [1] https://github.com/divanorama/git/tree/remote-svn-alpha_v2 [2] https://github.com/divanorama/git/tree/svn-fe-options_v7 [3] http://thread.gmane.org/gmane.comp.version-control.git/143187/focus=143201 [4] http://esr.ibiblio.org/?p=4071 [5] http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt [6] http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff > Regards, > Flo -- David Barr ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-20 13:19 ` David Barr @ 2012-03-21 21:16 ` Florian Achleitner 2012-03-26 11:06 ` Ramkumar Ramachandra 0 siblings, 1 reply; 46+ messages in thread From: Florian Achleitner @ 2012-03-21 21:16 UTC (permalink / raw) To: David Barr Cc: Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Ramkumar Ramachandra, Dmitry Ivankov Hi! After the exam today, I started to dig into the topic a little. So I accumulated some questions .. On Wednesday 21 March 2012 00:19:41 David Barr wrote: > Much of the progress so far has been merged into master. > Still outstanding are some of Dmitry's patches: > remote-svn-alpha_v2 [1] > svn-fe-options_v7 [2] I tried to find svn-related parts in gits sources. I found: - the huge ./git-svn.perl, which seems to be the git-svn implementation. - ./contrib/svn-fe/ and ./vcs-svn/, those you pointed me at. Did I miss something? Is there any seperate source documentation? The source files I looked at contain only very few comments. And nothing about the big picture. I built make doc, but it seems it's mostly user documentation. > Yes, that's the plan. To be fair, it is a stretch goal. Two GSoC > students have brought us as far as a read-only remote helper. So I > think there's at least two summers' worth of work remaining. What is the remote helper? How can I use/try it? > [1] https://github.com/divanorama/git/tree/remote-svn-alpha_v2 Is it in here? Should my project continue on this work? Until now, I've never used any remote that was not git. > > About me and GSoC > > In summer 2010 I participated in GSoC for hugin writing a > > Makefile-creation library in C++, which is used to drive the panorama > > creation [1]. It was a great experience and a cool, successful summer > > job! ( and it was merged in hugin's master branch :-) ) > > > > [1] http://hugin.hg.sourceforge.net/hgweb/hugin/hugin/branches branch: > > gsoc2010_makefilelib (unfortunately the web fronted doesn't display a > > specific branch) > > A track record is a plus. If you like, I could provide more references, e.g. a university course project in C using git. > Some extra reading: > [...] Haven't yet read it. Hm, and there are still some general questions: What about git-svn? Whats wrong with it? (I haven't used it) I saw the huge perl script, this looks a little extreme ;). But it provides bi-directional access?! svn-fe reads a dump of the svn repo. How can this approach ever be bidirectional? Probably I've to do the extra reading first .. -- Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-21 21:16 ` Florian Achleitner @ 2012-03-26 11:06 ` Ramkumar Ramachandra 2012-03-27 13:53 ` Florian Achleitner 2012-03-28 8:09 ` GSoC intro Miles Bader 0 siblings, 2 replies; 46+ messages in thread From: Ramkumar Ramachandra @ 2012-03-26 11:06 UTC (permalink / raw) To: Florian Achleitner Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Hi Florian, Florian Achleitner wrote: > On Wednesday 21 March 2012 00:19:41 David Barr wrote: >> Much of the progress so far has been merged into master. >> Still outstanding are some of Dmitry's patches: >> remote-svn-alpha_v2 [1] >> svn-fe-options_v7 [2] > > I tried to find svn-related parts in gits sources. I found: > - the huge ./git-svn.perl, which seems to be the git-svn implementation. > - ./contrib/svn-fe/ and ./vcs-svn/, > those you pointed me at. > Did I miss something? > Is there any seperate source documentation? The source files I looked at > contain only very few comments. And nothing about the big picture. A lot of big-picture discussions can be found in mailing list archives. Let us know what you're looking for exactly. >> Yes, that's the plan. To be fair, it is a stretch goal. Two GSoC >> students have brought us as far as a read-only remote helper. So I >> think there's at least two summers' worth of work remaining. > > What is the remote helper? How can I use/try it? The remote helper is an external program that git invokes to handle specific protocols. See ./git_remote_helpers for example. >> [1] https://github.com/divanorama/git/tree/remote-svn-alpha_v2 > Is it in here? Should my project continue on this work? > Until now, I've never used any remote that was not git. You might also decide to build a brand new remote helper. >> A track record is a plus. > > If you like, I could provide more references, e.g. a university course project > in C using git. > >> Some extra reading: >> [...] > Haven't yet read it. > > Hm, and there are still some general questions: > What about git-svn? Whats wrong with it? (I haven't used it) I saw the huge > perl script, this looks a little extreme ;). But it provides bi-directional > access?! The main problem with git-svn.perl is that it's hard to maintain or extend. See also: David Barr's LCA talk [1]. > svn-fe reads a dump of the svn repo. How can this approach ever be > bidirectional? Probably I've to do the extra reading first .. It can't. You'll have to write something to handle the Git -> SVN conversion. See also: one of my earlier attempts in this regard [2]. [1]: http://www.youtube.com/watch?v=0hVuv-wv4Dw [2]: https://github.com/artagnon/git/tree/svn-fi Ram ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-26 11:06 ` Ramkumar Ramachandra @ 2012-03-27 13:53 ` Florian Achleitner 2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner 2012-03-28 8:09 ` GSoC intro Miles Bader 1 sibling, 1 reply; 46+ messages in thread From: Florian Achleitner @ 2012-03-27 13:53 UTC (permalink / raw) To: Ramkumar Ramachandra Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Hi Ramkumar, and everybody! Thanks for your tips! I'm currently a little quiet on that topic because I'm working hard on a university project that we need to kick off this week, because easter holidays start in a few days. I'll then switch back to git immediatly, it's proposal time! On Monday 26 March 2012 16:36:46 Ramkumar Ramachandra wrote: > > A lot of big-picture discussions can be found in mailing list > archives. Let us know what you're looking for exactly. > > The main problem with git-svn.perl is that it's hard to maintain or > extend. See also: David Barr's LCA talk [1]. The talk from David Barr is exactly what I was searching (about the big picture and so ..). It gives a good introduction to what you guys already know very well, of course. Like what exists, which are the well-known problems.. Thx for the link! > Ram Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* GSOC Proposal draft: git-remote-svn 2012-03-27 13:53 ` Florian Achleitner @ 2012-04-02 8:30 ` Florian Achleitner 2012-04-02 11:00 ` Ramkumar Ramachandra ` (3 more replies) 0 siblings, 4 replies; 46+ messages in thread From: Florian Achleitner @ 2012-04-02 8:30 UTC (permalink / raw) To: Ramkumar Ramachandra Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Hi everybody! Here is my draft of the proposal for the GSoC project. RFC! Please comment and tell me what you think and if I understood it all right! I spent a lot of lines with wiriting about the current situation. This is mostly because, as a newbee, I spent a lot of time examining what we already have and wrote it down finally. The draft is inlined below. I hope it's not too long to read. I will put it on a github wiki later, once i figure out how this works ;) Florian ==Remote helper for Subversion== ==Introduction== { for non-insiders } Git [1] is a powerful distributed version control system (DVCS). "Distributed" means that everybody works on a full featured repository. To collaborate with other user's repositories git can fetch and pull from remote repositories using several transports (http://, ssh://, git://, ...). Git has a very powerful and useful concept of branches. They are lightweight pointers to commits (heads). Subversion (svn) [2] was created as a successor of CVS, both follow a strict client-server design, where the repository exclusively lives on the central server and every client only checks out a copy of a single revision at a time. SVN doesn't truly have a concept of branches. SVN branches are a copy of a directory (so are tags). ==What we want (the general goal)== short: git clone svn://<url> git push git fetch A full-featured bi-directional remote helper for svn that allows git to use a svn repository as a remote, mostly like a remote git repo. Remote helpers are separate programs invoked by git to communicate with foreign repositories. They are used by transceiving a command and data stream via stdin and stdout. The remote helper interface [2] supports commands that deliver a git-fast- import stream from the remote repo. git-fast-import [4] is a format to serialize a git repository into a text format. It is used by the tools git-fast-import and git-fast-export. The remote helper has to convert the foreign protocol and data (svn) to the git-fast-import format. ==What are the challenges? == To summarize: The way git tracks the state of the working tree and svn's way are different in several aspects. This makes a direct mapping impossible. There are lots of discussions about these issues on the git mailing list [5]. Some aspects: (I'm sure this is incomplete) - svn commit and file metadata, it's symlink and permission representation have to to be mapped to git. - svn history can only be extracted from the server (we have svnrdump for that) - svn commits are only possible after updating the working copy first, i.e. fetching and merging new revisions on the server. This is like implicitly rebasing your local work on the remote head before pushing to an svn repository. In git there is of course no such restriction. - and the most challenging: mapping subversion branches to git branches. In svn a branch is created by copying a directory with 'svn copy'. svn doesn't have a concept of branches by itself. Branches exist due to the convention of having branches/, trunk/, and tags/ directories in a repository, so do tags. But this is not mandatory and therefore there are many different layouts. It follows that in svn it is also possible to commit across branches. This means that a single commit can change files on more than one branch (accidentally or deliberately). To convert svn branches to git we have to detect branch semantics by examining the svn tree's structure and it's metadata (it has a 'copyfrom' property). Previous efforts show that this will not be possible fully automatically without configuration and interaction with the user. This brings us to: ==What we have: (existing work)== Andrew Sayers is currently developing a language to describe svn to git branch mapping [6]. I plan to use the language as a configuration for the remote helper that specifies unclear aspects. "esr" developed a tool to manipulate and export subversion repositories [7] that should be able to detect branches, but it's sources are not available yet. In git's tree there is git-svn, a huge Perl script used to convert svn to git. It detects branches, but with problems. It also supports some kind of pushing commits to svn using a separate command. It's problem: it's unmaintainable, bugs are hard to locate and to fix. There are several other one-way conversion tools, e.g. svn-fast-export, svn2git.py. In git's source tree we have a vcs-svn/, a set of functions to convert svn dumps to git-fast-import streams. Those are used by svn-fe to one-way import svn history to git. svn-fe doesn't do branch mapping yet. We have Ramkumar Ramachandra's svnrdump [8] which now lives in the svn source tree. It can create dump files [9] from remote svn servers and load dump files up to svn server. It practically provides read-write access to svn using a text format. There is a prototype remote helper from Dmitry Ivankov. A bash script providing one way fetching from svn via svnrdump and svn-fe. { did I miss something important? } ==Project outline== Please look at the drawing on: http://filestore.mg34.vc-graz.ac.at/flo/drawing.svg 1. Write a new bi-directional remote helper in C. - It uses vcs-svn utilities to convert svn dumps to git-fast-import and vice-versa. - It calls svnrdump as a backend to communicate with svn. - It reads a configuration file containing branch mappings according to [6]. These mapping have to be pre-generated using tools developed along with the language. The remote helper has no way of asking the user what to do. It will fail if a mapping is unclear. - Because generating the branch mapping configuration already requires that you have a dump of the svn repo, the helper should probably be able to read from a file in place of svnrdump too. - Using the config the helper translates svn branches/tags to git branches/tags and converts other metadata as applicable. It probably has to store some information about the mapping in a file in .git to allow a reconstruction on subsequent invocations. I think this is especially important when pushing to branches (does it already exist in svn, and where? is it new). - It communicate with git via the fast-import format. The remote helper interface (will have)|has commands for that. 2. Extend the remote helper interface as necessary to read and write fast- import streams to remote helpers 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only convert svn to git. To push to svn we also need conversion and mapping from git to svn. The actual mapping code for branches should also be placed here {??} and called by the remote helper. { Hmm.. so it looks like thats a lot? what do you think? } Timeline { Still to come !} About me { I sent an introduction to the list already, so I'll not copy it here. But it will be in the application on GSOC site.} [1] http://git-scm.com/ [2] http://subversion.tigris.org/ [3] git sources git/Documentation/git-remote-helpers.html [4] git sources git/Documentation/git-fast-import.html [5] http://thread.gmane.org/gmane.comp.version-control.git/192106 [6] https://github.com/andrew-sayers/SVN-Branching-Language [7] http://esr.ibiblio.org/?p=4071 [8] http://svnbook.red-bean.com/en/1.7/svn.ref.svnrdump.html [9] svn sources subversion/notes/dump-load-format.txt [10] https://github.com/divanorama/git/blob/remote-svn-alpha/contrib/svn- fe/git-remote-svn-alpha ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner @ 2012-04-02 11:00 ` Ramkumar Ramachandra 2012-04-02 20:57 ` Jonathan Nieder ` (2 subsequent siblings) 3 siblings, 0 replies; 46+ messages in thread From: Ramkumar Ramachandra @ 2012-04-02 11:00 UTC (permalink / raw) To: Florian Achleitner Cc: David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Hi Florian, Florian Achleitner wrote: > - svn commits are only possible after updating the working copy first, i.e. > fetching and merging new revisions on the server. This is like implicitly > rebasing your local work on the remote head before pushing to an svn > repository. This shouldn't worry you, because we don't have a Git -> SVN converter yet. However, I have written a prototype svn-fi. Unfortunately, due to the way marks work in fast-import, svn-fi is far from complete. See: https://github.com/artagnon/git/tree/svn-fi > Branches exist due to the convention of having branches/, trunk/, and tags/ > directories in a repository, so do tags. But this is not mandatory and > therefore there are many different layouts. It follows that in svn it is also > possible to commit across branches. This means that a single commit can change > files on more than one branch (accidentally or deliberately). > To convert svn branches to git we have to detect branch semantics by examining > the svn tree's structure and it's metadata (it has a 'copyfrom' property). > Previous efforts show that this will not be possible fully automatically > without configuration and interaction with the user. See also: http://article.gmane.org/gmane.comp.version-control.git/150007 > "esr" developed a tool to manipulate and export subversion repositories [7] > that should be able to detect branches, but it's sources are not available > yet. Sources are available at git://gitorious.org/reposurgeon/reposurgeon.git Do let us know how SBL compares to reposurgeon. Personally, I like the idea of a standard "language" to express the mapping. > In git's source tree we have a vcs-svn/, a set of functions to convert svn > dumps to git-fast-import streams. Those are used by svn-fe to one-way import > svn history to git. svn-fe doesn't do branch mapping yet. Are you planning to extend svn-fe to do the mapping, write it as a separate program, or write it into the remote helper? I personally don't mind if the mapping is done in Perl (like in git-svn or SBL) as opposed to C; mapping is just parse-intensive. > 1. Write a new bi-directional remote helper in C. > [...] > - It reads a configuration file containing branch mappings according to [6]. > These mapping have to be pre-generated using tools developed along with the > language. The remote helper has no way of asking the user what to do. It will > fail if a mapping is unclear. Right. > - Because generating the branch mapping configuration already requires that > you have a dump of the svn repo, the helper should probably be able to read > from a file in place of svnrdump too. You can clone the SVN dumpstream from svnrdump using tee (or similar), sending one copy to svn-fe and another to the SBL configuration generator. > - Using the config the helper translates svn branches/tags to git > branches/tags and converts other metadata as applicable. It probably has to > store some information about the mapping in a file in .git to allow a > reconstruction on subsequent invocations. I think this is especially important > when pushing to branches (does it already exist in svn, and where? is it new). How will the actual mapping be done? Using filter-branch's subdirectory filter, or something else? > 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only > convert svn to git. To push to svn we also need conversion and mapping from > git to svn. The actual mapping code for branches should also be placed here > {??} and called by the remote helper. I think this bit sounds overtly ambitious. I think if you can build a seamless one-way SVN -> Git bridge in one summer, it'll be quite an achievement in itself. Finishing and getting svn-fi merged should be last priority; I'll try to work on it myself in summer. Ram ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner 2012-04-02 11:00 ` Ramkumar Ramachandra @ 2012-04-02 20:57 ` Jonathan Nieder 2012-04-02 23:04 ` Jonathan Nieder ` (2 more replies) 2012-04-02 22:17 ` Andrew Sayers 2012-04-05 13:36 ` Florian Achleitner 3 siblings, 3 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-02 20:57 UTC (permalink / raw) To: Florian Achleitner Cc: Ramkumar Ramachandra, David Barr, Git Mailing List, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Hi Florian, Florian Achleitner wrote: > Here is my draft of the proposal for the GSoC project. RFC! > Please comment and tell me what you think and if I understood it all right! I like the rough idea. I also agree with Ram that the scope seems too wide for one summer and think it would be useful to narrow the scope a little. Some tasks I can think of: - getting Dmitry's importer into contrib/ and making sure it works reliably. This might require some fixes to svnrdump, svn-fe, and the transport-helper. Some known problems that I suspect may be still unresolved: - files marked with both svn:special (symlink) and svn:executable - dealing with after-the-fact edits to the svn repository. For example, revprops including svn:log can be and often are changed after the fact. - what happens when the connection to the Subversion server is interrupted? The Subversion dump format does not have an "end of commit" marker so currently we can get confused and seem to succeed. - svn-fe does not correctly handle revs that change a text file to a symlink or vice versa without changing its text. - UI for importing only some revisions (e.g., "all revisions after r1000"). Dmitry has a patch for the svn-fe plumbing to handle this but I don't think the corresponding change for the remote helper has been written. - this would probably also require changes to svnrdump. What happens when r2000 involves copying a file from a version before r1000? If imports do not start at r0, normal dumps of r1000: are not self-contained. - UI for storing the mapping between Subversion revision numbers and git commit names in the git object db somewhere. Currently we store it in a marks file. There is a script floating around to convert that marks file into a set of commit notes and Dmitry also has a patch for svn-fe to make it write commit notes directly. What happens when the notes and marks file go out of sync --- which is authoritative? This also implies that repeated fetches would not have to start importing again at r1. - Storing empty directories and path-specific properties like svn:ignore that we don't currently handle. - Splitting history into branches. Somehow svn-fe has to communicate "svn cp" source and target information to the branch mapper so we can trace history to before the birth of the paths we are following. That is, the full history of branches/1.7.x/ includes the early history of trunk/ if the 1.7.x branch was originally created as a copy of the trunk. This might be able to use mechanism similar to storage of empty directories and path properties. - UI for importing only a subset of paths (e.g., "just the trunk"). - this would probably also require changes to svnrdump. What happens when r2000 involves copying a file from a branch we have chosen not to import? - Mapping authorship information from Subversion (which usually amounts to a remote username) to something more idiomatic in git (usually a human's name and email address) in a way that makes round trips possible. - Sharing an imported repository with other users of the remote helper. - this might involve changes to the remote helper machinery to allow new clones to use some fetch/push ref specification different from refs/heads/*:refs/remotes/origin/*, or it might involve some change to core git to automatically push notes corresponding to some refs in some situations. - Importing <rev, path> pairs that have multiple parents. In the subversion model, path nodes have only one (copyfrom) parent, but repositories can use the svn:mergeinfo property to indicate that changes made in certain revs to another patch have been incorporated. Under what circumstances is that enough justification to add a second parent on the git side? - Because svn:mergeinfo is a normal path property, the branch mapper could have enough information to take care of this with the help of the previously mentioned facility for storing path properties. All of the above is just for reasonable fetch support. For push support, one early problem to solve would be that pushing a commit so that the git commit id from re-importing it is the same requires permission to set the svn:date property. Is our target audience one that already has that permission? Is that permission something reasonable for a committer to ask for from the repository admin in order to use the remote helper? Because of the above: > 1. Write a new bi-directional remote helper in C. The word "new" makes me worried that you'd be throwing away whatever work already exists. :) [...] > { Hmm.. so it looks like thats a lot? what do you think? } I agree --- what you've described is more than one summer's worth of work. Are there any aspects you're particularly interested in focusing on? For example, (1) If we focus on repositories without any branching structure at all and where the user has full ability to write whatever she pleases to the repository, I think developing a bidirectional remote helper is feasible during the summer. Round-trip support (i.e., commit ids staying the same with a push followed by a fetch) is feasible with such a quick plan if we're willing to store some git-specific junk in the repo. (2) Regarding a tool that sits between svn-fe and the remote helper and implements the "follow parent" rule for tracing the full history of a single (linear) branch: I think developing that _and_ getting it merged could fit in the summer. (3) Regarding storing and sharing Subversion's path-specific and revision-specific properties: I think implementing a mechanism for that and getting it merged could fit in one summer. (4) Regarding getting git weirdness like distinct author and committer names, lack of rename information cooked at commit time, and timezones in author and committer dates handled during pushes to Subversion in a non-invasive way that is user-friendly for the pusher likely to be acceptable on the receiving side for normal projects: that could certainly fill a summer. (5) Subversion weirdness like revs that change the entire repository at once in a many-branch repo, non-standard file modes, and noticing and acting appropriately for svn:log messages that were changed after the fact could fill another summer. So ideally I would like 5 students working on the remote helper project. ;-) Hope that helps, Jonathan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 20:57 ` Jonathan Nieder @ 2012-04-02 23:04 ` Jonathan Nieder 2012-04-03 7:49 ` Florian Achleitner 2012-04-05 16:18 ` Tomas Carnecky 2 siblings, 0 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-02 23:04 UTC (permalink / raw) To: Florian Achleitner Cc: Ramkumar Ramachandra, David Barr, Git Mailing List, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Jonathan Nieder wrote: > - Importing <rev, path> pairs that have multiple parents. In the > subversion model, path nodes have only one (copyfrom) parent, > but repositories can use the svn:mergeinfo property to indicate > that changes made in certain revs to another patch have been The above should say "... changes made in certain revs to another _path_ ...". > incorporated. Under what circumstances is that enough > justification to add a second parent on the git side? One subtlety here is that sometimes people merge almost everything from some branch but leave a few revisions out. Imagine the following history: o --- B1 --- B2 --- B3 ---- B4 -- F' ---- B6 --- B7 --- B8 [branch] \ \ \ \ \ \ T1 --- F ------------------------ M1 ------------------ M2 [trunk] The bugfix F was applied to the trunk first and then applied to the branch as rev F'. Then the maintainer merged the remaining changes B1, B2, B3, B4 from the branch to trunk. In git this operation would be carried out by running "git merge branch". Finally some more changes were made on the branch and the maintainer merged those to trunk, too. In subversion, this could be done like so: 1. Make commit T1 on trunk. 2. Make commit F on trunk. 3. Make commits B1, B2, B3, B4 on branch. 4. Make commit F' on branch, either using "svn merge" or by hand. 5. Merge changes B1, B2, B3, B4 from branch to trunk using "svn merge -r o:B4 <url for branch>" and commit. 6. Make commits B6, B7, B8 on branch. 7. Merge changes B6, B7, B8 from branch to trunk using "svn merge -r F':B8 <url for branch>" and commit. The resulting svn:mergeinfo property on trunk in revision M1 would look like this: /branches/branch:B1-B4 To a naive importer, this looks like a merge of B4. The svn:mergeinfo property on trunk in revision M2 would look like this: /branches/branch:B1-B4,B6-B8 which looks like a bunch of cherry-picks rather than a merge, since it looks like this almost-merge leaves out F'. If the maintainer used "svn merge --reintegrate" instead, the svn:mergeinfo properties are a little simpler, so maybe I am worrying for no good reason. Anyway, hopefully that makes the setup a little clearer. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 20:57 ` Jonathan Nieder 2012-04-02 23:04 ` Jonathan Nieder @ 2012-04-03 7:49 ` Florian Achleitner 2012-04-03 18:48 ` Jonathan Nieder 2012-04-05 16:18 ` Tomas Carnecky 2 siblings, 1 reply; 46+ messages in thread From: Florian Achleitner @ 2012-04-03 7:49 UTC (permalink / raw) To: Jonathan Nieder Cc: Ramkumar Ramachandra, David Barr, Git Mailing List, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Hi! I'm curiously watching the discussion I kicked off with my proposal. Before refining the proposal I think I will let the discussion continue at the moment. But just to clarify some things: You know I'm rather new to this topic. I've used svn and git, I know what git plumbing is about, but I haven't used plumbing commands to write something into git yet. So I can't tell from experience if it would be good or not, compared to fast-import. So please explain what's the advantage/disadvantage of which design decision. That makes it easier to get the point. I'm also not yet familiar with svn's internals and what properties they use for what. So there are several questions I simply don't have an answer for. I know that you have discussed several issues in a huge lot of mails on this list. I'm watching and learning currently. Jonathan wrote about a script "floating around". What's that? Is it somewhere in a tree in some repo, is at a patch somewhere in a mail on the list, is it in git.git in some branch?? How does one find catch floating scripts? And two clarifications about what I meant in the proposal: On Monday 02 April 2012 16:30:14 Ramkumar Ramachandra wrote: > Are you planning to extend svn-fe to do the mapping, write it as a > separate program, or write it into the remote helper? I personally > don't mind if the mapping is done in Perl (like in git-svn or SBL) as > opposed to C; mapping is just parse-intensive. I personally don't like Perl. :p (I would use python if i need a scripting language). As far as I've seen, svn-fe is a 5-liner calling functions in vcs-svn/. So I thought there is no point of piping something through svn-fe in the remote- helper. I thought I would use those functions like svn-fe does. I thought about vcs-svn/ being a library for svn interaction that the remote- helper, and svn-fe, and svn-fi (?) are using. On Monday 02 April 2012 15:57:00 Jonathan Nieder wrote: > Florian Achleitner wrote: > Because of the above: > > 1. Write a new bi-directional remote helper in C. > > The word "new" makes me worried that you'd be throwing away whatever > work already exists. :) Probably I missed something. But all I've seen that is directly a remote-helper is a bash script which basically calls a pipeline from svnrdump | svn-fe | fast-import [2]. I'm not planning to write a longer program in bash. (I personally use bash only for things that fit on one terminal height). Bash and Perl are not my favourites ;) [1] https://github.com/divanorama/git/blob/remote-svn-alpha_v2/contrib/svn- fe/git-remote-svn-alpha Cheers, Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-03 7:49 ` Florian Achleitner @ 2012-04-03 18:48 ` Jonathan Nieder 0 siblings, 0 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-03 18:48 UTC (permalink / raw) To: Florian Achleitner Cc: Ramkumar Ramachandra, David Barr, Git Mailing List, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Hi, Florian Achleitner wrote: > You know I'm rather new to this topic. I've used svn and git, I know what git > plumbing is about, but I haven't used plumbing commands to write something > into git yet. So I can't tell from experience if it would be good or not, > compared to fast-import. Yes, no problem. I think the question of using fast-import or other commands is not a fundamental one. > So please explain what's the advantage/disadvantage of which design decision. > That makes it easier to get the point. The main advantages of using fast-import are: - it's faster (assuming it works correctly) :) - there are backends for version control systems other than git - remote helpers can declare the export/import capabilities to support other version control systems, instead of declaring fetch/push and supporting only git However, whatever tools you use, the immediate idea is to transfer data between a Subversion repository and a Git repository, and the problems to be solved are the same. [...] > I'm also not yet familiar with svn's internals and what properties they use > for what. > So there are several questions I simply don't have an answer for. > I know that you have discussed several issues in a huge lot of mails on this > list. I'm watching and learning currently. The svnbook at http://svnbook.red-bean.com/, the Subversion lists at <http://subversion.apache.org/mailing-lists.html>, and the #svn-dev IRC channel on freenode <http://colabti.org/irclogger/irclogger_logs/svn-dev> are the best resources I know for questions in that vein. I also learned a lot from looking at the dump format that "svnadmin dump" spits out, since it matches Subversion concepts pretty well. It is documented at https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt Some basic design questions are covered in the thread starting at http://thread.gmane.org/gmane.comp.version-control.git/159054 > Jonathan wrote about a script "floating around". What's that? I think you mean the marks-to-notes converter. One version is at http://thread.gmane.org/gmane.comp.version-control.git/163395/focus=168514 [...] > On Monday 02 April 2012 16:30:14 Ramkumar Ramachandra wrote: >> Are you planning to extend svn-fe to do the mapping, write it as a >> separate program, or write it into the remote helper? I personally >> don't mind if the mapping is done in Perl (like in git-svn or SBL) as >> opposed to C; mapping is just parse-intensive. > > I personally don't like Perl. :p (I would use python if i need a scripting > language). > As far as I've seen, svn-fe is a 5-liner calling functions in vcs-svn/. So I > thought there is no point of piping something through svn-fe in the remote- > helper. I thought I would use those functions like svn-fe does. > I thought about vcs-svn/ being a library for svn interaction that the remote- > helper, and svn-fe, and svn-fi (?) are using. Yes, I think when Ram added vcs-svn/ to the main git repository, the intent was to make it a library that some git-remote-svn.c could use directly. [...] > On Monday 02 April 2012 15:57:00 Jonathan Nieder wrote: >> The word "new" makes me worried that you'd be throwing away whatever >> work already exists. :) > > Probably I missed something. > But all I've seen that is directly a remote-helper is a bash script which > basically calls a pipeline from svnrdump | svn-fe | fast-import [2]. > I'm not planning to write a longer program in bash. (I personally use bash > only for things that fit on one terminal height). > > Bash and Perl are not my favourites ;) I think that's fine. It's a prototype, and it has -alpha in its name to make sure people understand there are no compatibility guarantees which avoids constraining us. What I was more worried about is throwing away discoveries made in the previous design and starting over. Jonathan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 20:57 ` Jonathan Nieder 2012-04-02 23:04 ` Jonathan Nieder 2012-04-03 7:49 ` Florian Achleitner @ 2012-04-05 16:18 ` Tomas Carnecky 2 siblings, 0 replies; 46+ messages in thread From: Tomas Carnecky @ 2012-04-05 16:18 UTC (permalink / raw) To: Jonathan Nieder, Florian Achleitner Cc: Ramkumar Ramachandra, David Barr, Git Mailing List, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov On Mon, 02 Apr 2012 15:57:00 -0500, Jonathan Nieder <jrnieder@gmail.com> wrote: > - UI for storing the mapping between Subversion revision numbers and > git commit names in the git object db somewhere. Currently we I wrote a proof-of-concept importer which stored this mapping in notes. Worked fairly well. Maybe I can dig up the code again. tom ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner 2012-04-02 11:00 ` Ramkumar Ramachandra 2012-04-02 20:57 ` Jonathan Nieder @ 2012-04-02 22:17 ` Andrew Sayers 2012-04-02 22:29 ` Jonathan Nieder 2012-04-05 13:36 ` Florian Achleitner 3 siblings, 1 reply; 46+ messages in thread From: Andrew Sayers @ 2012-04-02 22:17 UTC (permalink / raw) To: Florian Achleitner Cc: Ramkumar Ramachandra, David Barr, Git Mailing List, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Hey Florian, Comments below. The nitpickier ones aren't so much there to help the proposal as for general information. On 02/04/12 09:30, Florian Achleitner wrote: <snip> > > Subversion (svn) [2] was created as a successor of CVS, both follow a strict > client-server design, where the repository exclusively lives on the central > server and every client only checks out a copy of a single revision at a time. > SVN doesn't truly have a concept of branches. SVN branches are a copy of a > directory (so are tags). Just a little nitpick - SVN was primarily inspired by CVS, but there's no formal connection between the projects - both are developed by different development teams even to this day. <snip> > git-fast-import [4] is a format to serialize a git repository into a text > format. It is used by the tools git-fast-import and git-fast-export. > > The remote helper has to convert the foreign protocol and data (svn) to the > git-fast-import format. As discussed on IRC, I'd like to see some discussion of solutions that use plumbing directly (e.g. git-commit-tree) if you choose to focus on branch import. <snip> > Branches exist due to the convention of having branches/, trunk/, and tags/ > directories in a repository, so do tags. But this is not mandatory and > therefore there are many different layouts. It follows that in svn it is also > possible to commit across branches. This means that a single commit can change > files on more than one branch (accidentally or deliberately). This is basically accurate, but a contrived example might help explain why fully automatic branch export is impossible in the general case: Imagine a repository that consists of a single revision with a single file, "scratchpad/libfoo/foo.c" - how would we decide which directory is the branch? Has the author has even decided yet? For example, he might be learning version control and not understand what branches are. Having said that, automatic branch export might be possible in some important special cases (like repositories that use the standard layout). I haven't really looked into this yet. <snip> > - Because generating the branch mapping configuration already requires that > you have a dump of the svn repo, the helper should probably be able to read > from a file in place of svnrdump too. It might help if I explain how the SVN branch exporter will work: First, it will read an SVN dump and create a file containing JSON blobs summarising each revision - e.g. it specifies which files were changed, but not the contents of the changes. As Ram mentioned, downloading the dump and tee'ing it to both this process and svn-fe makes a lot of sense. Next, it will read the JSON file and detect trunks. This turns out to be extremely fast now it's been freed from the SVN dump format. Next, the user will have the opportunity to review the detected trunks. For example, if somebody put a "README.txt" in the root directory, the previous step will need to be rerun with that file ignored. Next, the main branch detection stage will be run using the JSON file and the previous branch information. Next, the user has another chance to make changes. Some users will blow straight past this stage, but sufficiently fussy users with sufficiently large repositories could spend several days looping through this and the previous stage until their branches and merges are just right. The SBL file is finally complete whenever the user decides - you'll need to tell them how to restart the import process, in case they restarted their computer while they were refining the file. <snip> > 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only > convert svn to git. To push to svn we also need conversion and mapping from > git to svn. The actual mapping code for branches should also be placed here > {??} and called by the remote helper. I agree with Jonathan and Ram that we're not ready for this yet. Even mapping git branches back to a branchless representation won't be practical until branch import is fairly mature. - Andrew [1]https://github.com/andrew-sayers/Proof-of-concept-History-Converter/blob/master/git-branch-import.pl [2]git sources git/Documentation/git-commit-tree.txt ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 22:17 ` Andrew Sayers @ 2012-04-02 22:29 ` Jonathan Nieder 2012-04-02 23:20 ` Andrew Sayers 0 siblings, 1 reply; 46+ messages in thread From: Jonathan Nieder @ 2012-04-02 22:29 UTC (permalink / raw) To: Andrew Sayers Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr, Git Mailing List, Sverre Rabbelier, Dmitry Ivankov Hi Andrew, Andrew Sayers wrote: > On 02/04/12 09:30, Florian Achleitner wrote: >> The remote helper has to convert the foreign protocol and data (svn) to the >> git-fast-import format. > > As discussed on IRC, I'd like to see some discussion of solutions that > use plumbing directly (e.g. git-commit-tree) if you choose to focus on > branch import. Do you mean that fast-import is not a plumbing command? >From the IRC log[1]: > andrew_sayers From my reading of the protocol, you'd have to pass > all the files in for each branch. > andrew_sayers For each commit. I'm a little confused by this. Do you mean that a fast-import stream is not allowed to use multiple branches, or that when a fast-import stream represents a commit that changes one file, it needs to list all files rather than the one that changed? Neither is true. The fast-import tool started as a tool to write objects to pack directly, or in other words to save time by avoiding the step of writing loose objects. That is still one of its main benefits. [...] >> 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only >> convert svn to git. To push to svn we also need conversion and mapping from >> git to svn. The actual mapping code for branches should also be placed here >> {??} and called by the remote helper. > > I agree with Jonathan and Ram that we're not ready for this yet. Just to be clear, I never said such a thing. :) Thanks for some useful clarifications. Jonathan [1] http://colabti.org/irclogger/irclogger_log/git-devel?date=2012-04-02#l153 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 22:29 ` Jonathan Nieder @ 2012-04-02 23:20 ` Andrew Sayers 2012-04-03 0:09 ` Jonathan Nieder 0 siblings, 1 reply; 46+ messages in thread From: Andrew Sayers @ 2012-04-02 23:20 UTC (permalink / raw) To: Jonathan Nieder Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr, Git Mailing List, Sverre Rabbelier, Dmitry Ivankov On 02/04/12 23:29, Jonathan Nieder wrote: > Hi Andrew, > > Andrew Sayers wrote: >> On 02/04/12 09:30, Florian Achleitner wrote: > >>> The remote helper has to convert the foreign protocol and data (svn) to the >>> git-fast-import format. >> >> As discussed on IRC, I'd like to see some discussion of solutions that >> use plumbing directly (e.g. git-commit-tree) if you choose to focus on >> branch import. > > Do you mean that fast-import is not a plumbing command? Sorry, that wasn't clear. I meant commands that just expose a single primitive bit of functionality (like git-commit-tree) instead of those that present an abstract interface to the whole git machinery (like git-fast-import). I'm not sure what the right word for that would be? I agree it's possible to use fast-import for this problem, but it seems like it's redundant after svn-fe has already loaded everything into git. For example, if svn-fe loaded three revisions into the master branch, you could create a trunk branch by doing something like: COMMIT=$( git show -s --pretty=%b master^^ | \ git commit-tree master^^:trunk ) COMMIT=$( git show -s --pretty=%b master^ | \ git commit-tree master^:trunk -p $COMMIT ) COMMIT=$( git show -s --pretty=%b master | \ git commit-tree master:trunk -p $COMMIT ) echo $COMMIT > .git/refs/heads/foo The point I was making in IRC was that (so far as I understand) fast-import doesn't let you pass trees around in this way, but instead requires you to transmit the contents of all the changed files. The code above could of course be achieved more easily with git-filter-branch, or achieved more efficiently with a custom bit of C. I suggested discussing the problem in terms of single-purpose commands because it strikes me as about the right level to expose the architectural questions without getting bogged down in detail. - Andrew ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 23:20 ` Andrew Sayers @ 2012-04-03 0:09 ` Jonathan Nieder 2012-04-03 21:53 ` Andrew Sayers 0 siblings, 1 reply; 46+ messages in thread From: Jonathan Nieder @ 2012-04-03 0:09 UTC (permalink / raw) To: Andrew Sayers Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr, Git Mailing List, Sverre Rabbelier, Dmitry Ivankov Andrew Sayers wrote: > Sorry, that wasn't clear. I meant commands that just expose a single > primitive bit of functionality (like git-commit-tree) instead of those > that present an abstract interface to the whole git machinery (like > git-fast-import). Ok. I think you are misunderstanding the purpose of fast-import[1] but it doesn't take away from what you're saying. > I agree it's possible to use fast-import for this problem, but it seems > like it's redundant after svn-fe has already loaded everything into git. Right, I missed your point here before. The fundamental question is not about what commands to use but about the order of operations. 1. In one scheme, first you import the whole tree without splitting it into branches, with a tool like svn-fe. Afterwards, you postprocess the resulting repository with tools like "git filter-branch --subdirectory-filter". The result of the import can depend on all revisions --- you can say, in rev 1, "I'm not sure whether this new directory is a branch; let me see how it develops by rev 1000 to decide how to process it". 2. In another scheme, you only import the subset of the repository you are interested in. This is what git-svn does, for example. This requires the branch discovery to happen at the same time as the import, because otherwise there is no way to tell what subset of the repository you are actually interested in. 3. Lastly, in yet another scheme, you import the whole tree and it is split into branches on the fly. The advantages relative to (1) are: - impatient people can peek at the partial result of the import as it happens - the result of importing rev n is guaranteed to depend only on revs <= n, so different people importing at different times will get the same commits (assuming nobody is rewriting early history behind the scenes) and it is obvious how to support incremental importants to expand a repository with all revs <= n to a repository with all revs <= 2n However, if splitting branches only can happen during the initial import, that makes it harder to tweak the configuration and try again to see what changes. The relevant technical difference is that in the naive implementation of scheme (2) you can make use of arbitrary information available over svn protocol, in naive scheme (3) you can only use information that makes it into the fast-import stream, and in naive scheme (1) you can only use information that makes it into the actual git repository. So to use scheme (1) you need to make sure svn-fe stores all interesting data in a visible way, including copyfrom info (which is not a bad idea anyway). [...] > The point I was making in IRC was that (so far as I understand) > fast-import doesn't let you pass trees around in this way, but instead > requires you to transmit the contents of all the changed files. fast-import's "ls" command allows exactly what you are talking about, and svn-fe uses it to copy subtrees from earlier revs into later ones when it receives an "svn cp" command. See [2] for some work that preexists that. Did I understand correctly? Jonathan [1] By acting as a single process that takes a stream of commands it really is able to do something that no other plumbing command can do. [2] http://thread.gmane.org/gmane.comp.version-control.git/158375 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-03 0:09 ` Jonathan Nieder @ 2012-04-03 21:53 ` Andrew Sayers 2012-04-03 22:21 ` Jonathan Nieder 0 siblings, 1 reply; 46+ messages in thread From: Andrew Sayers @ 2012-04-03 21:53 UTC (permalink / raw) To: Jonathan Nieder Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr, Git Mailing List, Sverre Rabbelier, Dmitry Ivankov On 03/04/12 01:09, Jonathan Nieder wrote: > Andrew Sayers wrote: > >> Sorry, that wasn't clear. I meant commands that just expose a single >> primitive bit of functionality (like git-commit-tree) instead of those >> that present an abstract interface to the whole git machinery (like >> git-fast-import). > > Ok. I think you are misunderstanding the purpose of fast-import[1] but > it doesn't take away from what you're saying. I had certainly missed the "ls" command - having seen that, I agree fast-import is the best solution to this problem. I'm still a bit concerned about fast-import as a learning tool, although this is a bit of a meta-conversation as far as GSoC is concerned. Personally, I like to learn things by understanding the basic building blocks, then seeing how to construct things from them. I found git easy to learn because I could start with the basic data structures and algorithms, then layer an approximation of a patches-and-tarballs workflow on top of it. I would expect a discussion of the problem in terms of primitive commands like git-commit-tree to help that learning style, although I am committing a logical fallacy by assuming that everyone thinks like me until proven otherwise :) I think a lot of learners want to play a bit, make some informative mistakes, then flesh out their understanding with something a bit more technical. People that want to "look under the hood" are well-served by git, because they can use the ordinary interface (status/commit/branch/etc.) then use the source when they're ready. It seems like people that want to "peek behind the curtain" at a communication stream would be well-served by fast-import if only there was a curtain for them to peek behind. I'd be intereseted to know what git learners think, but I'd feel more comfortable pointing students at fast-import if there was a FUSE module, or a shell, or some other interface on top of it whose failure mode was a puzzling mess instead of a safely inert repository. Incidentally Florian, some of the above probably spoke to you, other bits probably less so. It took me several years after leaving university to see my own learning style, so if you find it hard to learn git one way, try some different approaches before assuming it's a personal problem :) >> I agree it's possible to use fast-import for this problem, but it seems >> like it's redundant after svn-fe has already loaded everything into git. > > Right, I missed your point here before. The fundamental question is > not about what commands to use but about the order of operations. > > 1. In one scheme, first you import the whole tree without splitting it > into branches, with a tool like svn-fe. Afterwards, you > postprocess the resulting repository with tools like "git > filter-branch --subdirectory-filter". The result of the import can > depend on all revisions --- you can say, in rev 1, "I'm not sure > whether this new directory is a branch; let me see how it develops > by rev 1000 to decide how to process it". > > 2. In another scheme, you only import the subset of the repository > you are interested in. This is what git-svn does, for example. > This requires the branch discovery to happen at the same time as > the import, because otherwise there is no way to tell what subset > of the repository you are actually interested in. > > 3. Lastly, in yet another scheme, you import the whole tree and it is > split into branches on the fly. The advantages relative to (1) are: > > - impatient people can peek at the partial result of the import as > it happens > > - the result of importing rev n is guaranteed to depend only on > revs <= n, so different people importing at different times will > get the same commits (assuming nobody is rewriting early history > behind the scenes) and it is obvious how to support incremental > importants to expand a repository with all revs <= n to a > repository with all revs <= 2n > > However, if splitting branches only can happen during the initial > import, that makes it harder to tweak the configuration and try > again to see what changes. > That's a good way of putting the question, but for SVN it's useful to distinguish between trunk and non-trunk branches. I previously[1] suggested this algorithm for deciding if a directory is a branch: A directory is a branch if... 1. it is not a subdirectory of an existing branch; and 2. either: 2a. it is in a list of branches specified by the user, or 2b. it is copied from a (subdirectory of a) branch This is a pretty solid heuristic for detecting branches copied from an existing branch even in scheme (2) or (3), but does absolutely nothing for trunk detection. Although trunk detection is trivial in the sane case (the "trunk" directory is the one and only trunk, end of story), here's a contrived example for why it's hard in the general case: Our SVN newbie created "scratchpad/libfoo/foo.c" in revision 1. He spends the next 1,000 revisions working in scratchpad/libfoo, creating the fooiest foo that ever did foo. After that, he creates "scratchpad/libbar/bar.c" and continues for another thousand revisions. This cycle repeats until he's finally ready to tie all his libraries together. It's only now that he finally decides whether to create "scratchpad/main.c" (if he thinks "scratchpad" is the trunk), or "trunk/main.c" (if he thinks all the subdirectories of scratchpad were trunks) or "scratchpad/main/main.c" (if he wants to give me an aneurysm worrying how to cope when he does `svn cp scratchpad/main scratchpad`). I paused after writing the paragraph above, because the last part got me thinking. Copying a subdirectory to its parent directory isn't actually possible in SVN, but the concept of "branch absorption" is an interesting one. In theory, we could say that "scratchpad/libfoo" and "scratchpad/libbar" were trunk branches at first, but were deleted when the "scratchpad" branch was created. I'll have to check whether this leads to undesirable results in the real world, but this might make it possible to do on-the-fly trunk detection as described in scheme (3). > The relevant technical difference is that in the naive implementation > of scheme (2) you can make use of arbitrary information available over > svn protocol, in naive scheme (3) you can only use information that > makes it into the fast-import stream, and in naive scheme (1) you can > only use information that makes it into the actual git repository. So > to use scheme (1) you need to make sure svn-fe stores all interesting > data in a visible way, including copyfrom info (which is not a bad > idea anyway). The approach I'm looking at is to extract information from an SVN dump at an early stage, then use the extracted information when the user tidies up the SBL file. This was originally a simple optimisation (reading a small gzipped JSON file is much faster than reading an SVN dump that's 99% file bodies you don't care about) but it wouldn't be too hard to teach svn-fe how to produce the file if you were so inclined. - Andrew [1] http://article.gmane.org/gmane.comp.version-control.git/192286 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-03 21:53 ` Andrew Sayers @ 2012-04-03 22:21 ` Jonathan Nieder 0 siblings, 0 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-03 22:21 UTC (permalink / raw) To: Andrew Sayers Cc: Florian Achleitner, Ramkumar Ramachandra, David Barr, Git Mailing List, Sverre Rabbelier, Dmitry Ivankov Andrew Sayers wrote: > This is a pretty solid heuristic for detecting branches copied from an > existing branch even in scheme (2) or (3), but does absolutely nothing > for trunk detection. Although trunk detection is trivial in the sane > case (the "trunk" directory is the one and only trunk, end of story), > here's a contrived example for why it's hard in the general case: For the remote helper in its default configuration, I think it's ok to assume the standard layout (trunk/, branches/*, tags/*). Thanks for some useful examples. Sincerely, Jonathan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner ` (2 preceding siblings ...) 2012-04-02 22:17 ` Andrew Sayers @ 2012-04-05 13:36 ` Florian Achleitner 2012-04-05 15:47 ` Dmitry Ivankov ` (2 more replies) 3 siblings, 3 replies; 46+ messages in thread From: Florian Achleitner @ 2012-04-05 13:36 UTC (permalink / raw) To: Git Mailing List Cc: Ramkumar Ramachandra, David Barr, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Hi everybody! Thanks for your inputs. I've now submitted a slightly updated version of my proposal to google. Additionally it's on github [1]. Summary of diffs: I'll concentrate on the fetching from svn, writing a remote helper without branch detection (like svn-fe) first, and then creating the branch mapper. [1] https://github.com/flyingflo/git/wiki/ -- Florian On Monday 02 April 2012 10:30:58 Florian Achleitner wrote: > > ==Remote helper for Subversion== > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-05 13:36 ` Florian Achleitner @ 2012-04-05 15:47 ` Dmitry Ivankov 2012-04-09 18:59 ` Stephen Bash 2012-04-10 17:17 ` Jonathan Nieder 2 siblings, 0 replies; 46+ messages in thread From: Dmitry Ivankov @ 2012-04-05 15:47 UTC (permalink / raw) To: Florian Achleitner Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier Hi! On Thu, Apr 5, 2012 at 7:36 PM, Florian Achleitner <florian.achleitner2.6.31@gmail.com> wrote: > Hi everybody! > > Thanks for your inputs. I've now submitted a slightly updated version of my > proposal to google. Additionally it's on github [1]. > > Summary of diffs: > I'll concentrate on the fetching from svn, writing a remote helper without > branch detection (like svn-fe) first, and then creating the branch mapper. I think that the general goal should include a possibility to clone "svn:// clone" (not necessarily exactly "clone", special easy to use command/script looks fine too) so that this new clone is able to fetch and push too. This is a new feature compared to git-svn.perl and allows to share svn->git conversion result. Not completely trivial, but cool to have. At least I recommend to keep it in mind during design phase(s). Though it is not a must as there are many many other cool things to implement in git-svn area :) > > [1] https://github.com/flyingflo/git/wiki/ > > -- Florian > > On Monday 02 April 2012 10:30:58 Florian Achleitner wrote: > >> >> ==Remote helper for Subversion== >> > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-05 13:36 ` Florian Achleitner 2012-04-05 15:47 ` Dmitry Ivankov @ 2012-04-09 18:59 ` Stephen Bash 2012-04-10 17:17 ` Jonathan Nieder 2 siblings, 0 replies; 46+ messages in thread From: Stephen Bash @ 2012-04-09 18:59 UTC (permalink / raw) To: Florian Achleitner Cc: Ramkumar Ramachandra, David Barr, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov, Git Mailing List ----- Original Message ----- > From: "Florian Achleitner" <florian.achleitner2.6.31@gmail.com> > Sent: Thursday, April 5, 2012 9:36:40 AM > Subject: Re: GSOC Proposal draft: git-remote-svn > > Thanks for your inputs. I've now submitted a slightly updated version > of my proposal to google. Additionally it's on github [1]. > > Summary of diffs: > I'll concentrate on the fetching from svn, writing a remote helper > without branch detection (like svn-fe) first, and then creating the > branch mapper. > > [1] https://github.com/flyingflo/git/wiki/ Florian - I just skimmed the github page since I've been away for a week. Not to toot my own horn to much, there's a lot of good discussion about svn-isms in my thread from 2010 (starts at [1], but most of the good stuff is the discussion that follows). I didn't see it in the references, and it probably doesn't need to be there, but if you haven't seen it yet, take a look at it (and cringe at my horrible abuse of git in my early days... ugh!). [1] http://article.gmane.org/gmane.comp.version-control.git/159054 Thanks, Stephen ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-05 13:36 ` Florian Achleitner 2012-04-05 15:47 ` Dmitry Ivankov 2012-04-09 18:59 ` Stephen Bash @ 2012-04-10 17:17 ` Jonathan Nieder 2012-04-10 22:30 ` Andrew Sayers ` (4 more replies) 2 siblings, 5 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-10 17:17 UTC (permalink / raw) To: Florian Achleitner Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Hi, Florian Achleitner wrote: > Thanks for your inputs. I've now submitted a slightly updated version of my > proposal to google. Additionally it's on github [1]. > > Summary of diffs: > I'll concentrate on the fetching from svn, writing a remote helper without > branch detection (like svn-fe) first, and then creating the branch mapper. Thanks for the update. If I understand correctly, the remote helper from the first half would do essentially the same thing as Dmitry's remote-svn-alpha script. Since in shell script form it is very simple, I don't think it should take more than a couple of days to write such a thing in C. > Timeline > > GSoC timeline and summer holidays > Summer holidays in Austria at 9th of July. So until the mid-term > evaluations my git project will have co-exist with my regular > university work and projects. But holidays extend until the beginning > of October, so there’s some time left to catch up after the official > end of GSoC. Another possibility that some people in similar situations have followed is to start early. That works a little better since it means that by the time midterm evaluations come around we can have a reasonable idea of whether a change in strategy is needed for the project to finished on time. > I plan to split the project in two parts: > > Writing the remote helper using existing functions in vcs-svn to > import svn history without detecting branches, like svn-fe does. > Milestone: 9th of July, GSoC mid-term > > Writing a branch mapper for the remote helper that reads the config > language (SBL) and imports branches trying to deal as good as possible > with all the little pitfalls that will occur. Milestone: 20th of > August, GSoC end Could you flesh out this timeline more? Ideally it would be nice to have a definite plan here, even to the point of listing what patches would need to be written, so during the summer all that would need to happen is to execute and deal with bugs as they come. Given the goal described here of an import with support for automatically detecting branches, here are some rough steps I imagine would be involved: . baseline: remote helper in C . option to import starting with a particular numbered revision. This would be good practice for seeing how options passed to "git clone -c" can be read from the config file. . option or URL schema to import a single project from a large Subversion repository that houses several projects. This would already be useful in practice since importing the entire Apache Software Foundation repository takes a while which is a waste when one only wants the history of the Subversion project. How should the importer handle Subversion copy commands that refer to other projects in this case? . automatically detecting trunk when importing a project with the standard layout. The trunk usually is not branched from elsewhere so this does not require copyfrom info. Some design questions come up here: should the remote helper import the entire project tree, too? (I think "yes", since copy commands that copy from other branches are very common and that would ensure the relevant info is available to git.) What should the mapping of git commit names to Subversion revision numbers that is stored in notes say in this case? . detecting trunk and branches and exposing them as different remote branches. This is a small step that just involves understanding how remote helpers expose branches. . storing path properties and copyfrom information in the commits produced by the vcs-svn/ library. How should these be stored? For example, there could be a parallel directory structure in the tree: foo/ bar.c baz/ qux.c .properties/ foo.properties foo/ bar.c.properties baz/ qux.c.properties with properites for <path> stored at .properties/<path>.properties. This strawman scheme doesn't work if the repository being imported has any paths ending with ".properties", though. Ideas? . tracing history past branch creation events, using the now-saved copyfrom information. . tracing second-parent history using svn:mergeinfo properties. In other words, in the above list the strategy is: 1. First convert the remote helper to C so it doesn't have to be translated again later. 2. Teach the remote helper to import a single project from a repository that houses multiple projects (i.e., path limiting). 3. Teach the remote helper to split an imported project that uses the standard layout into branches (an application of the code from (2)). This complicates the scheme for mapping between Subversion revision numbers and git commit ids. 4. Teach the SVN dumpfile to fast-import stream converter not to lose the information that is needed in order to get parenthood information. 5. Use the information from step (4) to get parenthood right for a project split into branches. 6. Getting the second parent right (i.e., merges). I mentioned this for fun but I don't expect there to be time for it. Does that seem right, or does it need tweaks? How long would each step take? Can the steps be subdivided into smaller steps? Another question is: what is the design for this? With the existing remote-svn-alpha script, there are a few different components with well defined interfaces: commands like "git fetch" | | (1) | transport-helper --- (2) --- git fast-import | | | (2, 3) | | | remote-svn-alpha | (3) | ''.. | | (2) ''(2).. | | ''.. | svnrdump --------- (3) -------- svn-fe (1) communicates using function calls and shared data (2) launches (3) communicates over pipe Once remote-svn-alpha is rewritten in C, the same structure is still present, though it might be less obvious because some of the (2) and (3) can change into (1). Where does the functionality you are adding fit into this picture? Are there any new components being added, and if so what do they take as input and output? Hope that helps, Jonathan > [1] https://github.com/flyingflo/git/wiki/ ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 17:17 ` Jonathan Nieder @ 2012-04-10 22:30 ` Andrew Sayers 2012-04-10 23:46 ` Jonathan Nieder 2012-04-11 19:09 ` Florian Achleitner 2012-04-11 15:51 ` Jakub Narebski ` (3 subsequent siblings) 4 siblings, 2 replies; 46+ messages in thread From: Andrew Sayers @ 2012-04-10 22:30 UTC (permalink / raw) To: Jonathan Nieder Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov On 10/04/12 18:17, Jonathan Nieder wrote: <snip> > Given the goal described here of an import with support for > automatically detecting branches, here are some rough steps I imagine > would be involved: Just to be clear, my understanding is that this project will take SBL created by another program (that I'm writing) and create branches as specified. This frees Florian from having to deal with the maze of edge cases involved in that part of the problem. > > . baseline: remote helper in C > > . option to import starting with a particular numbered revision. > This would be good practice for seeing how options passed to > "git clone -c" can be read from the config file. > > . option or URL schema to import a single project from a large > Subversion repository that houses several projects. This would > already be useful in practice since importing the entire Apache > Software Foundation repository takes a while which is a waste > when one only wants the history of the Subversion project. > > How should the importer handle Subversion copy commands that > refer to other projects in this case? This is a good point. I've just svnadmin and svnrdump, and it turns out svnadmin doesn't allow you to dump a subtree while svnrdump strips out the offending copy commands, so either way there's nothing to be done. > . automatically detecting trunk when importing a project with the > standard layout. The trunk usually is not branched from elsewhere > so this does not require copyfrom info. Some design questions > come up here: should the remote helper import the entire project > tree, too? (I think "yes", since copy commands that copy from > other branches are very common and that would ensure the relevant > info is available to git.) What should the mapping of git commit > names to Subversion revision numbers that is stored in notes say > in this case? > > . detecting trunk and branches and exposing them as different remote > branches. This is a small step that just involves understanding > how remote helpers expose branches. After last week's discussion about branch absorption, I tried writing another algorithm over the weekend. I plan to test it during the week, but online detection of branches and trunks looks fairly practical in most real world cases (even those that are sanitily challenged). > . storing path properties and copyfrom information in the commits > produced by the vcs-svn/ library. How should these be stored? > For example, there could be a parallel directory structure > in the tree: Yes, this is an important problem. It became apparent over the weekend that my code was I/O bound, so I started caching the metadata I need (without e.g. file contents) in a gzipped file containing a list of JSON blobs (one blob per revision). That immediately caused the script to jump from about a hundred revisions/second to a few thousand(!), and each further size optimisation caused it to jump by another few thousand per second. This sort of speed is useful for the initial SVN->git conversion, because it means even people with very large repositories can have a quick edit/compile/test loop when they're looking for mis-detected branches. Having said all that, a git directory is easier to examine and update than a gzipped file. I have no idea what the performance would be like, but even if a directory was slower we could use gzipped JSON as a cache layer during the initial import, then throw it away and read straight from a git directory on update. > . tracing history past branch creation events, using the now-saved > copyfrom information. I'm not sure if I understand correctly, but I think you're referring to this edge case: mkdir tronk brunches svn add tronk brunches svn ci -m "Initial commit, with typos to evade stdlayout detection" mkdir tronk/libfoo touch tronk/libfoo/main.c svn add tronk/libfoo svn ci -m "Created libfoo - no way to know this isn't a branch" svn up # so the 'svn cp' works correctly below svn cp tronk brunches/copy_of_tronk touch brunches/copy_of_tronk/main.c svn add brunches/copy_of_tronk/main.c svn ci -m "Marking the copy as a branch, but what about the original?" I'm not actually sure what the right behaviour is here. You could argue that once we know "copy_of_tronk" is a branch, it follows that "tronk" itself is a branch. On the other hand, these directories have diverged, and who's to say it wasn't because of a disagreement about which directory was the branch? Branch absorption makes this problem less important - the "tronk/libfoo" branch will be deleted and merged into the new "tronk" branch the moment someone creates "tronk/main.c", which tends to happen pretty quickly in the real world. I'm open to suggestions, but my instinct right now is to say that communicating branchiness back through a copyfrom should at least require confirmation by the user. > . tracing second-parent history using svn:mergeinfo properties. My old POC code did this, and I plan to include it in the work I'm doing now. I expect this to be the hardest single part of the project to solve in the general case, because of SVN's troubled approach to merge handling. <snip> > Another question is: what is the design for this? Here's my part of the equation: Right now I have a script that first takes an SVN dump and produces gzipped JSON as output, then takes the gzipped JSON as input and produces an SBL file as output. The first round will generally only need to be run once (and is comparable to svn-fe in speed), whereas the second round might need to be run an arbitrary number of times (but is very fast). Incidentally, the initial cache generation is the only part that's still tied to the SVN dump format, and I doubt it would be that hard for someone to rewrite it inside svn-fe or to make it read from git metadata in future. I'm currently focussing on bringing all the modules up to release quality, so that I can have something for Florian to play with in the near future. This should have an interface that is mature but flexible, so I can change the interface to make his life easier but won't need to change the interface because I missed something. After that, I'll concentrate on improving the quality of the SBL output. - Andrew ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 22:30 ` Andrew Sayers @ 2012-04-10 23:46 ` Jonathan Nieder 2012-04-11 19:09 ` Florian Achleitner 1 sibling, 0 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-10 23:46 UTC (permalink / raw) To: Andrew Sayers Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov Andrew Sayers wrote: > Just to be clear, my understanding is that this project will take SBL > created by another program (that I'm writing) and create branches as > specified. If that seems like the right thing to do for the people involved (Florian and mentor, list consensus) and if that's easy. I'm happy as long as the default configuration works well with sane repositories. [...] > On 10/04/12 18:17, Jonathan Nieder wrote: >> How should the importer handle Subversion copy commands that >> refer to other projects in this case? > > This is a good point. I've just svnadmin and svnrdump, and it turns out > svnadmin doesn't allow you to dump a subtree while svnrdump strips out > the offending copy commands, so either way there's nothing to be done. >From a quick test, it looks like svnrdump converts a directory copy into the addition of its contents. Good. svndumpfilter produces svndumpfilter: E200003: Invalid copy source path '/branches/foo/subdir' and exits with status 1 so it seems like we're ok. [...] >> . tracing history past branch creation events, using the now-saved >> copyfrom information. > > I'm not sure if I understand correctly, but I think you're referring to > this edge case: Nope, I'm talking about the most typical and boring case there is: svn cp <repo>/trunk <repo>/branches/topic When cloning <repo>, it seems reasonable to expect that the ancestry of the trunk and branch would not be shown as disjoint linear histories, but that the revision in which the branch was introduced would be shown as a child of the previous revision of the trunk, like so: o --- o --- o [topic] / o --- o --- o --- o --- o --- o [trunk] This requires paying attention to copyfrom information. [...] > Right now I have a script that first takes an SVN dump and produces > gzipped JSON as output, then takes the gzipped JSON as input and > produces an SBL file as output. The first round will generally only > need to be run once (and is comparable to svn-fe in speed), whereas the > second round might need to be run an arbitrary number of times (but is > very fast). For what it's worth, for importing from repositories that use a nonstandard layout I do think this "start with a quick pass to figure the layout out" approach is a sane one. [...] > I'm currently focussing on bringing all the modules up to release > quality, so that I can have something for Florian to play with in the > near future. This should have an interface that is mature but flexible, > so I can change the interface to make his life easier but won't need to > change the interface because I missed something. After that, I'll > concentrate on improving the quality of the SBL output. Neat. Thanks for some useful clarifications. Jonathan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 22:30 ` Andrew Sayers 2012-04-10 23:46 ` Jonathan Nieder @ 2012-04-11 19:09 ` Florian Achleitner 2012-04-14 22:57 ` Andrew Sayers 1 sibling, 1 reply; 46+ messages in thread From: Florian Achleitner @ 2012-04-11 19:09 UTC (permalink / raw) To: Andrew Sayers Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov On Tuesday 10 April 2012 23:30:21 you wrote: > On 10/04/12 18:17, Jonathan Nieder wrote: > <snip> > > > Given the goal described here of an import with support for > > automatically detecting branches, here are some rough steps I imagine > > > would be involved: > Just to be clear, my understanding is that this project will take SBL > created by another program (that I'm writing) and create branches as > specified. This frees Florian from having to deal with the maze of edge > cases involved in that part of the problem. Furthermore the remote-helper has no way of asking the user something, right? So it can only fail if something is ambigous in the svn repository layout. So I thought the SBL is exactly to describe these cases, and that's what I need. > [..] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-11 19:09 ` Florian Achleitner @ 2012-04-14 22:57 ` Andrew Sayers 0 siblings, 0 replies; 46+ messages in thread From: Andrew Sayers @ 2012-04-14 22:57 UTC (permalink / raw) To: Florian Achleitner Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov On 11/04/12 20:09, Florian Achleitner wrote: > Furthermore the remote-helper has no way of asking the user something, right? > So it can only fail if something is ambigous in the svn repository layout. So > I thought the SBL is exactly to describe these cases, and that's what I need. Sorry, I missed this when it was first posted. I'm not sure whether the remote helper is allowed to ask the user things, but there can be times when that would be helpful. The one that jumps to mind is tag handling. SVN considers tags and branches to be functionally identical, whereas git likes to create "annotated tags" (commits with a special tag message on top of the normal commit message) that can't be changed once they've been created. So if e.g. a tag is created then later committed to again, what do you do? Do you refuse to make annotated tags in case you need to change them later? Do you ignore later commits so that annotated tags work nicely? SBL can't provide much help here, as a tag could be created in one update, then committed to again in another update. Last time this was discussed[1], the consensus seemed to be that there any clever solution would drive straight past "it just works" into "why did it do that?" territory, so the only sensible solution would be to ask what to do. As I say, I don't really know anything about remote helpers, but I'd be very surprised if you weren't allowed to at least fail with a message like "Please set svn.tagStrategy, see `man git-config` for details". - Andrew [1]http://thread.gmane.org/gmane.comp.version-control.git/192106/focus=192286 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 17:17 ` Jonathan Nieder 2012-04-10 22:30 ` Andrew Sayers @ 2012-04-11 15:51 ` Jakub Narebski 2012-04-11 15:56 ` Jonathan Nieder 2012-04-11 19:20 ` Florian Achleitner ` (2 subsequent siblings) 4 siblings, 1 reply; 46+ messages in thread From: Jakub Narebski @ 2012-04-11 15:51 UTC (permalink / raw) To: Jonathan Nieder Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Jonathan Nieder <jrnieder@gmail.com> writes: > 2. Teach the remote helper to import a single project from a > repository that houses multiple projects (i.e., path limiting). > > 3. Teach the remote helper to split an imported project that uses > the standard layout into branches (an application of the code > from (2)). This complicates the scheme for mapping between > Subversion revision numbers and git commit ids. Can't we use the either peg rev notation of externals, or the notation that Subversion itself uses for svn:mergeinfo? -- Jakub Narebski ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-11 15:51 ` Jakub Narebski @ 2012-04-11 15:56 ` Jonathan Nieder 0 siblings, 0 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-11 15:56 UTC (permalink / raw) To: Jakub Narebski Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Jakub Narebski wrote: > Jonathan Nieder <jrnieder@gmail.com> writes: >> 2. Teach the remote helper to import a single project from a >> repository that houses multiple projects (i.e., path limiting). >> >> 3. Teach the remote helper to split an imported project that uses >> the standard layout into branches (an application of the code >> from (2)). This complicates the scheme for mapping between >> Subversion revision numbers and git commit ids. > > Can't we use the either peg rev notation of externals, or the notation > that Subversion itself uses for svn:mergeinfo? Maybe. ;-) Could you give an example? Where would the text in this notation be stored in the git repository? How are lookups performed? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 17:17 ` Jonathan Nieder 2012-04-10 22:30 ` Andrew Sayers 2012-04-11 15:51 ` Jakub Narebski @ 2012-04-11 19:20 ` Florian Achleitner 2012-04-11 19:44 ` Dmitry Ivankov 2012-04-11 19:53 ` Jonathan Nieder 2012-04-12 15:28 ` Florian Achleitner 2012-04-18 20:16 ` Florian Achleitner 4 siblings, 2 replies; 46+ messages in thread From: Florian Achleitner @ 2012-04-11 19:20 UTC (permalink / raw) To: Jonathan Nieder Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: > Hi, > > Florian Achleitner wrote: > > Thanks for your inputs. I've now submitted a slightly updated version of > > my proposal to google. Additionally it's on github [1]. > > > > Summary of diffs: > > I'll concentrate on the fetching from svn, writing a remote helper > > without branch detection (like svn-fe) first, and then creating the > > branch mapper. > Thanks for the update. > > If I understand correctly, the remote helper from the first half would > do essentially the same thing as Dmitry's remote-svn-alpha script. > Since in shell script form it is very simple, I don't think it should > take more than a couple of days to write such a thing in C. If the remote-svn-alpha script is really all that needs to be done, you're right. It just pipes through svn-fe. I thought svn-fe could only import an svn repo initially, and there would be some difference between importing the whole history and fetching new revisions later, (?). > Via > > Timeline > > > > GSoC timeline and summer holidays > > Summer holidays in Austria at 9th of July. So until the mid-term > > evaluations my git project will have co-exist with my regular > > university work and projects. But holidays extend until the beginning > > of October, so there’s some time left to catch up after the official > > end of GSoC. > > Another possibility that some people in similar situations have > followed is to start early. That works a little better since it means > that by the time midterm evaluations come around we can have a > reasonable idea of whether a change in strategy is needed for the > project to finished on time. > > > I plan to split the project in two parts: > > > > Writing the remote helper using existing functions in vcs-svn to > > import svn history without detecting branches, like svn-fe does. > > Milestone: 9th of July, GSoC mid-term > > > > Writing a branch mapper for the remote helper that reads the config > > language (SBL) and imports branches trying to deal as good as possible > > with all the little pitfalls that will occur. Milestone: 20th of > > August, GSoC end > > Could you flesh out this timeline more? Ideally it would be nice to > have a definite plan here, even to the point of listing what patches > would need to be written, so during the summer all that would need to > happen is to execute and deal with bugs as they come. Listing patches and planing all details in the submitted proposal would require me to know what I do and how I will do it all before last Friday! As I'm not yet an expert on this topic, I don't know how I could have known all details a-priori. Of course the project's documentation will evolve outside the GSoC project proposal, which cannot be changed anymore. > > Given the goal described here of an import with support for > automatically detecting branches, here are some rough steps I imagine > would be involved: > > . baseline: remote helper in C > > . option to import starting with a particular numbered revision. > This would be good practice for seeing how options passed to > "git clone -c" can be read from the config file. > > . option or URL schema to import a single project from a large > Subversion repository that houses several projects. This would > already be useful in practice since importing the entire Apache > Software Foundation repository takes a while which is a waste > when one only wants the history of the Subversion project. > > How should the importer handle Subversion copy commands that > refer to other projects in this case? > > . automatically detecting trunk when importing a project with the > standard layout. The trunk usually is not branched from elsewhere > so this does not require copyfrom info. Some design questions > come up here: should the remote helper import the entire project > tree, too? (I think "yes", since copy commands that copy from > other branches are very common and that would ensure the relevant > info is available to git.) What should the mapping of git commit > names to Subversion revision numbers that is stored in notes say > in this case? > > . detecting trunk and branches and exposing them as different remote > branches. This is a small step that just involves understanding > how remote helpers expose branches. > > . storing path properties and copyfrom information in the commits > produced by the vcs-svn/ library. How should these be stored? > For example, there could be a parallel directory structure > in the tree: > > foo/ > bar.c > baz/ > qux.c > .properties/ > foo.properties > foo/ > bar.c.properties > baz/ > qux.c.properties > > with properites for <path> stored at .properties/<path>.properties. > This strawman scheme doesn't work if the repository being imported > has any paths ending with ".properties", though. Ideas? > > . tracing history past branch creation events, using the now-saved > copyfrom information. > > . tracing second-parent history using svn:mergeinfo properties. > > In other words, in the above list the strategy is: > > 1. First convert the remote helper to C so it doesn't have to be > translated again later. > > 2. Teach the remote helper to import a single project from a > repository that houses multiple projects (i.e., path limiting). > > 3. Teach the remote helper to split an imported project that uses > the standard layout into branches (an application of the code > from (2)). This complicates the scheme for mapping between > Subversion revision numbers and git commit ids. > > 4. Teach the SVN dumpfile to fast-import stream converter not to > lose the information that is needed in order to get parenthood > information. > > 5. Use the information from step (4) to get parenthood right for a > project split into branches. > > 6. Getting the second parent right (i.e., merges). I mentioned > this for fun but I don't expect there to be time for it. > > Does that seem right, or does it need tweaks? How long would each > step take? Can the steps be subdivided into smaller steps? > > Another question is: what is the design for this? With the existing > remote-svn-alpha script, there are a few different components with > well defined interfaces: > > commands like "git fetch" > > | (1) > > transport-helper --- (2) --- git fast-import > > | (2, 3) | > > remote-svn-alpha | (3) > > | ''.. | > | > | (2) ''(2).. | > | > | ''.. | > > svnrdump --------- (3) -------- svn-fe > > (1) communicates using function calls and shared data > (2) launches > (3) communicates over pipe > > Once remote-svn-alpha is rewritten in C, the same structure is still > present, though it might be less obvious because some of the (2) > and (3) can change into (1). > > Where does the functionality you are adding fit into this picture? > Are there any new components being added, and if so what do they take > as input and output? I planned to implement a remote-helper using the existing interface specification to communicate over pipes with git's transport-helper. Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions directly from the remote-helper and place new functions in this directory (?). To communicate with svn, the remote-helper launches svnrdump as a subprocess. Additionally the remote-helper will read a configuration file containing additional information about branch-mapping, this should be closely related to Andrew's SBL. > > Hope that helps, > Jonathan > > > [1] https://github.com/flyingflo/git/wiki/ Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-11 19:20 ` Florian Achleitner @ 2012-04-11 19:44 ` Dmitry Ivankov 2012-04-11 19:53 ` Jonathan Nieder 1 sibling, 0 replies; 46+ messages in thread From: Dmitry Ivankov @ 2012-04-11 19:44 UTC (permalink / raw) To: Florian Achleitner Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier On Thu, Apr 12, 2012 at 1:20 AM, Florian Achleitner <florian.achleitner2.6.31@gmail.com> wrote: > On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: >> Hi, >> >> Florian Achleitner wrote: >> > Thanks for your inputs. I've now submitted a slightly updated version of >> > my proposal to google. Additionally it's on github [1]. >> > >> > Summary of diffs: >> > I'll concentrate on the fetching from svn, writing a remote helper >> > without branch detection (like svn-fe) first, and then creating the >> > branch mapper. >> Thanks for the update. >> >> If I understand correctly, the remote helper from the first half would >> do essentially the same thing as Dmitry's remote-svn-alpha script. >> Since in shell script form it is very simple, I don't think it should >> take more than a couple of days to write such a thing in C. > > If the remote-svn-alpha script is really all that needs to be done, you're > right. It just pipes through svn-fe. I thought svn-fe could only import an svn > repo initially, and there would be some difference between importing the whole > history and fetching new revisions later, (?). I've already forgotten the exact details, but svnrdump --incremental from r0 to rX and then from rX+1 to Y is the same (modulo small dump header) as from r0 to rY. And svn-fe is able to continue like this too (maybe some bits of this are not merged, sadly I've forgotten this too). A side note is that svnrdump can't do the same trick for rZ, Z>1 (that's shallow clone) starting point as --incremental may produce delta references to rX, X<Z. So svnrdump rZ..rY is ok, but it's impossible to continue this with svnrdump --incremental rY+1..rX. Though it probably is not too hard to fix from inside svnrdump (disable deltas agains given"old"-threshold revs) or if the helper becomes very smart about partial history import it may be done from outside svnrdump, obviously via calling a new svnrdump request for the needed data and somehow glueing it together. > >> Via >> > Timeline >> > >> > GSoC timeline and summer holidays >> > Summer holidays in Austria at 9th of July. So until the mid-term >> > evaluations my git project will have co-exist with my regular >> > university work and projects. But holidays extend until the beginning >> > of October, so there’s some time left to catch up after the official >> > end of GSoC. >> >> Another possibility that some people in similar situations have >> followed is to start early. That works a little better since it means >> that by the time midterm evaluations come around we can have a >> reasonable idea of whether a change in strategy is needed for the >> project to finished on time. >> >> > I plan to split the project in two parts: >> > >> > Writing the remote helper using existing functions in vcs-svn to >> > import svn history without detecting branches, like svn-fe does. >> > Milestone: 9th of July, GSoC mid-term >> > >> > Writing a branch mapper for the remote helper that reads the config >> > language (SBL) and imports branches trying to deal as good as possible >> > with all the little pitfalls that will occur. Milestone: 20th of >> > August, GSoC end >> >> Could you flesh out this timeline more? Ideally it would be nice to >> have a definite plan here, even to the point of listing what patches >> would need to be written, so during the summer all that would need to >> happen is to execute and deal with bugs as they come. > > Listing patches and planing all details in the submitted proposal would > require me to know what I do and how I will do it all before last Friday! As > I'm not yet an expert on this topic, I don't know how I could have known all > details a-priori. > Of course the project's documentation will evolve outside the GSoC project > proposal, which cannot be changed anymore. > >> >> Given the goal described here of an import with support for >> automatically detecting branches, here are some rough steps I imagine >> would be involved: >> >> . baseline: remote helper in C >> >> . option to import starting with a particular numbered revision. >> This would be good practice for seeing how options passed to >> "git clone -c" can be read from the config file. >> >> . option or URL schema to import a single project from a large >> Subversion repository that houses several projects. This would >> already be useful in practice since importing the entire Apache >> Software Foundation repository takes a while which is a waste >> when one only wants the history of the Subversion project. >> >> How should the importer handle Subversion copy commands that >> refer to other projects in this case? >> >> . automatically detecting trunk when importing a project with the >> standard layout. The trunk usually is not branched from elsewhere >> so this does not require copyfrom info. Some design questions >> come up here: should the remote helper import the entire project >> tree, too? (I think "yes", since copy commands that copy from >> other branches are very common and that would ensure the relevant >> info is available to git.) What should the mapping of git commit >> names to Subversion revision numbers that is stored in notes say >> in this case? >> >> . detecting trunk and branches and exposing them as different remote >> branches. This is a small step that just involves understanding >> how remote helpers expose branches. >> >> . storing path properties and copyfrom information in the commits >> produced by the vcs-svn/ library. How should these be stored? >> For example, there could be a parallel directory structure >> in the tree: >> >> foo/ >> bar.c >> baz/ >> qux.c >> .properties/ >> foo.properties >> foo/ >> bar.c.properties >> baz/ >> qux.c.properties >> >> with properites for <path> stored at .properties/<path>.properties. >> This strawman scheme doesn't work if the repository being imported >> has any paths ending with ".properties", though. Ideas? >> >> . tracing history past branch creation events, using the now-saved >> copyfrom information. >> >> . tracing second-parent history using svn:mergeinfo properties. >> >> In other words, in the above list the strategy is: >> >> 1. First convert the remote helper to C so it doesn't have to be >> translated again later. >> >> 2. Teach the remote helper to import a single project from a >> repository that houses multiple projects (i.e., path limiting). >> >> 3. Teach the remote helper to split an imported project that uses >> the standard layout into branches (an application of the code >> from (2)). This complicates the scheme for mapping between >> Subversion revision numbers and git commit ids. >> >> 4. Teach the SVN dumpfile to fast-import stream converter not to >> lose the information that is needed in order to get parenthood >> information. >> >> 5. Use the information from step (4) to get parenthood right for a >> project split into branches. >> >> 6. Getting the second parent right (i.e., merges). I mentioned >> this for fun but I don't expect there to be time for it. >> >> Does that seem right, or does it need tweaks? How long would each >> step take? Can the steps be subdivided into smaller steps? >> >> Another question is: what is the design for this? With the existing >> remote-svn-alpha script, there are a few different components with >> well defined interfaces: >> >> commands like "git fetch" >> >> | (1) >> >> transport-helper --- (2) --- git fast-import >> >> | (2, 3) | >> >> remote-svn-alpha | (3) >> >> | ''.. | >> | >> | (2) ''(2).. | >> | >> | ''.. | >> >> svnrdump --------- (3) -------- svn-fe >> >> (1) communicates using function calls and shared data >> (2) launches >> (3) communicates over pipe >> >> Once remote-svn-alpha is rewritten in C, the same structure is still >> present, though it might be less obvious because some of the (2) >> and (3) can change into (1). >> >> Where does the functionality you are adding fit into this picture? >> Are there any new components being added, and if so what do they take >> as input and output? > > I planned to implement a remote-helper using the existing interface > specification to communicate over pipes with git's transport-helper. > Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions > directly from the remote-helper and place new functions in this directory (?). > To communicate with svn, the remote-helper launches svnrdump as a subprocess. > Additionally the remote-helper will read a configuration file containing > additional information about branch-mapping, this should be closely related to > Andrew's SBL. > >> >> Hope that helps, >> Jonathan >> >> > [1] https://github.com/flyingflo/git/wiki/ > > Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-11 19:20 ` Florian Achleitner 2012-04-11 19:44 ` Dmitry Ivankov @ 2012-04-11 19:53 ` Jonathan Nieder 2012-04-11 22:43 ` Andrew Sayers 2012-04-12 9:02 ` Thomas Rast 1 sibling, 2 replies; 46+ messages in thread From: Jonathan Nieder @ 2012-04-11 19:53 UTC (permalink / raw) To: Florian Achleitner Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Florian Achleitner wrote: > If the remote-svn-alpha script is really all that needs to be done, you're > right. It just pipes through svn-fe. I thought svn-fe could only import an svn > repo initially, and there would be some difference between importing the whole > history and fetching new revisions later, (?). Yes, Dmitry's script (not the first version, but a later one) supports incremental imports without trouble if I remember correctly. [...] > Listing patches and planing all details in the submitted proposal would > require me to know what I do and how I will do it all before last Friday! As > I'm not yet an expert on this topic, I don't know how I could have known all > details a-priori. Oh, I didn't mean you would need to do that alone. :) Dmitry, David, Ram, Sverre, and I should be able to answer any questions you have about how git, vcs-svn, svnrdump, and the transport-helper currently work in the importer. I've marked the proposal editable to allow details to be filled in. [...] > I planned to implement a remote-helper using the existing interface > specification to communicate over pipes with git's transport-helper. > Instead of invoking svn-fe as a subprocess, I want to call vcs-svn/ functions > directly from the remote-helper and place new functions in this directory (?). Ah, this is a good place to start. In my diagram I lumped everything under vcs-svn/ together as svn-fe for convenience, but in fact the vcs-svn lib is made up of multiple components: caller . : | public interface (svndump_init, svndump_read, etc) | | | dump file parser (svndump_read body) | | | fast-export interface (fast_export_*, repo_*) --------- svndiff0 parser | : . git fast-import Each component has a narrow interface. For each action in the dump, svndump_read() calls some appropriate function from the fast-export interface to bring about the corresponding change on the git side. Details of svndump syntax and the state needed to parse it are isolated in svndump.c and details of fast-import syntax are in fast-export.c and repo_tree.c. (The structure used to be more complicated when the repo_* functions had to keep track of the repository state instead of relying on fast-import for that.) Where would the branch mapping go? What kind of state needs to be maintained as it occurs? What steps would I follow to imitate the code and work out a branch mapping on paper? How do I invoke the code if I want to try it out (i.e., what functions form the public interface needed to support branch mapping)? I don't expect you to have answers to all these questions already; I understand that getting used to what's already there and trying out ideas will take time. However, I do think we have a much better chance of this going well if there are answers to these questions by the time the coding period starts. [...] > Additionally the remote-helper will read a configuration file containing > additional information about branch-mapping, this should be closely related to > Andrew's SBL. That sounds reasonable to me. I am somewhat unconvinced (but convinceable) about the need to use a configuration scheme that handles all the edge cases right away. Shouldn't it be enough to tell the importer the following? - the path to the repository (from which it can deduce $SVNROOT and the path within there to the subproject of interest) - a single bit of information on top of that: "this repository uses the standard layout" Once that works, the tools could easily be tweaked to respect a configuration file that describes more complex situations, and as a bonus the SBL tools for making sense of those situations would have time to become more mature in the meantime. Thanks for some useful clarifications. Jonathan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-11 19:53 ` Jonathan Nieder @ 2012-04-11 22:43 ` Andrew Sayers 2012-04-12 9:02 ` Thomas Rast 1 sibling, 0 replies; 46+ messages in thread From: Andrew Sayers @ 2012-04-11 22:43 UTC (permalink / raw) To: Jonathan Nieder Cc: Florian Achleitner, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov On 11/04/12 20:53, Jonathan Nieder wrote: > [...] >> Additionally the remote-helper will read a configuration file containing >> additional information about branch-mapping, this should be closely related to >> Andrew's SBL. > > That sounds reasonable to me. I am somewhat unconvinced (but > convinceable) about the need to use a configuration scheme that > handles all the edge cases right away. Shouldn't it be enough to tell > the importer the following? > > - the path to the repository (from which it can deduce $SVNROOT > and the path within there to the subproject of interest) > > - a single bit of information on top of that: "this repository uses > the standard layout" > > Once that works, the tools could easily be tweaked to respect a > configuration file that describes more complex situations, and as a > bonus the SBL tools for making sense of those situations would have > time to become more mature in the meantime. SBL itself is just a plain text description of which directories are branches etc. - there are a handful of tricky bits on Florian's side of the fence, but it shouldn't be that hard to add everything necessary to parse any arbitrary SBL file. For example, if he gets an SBL action that looks like this: In r105, create branch "/foo" as "foo-bar" from "/bar/baz" r25 ... then the logic that produced that line doesn't really matter, so long as he can convert it to a series of fast-import commands. I started work on exporting branches from SVN a few months ago, and happened to be polishing off SBL when GSoC got going, so my work ties nicely into Florian's. I've been keen to talk about edge cases lately because that's the point I'm at in my work - to make a long story short, I know how to do the easy cases now, and need to veer off into some weird edge cases for a month or two, before swinging back by the standard layout and optimising for that. If Florian needs something that generates SBL before I'm ready, I'd be happy to cobble a basic "standard layout only" script from the modules I've got. - Andrew ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-11 19:53 ` Jonathan Nieder 2012-04-11 22:43 ` Andrew Sayers @ 2012-04-12 9:02 ` Thomas Rast 1 sibling, 0 replies; 46+ messages in thread From: Thomas Rast @ 2012-04-12 9:02 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Florian Achleitner, Git Mailing List, Jeff King [clean up Cc; +Peff] Jonathan Nieder <jrnieder@gmail.com> writes: > I've marked the proposal editable to allow details to be filled in. That went wrong, or somebody toggled it back. Since nobody objected here, I'm assuming it was fine, and set it to editable again. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 17:17 ` Jonathan Nieder ` (2 preceding siblings ...) 2012-04-11 19:20 ` Florian Achleitner @ 2012-04-12 15:28 ` Florian Achleitner 2012-04-12 22:30 ` Andrew Sayers 2012-04-13 19:19 ` Jonathan Nieder 2012-04-18 20:16 ` Florian Achleitner 4 siblings, 2 replies; 46+ messages in thread From: Florian Achleitner @ 2012-04-12 15:28 UTC (permalink / raw) To: Jonathan Nieder Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov Hi! Let's discuss the details as suggested by Jonathan! I will collect them in the wiki, leading to a more elaborated project plan at the end. It's rather hard to keep an overview over all the issues and pitfalls that may exist, and over all the existing discussions, and whether there was an solution or the issue is still unsolved. So I want to create some collection of information with your support. On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: > Given the goal described here of an import with support for > automatically detecting branches, here are some rough steps I imagine > would be involved: > > . baseline: remote helper in C > > . option to import starting with a particular numbered revision. > This would be good practice for seeing how options passed to > "git clone -c" can be read from the config file. Really -c? My installed git doesn't have that switch. Should it pass arguments to the remote-helper? > > . option or URL schema to import a single project from a large > Subversion repository that houses several projects. This would > already be useful in practice since importing the entire Apache > Software Foundation repository takes a while which is a waste > when one only wants the history of the Subversion project. > > How should the importer handle Subversion copy commands that > refer to other projects in this case? Jonathan tried that, it's handled by svnrdump nicely. > > . automatically detecting trunk when importing a project with the > standard layout. The trunk usually is not branched from elsewhere > so this does not require copyfrom info. Some design questions > come up here: should the remote helper import the entire project > tree, too? (I think "yes", since copy commands that copy from > other branches are very common and that would ensure the relevant > info is available to git.) What should the mapping of git commit > names to Subversion revision numbers that is stored in notes say > in this case? What does it mean, "import the entire project tree"? Importing other directories than "trunk"? About the mapping of git commits to svn refs .. I've seen the thread about the marks-to-notes converter. But can somebody please explain what it's for? There is this mark file mentioned in the git-fast-import help page .. Do we create two commits from one revision if it's some special case, like modifying two branches at once? > > . detecting trunk and branches and exposing them as different remote > branches. This is a small step that just involves understanding > how remote helpers expose branches. > > . storing path properties and copyfrom information in the commits > produced by the vcs-svn/ library. How should these be stored? > For example, there could be a parallel directory structure > in the tree: > > foo/ > bar.c > baz/ > qux.c > .properties/ > foo.properties > foo/ > bar.c.properties > baz/ > qux.c.properties > > with properites for <path> stored at .properties/<path>.properties. > This strawman scheme doesn't work if the repository being imported > has any paths ending with ".properties", though. Ideas? This includes collecting which metadata we actually need to store? We could probably collect a list of important svn properties. Is there a general policy how to store additional metadata for git's helpers? I guess it would live somewhere in the .git dir. (.git/info/ ?) Dmitry mentioned the case where a git repository that fetched from svn is cloned, and the cloned repo should be able to fetch from svn too. Is there an exisiting concept about metadata in this case? I'm not sure if storing this in a seperate directory tree makes sense, mostly looking at performance. All these files will only contain some bytes, I guess. Andrew, why did you choose JSON? > > . tracing history past branch creation events, using the now-saved > copyfrom information. > > . tracing second-parent history using svn:mergeinfo properties. This is about detection when to create a git merge-commit, right? > > In other words, in the above list the strategy is: .. still to come.. Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-12 15:28 ` Florian Achleitner @ 2012-04-12 22:30 ` Andrew Sayers 2012-04-14 20:09 ` Florian Achleitner 2012-04-13 19:19 ` Jonathan Nieder 1 sibling, 1 reply; 46+ messages in thread From: Andrew Sayers @ 2012-04-12 22:30 UTC (permalink / raw) To: Florian Achleitner Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov [-- Attachment #1: Type: text/plain, Size: 3801 bytes --] On 12/04/12 16:28, Florian Achleitner wrote: > > I'm not sure if storing this in a seperate directory tree makes sense, mostly > looking at performance. All these files will only contain some bytes, I guess. > Andrew, why did you choose JSON? > JSON has become my default storage format in recent years, so it seemed like the natural thing to use for a format I wanted to chuck in and get on with my work :) JSON is my default format because it's reasonably space-efficient, human-readable, widely supported and can represent everything I care about except recursive data structures (which I didn't need for this job). You can do cleverer things if you don't mind being language-specific (e.g. Perl's "Storable" module supports recursive data structures but can't be used with other languages) or if you don't mind needing special tools (e.g. git's index is highly efficient but can't be debugged with `less`). I've found you won't go far wrong if you start with JSON and pick something else when the requirements become more obvious. I gzipped the file because JSON isn't *that* space-efficient, and because very large repositories are likely to produce enough JSON that people will notice. I found that gzipping the file significantly reduced its size without having too much effect on run time. I've attached a sample file representing the first few commits from the GNU R repository. The problem I referred to obliquely before isn't with JSON, but with gzip - how would you add more revisions to the end of the file without gunzipping it, adding one line, then gzipping it again? One very nice feature of a directory structure is that you could store it in git and get all that stuff for free. To be clear, I'm not pushing any particular solution to this problem, just offering some anecdotal evidence. I'm pretty sure that SVN branch export is an I/O bound problem - David Barr has said much the same about svn-fe, but I was surprised to see it was still the bottleneck with a problem that stripped out almost all the data from the dump and pushed it through not-particularly-optimised Perl. Having said that, the initial import problem (potentially hundreds of thousands of revisions needing manual attention) doesn't necessarily want the same solution as update (tens of revisions that can almost always be read automatically). >> . tracing history past branch creation events, using the now-saved >> copyfrom information. >> >> . tracing second-parent history using svn:mergeinfo properties. > > This is about detection when to create a git merge-commit, right? Yes - SVN has always stored metadata about where a directory was copied from (unlike git, which prefers to detect it automatically), and since version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and directories specifying which revisions of which other files or directories have been cherry-picked in to them. If you know a directory is a branch, "copyfrom" metadata is a very useful signal for detecting branches created from it. Unfortunately, "svn:mergeinfo" is not as useful - aside from anything else, older repositories often exhibit a period where there's no metadata at all, then a gradual migration through SVN's early experiments with merge tracking (like svnmerge.py), before everyone gradually standardises on svn:mergeinfo and leaves the other tools behind. Oh, and the interface doesn't tell you about unmerged revisions, so if anybody ever forgets to merge a revision then you'll probably never notice. I'm planning to tackle this stuff in the work I'm doing, but I expect people will be reporting edge cases until the day the last SVN repository shuts down. You shouldn't need to worry about it much on the git side of SBL, which is probably best for your sanity ;) - Andrew [-- Attachment #2: repo.json.gz --] [-- Type: application/x-gzip, Size: 466 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-12 22:30 ` Andrew Sayers @ 2012-04-14 20:09 ` Florian Achleitner 2012-04-14 21:35 ` Andrew Sayers 0 siblings, 1 reply; 46+ messages in thread From: Florian Achleitner @ 2012-04-14 20:09 UTC (permalink / raw) To: Andrew Sayers Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov Hi! Thanks for your explainations. On Thursday 12 April 2012 23:30:29 Andrew Sayers wrote: > On 12/04/12 16:28, Florian Achleitner wrote: > > I'm not sure if storing this in a seperate directory tree makes sense, > > mostly looking at performance. All these files will only contain some > > bytes, I guess. Andrew, why did you choose JSON? > > JSON has become my default storage format in recent years, so it seemed > like the natural thing to use for a format I wanted to chuck in and get > on with my work :) > > JSON is my default format because it's reasonably space-efficient, > human-readable, widely supported and can represent everything I care > about except recursive data structures (which I didn't need for this > job). You can do cleverer things if you don't mind being > language-specific (e.g. Perl's "Storable" module supports recursive data > structures but can't be used with other languages) or if you don't mind > needing special tools (e.g. git's index is highly efficient but can't be > debugged with `less`). I've found you won't go far wrong if you start > with JSON and pick something else when the requirements become more obvious. > > I gzipped the file because JSON isn't *that* space-efficient, and > because very large repositories are likely to produce enough JSON that > people will notice. I found that gzipping the file significantly > reduced its size without having too much effect on run time. > > I've attached a sample file representing the first few commits from the > GNU R repository. The problem I referred to obliquely before isn't with > JSON, but with gzip - how would you add more revisions to the end of the > file without gunzipping it, adding one line, then gzipping it again? > One very nice feature of a directory structure is that you could store > it in git and get all that stuff for free. > > To be clear, I'm not pushing any particular solution to this problem, > just offering some anecdotal evidence. I'm pretty sure that SVN branch > export is an I/O bound problem - David Barr has said much the same about > svn-fe, but I was surprised to see it was still the bottleneck with a > problem that stripped out almost all the data from the dump and pushed > it through not-particularly-optimised Perl. Having said that, the > initial import problem (potentially hundreds of thousands of revisions > needing manual attention) doesn't necessarily want the same solution as > update (tens of revisions that can almost always be read automatically). JSON seems to be a good initial choice.. > > >> . tracing history past branch creation events, using the now-saved > >> > >> copyfrom information. > >> > >> . tracing second-parent history using svn:mergeinfo properties. > > > > This is about detection when to create a git merge-commit, right? > > Yes - SVN has always stored metadata about where a directory was copied > from (unlike git, which prefers to detect it automatically), and since > version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and > directories specifying which revisions of which other files or > directories have been cherry-picked in to them. > > If you know a directory is a branch, "copyfrom" metadata is a very > useful signal for detecting branches created from it. Unfortunately, > "svn:mergeinfo" is not as useful - aside from anything else, older > repositories often exhibit a period where there's no metadata at all, > then a gradual migration through SVN's early experiments with merge > tracking (like svnmerge.py), before everyone gradually standardises on > svn:mergeinfo and leaves the other tools behind. Oh, and the interface > doesn't tell you about unmerged revisions, so if anybody ever forgets to > merge a revision then you'll probably never notice. This doesn't look very straight forward. In the svn docs they say there is a command that outputs which changesets are eligible to merge. http://svnbook.red- bean.com/en/1.7/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.mergeinfo But I don't know if that helps. > > I'm planning to tackle this stuff in the work I'm doing, but I expect > people will be reporting edge cases until the day the last SVN > repository shuts down. You shouldn't need to worry about it much on the > git side of SBL, which is probably best for your sanity ;) :) > > - Andrew ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-14 20:09 ` Florian Achleitner @ 2012-04-14 21:35 ` Andrew Sayers 2012-04-15 3:13 ` Stephen Bash 0 siblings, 1 reply; 46+ messages in thread From: Andrew Sayers @ 2012-04-14 21:35 UTC (permalink / raw) To: Florian Achleitner Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov On a slightly different topic, here's the only branching edge case I know of that will affect you. I agree with Jonathan that you should focus on the standard layout for now, but I think it's worth having the trickier cases in your head when you're planning things out. Imagine a team does this: # Slight misunderstanding of the standard layout at first: mkdir trunk/project1 trunk/project2 svn add trunk svn ci -m "Initial revsion" # r1 # Time passes, commits are made, people get smarter. # In revision 1000, the team decides to put the structure right: svn rm trunk svn ci -m "Removed incorrect directory name" # r1000 mkdir trunk touch trunk/MOVED_TO_PROJECT_TRUNK svn ci -m "Added signpost file for future reference" # r1001 mkdir project1 project2 svn cp -r 999 trunk/project1 project1/trunk svn cp -r 999 trunk/project2 project2/trunk svn ci -m "Recreated projects with correct directory names" # r1002 This would be represented in SBL something like: In r1, create branch "trunk/project1" In r1, create branch "trunk/project2" # We would prefer just to deactivate these... In r1000, deactivate "trunk/project1" In r1000, deactivate "trunk/project2" # ... but we have to delete them, # because git doesn't support recursive branch names: In r1001, delete branch "trunk/project1" In r1001, delete branch "trunk/project2" In r1001, create branch "trunk" # We deleted the branches, so how do we get the commit to fork from? In r1002, create branch "project1/trunk" from "trunk/project1" r999 In r1002, create branch "project2/trunk" from "trunk/project2" r999 If you look in your ".git/refs/heads/" directory, you'll see git branches are stored as files on disk. So if you have a branch "trunk/project1", you can't create a branch called "trunk" unless you delete the directory called "trunk" first. This unfortunate limitation of an otherwise neat solution means you can't reliably use git branches when retrieving older revisions. Other people will be able to tell you if there's any interest in removing this limitation, but even if there is, users will occasionally change their mind after asking for a branch to be deleted, and be surprised if SVN lets them but git doesn't. One solution you could look at would be storing dead branches in a JSON file somewhere. If you go down that route, remember that `git gc` will try to garbage collect the commits once the branches have been dead for long enough. - Andrew ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-14 21:35 ` Andrew Sayers @ 2012-04-15 3:13 ` Stephen Bash 0 siblings, 0 replies; 46+ messages in thread From: Stephen Bash @ 2012-04-15 3:13 UTC (permalink / raw) To: Andrew Sayers Cc: Jonathan Nieder, Git Mailing List, Ramkumar Ramachandra, David Barr, Sverre Rabbelier, Dmitry Ivankov, Florian Achleitner ----- Original Message ----- > From: "Andrew Sayers" <andrew-git@pileofstuff.org> > Sent: Saturday, April 14, 2012 5:35:59 PM > Subject: Re: GSOC Proposal draft: git-remote-svn > > ... snip ... > > One solution you could look at would be storing dead branches in a > JSON file somewhere. If you go down that route, remember that `git > gc` will try to garbage collect the commits once the branches have > been dead for long enough. I don't remember if this has already been discussed, but as I see it there are basically three approaches to closed/deleted SVN branches in the Git world: 1) Just delete the branch, allow git gc to later cleanup the objects 2) Just leave them be for the user to deal with at a later date 3) Move them to another namespace I think (3) is the only semi-tricky one. If you read the git-gc manpage, it turns out gc will consider any object reachable from any ref under refs/ as safe. When cloning/pushing/pulling/etc. git only looks at refs/heads and refs/tags (unless told otherwise). So for our conversion I created refs/hidden/heads and refs/hidden/tags (other choices could be refs/svn or refs/junk, but you get the idea). Just as a fun stat, the hidden namespace in our central repo has 280 refs in it vs 502 in the visible/normal namespace (surprisingly the hidden ones are almost perfectly split with 138 dead Subversion branches and 142 SVN tags that were later retagged/committed to). Thanks, Stephen ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-12 15:28 ` Florian Achleitner 2012-04-12 22:30 ` Andrew Sayers @ 2012-04-13 19:19 ` Jonathan Nieder 2012-04-14 20:15 ` Florian Achleitner 1 sibling, 1 reply; 46+ messages in thread From: Jonathan Nieder @ 2012-04-13 19:19 UTC (permalink / raw) To: Florian Achleitner Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov, Tomas Carnecky Hi again, Florian Achleitner wrote: > So I want to create some collection of information with your support. Sounds like a plan. Thanks, Florian. [...] > Really -c? My installed git doesn't have that switch. Should it pass arguments > to the remote-helper? What git version do you use? "man git clone" tells me that -c is an abbreviation for --config and "grep -e --config Documentation/RelNotes/*" tells me it was introduced in v1.7.7. [...] > On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: >> How should the importer handle Subversion copy commands that >> refer to other projects in this case? > > Jonathan tried that, it's handled by svnrdump nicely. Yes, except that it does not follow the history of the copy source. So if your project was renamed, then "svnrdump <new SVN URL for the project>" will dump a fictional history in which the first rev under the new name created the project out of thin air. That is not ideal, but it seems tolerable in the short term. >> Some design questions >> come up here: should the remote helper import the entire project >> tree, too? (I think "yes", since copy commands that copy from >> other branches are very common and that would ensure the relevant >> info is available to git.) What should the mapping of git commit >> names to Subversion revision numbers that is stored in notes say >> in this case? > > What does it mean, "import the entire project tree"? Importing other > directories than "trunk"? Yes. For an import that is going to be dumping the subdirectories of tags/ and branches/ anyway, it seems sensible to ask svnrdump to dump the entire {trunk,tags,branches} hierarchy and sort it out on the git side. The question is then: for each rev, in addition to making commits for each branch that changed, should we keep a commit representing the state of the combined whole-project tree for internal use? A person trying to check out this commit would get to see the enormous trunk/ tags/ branches/ directories. My rough answer was "yes, it's convenient to keep that information around, especially given that with git's repository model it doesn't waste a lot of space and makes debugging easier". > About the mapping of git commits to svn refs .. I've seen the thread about the > marks-to-notes converter. > But can somebody please explain what it's for? There is this mark file > mentioned in the git-fast-import help page .. There are two operations that need to be very fast: 1. Given a Subversion revision number, what is the corresponding git commit? svn-fe uses this to get the preimage data when executing an "svn copy" operation that refers to an old rev. For example: svn copy some/path@a-long-time-ago another/path Code tracking branches would use this same map to find the appropriate parent commit for a new branch. For example: svn copy trunk@a-long-time-ago branches/new-branch becomes: parent f572d396fae9206628714fb2ce00f72e94f2258f 2. Given a git commit, what is the corresponding Subversion revision number? For example, "git fetch" needs this information in order to get a first unfetched revision number when updating an existing clone of a Subversion repository. "git notes" is a mechanism for efficiently storing a mapping from git commit names to arbitrary data. For example, it can be used to cache the compiled form of some slow-to-compile source code, or it can be used to store reminders to a human that has reviewed these commits and wanted to scribble a little in the margin. A patch (in Dmitry's tree, not in git.git yet) teaches svn-fe to use the notes facility to store the mapping from git commit names to Subversion revision numbers, addressing problem (2) above. Tomas's human-friendly importer used the same trick. As you noticed, "git fast-import" has a facility that fits well for mapping in the other direction: a marks file can store an arbitrary mapping from numbers to objects (usually objects that were part of the import). svn-fe writes a mark for each Subversion revision it imports to address problem (1) above. Because "git notes" are stored in the git object db as native objects, they can be shared using the usual "git fetch" / "git push" commands as long as you specify the appropriate source and destination refs on the command line or in git's configuration file. Commands like "git rebase" that modify history also have some support for carrying notes along. By contrast, a marks file is just a flat text file and there is no standard facility for updating it when commit names change or sharing it using ordinary git transport. The marks-to-notes converter I wrote was a toy to show how the notes and marks can easily be kept in sync. If I remember correctly the last time this was discussed there was some feeling that when the two tables fall out of synch the notes should be considered authoritative and marks can be recomputed from them. > Do we create two commits from one revision if it's some special case, like > modifying two branches at once? remote-svn-alpha and svn-fe do not currently split by branch at all so the problem doesn't come up. Yes, I think the only sane way to represent a Subversion revision that modifies multiple branches is with a git commit on each branch. [...] >> For example, there could be a parallel directory structure >> in the tree: >> >> foo/ >> bar.c >> baz/ >> qux.c >> .properties/ >> foo.properties >> foo/ >> bar.c.properties >> baz/ >> qux.c.properties >> >> with properites for <path> stored at .properties/<path>.properties. >> This strawman scheme doesn't work if the repository being imported >> has any paths ending with ".properties", though. Ideas? > > This includes collecting which metadata we actually need to store? We could > probably collect a list of important svn properties. I imagined the importer just collecting all path properties, like "git svn" does in its .git/svn/refs/remotes/git-svn/unhandled.log. They're easy to iterate through on the svn side. > Is there a general policy how to store additional metadata for git's helpers? > I guess it would live somewhere in the .git dir. (.git/info/ ?) One simple design would be to keep properties in the "entire project" commit objects for internal use, since that's easy to share. I think David had a few other ideas. ;-) [...] >> . tracing second-parent history using svn:mergeinfo properties. > > This is about detection when to create a git merge-commit, right? Yep. A goal would be to allow a person would be able to push a git merge to an svn repository, fetch from another machine, and get the same commit back. >> In other words, in the above list the strategy is: > > .. still to come.. Thanks for your thoughtfulness. Jonathan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-13 19:19 ` Jonathan Nieder @ 2012-04-14 20:15 ` Florian Achleitner 0 siblings, 0 replies; 46+ messages in thread From: Florian Achleitner @ 2012-04-14 20:15 UTC (permalink / raw) To: Jonathan Nieder Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov, Tomas Carnecky Hi! Thanks for your help. I updated the wiki page. On Friday 13 April 2012 14:19:08 Jonathan Nieder wrote: > > Really -c? My installed git doesn't have that switch. Should it pass > > arguments to the remote-helper? > > What git version do you use? "man git clone" tells me that -c is an > abbreviation for --config and "grep -e --config Documentation/RelNotes/*" > tells me it was introduced in v1.7.7. Sorry, that was clumsy, I should use build and search the docs of the current version, not the one my distro ships! > > > > What does it mean, "import the entire project tree"? Importing other > > directories than "trunk"? > > Yes. For an import that is going to be dumping the subdirectories of > tags/ and branches/ anyway, it seems sensible to ask svnrdump to dump > the entire {trunk,tags,branches} hierarchy and sort it out on the git > side. The question is then: for each rev, in addition to making > commits for each branch that changed, should we keep a commit > representing the state of the combined whole-project tree for internal > use? A person trying to check out this commit would get to see the > enormous > > trunk/ > tags/ > branches/ > > directories. My rough answer was "yes, it's convenient to keep that > information around, especially given that with git's repository model > it doesn't waste a lot of space and makes debugging easier". Sounds reasonable. > > > About the mapping of git commits to svn refs .. I've seen the thread > > about the marks-to-notes converter. > > But can somebody please explain what it's for? There is this mark file > > mentioned in the git-fast-import help page .. > > There are two operations that need to be very fast: > > 1. Given a Subversion revision number, what is the corresponding git > commit? svn-fe uses this to get the preimage data when executing > an "svn copy" operation that refers to an old rev. For example: > > svn copy some/path@a-long-time-ago another/path > > Code tracking branches would use this same map to find the > appropriate parent commit for a new branch. For example: > > svn copy trunk@a-long-time-ago branches/new-branch > > becomes: > > parent f572d396fae9206628714fb2ce00f72e94f2258f > > 2. Given a git commit, what is the corresponding Subversion revision > number? For example, "git fetch" needs this information in order > to get a first unfetched revision number when updating an existing > clone of a Subversion repository. > > "git notes" is a mechanism for efficiently storing a mapping from git > commit names to arbitrary data. For example, it can be used to cache > the compiled form of some slow-to-compile source code, or it can be > used to store reminders to a human that has reviewed these commits and > wanted to scribble a little in the margin. A patch (in Dmitry's tree, > not in git.git yet) teaches svn-fe to use the notes facility to store > the mapping from git commit names to Subversion revision numbers, > addressing problem (2) above. Tomas's human-friendly importer used > the same trick. > > As you noticed, "git fast-import" has a facility that fits well for > mapping in the other direction: a marks file can store an arbitrary > mapping from numbers to objects (usually objects that were part of the > import). svn-fe writes a mark for each Subversion revision it imports > to address problem (1) above. > > Because "git notes" are stored in the git object db as native objects, > they can be shared using the usual "git fetch" / "git push" commands > as long as you specify the appropriate source and destination refs on > the command line or in git's configuration file. Commands like "git > rebase" that modify history also have some support for carrying notes > along. By contrast, a marks file is just a flat text file and there > is no standard facility for updating it when commit names change or > sharing it using ordinary git transport. > > The marks-to-notes converter I wrote was a toy to show how the notes > and marks can easily be kept in sync. If I remember correctly the > last time this was discussed there was some feeling that when the two > tables fall out of synch the notes should be considered authoritative > and marks can be recomputed from them. Oh, thats intersting, I haven't heard of git notes yet. (I should have greped the Documentation ..). Because of the possibility that one revision is transformed into two commits, the bi-directional mapping has to support 1-to-n or probably n-to-n mappings, I think. But this should be possible with these mechanisms. > > > Do we create two commits from one revision if it's some special case, > > like modifying two branches at once? > > remote-svn-alpha and svn-fe do not currently split by branch at all so > the problem doesn't come up. > > Yes, I think the only sane way to represent a Subversion revision that > modifies multiple branches is with a git commit on each branch. > > [...] > > >> For example, there could be a parallel directory structure > >> > >> in the tree: > >> foo/ > >> > >> bar.c > >> > >> baz/ > >> > >> qux.c > >> > >> .properties/ > >> > >> foo.properties > >> foo/ > >> > >> bar.c.properties > >> > >> baz/ > >> > >> qux.c.properties > >> > >> with properites for <path> stored at > >> .properties/<path>.properties. > >> This strawman scheme doesn't work if the repository being > >> imported > >> has any paths ending with ".properties", though. Ideas? > > > > This includes collecting which metadata we actually need to store? We > > could probably collect a list of important svn properties. > > I imagined the importer just collecting all path properties, like "git > svn" does in its .git/svn/refs/remotes/git-svn/unhandled.log. They're > easy to iterate through on the svn side. Ok, and it will be useful for pushing to svn in the future. > > > Is there a general policy how to store additional metadata for git's > > helpers? I guess it would live somewhere in the .git dir. (.git/info/ > > ?) > > One simple design would be to keep properties in the "entire project" > commit objects for internal use, since that's easy to share. > > I think David had a few other ideas. ;-) Commit objects that are actually not commits but store metadata? > > [...] > > >> . tracing second-parent history using svn:mergeinfo properties. > > > > This is about detection when to create a git merge-commit, right? > > Yep. A goal would be to allow a person would be able to push a git > merge to an svn repository, fetch from another machine, and get the > same commit back. > > >> In other words, in the above list the strategy is: > > .. still to come.. > > Thanks for your thoughtfulness. > > Jonathan Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-10 17:17 ` Jonathan Nieder ` (3 preceding siblings ...) 2012-04-12 15:28 ` Florian Achleitner @ 2012-04-18 20:16 ` Florian Achleitner 2012-04-19 12:26 ` Florian Achleitner 4 siblings, 1 reply; 46+ messages in thread From: Florian Achleitner @ 2012-04-18 20:16 UTC (permalink / raw) To: Jonathan Nieder Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov On Tuesday 10 April 2012 12:17:07 Jonathan Nieder wrote: > In other words, in the above list the strategy is: > > 1. First convert the remote helper to C so it doesn't have to be > translated again later. Store rev <--> commit mappings using marks and notes. Store svn metadata. > > 2. Teach the remote helper to import a single project from a > repository that houses multiple projects (i.e., path limiting). I would plan to have this until the mid-term. From that point my summer holidays start .. > > 3. Teach the remote helper to split an imported project that uses > the standard layout into branches (an application of the code > from (2)). This complicates the scheme for mapping between > Subversion revision numbers and git commit ids. Read ambigouos branches/tags from SBL. > > 4. Teach the SVN dumpfile to fast-import stream converter not to > lose the information that is needed in order to get parenthood > information. This means actually saving svn:copyfrom properties. (right?) > > 5. Use the information from step (4) to get parenthood right for a > project split into branches. .. and using svn:copyfrom properties. (right?) > > 6. Getting the second parent right (i.e., merges). I mentioned > this for fun but I don't expect there to be time for it. I think this needs a little morge discussion, let's do this if it's the time. mergeinfo stores a list of revs merged for a file. This looks like a list of git cherry-picks to me .. > > Does that seem right, or does it need tweaks? How long would each > step take? Can the steps be subdivided into smaller steps? What do you think? I will finally add this strategy to the proposal. -- Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSOC Proposal draft: git-remote-svn 2012-04-18 20:16 ` Florian Achleitner @ 2012-04-19 12:26 ` Florian Achleitner 0 siblings, 0 replies; 46+ messages in thread From: Florian Achleitner @ 2012-04-19 12:26 UTC (permalink / raw) To: Jonathan Nieder Cc: Git Mailing List, Ramkumar Ramachandra, David Barr, Andrew Sayers, Sverre Rabbelier, Dmitry Ivankov I have now updated the proposal in the github wiki [1] and on melange. Most important change: Added a more detailed timeline. [1] https://github.com/flyingflo/git/wiki -- Florian ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-26 11:06 ` Ramkumar Ramachandra 2012-03-27 13:53 ` Florian Achleitner @ 2012-03-28 8:09 ` Miles Bader 2012-03-28 9:30 ` Dmitry Ivankov 1 sibling, 1 reply; 46+ messages in thread From: Miles Bader @ 2012-03-28 8:09 UTC (permalink / raw) To: Ramkumar Ramachandra Cc: Florian Achleitner, David Barr, Git Mailing List, Andrew Sayers, Jonathan Nieder, Sverre Rabbelier, Dmitry Ivankov Ramkumar Ramachandra <artagnon@gmail.com> writes: >> Hm, and there are still some general questions: >> What about git-svn? Whats wrong with it? (I haven't used it) I saw the huge >> perl script, this looks a little extreme ;). But it provides bi-directional >> access?! > > The main problem with git-svn.perl is that it's hard to maintain or > extend. See also: David Barr's LCA talk [1]. git-svn's also pretty annoying to use (e.g. the way dcommit rebases anything you push to svn, which makes juggling local git branches problematic; ugh)... :/ -miles -- .Numeric stability is probably not all that important when you're guessing. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: GSoC intro 2012-03-28 8:09 ` GSoC intro Miles Bader @ 2012-03-28 9:30 ` Dmitry Ivankov 0 siblings, 0 replies; 46+ messages in thread From: Dmitry Ivankov @ 2012-03-28 9:30 UTC (permalink / raw) To: git Miles Bader <miles <at> gnu.org> writes: > git-svn's also pretty annoying to use (e.g. the way dcommit rebases > anything you push to svn, which makes juggling local git branches > problematic; ugh)... :/ There is git svn dcommit --no-rebase to disable the rebase step. "upstream svn" branch history will still diverge from your local branch though. > > -miles > ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2012-04-19 12:26 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-19 14:42 GSoC intro Florian Achleitner 2012-03-19 21:31 ` Andrew Sayers 2012-03-20 12:25 ` Florian Achleitner 2012-03-20 13:19 ` David Barr 2012-03-21 21:16 ` Florian Achleitner 2012-03-26 11:06 ` Ramkumar Ramachandra 2012-03-27 13:53 ` Florian Achleitner 2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner 2012-04-02 11:00 ` Ramkumar Ramachandra 2012-04-02 20:57 ` Jonathan Nieder 2012-04-02 23:04 ` Jonathan Nieder 2012-04-03 7:49 ` Florian Achleitner 2012-04-03 18:48 ` Jonathan Nieder 2012-04-05 16:18 ` Tomas Carnecky 2012-04-02 22:17 ` Andrew Sayers 2012-04-02 22:29 ` Jonathan Nieder 2012-04-02 23:20 ` Andrew Sayers 2012-04-03 0:09 ` Jonathan Nieder 2012-04-03 21:53 ` Andrew Sayers 2012-04-03 22:21 ` Jonathan Nieder 2012-04-05 13:36 ` Florian Achleitner 2012-04-05 15:47 ` Dmitry Ivankov 2012-04-09 18:59 ` Stephen Bash 2012-04-10 17:17 ` Jonathan Nieder 2012-04-10 22:30 ` Andrew Sayers 2012-04-10 23:46 ` Jonathan Nieder 2012-04-11 19:09 ` Florian Achleitner 2012-04-14 22:57 ` Andrew Sayers 2012-04-11 15:51 ` Jakub Narebski 2012-04-11 15:56 ` Jonathan Nieder 2012-04-11 19:20 ` Florian Achleitner 2012-04-11 19:44 ` Dmitry Ivankov 2012-04-11 19:53 ` Jonathan Nieder 2012-04-11 22:43 ` Andrew Sayers 2012-04-12 9:02 ` Thomas Rast 2012-04-12 15:28 ` Florian Achleitner 2012-04-12 22:30 ` Andrew Sayers 2012-04-14 20:09 ` Florian Achleitner 2012-04-14 21:35 ` Andrew Sayers 2012-04-15 3:13 ` Stephen Bash 2012-04-13 19:19 ` Jonathan Nieder 2012-04-14 20:15 ` Florian Achleitner 2012-04-18 20:16 ` Florian Achleitner 2012-04-19 12:26 ` Florian Achleitner 2012-03-28 8:09 ` GSoC intro Miles Bader 2012-03-28 9:30 ` Dmitry Ivankov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.