* darcs2git.py - convert darcs repository using gfi
@ 2007-02-11 23:56 Han-Wen Nienhuys
2007-02-12 1:14 ` Shawn O. Pearce
0 siblings, 1 reply; 5+ messages in thread
From: Han-Wen Nienhuys @ 2007-02-11 23:56 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]
The python script attached is a try at providing a sane
conversion from Darcs to GIT. It tries to map darcs conflict
resolutions onto git branch merges.
Regarding GFI, it's a breeze to work with; my compliments to its
author. My only gripe is the need to specify a branch for each commit.
Darcs uses changeset based storage. It doesn't really have branches,
but it does record divergent changes and merges of resulting
conflicts. Hence, it's not clear which refs/head/BRANCH should be
used when creating a commit object.
I found it easiest to write each commit to a
refs/head/darcs-tmp-COUNT
branch, use the reset command to specify at the end which commits are
tops of branches, and delete the temporary branches.
So, my feature request: please make the "commit" command always accept
a "from" command, and make the "refs" argument optional. This will
cleanup my converter, and separate out two logical functions of the
gfi "commit" command: creating a commit object, and advancing the head
ref.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
[-- Attachment #2: darcs2git.py --]
[-- Type: text/x-python, Size: 12160 bytes --]
import os
import sys
import time
import xml.dom.minidom
import re
import gdbm as dbmodule
import gzip
import optparse
################################################################
# globals
silent=False
mail_to_name_dict = {}
pending_patches = {}
used_tags = {}
################################################################
# utils
class PullConflict (Exception):
pass
class CommandFailed (Exception):
pass
def progress (s):
sys.stderr.write (s + '\n')
def get_cli_options ():
p = optparse.OptionParser ()
p.usage='''darcs2git [OPTIONS] DARCS-REPO'''
p.description='''Convert darcs repo to git.
This tool is a one shot conversion utility for Darcs repositories. It
requires Git version that has git-fast-import. It does not support
incremental updating.
This tool will import the patches in chronological order, and only creates
merges when a resolved conflict is detected.
TODO:
- correct time zone handling
-
'''
def update_map (option, opt, value, parser):
for l in open (value).readlines ():
(mail, name) = tuple (l.strip ().split ('='))
mail_to_name_dict[mail] = name
p.add_option ('-a', '--authors', action='callback',
callback=update_map,
type='string',
nargs=1,
help='read a text file, containing EMAIL=NAME lines')
p.add_option ('-d', '--destination', action='store',
type='string',
default='',
dest='target_git_repo',
help='where to put the resulting Git repo.')
p.add_option ('--verbose', action='store_true',
dest='verbose',
default=False,
help='show commands as they are invoked')
options, args = p.parse_args ()
if not args:
p.print_help ()
sys.exit (2)
global silent
silent = not options.verbose
if not options.target_git_repo:
p = args[0]
p = os.path.abspath (p)
options.target_git_repo = os.path.basename (p).replace ('.darcs', '')
options.target_git_repo += '.git'
return (options, args)
def read_pipe (cmd, ignore_errors=False):
if not silent:
progress ('pipe %s' % cmd)
pipe = os.popen (cmd)
val = pipe.read ()
if pipe.close () and not ignore_errors:
raise CommandFailed ("Pipe failed: %s" % cmd)
return val
def system (c, ignore_error=0):
if not silent:
progress ( c)
if os.system (c) and not ignore_error:
raise CommandFailed ("Command failed: %s" % c)
def darcs_date_to_git (x):
t = time.strptime (x, '%Y%m%d%H%M%S')
return '%d' % int (time.mktime (t))
def darcs_timezone (x) :
time.strptime (x, '%a %b %d %H:%M:%S %Z %Y')
# todo
return "+0100"
################################################################
# darcs
class DarcsConversionRepo:
def __init__ (self, dir, patches):
self.dir = dir
self.patches = patches
def clean (self):
system ('rm -rf %s' % self.dir)
def pull (self, patch):
id = patch.attributes['hash']
source_repo = patch.dir
dir = self.dir
system ('cd %(dir)s && darcs pull --quiet --all --match "hash %(id)s" %(source_repo)s ' % locals ())
def go_from_to (self, from_patch, to_patch):
"""Move the repo to FROM_PATCH, then go to TO_PATCH. Raise
PullConflict if conflict is detected
This uses the fishy technique of writing the inventory and
constructing the pristine tree with 'darcs repair'
It might be quicker and/or more correct to wind/rewind the
repo with pull and unpull. """
dir = os.path.abspath (self.dir)
system ('rm -rf %(dir)s && mkdir %(dir)s && darcs init --repo %(dir)s'
% locals ())
source = to_patch.dir
if from_patch:
iv = open (dir + '/_darcs/inventory', 'w')
for p in self.patches[:from_patch.number+1]:
os.link (p.filename (), dir + '/_darcs/patches/' + os.path.basename (p.filename ()))
iv.write (p.header ())
iv.close ()
progress ('Go to patch %d' % from_patch.number)
system ('cd %(dir)s && darcs repair --quiet' % locals ())
system ('rsync -a %(dir)s/_darcs/pristine/ %(dir)s/' % locals ())
try:
self.pull (to_patch)
success = 'No conflicts to resolve' in read_pipe ('cd %(dir)s && echo y|darcs resolve' % locals ())
except CommandFailed:
raise PullConflict ()
if not success:
raise PullConflict ()
def has_patch (self, p):
id = p.attributes['hash']
f = self.dir + '/_darcs/patches/' + id
return os.path.exists (f)
def pristine_tree (self):
return self.dir + '/_darcs/pristine'
class DarcsPatch:
def __init__ (self, xml, dir):
self.xml = xml
self.dir = dir
self.number = -1
self.attributes = {}
for (nm, value) in xml.attributes.items():
self.attributes[nm] = value
# fixme: ugh attributes vs. methods.
self.extract_author ()
self.extract_message ()
self.extract_time ()
def filename (self):
return self.dir + '/_darcs/patches/' + self.attributes['hash']
def contents (self):
f = gzip.open (self.filename ())
return f.read ()
def header (self):
lines = self.contents ().split ('\n')
name = lines[0]
committer = lines[1] + '\n'
committer = re.sub ('] {\n$', ']\n', committer)
committer = re.sub ('] *\n$', ']\n', committer)
comment = ''
if not committer.endswith (']\n'):
for l in lines[2:]:
if l[0] == ']':
comment += ']\n'
break
comment += l + '\n'
header = name + '\n' + committer
if comment:
header += comment
return header
def extract_author (self):
mail = self.attributes['author']
name = ''
m = re.search ("^(.*) <(.*)>$", mail)
if m:
name = m.group (1)
mail = m.group (2)
else:
try:
name = mail_to_name_dict[mail]
except KeyError:
name = mail.split ('@')[0]
self.author_name = name
self.author_mail = mail
def extract_time (self):
self.date = darcs_date_to_git (self.attributes['date']) + ' ' + darcs_timezone (self.attributes['local_date'])
def name (self):
patch_name = '(no comment)'
try:
name_elt = self.xml.getElementsByTagName ('name')[0]
patch_name = name_elt.childNodes[0].data
except IndexError:
pass
return patch_name
def extract_message (self):
patch_name = self.name ()
comment_elts = self.xml.getElementsByTagName ('comment')
comment = ''
if comment_elts:
comment = comment_elts[0].childNodes[0].data
if self.attributes['inverted'] == 'True':
patch_name = 'UNDO: ' + patch_name
self.message = '%s\n\n%s' % (patch_name, comment)
def tag_name (self):
patch_name = self.name ()
if patch_name.startswith ("TAG "):
tag = patch_name[4:]
tag = re.sub (r'\s', '_', tag).strip ()
tag = re.sub (r':', '_', tag).strip ()
return tag
return ''
def get_darcs_patches (darcs_repo):
progress ('reading patches.')
xml_string = read_pipe ('darcs changes --xml --reverse --repo ' + darcs_repo)
dom = xml.dom.minidom.parseString(xml_string)
xmls = dom.documentElement.getElementsByTagName('patch')
patches = [DarcsPatch (x, darcs_repo) for x in xmls]
n = 0
for p in patches:
p.number = n
n += 1
return patches
################################################################
# GIT export
def export_tree (tree, gfi):
tree = os.path.normpath (tree)
gfi.write ('deleteall\n')
for (root, dirs, files) in os.walk (tree):
for f in files:
rf = os.path.normpath (os.path.join (root, f))
s = open (rf).read ()
rf = rf.replace (tree + '/', '')
gfi.write ('M 644 inline %s\n' % rf)
gfi.write ('data %d\n%s\n' % (len (s), s))
gfi.write ('\n')
def export_commit (repo, patch, last_patch, gfi):
gfi.write ('commit refs/heads/darcstmp%d\n' % patch.number)
gfi.write ('mark :%d\n' % (patch.number + 1))
gfi.write ('committer %s <%s> %s\n' % (patch.author_name,
patch.author_mail,
patch.date))
gfi.write ('data %d\n%s\n' % (len (patch.message), patch.message))
if last_patch:
gfi.write ('from :%d\n' % (last_patch.number + 1))
if pending_patches.has_key (last_patch.number):
del pending_patches[last_patch.number]
for (n, p) in pending_patches.items ():
if repo.has_patch (p):
gfi.write ('merge :%d\n' % (n + 1))
del pending_patches[n]
pending_patches[patch.number] = patch
export_tree (repo.pristine_tree (), gfi)
def export_pending (gfi):
if len (pending_patches.items ()) == 1:
gfi.write ('reset refs/heads/master\n')
gfi.write ('from :%d\n\n' % (pending_patches.values()[0].number+1))
return
for (n, p) in pending_patches.items ():
gfi.write ('reset refs/heads/master%d\n' % n)
gfi.write ('from :%d\n\n' % (n+1))
patches = pending_patches.values()
patch = patches[0]
gfi.write ('commit refs/heads/master\n')
gfi.write ('committer %s <%s> %s\n' % (patch.author_name,
patch.author_mail,
patch.date))
msg = 'tie together'
gfi.write ('data %d\n%s\n' % (len(msg), msg))
gfi.write ('from :%d\n' % (patch.number + 1))
for p in patches[1:]:
gfi.write ('merge :%d\n' % (p.number + 1))
gfi.write ('\n')
def export_tag (patch, gfi):
gfi.write ('tag %s\n' % patch.tag_name ())
gfi.write ('from :%d\n' % (patch.number + 1))
gfi.write ('tagger %s <%s> %s\n' % (patch.author_name,
patch.author_mail,
patch.date))
gfi.write ('data %d\n%s\n' % (len (patch.message),
patch.message))
################################################################
# main.
def main ():
(options, args) = get_cli_options ()
darcs_repo = os.path.abspath (args[0])
git_repo = os.path.abspath (options.target_git_repo)
system ('mkdir %(git_repo)s && cd %(git_repo)s && git --bare init' % locals ())
os.environ['GIT_DIR'] = git_repo
gfi = os.popen ('git-fast-import', 'w') #
patches = get_darcs_patches (darcs_repo)
conv_repo = DarcsConversionRepo ("darcs2git.tmpdarcs", patches)
for p in patches:
parent = p.number - 1
last = None
while 1:
if parent >= 0:
last = patches[parent]
try:
conv_repo.go_from_to (last, p)
break
except PullConflict:
## simplistic, may not be enough.
progress ('conflict, going one back')
parent -= 1
if parent < 0:
raise Exception('urg')
progress ('Export %d -> %d (total %d)' % (parent,
p.number, len (patches)))
export_commit (conv_repo, p, last, gfi)
if p.tag_name ():
export_tag (p, gfi)
export_pending (gfi)
gfi.close ()
system ('rm %(git_repo)s/refs/heads/darcstmp*' % locals ())
conv_repo.clean ()
main ()
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: darcs2git.py - convert darcs repository using gfi
2007-02-11 23:56 darcs2git.py - convert darcs repository using gfi Han-Wen Nienhuys
@ 2007-02-12 1:14 ` Shawn O. Pearce
2007-02-13 22:42 ` Han-Wen Nienhuys
2007-02-18 12:45 ` git-fast-export ? Han-Wen Nienhuys
0 siblings, 2 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2007-02-12 1:14 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git
Han-Wen Nienhuys <hanwen@xs4all.nl> wrote:
> The python script attached is a try at providing a sane
> conversion from Darcs to GIT. It tries to map darcs conflict
> resolutions onto git branch merges.
Impressive.
> Regarding GFI, it's a breeze to work with; my compliments to its
> author.
Hey, thanks! ;-)
> My only gripe is the need to specify a branch for each commit.
> Darcs uses changeset based storage. It doesn't really have branches,
> but it does record divergent changes and merges of resulting
> conflicts. Hence, it's not clear which refs/head/BRANCH should be
> used when creating a commit object.
Just make something up. Or don't use refs/heads, instead use your
own directory e.g. refs/patches/. Then you can delete the entire
directory when you are done importing.
> So, my feature request: please make the "commit" command always accept
> a "from" command
This restriction was a safety valve. fast-import itself would be
OK if I permitted a from all of the time. A bug in cvs2svn caused
multiple froms to be emitted for the same branch, and that wasn't
correct, so fast-import crashed on it rather than silently accepting
the data corruption.
Its actually one of those things that is nice to remove, as its 3
lines of code that just need to be deleted. ;-)
> and make the "refs" argument optional.
This is harder than it sounds. fast-import internally is built
around the assumption of a branch, which has a name, and which lives
in the branch LRU. With the "from" command restriction lifted
you can just import every single commit onto the same hardcoded
branch name (e.g. DARCS_HEAD) then delete it when you are done
(e.g. rm .git/DARCS_HEAD). That's basically the same thing as an
optional ref argument.
--
Shawn.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: darcs2git.py - convert darcs repository using gfi
2007-02-12 1:14 ` Shawn O. Pearce
@ 2007-02-13 22:42 ` Han-Wen Nienhuys
2007-02-18 12:45 ` git-fast-export ? Han-Wen Nienhuys
1 sibling, 0 replies; 5+ messages in thread
From: Han-Wen Nienhuys @ 2007-02-13 22:42 UTC (permalink / raw)
To: git; +Cc: clee
Shawn O. Pearce escreveu:
>> The python script attached is a try at providing a sane
>> conversion from Darcs to GIT. It tries to map darcs conflict
>> resolutions onto git branch merges.
>
> Impressive.
If anyone's interested I've hacked at it some more,
and the result is at
http://repo.or.cz/w/darcs2git.git
I've further improved the script so it generates nicer
bifurcations, and does the conversion more incrementally as
possible, and can be considered fast (as far as any darcs project
can be called fast)
I've successfully converted a project containing
3000 patches with it.
> This is harder than it sounds. fast-import internally is built
> around the assumption of a branch, which has a name, and which lives
> in the branch LRU. With the "from" command restriction lifted
> you can just import every single commit onto the same hardcoded
> branch name (e.g. DARCS_HEAD) then delete it when you are done
> (e.g. rm .git/DARCS_HEAD). That's basically the same thing as an
will this be in git v1.5 ?
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 5+ messages in thread
* git-fast-export ?
2007-02-12 1:14 ` Shawn O. Pearce
2007-02-13 22:42 ` Han-Wen Nienhuys
@ 2007-02-18 12:45 ` Han-Wen Nienhuys
2007-02-19 8:25 ` Shawn O. Pearce
1 sibling, 1 reply; 5+ messages in thread
From: Han-Wen Nienhuys @ 2007-02-18 12:45 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Shawn O. Pearce escreveu:
>> Regarding GFI, it's a breeze to work with; my compliments to its
>> author.
>
> Hey, thanks! ;-)
BTW, I just had a brainwave
Wouldn't it make sense to build a git-fast-export, which
can be used to interrogate a git-repository: a GIT tool
could run a single git command,
git-fast-export
and communicate with that process in a similar fashion, eg.
object-type SP 0ac9f9cb54e7cf400453b85a3ae5f63813f5fdae LF
=>
show-raw SP 0ac9f9cb54e7cf400453b85a3ae5f63813f5fdae LF
=>
data SP 235 LF
tree 76c9d63f83530851d911f6ead36e3899929e0cda
parent e3559e3c52cf006a6b3b03ec083ed658ba1941ee
author Han-Wen Nienhuys <hanwen@lilypond.org> 1171799942 +0100
committer Han-Wen Nienhuys <hanwen@lilypond.org> 1171799942 +0100
ignorance. LF
this would make interfacing with Git from scripts more ergonomical,
less dependent on changes in the UI of porcelains, and in some cases
more efficient.
--
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git-fast-export ?
2007-02-18 12:45 ` git-fast-export ? Han-Wen Nienhuys
@ 2007-02-19 8:25 ` Shawn O. Pearce
0 siblings, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2007-02-19 8:25 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: git
Han-Wen Nienhuys <hanwen@xs4all.nl> wrote:
> Wouldn't it make sense to build a git-fast-export, which
> can be used to interrogate a git-repository: a GIT tool
> could run a single git command,
>
> this would make interfacing with Git from scripts more ergonomical,
> less dependent on changes in the UI of porcelains, and in some cases
> more efficient.
Maybe.
But without knowing what the UI program wants, its hard to say what
should be implemented there. I'm not going to create something on
a hunch that it will be useful someday - that's just not a practical
use of my time.
Worse, most scripting level languages have a hard time working
with a bidirectional pipe to a process. What you want here is
stdin and stdout pipes, so you can send a command and then receive
the response. This can be a challenge in something like Tcl,
maybe not fully portable in Perl, etc.
Even worse, some parts of Git are not reentrant. They are currently
built to run once and have the UNIX process terminate quickly
afterwards. Keeping it running to answer more queries from the
UI may cause the Git process to leak memory over a longer term,
cause it to crash after a couple of successive repack/prune/gc, etc.
There are a number of interesting operations within Git that a UI
would want to query, but that may not be a good idea to expose from
a within a long running UNIX process, for those reasons. fast-import
doesn't do these, so its reasonable to keep up for extended periods,
but even fast-import assumes it will terminate at some point as it
hangs onto its object table for the entire lifespan of the process.
--
Shawn.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-02-19 8:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-11 23:56 darcs2git.py - convert darcs repository using gfi Han-Wen Nienhuys
2007-02-12 1:14 ` Shawn O. Pearce
2007-02-13 22:42 ` Han-Wen Nienhuys
2007-02-18 12:45 ` git-fast-export ? Han-Wen Nienhuys
2007-02-19 8:25 ` Shawn O. Pearce
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.