All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv4 0/5] CVS remote helper
@ 2009-08-17  3:35 Johan Herland
  2009-08-17  3:35 ` [RFCv4 1/5] Basic build infrastructure for Python scripts Johan Herland
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-17  3:35 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, Johan Herland, barkalow, davvid, gitster

Hi,

Another iteration of the patch series implementing a CVS remote helper.
Changes from the previous iteration:

- Python coding style fixes suggested by PEP8, PyLint, David Aguilar, etc.

- Documentation/git-remote-cvs.txt compile fix

- git_remote_cvs/Makefile fixes from David Aguilar (his v2 series)

- rebased onto current 'pu' (minus jh/vcs-cvs)

- split up patch 2/4 into two parts to fit on the Git mailing list


Have fun! :)

...Johan


Johan Herland (5):
  Basic build infrastructure for Python scripts
  1/2: Add Python support library for CVS remote helper
  2/2: Add Python support library for CVS remote helper
  git-remote-cvs: Remote helper program for CVS repositories
  Add simple selftests of git-remote-cvs functionality

 Documentation/git-remote-cvs.txt   |   85 +++
 Makefile                           |   46 ++
 configure.ac                       |    3 +
 git-remote-cvs.py                  |  683 ++++++++++++++++++++++++
 git_remote_cvs/.gitignore          |    2 +
 git_remote_cvs/Makefile            |   35 ++
 git_remote_cvs/__init__.py         |   27 +
 git_remote_cvs/changeset.py        |  126 +++++
 git_remote_cvs/commit_states.py    |   62 +++
 git_remote_cvs/cvs.py              |  998 ++++++++++++++++++++++++++++++++++++
 git_remote_cvs/cvs_revision_map.py |  418 +++++++++++++++
 git_remote_cvs/cvs_symbol_cache.py |  313 +++++++++++
 git_remote_cvs/git.py              |  680 ++++++++++++++++++++++++
 git_remote_cvs/setup.py            |   17 +
 git_remote_cvs/util.py             |  186 +++++++
 t/t9800-remote-cvs-basic.sh        |  524 +++++++++++++++++++
 t/t9801-remote-cvs-fetch.sh        |  291 +++++++++++
 t/test-lib.sh                      |    1 +
 18 files changed, 4497 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-remote-cvs.txt
 create mode 100755 git-remote-cvs.py
 create mode 100644 git_remote_cvs/.gitignore
 create mode 100644 git_remote_cvs/Makefile
 create mode 100644 git_remote_cvs/__init__.py
 create mode 100644 git_remote_cvs/changeset.py
 create mode 100644 git_remote_cvs/commit_states.py
 create mode 100644 git_remote_cvs/cvs.py
 create mode 100644 git_remote_cvs/cvs_revision_map.py
 create mode 100644 git_remote_cvs/cvs_symbol_cache.py
 create mode 100644 git_remote_cvs/git.py
 create mode 100644 git_remote_cvs/setup.py
 create mode 100644 git_remote_cvs/util.py
 create mode 100755 t/t9800-remote-cvs-basic.sh
 create mode 100755 t/t9801-remote-cvs-fetch.sh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFCv4 1/5] Basic build infrastructure for Python scripts
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
@ 2009-08-17  3:35 ` Johan Herland
  2009-08-17  3:35 ` [RFCv4 2/5] 1/2: Add Python support library for CVS remote helper Johan Herland
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-17  3:35 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, Johan Herland, barkalow, davvid, gitster

This patch adds basic boilerplate support (based on corresponding Perl
sections) for enabling the building and installation Python scripts.

There are currently no Python scripts being built, and when Python
scripts are added in future patches, their building and installation
can be disabled by defining NO_PYTHON.

Signed-off-by: Johan Herland <johan@herland.net>
---
 Makefile      |   13 +++++++++++++
 configure.ac  |    3 +++
 t/test-lib.sh |    1 +
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index 189aee5..969cef5 100644
--- a/Makefile
+++ b/Makefile
@@ -166,6 +166,8 @@ all::
 #
 # Define NO_PERL if you do not want Perl scripts or libraries at all.
 #
+# Define NO_PYTHON if you do not want Python scripts or libraries at all.
+#
 # Define NO_TCLTK if you do not want Tcl/Tk GUI.
 #
 # The TCL_PATH variable governs the location of the Tcl interpreter
@@ -311,6 +313,7 @@ LIB_H =
 LIB_OBJS =
 PROGRAMS =
 SCRIPT_PERL =
+SCRIPT_PYTHON =
 SCRIPT_SH =
 TEST_PROGRAMS =
 
@@ -349,6 +352,7 @@ SCRIPT_PERL += git-svn.perl
 
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
+	  $(patsubst %.py,%,$(SCRIPT_PYTHON)) \
 	  git-instaweb
 
 # Empty...
@@ -404,8 +408,12 @@ endif
 ifndef PERL_PATH
 	PERL_PATH = /usr/bin/perl
 endif
+ifndef PYTHON_PATH
+	PYTHON_PATH = /usr/bin/python
+endif
 
 export PERL_PATH
+export PYTHON_PATH
 
 LIB_FILE=libgit.a
 XDIFF_LIB=xdiff/lib.a
@@ -1259,6 +1267,10 @@ ifeq ($(PERL_PATH),)
 NO_PERL=NoThanks
 endif
 
+ifeq ($(PYTHON_PATH),)
+NO_PYTHON=NoThanks
+endif
+
 QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
 QUIET_SUBDIR1  =
 
@@ -1306,6 +1318,7 @@ prefix_SQ = $(subst ','\'',$(prefix))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
+PYTHON_PATH_SQ = $(subst ','\'',$(PYTHON_PATH))
 TCLTK_PATH_SQ = $(subst ','\'',$(TCLTK_PATH))
 
 LIBS = $(GITLIBS) $(EXTLIBS)
diff --git a/configure.ac b/configure.ac
index 3f1922d..3749e5c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -241,6 +241,9 @@ GIT_ARG_SET_PATH(shell)
 # Define PERL_PATH to provide path to Perl.
 GIT_ARG_SET_PATH(perl)
 #
+# Define PYTHON_PATH to provide path to Python.
+GIT_ARG_SET_PATH(python)
+#
 # Define ZLIB_PATH to provide path to zlib.
 GIT_ARG_SET_PATH(zlib)
 #
diff --git a/t/test-lib.sh b/t/test-lib.sh
index a5b8d03..01ea386 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -714,6 +714,7 @@ case $(uname -s) in
 esac
 
 test -z "$NO_PERL" && test_set_prereq PERL
+test -z "$NO_PYTHON" && test_set_prereq PYTHON
 
 # test whether the filesystem supports symbolic links
 ln -s x y 2>/dev/null && test -h y 2>/dev/null && test_set_prereq SYMLINKS
-- 
1.6.4.262.gca66a.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFCv4 2/5] 1/2: Add Python support library for CVS remote helper
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
  2009-08-17  3:35 ` [RFCv4 1/5] Basic build infrastructure for Python scripts Johan Herland
@ 2009-08-17  3:35 ` Johan Herland
  2009-08-17  3:35 ` [RFCv4 3/5] 2/2: " Johan Herland
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-17  3:35 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, Johan Herland, barkalow, davvid, gitster

This patch introduces parts of a Python package called "git_remote_cvs"
containing the building blocks of the CVS remote helper.
The CVS remote helper itself is NOT part of this patch.

This patch has been improved by the following contributions:
- David Aguilar: Lots of Python coding style fixes

Cc: David Aguilar <davvid@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

This patch and the next patch belong together in the same commit, but
the 100K limit (?) on the git mailing list forces me to split this
patch in two.

...Johan

 git_remote_cvs/changeset.py        |  126 +++++
 git_remote_cvs/cvs.py              |  998 ++++++++++++++++++++++++++++++++++++
 git_remote_cvs/cvs_symbol_cache.py |  313 +++++++++++
 git_remote_cvs/util.py             |  186 +++++++
 4 files changed, 1623 insertions(+), 0 deletions(-)
 create mode 100644 git_remote_cvs/changeset.py
 create mode 100644 git_remote_cvs/cvs.py
 create mode 100644 git_remote_cvs/cvs_symbol_cache.py
 create mode 100644 git_remote_cvs/util.py

diff --git a/git_remote_cvs/changeset.py b/git_remote_cvs/changeset.py
new file mode 100644
index 0000000..9eea9d2
--- /dev/null
+++ b/git_remote_cvs/changeset.py
@@ -0,0 +1,126 @@
+#!/usr/bin/env python
+
+"""Code for collecting individual CVS revisions into "changesets"
+
+A changeset is a collection of CVSRev objects that belong together in
+the same "commit".  This is a somewhat artificial construct on top of
+CVS, which only stores changes at the per-file level.  Normally, CVS
+users create several CVS revisions simultaneously by applying the
+"cvs commit" command to several files with related changes.  This
+module tries to reconstruct this notion of related revisions.
+
+"""
+
+from git_remote_cvs.util import debug, error, die
+
+
+class Changeset(object):
+
+    """Encapsulate a single changeset/commit."""
+
+    __slots__ = ('revs', 'date', 'author', 'message')
+
+    # The maximum time between the changeset's date, and the date of a
+    # rev to included in that changeset.
+    MaxSecondsBetweenRevs = 8 * 60 * 60  # 8 hours
+
+    @classmethod
+    def from_rev (cls, rev):
+        """Return a Changeset based on the given CVSRev object."""
+        c = cls(rev.date, rev.author, rev.message)
+        result = c.add(rev)
+        assert result
+        return c
+
+    def __init__ (self, date, author, message):
+        """Create a new Changeset with the given metadata."""
+        self.revs = {}  # dict: path -> CVSRev object
+        self.date = date  # CVSDate object
+        self.author = author
+        self.message = message  # Lines of commit message
+
+    def __str__ (self):
+        """Stringify this Changeset object."""
+        msg = self.message[0]  # First line only
+        # Limit message to 25 chars
+        if len(msg) > 25:
+            msg = msg[:22] + "..."
+        return ("<Changeset @(%s) by %s (%s) updating %i files>" %
+                (self.date, self.author, msg, len(self.revs)))
+
+    def __iter__ (self):
+        """Return iterator traversing the CVSRevs in this Changeset."""
+        return self.revs.itervalues()
+
+    def __getitem__ (self, path):
+        """Look up a specific CVSRev in this Changeset."""
+        return self.revs[path]
+
+    def within_time_window (self, rev):
+        """Return True iff the rev is within the time window of self."""
+        return abs(rev.date.diff(self.date)) <= self.MaxSecondsBetweenRevs
+
+    def add (self, rev):
+        """Add the given CVSRev to this Changeset.
+
+        The addition will only succeed if the following holds:
+          - rev.author == self.author
+          - rev.message == self.message
+          - rev.path is not in self.revs
+          - rev.date is within MaxSecondsBetweenRevs of self.date
+        If the addition succeeds, True is returned; otherwise False.
+
+        """
+        if rev.author != self.author or \
+           rev.message != self.message or \
+           rev.path in self.revs or \
+           not self.within_time_window(rev):
+            return False
+
+        self.revs[rev.path] = rev
+        return True
+
+
+def build_changesets_from_revs (cvs_revs):
+    """Organize CVSRev objects into a chronological list of Changesets."""
+    # Construct chronological list of CVSRev objects
+    chron_revs = []
+    for path, d in cvs_revs.iteritems():
+        i = 0  # Current index into chronRevs
+        for revnum, cvsrev in sorted(d.iteritems()):
+            assert path == cvsrev.path
+            assert revnum == cvsrev.num
+            while i < len(chron_revs) and cvsrev.date > chron_revs[i].date:
+                i += 1
+            # Insert cvsRev at position i in chronRevs
+            chron_revs.insert(i, cvsrev)
+            i += 1
+
+    changesets = []  # Chronological list of Changeset objects
+    while len(chron_revs):
+        # There are still more revs to be added to Changesets
+        # Create Changeset based on the first rev in chronRevs
+        changeset = Changeset.from_rev(chron_revs.pop(0))
+        # Keep adding revs chronologically until MaxSecondsBetweenRevs
+        rejects = []  # Revs that cannot be added to this changeset
+        while len(chron_revs):
+            rev = chron_revs.pop(0)
+            reject = False
+            # First, if we have one of rev's parents in rejects, we
+            # must also reject rev
+            for r in rejects:
+                if r.path == rev.path:
+                    reject = True
+                    break
+            # Next, add rev to changeset, reject if add fails
+            if not reject:
+                reject = not changeset.add(rev)
+            if reject:
+                rejects.append(rev)
+                # stop trying when rev is too far in the future
+                if not changeset.within_time_window(rev):
+                    break
+        chron_revs = rejects + chron_revs  # Reconstruct remaining revs
+        changesets.append(changeset)
+
+    return changesets
diff --git a/git_remote_cvs/cvs.py b/git_remote_cvs/cvs.py
new file mode 100644
index 0000000..f870ae0
--- /dev/null
+++ b/git_remote_cvs/cvs.py
@@ -0,0 +1,998 @@
+#!/usr/bin/env python
+
+"""Functionality for interacting with CVS repositories.
+
+This module provides classes for interrogating a CVS repository via a
+CVS working directory (aka. checkout), or via direct queries using the
+"cvs rlog" command.
+
+Also, classes for encapsulating fundamental CVS concepts (like CVS
+revision/branch numbers) are provided.
+"""
+
+import sys
+import os
+import shutil
+import time
+from calendar import timegm
+import unittest
+
+from git_remote_cvs.util import (debug, error, die, ProgressIndicator,
+                                 start_command, run_command,
+                                 file_reader_method, file_writer_method)
+
+
+class CVSNum(object):
+
+    """Encapsulate a single CVS revision/branch number.
+
+    Provides functionality for common operations on CVS numbers.
+
+    A CVS number consists of a list of components separated by periods
+    ('.'), where each component is a decimal number.  Inspecting the
+    components from left to right, the odd-numbered (1st, 3rd, 5th,
+    etc.) components represent branches in the CVS history tree, while
+    the even-numbered (2nd, 4th, 6th, etc.) components represent
+    revisions on the branch specified in the previous position.
+    Thus "1.2" denotes the second revision on the first branch
+    (aka. trunk), while "1.2.4.6" denotes the sixth revision of the
+    fourth branch started from revision "1.2".
+
+    Therefore, in general, a CVS number with an even number of
+    components denotes a revision (we call this a "revision number"),
+    while an odd number of components denotes a branch (called a
+    "branch number").
+
+    There are a few complicating peculiarities: If there is an even
+    number of components, and the second-last component is 0, the
+    number is not a revision number, but is rather equivalent to the
+    branch number we get by removing the 0-component.  I.e. "1.2.0.4"
+    is equivalent to "1.2.4".
+
+    A branch number (except the trunk: "1") always has a "branch point"
+    revision, i.e. the revision from which the branch was started.
+    This revision is found by removing the last component of the branch
+    number.  For example the branch point of "1.2.4" is "1.2".
+
+    Conversely, all revision numbers belong to a corresponding branch,
+    whose branch number is found by removing the last component.
+    Examples: The "1.2.4.6" revision belong to the "1.2.4" branch,
+    while the "1.2" revision belongs to the "1" branch (the "trunk").
+
+    From this we can programatically determine the ancestry of any
+    revision number, by decrementing the last revision component until
+    it equals 1, and then trim off the last two components to get to
+    the branch point, and repeat the process from there until we reach
+    the initial revision (typically "1.1").  For example, recursively
+    enumerating the parent revisions of "1.2.4.6" yields the following
+    revisions:
+    "1.2.4.5", "1.2.4.4", "1.2.4.3", "1.2.4.2", "1.2.4.1", "1.2", "1.1"
+
+    """
+
+    __slots__ = ('c',)
+
+    @staticmethod
+    def decompose (cvsnum):
+        """Split the given CVS number into a list of int components.
+
+        Branch numbers are normalized to the odd-numbered components
+        form (i.e. removing the second last '0' component)
+
+        Examples:
+          '1.2.4.8' -> [1, 2, 4, 8]
+          '1.2.3'   -> [1, 2, 3]
+          '1.2.0.5' -> [1, 2, 5]
+
+        """
+        if cvsnum:
+            r = map(int, cvsnum.split('.'))
+        else:
+            r = []
+        if len(r) >= 2 and r[-2] == 0:
+            del r[-2]
+        if r[-1] == 0:
+            raise ValueError(cvsnum)
+        return tuple(r)
+
+    @staticmethod
+    def compose (c):
+        """Join the given list of integer components into a CVS number.
+
+        E.g.: (1, 2, 4, 8) -> '1.2.4.8'
+
+        """
+        if c[-1] == 0:
+            raise ValueError(str(c))
+        return ".".join(map(str, c))
+
+    @classmethod
+    def from_components (cls, args):
+        """Create a CVSNum from the given list of numerical components."""
+        return cls(cls.compose(args))
+
+    @classmethod
+    def disjoint (cls, a, b):
+        """Return True iff the CVS numbers are historically disjoint.
+
+        Two CVS numbers are disjoint if they do not share the same
+        historical line back to the initial revision.  In other words:
+        the two numbers are disjoint if the history (i.e. set of parent
+        revisions all the way back to the intial (1.1) revision) of
+        neither number is a superset of the other's history.
+        See test_disjoint() for practical examples:
+
+        """
+        if a.is_branch():
+            a = cls.from_components(a.c + (1,))
+        if b.is_branch():
+            b = cls.from_components(b.c + (1,))
+        if len(a.c) > len(b.c):
+            a, b = b, a  # a is now shortest
+        pairs = zip(a.c, b.c)
+        for pa, pb in pairs[:-1]:
+            if pa != pb:
+                return True
+        if len(a) == len(b):
+            return False
+        common_len = len(a)
+        if a.c[common_len - 1] <= b.c[common_len - 1]:
+            return False
+        return True
+
+
+    def __init__ (self, cvsnum):
+        """Create a CVSNum object from the given CVS number string."""
+        self.c = self.decompose(str(cvsnum))
+
+    def __repr__ (self):
+        """Return a string representation of this object."""
+        return self.compose(self.c)
+
+    def __str__ (self):
+        """Return a string representation of this object."""
+        return repr(self)
+
+    def __hash__ (self):
+        """Create a hash value for this CVS number."""
+        return hash(repr(self))
+
+    def __len__ (self):
+        """Return number of components in this CVS number."""
+        return len(self.c)
+
+    def __cmp__ (self, other):
+        """Comparison method for CVS numbers."""
+        try:
+            return cmp(self.c, other.c)
+        except AttributeError:
+            return 1
+
+    def __getitem__ (self, key):
+        """Return the Xth component of this CVS number."""
+        return self.c[key]
+
+    def is_rev (self):
+        """Return True iff this number is a CVS revision number."""
+        return len(self.c) % 2 == 0 and len(self.c) >= 2 and self.c[-2] != 0
+
+    def is_branch (self):
+        """Return True iff this number is a CVS branch number."""
+        return len(self.c) % 2 != 0 or (len(self.c) >= 2 and self.c[-2] == 0)
+
+    def components (self):
+        """Return a list of integer components in this CVS number."""
+        return list(self.c)
+
+    def branch (self):
+        """Return the branch on which the given number lives.
+
+        Revisions: chop the last component to find the branch, e.g.:
+            1.2.4.6 -> 1.2.4
+            1.1 -> 1
+        Branches: unchanged
+
+        """
+        if self.is_rev():
+            return self.from_components(self.c[:-1])
+        return self
+
+    def parent (self):
+        """Return the parent/previous revision number to this number.
+
+        For revisions, this is the previous revision, e.g.:
+            1.2.4.6 -> 1.2.4.5
+            1.2.4.1 -> 1.2
+            1.1 -> None
+            2.1 -> None
+        For branches, this is the branch point, e.g.:
+            1.2.4 -> 1.2
+            1 -> None
+            2 -> None
+
+        """
+        if len(self.c) < 2:
+            return None
+        elif len(self.c) % 2:  # Branch number
+            return self.from_components(self.c[:-1])
+        else:  # Revision number
+            assert self.c[-1] > 0
+            result = self.components()
+            result[-1] -= 1  # Decrement final component
+            if result[-1] == 0:  # We're at the start of the branch
+                del result[-2:]  # Make into branch point
+                if not result:
+                    return None
+            return self.from_components(result)
+
+    def follows (self, other):
+        """Return True iff self historically follows the given rev.
+
+        This iterates through the parents of self, and returns True iff
+        any of them equals the given rev.  Otherwise, it returns False.
+
+        """
+        assert other.is_rev()
+        cur = self
+        while cur:
+            if cur == other:
+                return True
+            cur = cur.parent()
+        return False
+
+    def on_branch (self, branch):
+        """Return True iff this rev is on the given branch.
+
+        The revs considered to be "on" a branch X also includes the
+        branch point of branch X.
+
+        """
+        return branch == self.branch() or branch.parent() == self
+
+
+class TestCVSNum(unittest.TestCase):
+
+    """CVSNum selftests."""
+
+    def test_basic (self):
+        """CVSNum basic selftests."""
+        self.assertEqual(CVSNum("1.2.4"), CVSNum("1.2.0.4"))
+        self.assert_(CVSNum("1.2.4").is_branch())
+        self.assert_(CVSNum("1.2").is_rev())
+        self.assert_(CVSNum("1").is_branch())
+        self.assert_(CVSNum("1.2.4.6").is_rev())
+        self.assertEqual(CVSNum("1.2.4.6").components(), [1, 2, 4, 6])
+        self.assertEqual(CVSNum.from_components([1, 2, 4, 6]),
+                         CVSNum("1.2.4.6"))
+        self.assertEqual(str(CVSNum.from_components([1, 2, 4, 6])), "1.2.4.6")
+        self.assertEqual(len(CVSNum("1.2.4.6")), 4)
+        self.assertEqual(CVSNum("1.2.4.6").branch(), CVSNum("1.2.4"))
+        self.assertEqual(CVSNum("1.2.4").branch(), CVSNum("1.2.4"))
+        self.assertEqual(CVSNum("1.1").branch(), CVSNum("1"))
+        self.assertEqual(CVSNum("1").branch(), CVSNum("1"))
+        self.assertEqual(CVSNum("1.2.4.6").parent(), CVSNum("1.2.4.5"))
+        self.assertEqual(CVSNum("1.2.4.1").parent(), CVSNum("1.2"))
+        self.assertEqual(CVSNum("1.2").parent(), CVSNum("1.1"))
+        self.assert_(CVSNum("1.1").parent() is None)
+        self.assert_(CVSNum("2.1").parent() is None)
+        self.assertEqual(CVSNum("1.2.4").parent(), CVSNum("1.2"))
+        self.assert_(CVSNum("1").parent() is None)
+        self.assert_(CVSNum("2").parent() is None)
+
+    def test_follows (self):
+        """CVSNum.follows() selftests."""
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.1")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.1")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.2")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.3")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.4")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.5")))
+        self.assert_(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.6")))
+        self.assertFalse(CVSNum("1.2.4.6").follows(CVSNum("1.2.4.7")))
+        self.assertFalse(CVSNum("1.2.4.6").follows(CVSNum("1.3")))
+        self.assertFalse(CVSNum("1.1").follows(CVSNum("1.2.4.6")))
+
+    def test_disjoint (self):
+        """CVSNum.disjoint() selftests."""
+        tests = [
+            ("1.2", "1.1", False),
+            ("1.2", "1.2", False),
+            ("1.2", "1.3", False),
+            ("1.2", "1.1.2", True),
+            ("1.2", "1.1.2.3", True),
+            ("1.2.4", "1.1", False),
+            ("1.2.4", "1.2", False),
+            ("1.2.4", "1.3", True),
+            ("1.2.4", "1.2.2", True),
+            ("1.2.4", "1.2.4", False),
+            ("1.2.4", "1.2.6", True),
+            ("1.2.4", "1.2.2.4", True),
+            ("1.2.4", "1.2.4.4", False),
+            ("1.2.4", "1.2.6.4", True),
+            ("1.2.4.6", "1.1", False),
+            ("1.2.4.6", "1.2", False),
+            ("1.2.4.6", "1.3", True),
+            ("1.2.4.6", "1.2.2", True),
+            ("1.2.4.6", "1.2.2.1", True),
+            ("1.2.4.6", "1.2.4", False),
+            ("1.2.4.6", "1.2.4.5", False),
+            ("1.2.4.6", "1.2.4.6", False),
+            ("1.2.4.6", "1.2.4.7", False),
+            ("1.2.4.6.8.10", "1.2.4.5", False),
+            ("1.2.4.6.8.10", "1.2.4.6", False),
+            ("1.2.4.6.8.10", "1.2.4.7", True),
+        ]
+        for a, b, result in tests:
+            self.assertEqual(CVSNum.disjoint(CVSNum(a), CVSNum(b)), result)
+            self.assertEqual(CVSNum.disjoint(CVSNum(b), CVSNum(a)), result)
+
+
+class CVSState(object):
+
+    """Encapsulate a historical state in CVS (a set of paths and nums).
+
+    This class is a container of CVS pathnames and associated CVSNum
+    objects.
+
+    No communication with a CVS working directory or repository is done
+    in this class, hence only basic sanity checks are performed:
+      - A path may only appear once in a CVSState.
+      - When adding a path:num pair, path may not already exist in self
+      - When replacing a path:num pair, path must already exist in self
+      - When removing a path:num pair, both path and num must be given
+
+    IMPORTANT: Objects of this class are hash()able (to support being
+    used as keys in a dict), but they are also mutable.  It is
+    therefore up to the caller to make sure that the object is not
+    changed after being stored in a data structure indexed by its hash
+    value.
+
+    """
+
+    __slots__ = ('revs', '_hash')
+
+    def __init__ (self):
+        """Create a new, empty CVSState."""
+        self.revs = {}  # dict: path -> CVSNum object
+        self._hash = None
+
+    def __iter__ (self):
+        """Return iterator traversing the (path, CVSNum)s in this CVSState."""
+        return self.revs.iteritems()
+
+    def __cmp__ (self, other):
+        """Comparison method for CVSState objects."""
+        return cmp(self.revs, other.revs)
+
+    def __str__ (self):
+        """Stringify this CVSState by listing the contained revisions."""
+        return "".join(["%s:%s\n" % (p, n) for p, n in sorted(self)])
+
+    def __hash__ (self):
+        """Create a hash value for this CVSState."""
+        if self._hash is None:
+            self._hash = hash(str(self))
+        return self._hash
+
+    def __getitem__ (self, path):
+        """Return the CVSNum associated with the given path in self."""
+        return self.revs[path]
+
+    def get (self, path, default = None):
+        """Return the CVSNum associated with the given path in self."""
+        return self.revs.get(path, default)
+
+    def paths (self):
+        """Return the path names contained within this CVSState."""
+        return self.revs.iterkeys()
+
+    def add (self, path, revnum):
+        """Add the given path:revnum to this CVSState."""
+        assert path not in self.revs
+        self._hash = None
+        self.revs[path] = revnum
+
+    def replace (self, path, revnum):
+        """Replace the revnum associated with the given path."""
+        assert path in self.revs
+        self._hash = None
+        self.revs[path] = revnum
+
+    def remove (self, path, revnum):
+        """Remove the given path:revnum association from this CVSState."""
+        assert path in self.revs and self.revs[path] == revnum
+        self._hash = None
+        del self.revs[path]
+
+    def copy (self):
+        """Create and return a copy of this object."""
+        ret = CVSState()
+        ret.revs = self.revs.copy()
+        ret._hash = self._hash
+        return ret
+
+    def load_data (self, note_data):
+        """Load note data as formatted by self.__str__()."""
+        for line in note_data.split("\n"):
+            line = line.strip()
+            if not line:
+                continue
+            path, num = line.rsplit(':', 1)
+            self.add(path, CVSNum(num))
+        self._hash = hash(note_data)
+
+    def print_members (self, f = sys.stdout, prefix = ""):
+        """Write the members of this CVSState to the given file object."""
+        for path, num in sorted(self):
+            print >> f, "%s%s:%s" % (prefix, path, num)
+
+    @file_reader_method(missing_ok = True)
+    def load (self, f):
+        """Load CVS state from the given file name/object."""
+        if f:
+            self.load_data(f.read())
+
+    @file_writer_method
+    def save (self, f):
+        """Save CVS state to the given file name/object."""
+        assert f
+        print >> f, str(self),
+
+
+class CVSDate(object):
+
+    """Encapsulate a timestamp, as reported by CVS.
+
+    The internal representation of a timestamp is two integers, the
+    first representing the timestamp as #seconds since epoch (UTC),
+    and the second representing the timezone as #minutes offset from
+    UTC.
+
+    Example: "2007-09-05 17:26:28 -0200" is converted to
+             (1189013188, -120)
+
+    """
+
+    __slots__ = ('ts', 'tz')
+
+    def __init__ (self, date_str = None, in_utc = False):
+        """Convert CVS date string into a CVSDate object.
+
+        A CVS timestamp string has one of the following forms:
+          - "YYYY-MM-DD hh:mm:ss SZZZZ"
+          - "YYYY/MM/DD hh:mm:ss" (with timezone assumed to be UTC)
+        The in_utc parameter determines whether the timestamp part of
+        the given string (the "YYYY-MM-DD hh:mm:ss" part) is given in
+        local time or UTC (normally CVS dates are given in local time.
+        If given in local time, the timezone offset is subtracted from
+        the timestamp in order to make the time in UTC format.
+
+        """
+        if date_str is None:
+            self.ts, self.tz = 0, 0
+            return
+        if date_str == "now":
+            self.ts, self.tz = time.time(), 0
+            return
+        date_str = date_str.strip()
+        # Set up self.ts and self.tz
+        if date_str.count(" ") == 2:
+            # Assume format "YYYY-MM-DD hh:mm:ss SZZZZ"
+            t, z = date_str.rsplit(" ", 1)
+            # Convert timestamp to #secs since epoch (UTC)
+            self.ts = timegm(time.strptime(t, "%Y-%m-%d %H:%M:%S"))
+            # Convert timezone into #mins offset from UTC
+            self.tz = int(z[1:3]) * 60 + int(z[3:5])
+            # Incorporate timezone sign
+            if z[0] == '-':
+                self.tz *= -1
+        else:
+            assert date_str.count(" ") == 1
+            # Assume format "YYYY/MM/DD hh:mm:ss"
+            self.ts = timegm(time.strptime(date_str, "%Y/%m/%d %H:%M:%S"))
+            self.tz = 0
+        # Adjust timestamp if not already in UTC
+        if not in_utc:
+            self.ts -= self.tz * 60
+
+    def tz_str (self):
+        """Return timezone part of self in string format."""
+        sign = '+'
+        if self.tz < 0:
+            sign = '-'
+        hours, minutes = divmod(abs(self.tz), 60)
+        return "%s%02d%02d" % (sign, hours, minutes)
+
+    def __str__ (self):
+        """Reconstruct date string from members."""
+        s = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(self.ts))
+        return "%s %s" % (s, self.tz_str())
+
+    def __repr__ (self):
+        """Create a string representation of self."""
+        return "CVSDate('%s')" % (str(self))
+
+    def __hash__ (self):
+        """Create a hash value from self."""
+        return hash((self.ts, self.tz))
+
+    def __nonzero__ (self):
+        """Provide interpretation of self in a boolean context."""
+        return bool(self.ts or self.tz)
+
+    def __cmp__ (self, other):
+        """Comparison method for CVSDate objects."""
+        return cmp(self.ts, other.ts) or cmp(self.tz, other.tz)
+
+    def __eq__ (self, other):
+        """Return True iff self and other is considered equal."""
+        return self.ts == other.ts and self.tz == other.tz
+
+    def diff (self, other):
+        """Return difference between self and other in #seconds.
+
+        Invariant: self == other.add(self.diff(other))
+
+        """
+        return self.ts - other.ts
+
+
+class TestCVSDate(unittest.TestCase):
+
+    """CVSDate selftests."""
+
+    def test_basic (self):
+        """CVSDate basic selftests."""
+        a = CVSDate("2009-05-10 14:34:56 +0200")
+        b = CVSDate("2009/05/10 12:34:56")
+        self.assert_(a)
+        self.assert_(b)
+        self.assertEqual(str(a), "2009-05-10 12:34:56 +0200", str(a))
+        self.assertEqual(str(b), "2009-05-10 12:34:56 +0000", str(b))
+        self.assertNotEqual(a, b)
+        self.assertEqual(a.diff(b), 0)
+        c = CVSDate("2009-05-10 16:34:56 +0200")
+        self.assert_(c)
+        self.assertEqual(str(c), "2009-05-10 14:34:56 +0200", str(c))
+        self.assertNotEqual(c, a)
+        self.assertEqual(c.diff(a), 2 * 60 * 60)
+        self.assertEqual(a.diff(c), -2 * 60 * 60)
+
+
+class CVSRev(object):
+
+    """Encapsulate metadata on a CVS revision."""
+
+    __slots__ = ('path', 'num', 'date', 'author', 'deleted', 'message')
+
+    def __init__ (self, path, num):
+        """Create a CVSRev object for the given path:num revision."""
+        self.path = path
+        self.num = num
+        self.date = None  # CVSDate object
+        self.author = ""
+        self.deleted = None  # True or False
+        self.message = []  # Lines of commit message
+
+    def __str__ (self):
+        """Return a string listing the metadata in this CVS revision."""
+        return ("<%s:%s on %s by %s%s>" %
+                (self.path, self.num, self.date, self.author,
+                 self.deleted and ", deleted" or ""))
+
+    def __cmp__ (self, other):
+        """Comparison method for CVSRev objects."""
+        return cmp(self.path, other.path) or cmp(self.num, other.num)
+
+
+class CVSWorkDir(object):
+
+    """Encapsulate a CVS working directory.
+
+    This class auto-creates a CVS workdir/checkout in the directory
+    given to the constructor, and provides various methods for
+    interacting with this workdir.
+
+    """
+
+    def __init__ (self, workdir, cvs_repo):
+        """Create a new CVSWorkDir.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+
+        """
+        self.d = workdir
+        self.cvs_root, self.cvs_module = cvs_repo
+        parent_dir = os.path.dirname(self.d)
+        if not os.path.isdir(parent_dir):
+            os.makedirs(parent_dir)
+        self._valid = None
+
+    def makepath(self, *args):
+        """Create path relative to working directory."""
+        return os.path.join(self.d, *args)
+
+    def valid (self):
+        """Return True iff this workdir is present and valid."""
+        if self._valid is not None:
+            return self._valid
+        try:
+            f = open(self.makepath("CVS", "Root"), 'r')
+            assert f.read().strip() == self.cvs_root
+            f.close()
+            f = open(self.makepath("CVS", "Repository"), 'r')
+            assert f.read().strip() == self.cvs_module
+            f.close()
+            self._valid = True
+        except (IOError, AssertionError):
+            self._valid = False
+        return self._valid
+
+    def remove (self):
+        """Remove this checkout."""
+        shutil.rmtree(self.d, True)
+        assert not os.path.exists(self.d)
+        self._valid = False
+
+    def checkout (self, revision = "HEAD"):
+        """Create a checkout of the given revision."""
+        self.remove()
+        parent_dir, co_dir = os.path.split(self.d)
+        args = ["cvs", "-f", "-Q", "-d", self.cvs_root, "checkout"]
+        if str(revision) != "HEAD":
+            args.extend(["-r", str(revision)])
+        args.extend(["-d", co_dir, self.cvs_module])
+        exit_code, output, errors = run_command(args, cwd = parent_dir)
+        if exit_code:
+            die("Failed to checkout CVS working directory")
+        assert not errors
+        assert not output, "output = '%s'" % (output)
+        self._valid = None
+        assert self.valid()
+
+    def update (self, revision = "HEAD", paths = None):
+        """Update the given paths to the given revision."""
+        if not self.valid():
+            self.checkout()
+        args = ["cvs", "-f", "-Q", "update", "-kk"]
+        if str(revision) == "HEAD":
+            args.append("-A")
+        else:
+            args.extend(["-r", str(revision)])
+        if paths is not None:
+            args.extend(paths)
+        exit_code, output, errors = run_command(args, cwd = self.d)
+        if exit_code:
+            die("Failed to checkout CVS working directory")
+        assert not errors
+        assert not output, "output = '%s'" % (output)
+
+    def get_revision_data (self, path, revision):
+        """Return the contents of the given CVS path:revision."""
+        if not self.valid():
+            self.checkout()
+        args = ["cvs", "-f", "-Q", "update", "-p", "-kk"]
+        if str(revision) == "HEAD":
+            args.append("-A")
+        else:
+            args.extend(["-r", str(revision)])
+        args.append(path)
+        exit_code, output, errors = run_command(args, cwd = self.d)
+        if exit_code:
+            die("Failed to checkout CVS working directory")
+        assert not errors
+        return output
+
+    def get_modeinfo (self, paths = None):
+        """Return mode information for the given paths.
+
+        Returns a dict of path -> mode number mappings.  If paths are
+        not specified, mode information for all files in the current
+        checkout will be returned.  No checkout/update will be done.
+
+        """
+        result = {}
+        if paths is not None:
+            for path in paths:
+                fullpath = os.path.join(self.d, path)
+                mode = 644
+                if os.access(fullpath, os.X_OK):
+                    mode = 755
+                assert path not in result
+                result[path] = mode
+        else:  # Return mode information for all paths
+            for dirpath, dirnames, filenames in os.walk(self.d):
+                # Don't descend into CVS subdirs
+                try:
+                    dirnames.remove('CVS')
+                except ValueError:
+                    pass
+                assert dirpath.startswith(self.d)
+                directory = dirpath[len(self.d):].lstrip("/")
+                for fname in filenames:
+                    path = os.path.join(directory, fname)
+                    fullpath = os.path.join(dirpath, fname)
+                    mode = 644
+                    if os.access(fullpath, os.X_OK):
+                        mode = 755
+                    assert path not in result
+                    result[path] = mode
+        return result
+
+    @classmethod
+    def parse_entries (cls, entries, prefix, directory = ""):
+        """Recursively parse CVS/Entries files.
+
+        Return a dict of CVS paths found by parsing the CVS/Entries
+        files rooted at the given directory.
+
+        See http://ximbiot.com/cvs/manual/feature/cvs_2.html#SEC19 for
+        information on the format of the CVS/Entries file.
+
+        """
+        fname = os.path.join(prefix, directory, "CVS", "Entries")
+        subdirs = []
+        f = open(fname, 'r')
+        for line in f:
+            line = line.strip()
+            if line == "D":
+                continue  # There are no subdirectories
+            t, path, revnum, date, options, tag = line.split("/")
+            if t == "D":
+                subdirs.append(path)
+                continue
+            assert line.startswith("/")
+            path = os.path.join(directory, path)
+            revnum = CVSNum(revnum)
+            assert path not in entries
+            entries[path] = (revnum, date, options, tag)
+        f.close()
+        for d in subdirs:
+            d = os.path.join(directory, d)
+            cls.parse_entries(entries, prefix, d)
+
+    def get_state (self):
+        """Return CVSState reflecting current state of this checkout.
+
+        Note that the resulting CVSState will never contain any
+        deleted/dead files.  Other CVSStates to be compared to the one
+        returned from here should remove deleted/dead entries first.
+
+        """
+        assert self.valid()
+        entries = {}
+        result = CVSState()
+        self.parse_entries(entries, self.d)
+        for path, info in entries.iteritems():
+            result.add(path, info[0])
+        return result
+
+
+class CVSLogParser(object):
+
+    """Encapsulate the execution of a "cvs rlog" command."""
+
+    def __init__ (self, cvs_repo):
+        """Create a new CVSLogParser.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+
+        """
+        self.cvs_root, self.cvs_module = cvs_repo
+
+    def cleanup_path (self, cvs_path):
+        """Utility method for parsing CVS paths from CVS log."""
+        cvsprefix = "/".join((self.cvs_root[self.cvs_root.index("/"):],
+                              self.cvs_module))
+        assert cvs_path.startswith(cvsprefix)
+        assert cvs_path.endswith(",v")
+        # Drop cvsprefix and ,v-extension
+        cvs_path = cvs_path[len(cvsprefix):-2]
+        # Split the remaining path into components
+        path_comps = filter(None, cvs_path.strip().split('/'))
+        # Remove 'Attic' from CVS paths
+        if len(path_comps) >= 2 and path_comps[-2] == "Attic":
+            del path_comps[-2]
+        # Reconstruct resulting "cleaned" path
+        return "/".join(path_comps)
+
+    def __call__ (self, line):
+        """Parse the given line from the CVS log.
+
+        Must be reimplemented by subclass
+
+        """
+        pass
+
+    def finish (self):
+        """This method is invoked after the last line has been parsed.
+
+        May be reimplemented by subclass
+
+        """
+        pass
+
+    def run (self, paths = None, no_symbols = False, revisions = None):
+        """Execute "cvs rlog" with the given arguments.
+
+        self.__call__() is invoked once for each line in the CVS log.
+        self.finish() is invoked exactly once after the CVS log.
+
+        """
+        args = ["cvs", "-f", "-q", "-d", self.cvs_root, "rlog"]
+        if no_symbols:
+            args.append("-N")
+        if revisions:
+            args.append("-r%s" % (revisions))
+        if paths is not None:
+            for p in paths:
+                args.append("%s/%s" % (self.cvs_module, p))
+        else:
+            args.append(self.cvs_module)
+        proc = start_command(args)
+        proc.stdin.close()
+        while True:
+            for line in proc.stdout:
+                self(line.rstrip())  # Call self's line parser
+            if proc.poll() is not None:
+                break
+        assert proc.stdout.read() == ""
+        self.finish()  # Notify subclass that parsing is finished
+        exit_code = proc.returncode
+        if exit_code:
+            error("'%s' returned exit code %i, and errors:\n---\n%s---",
+                  " ".join(args), exit_code, proc.stderr.read())
+        return exit_code
+
+
+class CVSRevLister(CVSLogParser):
+
+    """Extract CVSRev objects (with revision metadata) from a CVS log."""
+
+    def __init__ (self, cvs_repo, show_progress = False):
+        """Create a new CVSRevLister.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+        show_progress determines whether progress indication is shown.
+
+        """
+        super(CVSRevLister, self).__init__(cvs_repo)
+        self.cur_file = None  # Current CVS file being processed
+        self.cur_file_numrevs = 0  # #revs in current CVS file
+        self.cur_rev = None  # Current CVSRev under construction
+        self.progress = None
+        if show_progress:
+            self.progress = ProgressIndicator("\t", sys.stderr)
+        # Store found revs in a two-level dict structure:
+        # filename -> revnum -> CVSRev
+        self.revs = {}
+        # Possible states:
+        # - BeforeRevs  - waiting for "total revisions:"
+        # - BetweenRevs - waiting for "----------------------------"
+        # - ReadingRev  - reading CVS revision details
+        self.state = 'BeforeRevs'
+
+    def __call__ (self, line):
+        """Line parser; this method is invoked for each line in the log."""
+        if self.state == 'BeforeRevs':
+            if line.startswith("RCS file: "):
+                self.cur_file = self.cleanup_path(line[10:])
+                assert self.cur_file not in self.revs
+                self.revs[self.cur_file] = {}
+            elif line.startswith("total revisions: "):
+                assert self.cur_file
+                totalrevs_unused, selectedrevs = line.split(";")
+                self.cur_file_numrevs = int(selectedrevs.split(":")[1].strip())
+                self.state = 'BetweenRevs'
+        elif self.state == 'BetweenRevs':
+            if (line == "----------------------------" or
+                line == "======================================"
+                        "======================================="):
+                if self.cur_rev:
+                    # Finished current revision
+                    f = self.revs[self.cur_file]
+                    assert self.cur_rev.num not in f
+                    f[self.cur_rev.num] = self.cur_rev
+                    self.cur_rev = None
+                    if self.progress:
+                        self.progress()
+                if line == "----------------------------":
+                    self.state = 'ReadingRev'
+                else:
+                    # Finalize current CVS file
+                    assert len(self.revs[self.cur_file]) == \
+                           self.cur_file_numrevs
+                    self.cur_file = None
+                    self.state = 'BeforeRevs'
+            elif self.cur_rev:
+                # Currently in the middle of a revision.
+                if line.startswith("branches:  %s" % (self.cur_rev.num)) and \
+                   line.endswith(";"):
+                    return  # Skip 'branches:' lines
+                # This line is part of the commit message.
+                self.cur_rev.message.append(line)
+        elif self.state == 'ReadingRev':
+            if line.startswith("revision "):
+                self.cur_rev = CVSRev(self.cur_file, CVSNum(line.split()[1]))
+            else:
+                date, author, state, dummy = line.split(";", 3)
+                assert date.startswith("date: ")
+                self.cur_rev.date = CVSDate(date[6:])
+                assert author.strip().startswith("author: ")
+                self.cur_rev.author = author.strip()[8:]
+                assert state.strip().startswith("state: ")
+                state = state.strip()[7:]
+                self.cur_rev.deleted = state == "dead"
+                self.state = 'BetweenRevs'
+
+    def finish (self):
+        """This method is invoked after the last line has been parsed."""
+        assert self.state == 'BeforeRevs'
+        if self.progress:
+            self.progress.finish("Parsed %i revs in %i files" %
+                                 (self.progress.n, len(self.revs)))
+
+
+def fetch_revs (path, from_rev, to_rev, symbol, cvs_repo):
+    """Fetch CVSRevs for each rev in <path:from_rev, path:symbol].
+
+    Return a dict of CVSRev objects (revnum -> CVSRev), where each
+    CVSRev encapsulates a CVS revision in the range from
+    path:from_rev to path:symbol (inclusive).  If symbol currently
+    refers to from_rev (i.e. nothing has happened since the last
+    import), the returned dict will have exactly one entry (from_rev).
+    If there is no valid revision range between from_rev and symbol,
+    the returned dict will be empty.  Situations in which an empty dict
+    is returned, include:
+    - symbol is no longer defined on this path
+    - symbol refers to a revision that is disjoint from from_rev
+
+    from_rev may be None, meaning that all revisions from the initial
+    version of path up to the revision currently referenced by symbol
+    should be fetched.
+
+    If the revision currently referenced by symbol is disjoint from
+    from_rev, the returned dict will be empty.
+
+    Note that there is lots of unexpected behaviour in the handling of
+    the 'cvs rlog -r' parameter: Say you have a branch, called
+    'my_branch', that points to branch number 1.1.2 of a file.  Say
+    there are 3 revisions on this branch: 1.1.2.1 -> 1.1.2.3 (in
+    additions to the branch point 1.1).  Now, observe the following
+    'cvs rlog' executions:
+    - cvs rlog -r0:my_branch ... returns 1.1, 1.1.2.1, 1.1.2.2, 1.1.2.3
+    - cvs rlog -r1.1:my_branch ... returns the same revs
+    - cvs rlog -rmy_branch ... returns 1.1.2.1, 1.1.2.2, 1.1.2.3
+    - cvs rlog -rmy_branch: ... returns the same revs
+    - cvs rlog -r:my_branch ... returns the same revs
+    - cvs rlog -r::my_branch ... returns the same revs
+    - cvs rlog -r1.1.2.1: ... returns the same revs
+    Here is where it gets really weird:
+    - cvs rlog -r1.1.2.1:my_branch ... returns 1.1.2.1 only
+    - cvs rlog -r1.1.2.2:my_branch ... returns 1.1.2.1, 1.1.2.2
+    - cvs rlog -r1.1.2.3:my_branch ... returns 1.1.2.1, 1.1.2.2, 1.1.2.3
+
+    In other words the 'cvs rlog -rfrom_rev:symbol' scheme that we
+    normally use will not work in the case where from_rev is _on_ the
+    branch pointed at by the symbol.
+
+    Therefore, we need an extra parameter, to_rev, which we can use to:
+    1. Detect when this situation is present.
+    2. Work around by using 'cvs rlog -rfrom_ref:to_rev' instead.
+
+    """
+    if from_rev is None:  # Initial import
+        from_rev = "0"  # "cvs rlog -r0:X" fetches from initial revision
+    elif to_rev and to_rev.branch() == from_rev.branch():
+        symbol = to_rev  # Use to_rev instead of given symbol
+    # Run 'cvs rlog' on range [from_rev, symbol] and parse CVSRev objects
+    parser = CVSRevLister(cvs_repo)
+    parser.run((path,), True, "%s:%s" % (from_rev, symbol))
+    assert len(parser.revs) == 1
+    assert path in parser.revs
+    return parser.revs[path]
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/git_remote_cvs/cvs_symbol_cache.py b/git_remote_cvs/cvs_symbol_cache.py
new file mode 100644
index 0000000..cc8d88b
--- /dev/null
+++ b/git_remote_cvs/cvs_symbol_cache.py
@@ -0,0 +1,313 @@
+#!/usr/bin/env python
+
+"""Implementation of a local CVS symbol cache.
+
+A CVS symbol cache stores a list of CVS symbols and the CVS state
+associated with each of those CVS symbols at some point in time.
+
+Keeping a local cache of CVS symbols is often needed because the
+design of CVS makes it potentially very expensive to query the CVS
+server directly for CVS symbols and associated states.
+
+In these cases, a local CVS symbol cache can provide equivalent
+(although possibly out-of-date) information immediatele.
+
+Synchronization with the current state on the CVS server can be
+done on a symbol-by-symbol basis (by checking out a given symbol
+and extracting the CVS state from the CVS work tree), or by
+synchronizing _all_ CVS symbols in one operation (by executing
+'cvs rlog' and parsing CVS states from its output).
+
+"""
+
+import sys
+import os
+
+from git_remote_cvs.util import debug, error, die, ProgressIndicator
+from git_remote_cvs.cvs import CVSNum, CVSState, CVSLogParser
+
+
+class CVSSymbolStateLister(CVSLogParser):
+
+    """Extract current CVSStates for all CVS symbols from a CVS log."""
+
+    def __init__ (self, cvs_repo, show_progress = False):
+        """Create a new CVSSymbolStateLister.
+
+        The cvs_repo argument must be a (cvs_root, cvs_module) tuple
+        show_progress determines whether a progress indicator should
+        be displayed.
+
+        """
+        super(CVSSymbolStateLister, self).__init__(cvs_repo)
+        self.symbols = {}  # CVS symbol name -> CVSState object
+        self.cur_file = None  # current CVS file being processed
+        self.cur_file_numrevs = 0  # #revs in current CVS file
+        self.cur_revnum = None  # current revision number
+        self.rev2syms = {}  # CVSNum -> [CVS symbol names]
+        self.cur_revs = {}  # CVSNum -> True/False (deleted)
+        self.head_num = None  # CVSNum of the HEAD rev or branch
+
+        # Possible states:
+        # - BeforeSymbols - waiting for "symbolic names:"
+        # - WithinSymbols - reading CVS symbol names
+        # - BeforeRevs  - waiting for "total revisions:"
+        # - BetweenRevs - waiting for "----------------------------"
+        # - ReadingRev  - reading CVS revision details
+        self.state = 'BeforeSymbols'
+
+        self.progress = None
+        if show_progress:
+            self.progress = ProgressIndicator("\t", sys.stderr)
+
+    def finalize_symbol_states (self):
+        """Adjust CVSStates in self.symbols based on revision data.
+
+        Based on the information found in self.rev2syms and
+        self.cur_revs, remove deleted revisions and turn branch numbers
+        into corresponding revisions in the CVSStates found in
+        self.symbols.
+
+        """
+        # Create a mapping from branch numbers to the last existing
+        # revision number on those branches
+        branch2lastrev = {}  # branch number -> revision number
+        for revnum in self.cur_revs.iterkeys():
+            branchnum = revnum.branch()
+            if (branchnum not in branch2lastrev) or \
+               (revnum > branch2lastrev[branchnum]):
+                branch2lastrev[branchnum] = revnum
+
+        for cvsnum, symbols in self.rev2syms.iteritems():
+            if cvsnum.is_branch():
+                # Turn into corresponding revision number
+                revnum = branch2lastrev.get(cvsnum, cvsnum.parent())
+                for s in symbols:
+                    state = self.symbols[s]
+                    assert state[self.cur_file] == cvsnum
+                    state.replace(self.cur_file, revnum)
+                cvsnum = revnum
+            assert cvsnum.is_rev()
+            assert cvsnum in self.cur_revs
+            if self.cur_revs[cvsnum]:  # cvsnum is a deleted rev
+                # Remove from CVSStates
+                for s in symbols:
+                    state = self.symbols[s]
+                    state.remove(self.cur_file, cvsnum)
+
+        self.rev2syms = {}
+        self.cur_revs = {}
+        self.cur_file = None
+
+    def __call__ (self, line):
+        """Line parser; this method is invoked for each line in the log."""
+        if self.state == 'BeforeSymbols':
+            if line.startswith("RCS file: "):
+                self.cur_file = self.cleanup_path(line[10:])
+                if self.progress:
+                    self.progress("%5i symbols found - Parsing CVS file #%i: "
+                                  "%s " % (len(self.symbols), self.progress.n,
+                                           self.cur_file,))
+            if line.startswith("head: "):
+                self.head_num = CVSNum(line[6:])
+            if line.startswith("branch: "):
+                self.head_num = CVSNum(line[8:])
+            elif line == "symbolic names:":
+                assert self.head_num
+                s = self.symbols.setdefault("HEAD", CVSState())
+                s.add(self.cur_file, self.head_num)
+                r = self.rev2syms.setdefault(self.head_num, [])
+                r.append("HEAD")
+                self.head_num = None
+                self.state = 'WithinSymbols'
+        elif self.state == 'WithinSymbols':
+            if line.startswith("\t"):
+                symbol, cvsnum = line.split(":", 1)
+                symbol = symbol.strip()
+                cvsnum = CVSNum(cvsnum)
+                s = self.symbols.setdefault(symbol, CVSState())
+                s.add(self.cur_file, cvsnum)
+                r = self.rev2syms.setdefault(cvsnum, [])
+                r.append(symbol)
+            else:
+                self.state = 'BeforeRevs'
+        elif self.state == 'BeforeRevs':
+            if line.startswith("total revisions: "):
+                assert self.cur_file
+                totalrevs_unused, selectedrevs = line.split(";")
+                self.cur_file_numrevs = int(selectedrevs.split(":")[1].strip())
+                self.state = 'BetweenRevs'
+        elif self.state == 'BetweenRevs':
+            if (line == "----------------------------" or
+                line == "======================================"
+                        "======================================="):
+                if self.cur_revnum:
+                    assert self.cur_revnum in self.cur_revs
+                    self.cur_revnum = None
+                if line == "----------------------------":
+                    self.state = 'ReadingRev'
+                else:
+                    # Finalize current CVS file
+                    assert len(self.cur_revs) == self.cur_file_numrevs
+                    self.finalize_symbol_states()
+                    self.state = 'BeforeSymbols'
+        elif self.state == 'ReadingRev':
+            if line.startswith("revision "):
+                self.cur_revnum = CVSNum(line.split()[1])
+            else:
+                date, author, state, dummy = line.split(";", 3)
+                assert date.startswith("date: ")
+                assert author.strip().startswith("author: ")
+                assert state.strip().startswith("state: ")
+                state = state.strip()[7:]
+                assert self.cur_revnum not in self.cur_revs
+                deleted = state == "dead"
+                self.cur_revs[self.cur_revnum] = deleted
+                self.state = 'BetweenRevs'
+
+    def finish (self):
+        """This method is invoked after the last line has been parsed."""
+        assert self.state == 'BeforeSymbols'
+        if self.progress:
+            self.progress.finish("Parsed %i symbols in %i files" %
+                                 (len(self.symbols), self.progress.n))
+
+
+class CVSSymbolCache(object):
+
+    """Local cache of the current CVSState of CVS symbols.
+
+    Simulates a dictionary of CVS symbol -> CVSState mappings.
+
+    """
+
+    def __init__ (self, symbols_dir):
+        """Create a new CVS symbol cache, located in the given directory."""
+        self.symbols_dir = symbols_dir
+        if not os.path.isdir(self.symbols_dir):
+            os.makedirs(self.symbols_dir)
+
+    def __len__ (self):
+        """Return the number of CVS symbols stored in this cache."""
+        return len(os.listdir(self.symbols_dir))
+
+    def __iter__ (self):
+        """Return an iterator traversing symbol names stored in this cache."""
+        for filename in os.listdir(self.symbols_dir):
+            yield filename
+
+    def __contains__ (self, symbol):
+        """Return True if the given symbol is present in this cache."""
+        return os.access(os.path.join(self.symbols_dir, symbol),
+                         os.F_OK | os.R_OK)
+
+    def __getitem__ (self, symbol):
+        """Return the cached CVSState of the given CVS symbol."""
+        try:
+            f = open(os.path.join(self.symbols_dir, symbol), 'r')
+        except IOError:
+            raise KeyError("'%s'" % (symbol))
+        ret = CVSState()
+        ret.load(f)
+        f.close()
+        return ret
+
+    def __setitem__ (self, symbol, cvs_state):
+        """Store the given CVS symbol and CVSState into the cache."""
+        cvs_state.save(os.path.join(self.symbols_dir, symbol))
+
+    def __delitem__ (self, symbol):
+        """Remove the the given CVS symbol from the cache."""
+        os.remove(os.path.join(self.symbols_dir, symbol))
+
+    def get (self, symbol, default = None):
+        """Return the cached CVSState of the given CVS symbol."""
+        try:
+            return self[symbol]
+        except KeyError:
+            return default
+
+    def items (self):
+        """Return list of (CVS symbol, CVSState) tuples saved in this cache."""
+        for filename in self:
+            yield (filename, self[filename])
+
+    def clear (self):
+        """Remove all entries from this CVS symbol cache."""
+        for filename in os.listdir(self.symbols_dir):
+            os.remove(os.path.join(self.symbols_dir, filename))
+
+    def sync_symbol (self, symbol, cvs, progress):
+        """Synchronize the given CVS symbol with the CVS server.
+
+        The given CVS workdir is used for the synchronization.
+        The retrieved CVSState is also returned
+
+        """
+        progress("Retrieving state of CVS symbol '%s'..." % (symbol))
+        cvs.update(symbol)
+        state = cvs.get_state()
+
+        progress("Saving state of '%s' to symbol cache..." % (symbol))
+        self[symbol] = state
+
+    def sync_all_symbols (self, cvs_repo, progress, symbol_filter = None):
+        """Synchronize this entire CVS symbol cache with the CVS server.
+
+        This may be very expensive if the CVS repository is large, or
+        has many symbols.  After this method returns, the symbol cache
+        will be in sync with the current state on the server.
+
+        This method returns a dict with the keys 'unchanged',
+        'changed', 'added', and 'deleted', where each map to a list of
+        CVS symbols.  Each CVS symbol appears in exactly one of these
+        lists.
+
+        If symbol_filter is given, it specifies functions that takes
+        one parameter - a CVS symbol name - and returns True if that
+        symbol should be synchronized, and False if that symbol should
+        be skipped.  Otherwise all CVS symbols are synchronized.
+
+        """
+        if symbol_filter is None:
+            symbol_filter = lambda symbol: True
+
+        # Run cvs rlog to fetch current CVSState for all CVS symbols
+        progress("Retrieving current state of all CVS symbols from CVS "
+                 "server...", lf = True)
+        parser = CVSSymbolStateLister(cvs_repo, True)
+        retcode = parser.run()
+        if retcode:
+            raise EnvironmentError(retcode, "cvs rlog exit code %i" % retcode)
+
+        # Update symbol cache with new states from the CVS server
+        progress("Updating symbol cache with current CVS state...")
+        results = {}
+        result_keys = ("unchanged", "changed", "added", "deleted")
+        for k in result_keys:
+            results[k] = []
+        # Classify existing symbols as unchanged, changed, or deleted
+        for symbol in filter(symbol_filter, self):
+            if symbol not in parser.symbols:  # Deleted
+                results["deleted"].append(symbol)
+                del self[symbol]
+            elif self[symbol] != parser.symbols[symbol]:  # Changed
+                results["changed"].append(symbol)
+                self[symbol] = parser.symbols[symbol]
+            else:  # Unchanged
+                results["unchanged"].append(symbol)
+            progress()
+        # Add symbols that are not in self
+        for symbol, state in parser.symbols.iteritems():
+            if not symbol_filter(symbol):
+                debug("Skipping CVS symbol '%s'...", symbol)
+            elif symbol in self:
+                assert state == self[symbol]
+            else:  # Added
+                results["added"].append(symbol)
+                self[symbol] = state
+            progress()
+        progress("Synchronized local symbol cache (%s)" %
+                 (", ".join(["%i %s" % (len(results[k]), k)
+                             for k in result_keys])), True)
+        return results
diff --git a/git_remote_cvs/util.py b/git_remote_cvs/util.py
new file mode 100644
index 0000000..7d6adb4
--- /dev/null
+++ b/git_remote_cvs/util.py
@@ -0,0 +1,186 @@
+#!/usr/bin/env python
+
+"""Misc. useful functionality used by the rest of this package.
+
+This module provides common functionality used by the other modules in
+this package.
+
+"""
+
+import sys
+import os
+import subprocess
+
+
+# Whether or not to show debug messages
+DEBUG = False
+
+def debug (msg, *args):
+    """Print a debug message to stderr when DEBUG is enabled."""
+    if DEBUG:
+        print >> sys.stderr, msg % args
+
+def error (msg, *args):
+    """Print an error message to stderr."""
+    print >> sys.stderr, "ERROR:", msg % args
+
+def die (msg, *args):
+    """Print as error message to stderr and exit the program."""
+    error(msg, *args)
+    sys.exit(1)
+
+
+class ProgressIndicator(object):
+
+    """Simple progress indicator.
+
+    Displayed as a spinning character by default, but can be customized
+    by passing custom messages that overrides the spinning character.
+
+    """
+
+    States = ("|", "/", "-", "\\")
+
+    def __init__ (self, prefix = "", f = sys.stdout):
+        """Create a new ProgressIndicator, bound to the given file object."""
+        self.n = 0  # Simple progress counter
+        self.f = f  # Progress is written to this file object
+        self.prev_len = 0  # Length of previous msg (to be overwritten)
+        self.prefix = prefix  # Prefix prepended to each progress message
+        self.prefix_lens = [] # Stack of prefix string lengths
+
+    def pushprefix (self, prefix):
+        """Append the given prefix onto the prefix stack."""
+        self.prefix_lens.append(len(self.prefix))
+        self.prefix += prefix
+
+    def popprefix (self):
+        """Remove the last prefix from the prefix stack."""
+        prev_len = self.prefix_lens.pop()
+        self.prefix = self.prefix[:prev_len]
+
+    def __call__ (self, msg = None, lf = False):
+        """Indicate progress, possibly with a custom message."""
+        if msg is None:
+            msg = self.States[self.n % len(self.States)]
+        msg = self.prefix + msg
+        print >> self.f, "\r%-*s" % (self.prev_len, msg),
+        self.prev_len = len(msg.expandtabs())
+        if lf:
+            print >> self.f
+            self.prev_len = 0
+        self.n += 1
+
+    def finish (self, msg = "done", noprefix = False):
+        """Finalize progress indication with the given message."""
+        if noprefix:
+            self.prefix = ""
+        self(msg, True)
+
+
+def start_command (args, cwd = None, shell = False, add_env = None,
+                   stdin = subprocess.PIPE, stdout = subprocess.PIPE,
+                   stderr = subprocess.PIPE):
+    """Start the given command, and return a subprocess object.
+
+    This provides a simpler interface to the subprocess module.
+
+    """
+    env = None
+    if add_env is not None:
+        env = os.environ.copy()
+        env.update(add_env)
+    return subprocess.Popen(args, bufsize = 1, stdin = stdin, stdout = stdout,
+                            stderr = stderr, cwd = cwd, shell = shell,
+                            env = env, universal_newlines = True)
+
+
+def run_command (args, cwd = None, shell = False, add_env = None,
+                 flag_error = True):
+    """Run the given command to completion, and return its results.
+
+    This provides a simpler interface to the subprocess module.
+
+    The results are formatted as a 3-tuple: (exit_code, output, errors)
+
+    If flag_error is enabled, Error messages will be produced if the
+    subprocess terminated with a non-zero exit code and/or stderr
+    output.
+
+    The other arguments are passed on to start_command().
+
+    """
+    process = start_command(args, cwd, shell, add_env)
+    (output, errors) = process.communicate()
+    exit_code = process.returncode
+    if flag_error and errors:
+        error("'%s' returned errors:\n---\n%s---", " ".join(args), errors)
+    if flag_error and exit_code:
+        error("'%s' returned exit code %i", " ".join(args), exit_code)
+    return (exit_code, output, errors)
+
+
+def file_reader_method (missing_ok = False):
+    """Decorator for simplifying reading of files.
+
+    If missing_ok is True, a failure to open a file for reading will
+    not raise the usual IOError, but instead the wrapped method will be
+    called with f == None.  The method must in this case properly
+    handle f == None.
+
+    """
+    def _wrap (method):
+        """Teach given method to handle both filenames and file objects.
+
+        The given method must take a file object as its second argument
+        (the first argument being 'self', of course).  This decorator
+        will take a filename given as the second argument and promote
+        it to a file object.
+
+        """
+        def _wrapped_method (self, filename, *args, **kwargs):
+            if isinstance(filename, file):
+                f = filename
+            else:
+                try:
+                    f = open(filename, 'r')
+                except IOError:
+                    if missing_ok:
+                        f = None
+                    else:
+                        raise
+            try:
+                return method(self, f, *args, **kwargs)
+            finally:
+                if not isinstance(filename, file) and f:
+                    f.close()
+        return _wrapped_method
+    return _wrap
+
+
+def file_writer_method (method):
+    """Decorator for simplifying writing of files.
+
+    Enables the given method to handle both filenames and file objects.
+
+    The given method must take a file object as its second argument
+    (the first argument being 'self', of course).  This decorator will
+    take a filename given as the second argument and promote it to a
+    file object.
+
+    """
+    def _new_method (self, filename, *args, **kwargs):
+        if isinstance(filename, file):
+            f = filename
+        else:
+            # Make sure the containing directory exists
+            parent_dir = os.path.dirname(filename)
+            if not os.path.isdir(parent_dir):
+                os.makedirs(parent_dir)
+            f = open(filename, 'w')
+        try:
+            return method(self, f, *args, **kwargs)
+        finally:
+            if not isinstance(filename, file):
+                f.close()
+    return _new_method
-- 
1.6.4.262.gca66a.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFCv4 3/5] 2/2: Add Python support library for CVS remote helper
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
  2009-08-17  3:35 ` [RFCv4 1/5] Basic build infrastructure for Python scripts Johan Herland
  2009-08-17  3:35 ` [RFCv4 2/5] 1/2: Add Python support library for CVS remote helper Johan Herland
@ 2009-08-17  3:35 ` Johan Herland
  2009-10-05 13:31   ` Sverre Rabbelier
  2009-08-17  3:36 ` [RFCv4 4/5] git-remote-cvs: Remote helper program for CVS repositories Johan Herland
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Johan Herland @ 2009-08-17  3:35 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, Johan Herland, barkalow, davvid, gitster

This patch introduces the rest of a Python package called "git_remote_cvs"
containing the building blocks of the CVS remote helper. The CVS remote
helper itself is NOT part of this patch.

The patch includes the necessary Makefile additions to build and install
the git_remote_cvs Python package along with the rest of Git.

This patch has been improved by the following contributions:
- David Aguilar: Lots of Python coding style fixes
- David Aguilar: DESTDIR support in Makefile

Cc: David Aguilar <davvid@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

This patch and the previous patch belong together in the same commit,
but the 100K limit (?) on the git mailing list forces me to split this
patch in two.

...Johan

 Makefile                           |    9 +
 git_remote_cvs/.gitignore          |    2 +
 git_remote_cvs/Makefile            |   35 ++
 git_remote_cvs/__init__.py         |   27 ++
 git_remote_cvs/commit_states.py    |   62 ++++
 git_remote_cvs/cvs_revision_map.py |  418 ++++++++++++++++++++++
 git_remote_cvs/git.py              |  680 ++++++++++++++++++++++++++++++++++++
 git_remote_cvs/setup.py            |   17 +
 8 files changed, 1250 insertions(+), 0 deletions(-)
 create mode 100644 git_remote_cvs/.gitignore
 create mode 100644 git_remote_cvs/Makefile
 create mode 100644 git_remote_cvs/__init__.py
 create mode 100644 git_remote_cvs/commit_states.py
 create mode 100644 git_remote_cvs/cvs_revision_map.py
 create mode 100644 git_remote_cvs/git.py
 create mode 100644 git_remote_cvs/setup.py

diff --git a/Makefile b/Makefile
index 969cef5..bb5cea2 100644
--- a/Makefile
+++ b/Makefile
@@ -1350,6 +1350,9 @@ endif
 ifndef NO_PERL
 	$(QUIET_SUBDIR0)perl $(QUIET_SUBDIR1) PERL_PATH='$(PERL_PATH_SQ)' prefix='$(prefix_SQ)' all
 endif
+ifndef NO_PYTHON
+	$(QUIET_SUBDIR0)git_remote_cvs $(QUIET_SUBDIR1) PYTHON_PATH='$(PYTHON_PATH_SQ)' prefix='$(prefix_SQ)' all
+endif
 	$(QUIET_SUBDIR0)templates $(QUIET_SUBDIR1)
 
 please_set_SHELL_PATH_to_a_more_modern_shell:
@@ -1701,6 +1704,9 @@ install: all
 ifndef NO_PERL
 	$(MAKE) -C perl prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' install
 endif
+ifndef NO_PYTHON
+	$(MAKE) -C git_remote_cvs prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' install
+endif
 ifndef NO_TCLTK
 	$(MAKE) -C gitk-git install
 	$(MAKE) -C git-gui gitexecdir='$(gitexec_instdir_SQ)' install
@@ -1821,6 +1827,9 @@ ifndef NO_PERL
 	$(RM) gitweb/gitweb.cgi
 	$(MAKE) -C perl clean
 endif
+ifndef NO_PYTHON
+	$(MAKE) -C git_remote_cvs clean
+endif
 	$(MAKE) -C templates/ clean
 	$(MAKE) -C t/ clean
 ifndef NO_TCLTK
diff --git a/git_remote_cvs/.gitignore b/git_remote_cvs/.gitignore
new file mode 100644
index 0000000..2247d5f
--- /dev/null
+++ b/git_remote_cvs/.gitignore
@@ -0,0 +1,2 @@
+/build
+/dist
diff --git a/git_remote_cvs/Makefile b/git_remote_cvs/Makefile
new file mode 100644
index 0000000..061c247
--- /dev/null
+++ b/git_remote_cvs/Makefile
@@ -0,0 +1,35 @@
+#
+# Makefile for the git_remote_cvs python support modules
+#
+pysetupfile:=setup.py
+
+# Shell quote (do not use $(call) to accommodate ancient setups);
+DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
+
+ifndef PYTHON_PATH
+	PYTHON_PATH = /usr/bin/python
+endif
+ifndef prefix
+	prefix = $(HOME)
+endif
+ifndef V
+	QUIET = @
+	QUIETSETUP = --quiet
+endif
+
+PYLIBDIR=$(shell $(PYTHON_PATH) -c \
+	 "import sys; \
+	 print 'lib/python%i.%i/site-packages' % sys.version_info[:2]")
+
+all: $(pysetupfile)
+	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) build
+
+install: $(pysetupfile)
+	$(PYTHON_PATH) $(pysetupfile) install --prefix $(DESTDIR_SQ)$(prefix)
+
+instlibdir: $(pysetupfile)
+	@echo "$(prefix)/$(PYLIBDIR)"
+
+clean:
+	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) clean -a
+	$(RM) *.pyo *.pyc
diff --git a/git_remote_cvs/__init__.py b/git_remote_cvs/__init__.py
new file mode 100644
index 0000000..4956d2d
--- /dev/null
+++ b/git_remote_cvs/__init__.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python
+
+"""Support library package for git-remote-cvs.
+
+git-remote-cvs is a Git remote helper command that interfaces with a
+CVS repository to provide automatic import of CVS history into a Git
+repository.
+
+This package provides the support library needed by git-remote-cvs.
+The following modules are included:
+
+- cvs - Interaction with CVS repositories
+
+- cvs_symbol_cache - Local CVS symbol cache
+
+- changeset - Collect individual CVS revisions into commits
+
+- git - Interaction with Git repositories
+
+- commit_states - Map Git commits to CVS states
+
+- cvs_revision_map - Map CVS revisions to various metainformation
+
+- util - General utility functionality use by the other modules in
+         this package, and also used directly by git-remote-cvs.
+
+"""
diff --git a/git_remote_cvs/commit_states.py b/git_remote_cvs/commit_states.py
new file mode 100644
index 0000000..2e0af6c
--- /dev/null
+++ b/git_remote_cvs/commit_states.py
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+
+"""Code for relating Git commits to corresponding CVSState objects."""
+
+from git_remote_cvs.util import debug, error, die
+from git_remote_cvs.cvs import CVSState
+
+
+class CommitStates(object):
+
+    """Provide a mapping from Git commits to CVSState objects.
+
+    Behaves like a dictionary of Git commit names -> CVSState mappings.
+
+    Every Git commit converted from CVS has a corresponding CVSState,
+    which describes exactly which CVS revisions are present in a
+    checkout of that commit.
+
+    This class provides the interface to map from a Git commit to its
+    corresponding CVSState.  The mapping uses GitNotes as a persistent
+    storage backend.
+
+    """
+
+    def __init__ (self, notes):
+        """Create a new Git commit -> CVSState map."""
+        self.notes = notes
+        self._cache = {}  # commitname -> CVSState
+
+    def _load (self, commit):
+        """Retrieve the CVSState associated with the given Git commit."""
+        if commit is None:
+            return None
+        if commit in self._cache:
+            return self._cache[commit]
+        notedata = self.notes.get(commit)
+        if notedata is None:  # Given commit has no associated note
+            return None
+        state = CVSState()
+        state.load_data(notedata)
+        self._cache[commit] = state
+        return state
+
+    def add (self, commit, state, gfi):
+        """Add the given Git commit -> CVSState mapping."""
+        assert commit not in self._cache
+        self._cache[commit] = state
+        self.notes.import_note(commit, str(state), gfi)
+
+    def __getitem__ (self, commit):
+        """Return the CVSState associated with the given commit."""
+        state = self._load(commit)
+        if state is None:
+            raise KeyError("Unknown commit '%s'" % (commit))
+        return state
+
+    def get (self, commit, default = None):
+        """Return the CVSState associated with the given commit."""
+        state = self._load(commit)
+        if state is None:
+            return default
+        return state
diff --git a/git_remote_cvs/cvs_revision_map.py b/git_remote_cvs/cvs_revision_map.py
new file mode 100644
index 0000000..0e65ba6
--- /dev/null
+++ b/git_remote_cvs/cvs_revision_map.py
@@ -0,0 +1,418 @@
+#!/usr/bin/env python
+
+"""Functionality for mapping CVS revisions to associated metainfo.
+
+This modules provides the following main classes:
+
+CVSRevisionMap - provides a mapping from CVS revisions to the
+                 the following associated metainfo:
+                 - Mode/permission of the associated CVS path
+                 - The Git blob holding the revision data
+                 - The Git commits that correspond to the CVS
+                   states in which this CVS revision is present
+
+CVSStateMap - provides a mapping from CVS states to corresponding
+              Git commits (i.e. Git commits, whose tree state is
+              identical to a given CVS state)
+"""
+
+import os
+
+from git_remote_cvs.util import debug, error, die, file_reader_method
+from git_remote_cvs.cvs import CVSNum, CVSDate
+from git_remote_cvs.git import GitFICommit
+
+
+class _CVSPathInfo(object):
+
+    """Information on a single CVS path."""
+
+    __slots__ = ('revs', 'mode')
+
+    def __init__ (self, mode = None):
+        self.revs = {}
+        self.mode = mode
+
+
+class _CVSRevInfo(object):
+
+    """Information on a single CVS revision."""
+
+    __slots__ = ('blob', 'commits')
+
+    def __init__ (self, blob):
+        self.blob = blob
+        self.commits = []
+
+
+class CVSRevisionMap(object):
+
+    """Encapsulate the mapping of CVS revisions to associated metainfo.
+
+    This container maps CVS revisions (a combination of a CVS path and
+    a CVS revision number) into Git blob names, Git commit names, and
+    CVS path information.  Git object (blob/commit) names are either
+    40-char SHA1 strings, or "git fast-import" mark numbers if the form
+    ":<num>".
+
+    The data structure is organized as follows:
+    - A CVSRevisionMap instance knows about a set of CVS paths.
+      For each CVS path, the following is known:
+      - The mode (permission bits) of that CVS path (644 or 755)
+      - The CVS revision numbers that exist for that CVS path.
+        For each revision number, the following is known:
+        - Exactly 1 blob name; the Git blob containing the contents of
+          the revision (the contents of the CVS path at that CVS
+          revision).
+        - One or more commit names; the Git commits which encapsulate a
+          CVS state in which this CVS revision
+
+    To store this data structure persistently, this class uses a Git
+    ref that points to a tree structure containing the above
+    information.  When changes to the structure are made, this class
+    will produce git-fast-import commands to update that tree structure
+    accordingly.
+
+    IMPORTANT: No changes to the CVSRevisionMap will be stored unless
+    self.commit_map() is called with a valid GitFastImport instance.
+
+    NOTES: Mark numbers are only transient references bound to the
+    current "git fast-import" process (assumed to be running alongside
+    this process).  Therefore, when the "git fast-import" process ends,
+    it must write out a mark number -> SHA1 mapping (see the
+    "--export-marks" argument to "git fast-import").  Subsequently,
+    this mapping must be parsed, and the mark numbers in this
+    CVSRevisionMap must be resolved into persistent SHA1 names.
+    In order to quickly find all the unresolved mark number entries in
+    the data structure, and index mapping mark numbers to revisions is
+    kept in a separate file in the tree structure.
+    See the loadMarksFile() method for more details.
+
+    """
+
+    MarkNumIndexFile = "CVS/marks"  # Invalid CVS path name
+
+    def __init__ (self, git_ref, obj_fetcher):
+        """Create a new CVS revision map bound to the given Git ref."""
+        self.git_ref = git_ref
+        self.obj_fetcher = obj_fetcher
+
+        # The following data structure is a cache of the persistent
+        # data structure found at self.git_ref.
+        # It is structured as follows:
+        # - self.d is a mapping from CVS paths to _CVSPathInfo objects.
+        #   _CVSPathInfo object have two fields: revs, mode:
+        #   - mode is either 644 (non-executable) or 755 (executable).
+        #   - revs is a dict, mapping CVS revision numbers (CVSNum
+        #     instances) to _CVSRevInfo objects.  _CVSRevInfo objects
+        #     have two fields: blob, commits:
+        #     - blob is the name of the Git blob object holding the
+        #       contents of that revision.
+        #     - commits is a collection of zero or more Git commit
+        #       names where the commit contains this revision.
+        self.d = {}  # dict: path -> _CVSPathInfo
+        self.mods = set()  # (path, revnum) pairs for all modified revs
+        self.marknum_idx = {}  # dict: mark num -> [(path, revnum), ...]
+        self._load_marknum_idx()
+
+    def __del__ (self):
+        """Verify that self.commit_map() was called before destruction."""
+        if self.mods:
+            error("Missing call to self.commit_map().")
+            error("%i revision changes are lost!", len(self.mods))
+
+    def __nonzero__ (self):
+        """Return True iff any information is currently stored here."""
+        return bool(self.d)
+
+    # Private methods:
+
+    def _add_to_marknum_idx(self, marknum, path, revnum):
+        """Add the given marknum -> (path, revnum) association."""
+        entry = self.marknum_idx.setdefault(marknum, [])
+        entry.append((path, revnum))
+
+    def _load_marknum_idx(self):
+        """Load contents of MarkNumIndexFile into self.marknum_idx."""
+        blobspec = "%s:%s" % (self.git_ref, self.MarkNumIndexFile)
+        try:
+            f = self.obj_fetcher.open_obj(blobspec)
+        except KeyError:
+            return  # Missing object; nothing to do
+
+        for line in f:
+            # Format of line is "<marknum> <path>:<revnum>"
+            mark, rest = line.strip().split(' ', 1)
+            path, rev = rest.rsplit(':', 1)
+            self._add_to_marknum_idx(int(mark), path, CVSNum(rev))
+        f.close()
+
+    def _save_marknum_idx(self):
+        """Prepare data for storage into MarkNumIndexFile.
+
+        The returned string contains the current contents of
+        self.marknum_idx, formatted to be stored verbatim in
+        self.MarkNumIndexFile.
+
+        """
+        lines = []
+        for marknum, revs in sorted(self.marknum_idx.iteritems()):
+            for path, revnum in revs:
+                # Format of line is "<marknum> <path>:<revnum>"
+                line = "%i %s:%s\n" % (marknum, path, revnum)
+                lines.append(line)
+        return "".join(lines)
+
+    def _save_rev(self, path, revnum):
+        """Return blob data for storing the given revision persistently.
+
+        Generate the blob contents that will reconstitute the same
+        revision entry when read back in with _fetch_path().
+
+        """
+        lines = []
+        rev_info = self.d[path].revs[revnum]
+        lines.append("blob %s\n" % (rev_info.blob))
+        for commit in rev_info.commits:
+            lines.append("commit %s\n" % (commit))
+        return "".join(lines)
+
+    @staticmethod
+    def _valid_objname (objname):
+        """Return the argument as a SHA1 (string) or mark num (int)."""
+        # Blob names are either 40-char SHA1 strings, or mark numbers
+        if isinstance(objname, int) or len(objname) != 40:  # Mark number
+            return int(objname)
+        return objname
+
+    def _load (self, path, mode, data):
+        """GitObjectFetcher.walk_tree() callback."""
+        assert mode in (644, 755)
+        cvs_path, revnum = os.path.split(path)
+        revnum = CVSNum(revnum)
+        if cvs_path in self.d:
+            assert mode == self.d[cvs_path].mode
+        else:
+            self.d[cvs_path] = _CVSPathInfo(mode)
+        assert revnum not in self.d[cvs_path].revs
+        rev_info = None
+        for line in data.split("\n"):
+            if not line:
+                continue
+            t, objname = line.split()
+            objname = self._valid_objname(objname)
+            if t == "blob":
+                assert rev_info is None
+                rev_info = _CVSRevInfo(objname)
+            elif t == "commit":
+                assert rev_info is not None
+                rev_info.commits.append(objname)
+            else:
+                assert False, "Unknown type '%s'" % (t)
+        assert rev_info.commits  # Each rev is in at least one commit
+        self.d[cvs_path].revs[revnum] = rev_info
+
+    def _fetch_path (self, path):
+        """If the given path exists, create a path entry in self.d."""
+        tree_spec = "%s:%s" % (self.git_ref, path)
+        self.obj_fetcher.walk_tree(tree_spec, self._load, path)
+        # TODO: Don't load entire tree at once?
+
+    # Public methods:
+
+    def has_path (self, path):
+        """Return True iff the given path is present."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return path in self.d
+
+    def has_rev (self, path, revnum):
+        """Return True iff the given path:revnum is present."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return path in self.d and revnum in self.d[path].revs
+
+    def get_mode (self, path):
+        """Return mode bits for the given path."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return self.d[path].mode
+
+    def get_blob (self, path, revnum):
+        """Return the blob name for the given revision."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return self.d[path].revs[revnum].blob
+
+    def get_commits (self, path, revnum):
+        """Return the commit names containing the given revision."""
+        if path not in self.d:
+            self._fetch_path(path)
+        return self.d[path].revs[revnum].commits
+
+    def has_unresolved_marks (self):
+        """Return True iff there are mark numbers in the data structure."""
+        return self.marknum_idx
+
+    # Public non-const methods
+
+    def add_path (self, path, mode):
+        """Add the given path and associated mode bits to this map."""
+        if path not in self.d:
+            self._fetch_path(path)
+        if path in self.d:
+            if self.d[path].mode:
+                assert mode == self.d[path].mode, \
+                    "The mode of %s has changed from %s " \
+                    "to %s since the last import.  This " \
+                    "is not supported." % (
+                        path, self.d[path].mode, mode)
+            else:
+                self.d[path].mode = mode
+        else:
+            self.d[path] = _CVSPathInfo(mode)
+        # Do not add to self.mods yet, as we expect revisions to be
+        # added before commit_map() is called
+
+    def add_blob (self, path, revnum, blobname):
+        """Add the given path:revnum -> blobname association."""
+        assert blobname
+        if path not in self.d:
+            self._fetch_path(path)
+        blobname = self._valid_objname(blobname)
+        if isinstance(blobname, int):  # Mark number
+            self._add_to_marknum_idx(blobname, path, revnum)
+        entry = self.d.setdefault(path, _CVSPathInfo())
+        assert revnum not in entry.revs
+        entry.revs[revnum] = _CVSRevInfo(blobname)
+        self.mods.add((path, revnum))
+
+    def add_commit (self, path, revnum, commitname):
+        """Add the given path:revnum -> commitname association."""
+        if path not in self.d:
+            self._fetch_path(path)
+        commitname = self._valid_objname(commitname)
+        if isinstance(commitname, int):  # Mark number
+            self._add_to_marknum_idx(commitname, path, revnum)
+        assert revnum in self.d[path].revs
+        assert commitname not in self.d[path].revs[revnum].commits
+        self.d[path].revs[revnum].commits.append(commitname)
+        self.mods.add((path, revnum))
+
+    @file_reader_method(missing_ok = True)
+    def load_marks_file (self, f):
+        """Load mark -> SHA1 mappings from git-fast-import marks file.
+
+        Replace all mark numbers with proper SHA1 names in this data
+        structure (using self.marknum_idx to find them quickly).
+
+        """
+        if not f:
+            return 0
+        marks = {}
+        last_mark = 0
+        for line in f:
+            (mark, sha1) = line.strip().split()
+            assert mark.startswith(":")
+            mark = int(mark[1:])
+            assert mark not in marks
+            marks[mark] = sha1
+            if mark > last_mark:
+                last_mark = mark
+        for marknum, revs in self.marknum_idx.iteritems():
+            sha1 = marks[marknum]
+            for path, revnum in revs:
+                if path not in self.d:
+                    self._fetch_path(path)
+                rev_info = self.d[path].revs[revnum]
+                if rev_info.blob == marknum:  # Replace blobname
+                    rev_info.blob = sha1
+                else:  # Replace commitname
+                    assert marknum in rev_info.commits
+                    i = rev_info.commits.index(marknum)
+                    rev_info.commits[i] = sha1
+                    assert marknum not in rev_info.commits
+                self.mods.add((path, revnum))
+        self.marknum_idx = {}  # Resolved all transient mark numbers
+        return last_mark
+
+    def sync_modeinfo_from_cvs (self, cvs):
+        """Update with mode information from current CVS checkout.
+
+        This method will retrieve mode information on all paths in the
+        current CVS checkout, and update this data structure
+        correspondingly.  In the case where mode information is already
+        present for a given CVS path, this method will verify that such
+        information is correct.
+
+        """
+        for path, mode in cvs.get_modeinfo().iteritems():
+            self.add_path(path, mode)
+
+    def commit_map (self, gfi, author, message):
+        """Produce git-fast-import commands for storing changes.
+
+        The given GitFastImport object is used to produce a commit
+        making the changes done to this data structure persistent.
+
+        """
+        now = CVSDate("now")
+        commitdata = GitFICommit(
+            author[0], author[1], now.ts, now.tz_str(), message)
+
+        # Add updated MarkNumIndexFile to commit
+        mark = gfi.blob(self._save_marknum_idx())
+        commitdata.modify(644, mark, self.MarkNumIndexFile)
+
+        for path, revnum in self.mods:
+            mark = gfi.blob(self._save_rev(path, revnum))
+            mode = self.d[path].mode
+            assert mode in (644, 755)
+            commitdata.modify(mode, mark, "%s/%s" % (path, revnum))
+
+        gfi.commit(self.git_ref, commitdata)
+        self.mods = set()
+
+
+class CVSStateMap(object):
+
+    """Map CVSState object to the commit names which produces that state."""
+
+    def __init__ (self, cvs_rev_map):
+        """Create a CVSStateMap object, bound to the given CVS revision map."""
+        self.cvs_rev_map = cvs_rev_map
+
+    def get_commits (self, state):
+        """Map the given CVSState to commits that contain this state.
+
+        Return all commits where the given state is a subset of the
+        state produced by that commit.
+
+        Returns a set of commit names.  The set may be empty.
+
+        """
+        candidate_commits = None
+        for path, revnum in state:
+            commits = self.cvs_rev_map.get_commits(path, revnum)
+            if candidate_commits is None:
+                candidate_commits = set(commits)
+            else:
+                candidate_commits.intersection_update(commits)
+        return candidate_commits
+
+    def get_exact_commit (self, state, commit_map):
+        """Map the given CVSState to the commit with this exact state.
+
+        The given commit_map must be a CommitStates object.
+
+        Return the only commit (if any) that produces the exact given
+        CVSState.
+
+        Returns a commit name, or None if no matching commit is found.
+
+        """
+        commits = self.get_commits(state)
+        for c in commits:
+            if state == commit_map.get(c):
+                return c
+        return None
diff --git a/git_remote_cvs/git.py b/git_remote_cvs/git.py
new file mode 100644
index 0000000..cf037e3
--- /dev/null
+++ b/git_remote_cvs/git.py
@@ -0,0 +1,680 @@
+#!/usr/bin/env python
+
+"""Functionality for interacting with Git repositories.
+
+This module provides classes for interfacing with a Git repository.
+
+"""
+
+import os
+import re
+from binascii import hexlify
+from cStringIO import StringIO
+import unittest
+
+from git_remote_cvs.util import debug, error, die, start_command, run_command
+from git_remote_cvs.cvs import CVSDate
+
+
+def get_git_dir ():
+    """Return the path to the GIT_DIR for this repo."""
+    args = ("git", "rev-parse", "--git-dir")
+    exit_code, output, errors = run_command(args)
+    if exit_code:
+        die("Failed to retrieve git dir")
+    assert not errors
+    return output.strip()
+
+
+def parse_git_config ():
+    """Return a dict containing the parsed version of 'git config -l'."""
+    exit_code, output, errors = run_command(("git", "config", "-z", "-l"))
+    if exit_code:
+        die("Failed to retrieve git configuration")
+    assert not errors
+    return dict([e.split('\n', 1) for e in output.split("\0") if e])
+
+
+def git_config_bool (value):
+    """Convert the given git config string value to True or False.
+
+    Raise ValueError if the given string was not recognized as a
+    boolean value.
+
+    """
+    norm_value = str(value).strip().lower()
+    if norm_value in ("true", "1", "yes", "on", ""):
+        return True
+    if norm_value in ("false", "0", "no", "off", "none"):
+        return False
+    raise ValueError("Failed to parse '%s' into a boolean value" % (value))
+
+
+def valid_git_ref (ref_name):
+    """Return True iff the given ref name is a valid git ref name."""
+    # The following is a reimplementation of the git check-ref-format
+    # command.  The rules were derived from the git check-ref-format(1)
+    # manual page.  This code should be replaced by a call to
+    # check_ref_format() in the git library, when such is available.
+    if ref_name.endswith('/') or \
+       ref_name.startswith('.') or \
+       ref_name.count('/.') or \
+       ref_name.count('..') or \
+       ref_name.endswith('.lock'):
+        return False
+    for c in ref_name:
+        if ord(c) < 0x20 or ord(c) == 0x7f or c in " ~^:?*[":
+            return False
+    return True
+
+
+class GitObjectFetcher(object):
+
+    """Provide parsed access to 'git cat-file --batch'.
+
+    This provides a read-only interface to the Git object database.
+
+    """
+
+    def __init__ (self):
+        """Initiate a 'git cat-file --batch' session."""
+        self.queue = []  # List of object names to be submitted
+        self.in_transit = None  # Object name currently in transit
+
+        # 'git cat-file --batch' produces binary output which is likely
+        # to be corrupted by the default "rU"-mode pipe opened by
+        # start_command.  (Mode == "rU" does universal new-line
+        # conversion, which mangles carriage returns.) Therefore, we
+        # open an explicitly binary-safe pipe for transferring the
+        # output from 'git cat-file --batch'.
+        pipe_r_fd, pipe_w_fd = os.pipe()
+        pipe_r = os.fdopen(pipe_r_fd, "rb")
+        pipe_w = os.fdopen(pipe_w_fd, "wb")
+        self.proc = start_command(("git", "cat-file", "--batch"),
+                                  stdout = pipe_w)
+        self.f = pipe_r
+
+    def __del__ (self):
+        """Verify completed communication with 'git cat-file --batch'."""
+        assert not self.queue
+        assert self.in_transit is None
+        self.proc.stdin.close()
+        assert self.proc.wait() == 0  # Zero exit code
+        assert self.f.read() == ""  # No remaining output
+
+    def _submit_next_object (self):
+        """Submit queue items to the 'git cat-file --batch' process.
+
+        If there are items in the queue, and there is currently no item
+        currently in 'transit', then pop the first item off the queue,
+        and submit it.
+
+        """
+        if self.queue and self.in_transit is None:
+            self.in_transit = self.queue.pop(0)
+            print >> self.proc.stdin, self.in_transit[0]
+
+    def push (self, obj, callback):
+        """Push the given object name onto the queue.
+
+        The given callback function will at some point in the future
+        be called exactly once with the following arguments:
+        - self - this GitObjectFetcher instance
+        - obj  - the object name provided to push()
+        - sha1 - the SHA1 of the object, if 'None' obj is missing
+        - t    - the type of the object (tag/commit/tree/blob)
+        - size - the size of the object in bytes
+        - data - the object contents
+
+        """
+        self.queue.append((obj, callback))
+        self._submit_next_object()  # (Re)start queue processing
+
+    def process_next_entry (self):
+        """Read the next entry off the queue and invoke callback."""
+        obj, cb = self.in_transit
+        self.in_transit = None
+        header = self.f.readline()
+        if header == "%s missing\n" % (obj):
+            cb(self, obj, None, None, None, None)
+            return
+        sha1, t, size = header.split(" ")
+        assert len(sha1) == 40
+        assert t in ("tag", "commit", "tree", "blob")
+        assert size.endswith("\n")
+        size = int(size.strip())
+        data = self.f.read(size)
+        assert self.f.read(1) == "\n"
+        cb(self, obj, sha1, t, size, data)
+        self._submit_next_object()
+
+    def process (self):
+        """Process the current queue until empty."""
+        while self.in_transit is not None:
+            self.process_next_entry()
+
+    # High-level convenience methods:
+
+    def get_sha1 (self, objspec):
+        """Return the SHA1 of the object specified by 'objspec'.
+
+        Return None if 'objspec' does not specify an existing object.
+
+        """
+        class _ObjHandler(object):
+            """Helper class for getting the returned SHA1."""
+            def __init__ (self, parser):
+                self.parser = parser
+                self.sha1 = None
+
+            def __call__ (self, parser, obj, sha1, t, size, data):
+                # FIXME: Many unused arguments. Could this be cheaper?
+                assert parser == self.parser
+                self.sha1 = sha1
+
+        handler = _ObjHandler(self)
+        self.push(objspec, handler)
+        self.process()
+        return handler.sha1
+
+    def open_obj (self, objspec):
+        """Return a file object wrapping the contents of a named object.
+
+        The caller is responsible for calling .close() on the returned
+        file object.
+
+        Raise KeyError if 'objspec' does not exist in the repo.
+
+        """
+        class _ObjHandler(object):
+            """Helper class for parsing the returned git object."""
+            def __init__ (self, parser):
+                """Set up helper."""
+                self.parser = parser
+                self.contents = StringIO()
+                self.err = None
+
+            def __call__ (self, parser, obj, sha1, t, size, data):
+                """Git object callback (see GitObjectFetcher documentation)."""
+                assert parser == self.parser
+                if not sha1:  # Missing object
+                    self.err = "Missing object '%s'" % obj
+                else:
+                    assert size == len(data)
+                    self.contents.write(data)
+
+        handler = _ObjHandler(self)
+        self.push(objspec, handler)
+        self.process()
+        if handler.err:
+            raise KeyError(handler.err)
+        handler.contents.seek(0)
+        return handler.contents
+
+    def walk_tree (self, tree_objspec, callback, prefix = ""):
+        """Recursively walk the given Git tree object.
+
+        Recursively walk all subtrees of the given tree object, and
+        invoke the given callback passing three arguments:
+        (path, mode, data) with the path, permission bits, and contents
+        of all the blobs found in the entire tree structure.
+
+        """
+        class _ObjHandler(object):
+            """Helper class for walking a git tree structure."""
+            def __init__ (self, parser, cb, path, mode = None):
+                """Set up helper."""
+                self.parser = parser
+                self.cb = cb
+                self.path = path
+                self.mode = mode
+                self.err = None
+
+            def parse_tree (self, treedata):
+                """Parse tree object data, yield tree entries.
+
+                Each tree entry is a 3-tuple (mode, sha1, path)
+
+                self.path is prepended to all paths yielded
+                from this method.
+
+                """
+                while treedata:
+                    mode = int(treedata[:6], 10)
+                    # Turn 100xxx into xxx
+                    if mode > 100000:
+                        mode -= 100000
+                    assert treedata[6] == " "
+                    i = treedata.find("\0", 7)
+                    assert i > 0
+                    path = treedata[7:i]
+                    sha1 = hexlify(treedata[i + 1: i + 21])
+                    yield (mode, sha1, self.path + path)
+                    treedata = treedata[i + 21:]
+
+            def __call__ (self, parser, obj, sha1, t, size, data):
+                """Git object callback (see GitObjectFetcher documentation)."""
+                assert parser == self.parser
+                if not sha1:  # Missing object
+                    self.err = "Missing object '%s'" % (obj)
+                    return
+                assert size == len(data)
+                if t == "tree":
+                    if self.path:
+                        self.path += "/"
+                    # Recurse into all blobs and subtrees
+                    for m, s, p in self.parse_tree(data):
+                        parser.push(s,
+                                    self.__class__(self.parser, self.cb, p, m))
+                elif t == "blob":
+                    self.cb(self.path, self.mode, data)
+                else:
+                    raise ValueError("Unknown object type '%s'" % (t))
+
+        self.push(tree_objspec, _ObjHandler(self, callback, prefix))
+        self.process()
+
+
+class GitRefMap(object):
+
+    """Map Git ref names to the Git object names they currently point to.
+
+    Behaves like a dictionary of Git ref names -> Git object names.
+
+    """
+
+    def __init__ (self, obj_fetcher):
+        """Create a new Git ref -> object map."""
+        self.obj_fetcher = obj_fetcher
+        self._cache = {}  # dict: refname -> objname
+
+    def _load (self, ref):
+        """Retrieve the object currently bound to the given ref.
+
+        The name of the object pointed to by the given ref is stored
+        into this mapping, and also returned.
+
+        """
+        if ref not in self._cache:
+            self._cache[ref] = self.obj_fetcher.get_sha1(ref)
+        return self._cache[ref]
+
+    def __contains__ (self, refname):
+        """Return True if the given refname is present in this cache."""
+        return bool(self._load(refname))
+
+    def __getitem__ (self, refname):
+        """Return the git object name pointed to by the given refname."""
+        commit = self._load(refname)
+        if commit is None:
+            raise KeyError("Unknown ref '%s'" % (refname))
+        return commit
+
+    def get (self, refname, default = None):
+        """Return the git object name pointed to by the given refname."""
+        commit = self._load(refname)
+        if commit is None:
+            return default
+        return commit
+
+
+class GitFICommit(object):
+
+    """Encapsulate the data in a Git fast-import commit command."""
+
+    SHA1RE = re.compile(r'^[0-9a-f]{40}$')
+
+    @classmethod
+    def parse_mode (cls, mode):
+        """Verify the given git file mode, and return it as a string."""
+        assert mode in (644, 755, 100644, 100755, 120000)
+        return "%i" % (mode)
+
+    @classmethod
+    def parse_objname (cls, objname):
+        """Return the given object name (or mark number) as a string."""
+        if isinstance(objname, int):  # Object name is a mark number
+            assert objname > 0
+            return ":%i" % (objname)
+
+        # No existence check is done, only checks for valid format
+        assert cls.SHA1RE.match(objname)  # Object name is valid SHA1
+        return objname
+
+    @classmethod
+    def quote_path (cls, path):
+        """Return a quoted version of the given path."""
+        path = path.replace("\\", "\\\\")
+        path = path.replace("\n", "\\n")
+        path = path.replace('"', '\\"')
+        return '"%s"' % (path)
+
+    @classmethod
+    def parse_path (cls, path):
+        """Verify that the given path is valid, and quote it, if needed."""
+        assert not isinstance(path, int)  # Cannot be a mark number
+
+        # These checks verify the rules on the fast-import man page
+        assert not path.count("//")
+        assert not path.endswith("/")
+        assert not path.startswith("/")
+        assert not path.count("/./")
+        assert not path.count("/../")
+        assert not path.endswith("/.")
+        assert not path.endswith("/..")
+        assert not path.startswith("./")
+        assert not path.startswith("../")
+
+        if path.count('"') + path.count('\n') + path.count('\\'):
+            return cls.quote_path(path)
+        return path
+
+    def __init__ (self, name, email, timestamp, timezone, message):
+        """Create a new Git fast-import commit, with the given metadata."""
+        self.name = name
+        self.email = email
+        self.timestamp = timestamp
+        self.timezone = timezone
+        self.message = message
+        self.pathops = []  # List of path operations in this commit
+
+    def modify (self, mode, blobname, path):
+        """Add a file modification to this Git fast-import commit."""
+        self.pathops.append(("M",
+                             self.parse_mode(mode),
+                             self.parse_objname(blobname),
+                             self.parse_path(path)))
+
+    def delete (self, path):
+        """Add a file deletion to this Git fast-import commit."""
+        self.pathops.append(("D", self.parse_path(path)))
+
+    def copy (self, path, newpath):
+        """Add a file copy to this Git fast-import commit."""
+        self.pathops.append(("C",
+                             self.parse_path(path),
+                             self.parse_path(newpath)))
+
+    def rename (self, path, newpath):
+        """Add a file rename to this Git fast-import commit."""
+        self.pathops.append(("R",
+                             self.parse_path(path),
+                             self.parse_path(newpath)))
+
+    def note (self, blobname, commit):
+        """Add a note object to this Git fast-import commit."""
+        self.pathops.append(("N",
+                             self.parse_objname(blobname),
+                             self.parse_objname(commit)))
+
+    def deleteall (self):
+        """Delete all files in this Git fast-import commit."""
+        self.pathops.append("deleteall")
+
+
+class TestGitFICommit(unittest.TestCase):
+
+    """GitFICommit selftests."""
+
+    def test_basic (self):
+        """GitFICommit basic selftests."""
+
+        def expect_fail (method, data):
+            """Verify that the method(data) raises an AssertionError."""
+            try:
+                method(data)
+            except AssertionError:
+                return
+            raise AssertionError("Failed test for invalid data '%s(%s)'" %
+                                 (method.__name__, repr(data)))
+
+    def test_parse_mode (self):
+        """GitFICommit.parse_mode() selftests."""
+        self.assertEqual(GitFICommit.parse_mode(644), "644")
+        self.assertEqual(GitFICommit.parse_mode(755), "755")
+        self.assertEqual(GitFICommit.parse_mode(100644), "100644")
+        self.assertEqual(GitFICommit.parse_mode(100755), "100755")
+        self.assertEqual(GitFICommit.parse_mode(120000), "120000")
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, 0)
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, 123)
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, 600)
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, "644")
+        self.assertRaises(AssertionError, GitFICommit.parse_mode, "abc")
+
+    def test_parse_objname (self):
+        """GitFICommit.parse_objname() selftests."""
+        self.assertEqual(GitFICommit.parse_objname(1), ":1")
+        self.assertRaises(AssertionError, GitFICommit.parse_objname, 0)
+        self.assertRaises(AssertionError, GitFICommit.parse_objname, -1)
+        self.assertEqual(GitFICommit.parse_objname("0123456789" * 4),
+                         "0123456789" * 4)
+        self.assertEqual(GitFICommit.parse_objname("2468abcdef" * 4),
+                         "2468abcdef" * 4)
+        self.assertRaises(AssertionError, GitFICommit.parse_objname,
+                          "abcdefghij" * 4)
+
+    def test_parse_path (self):
+        """GitFICommit.parse_path() selftests."""
+        self.assertEqual(GitFICommit.parse_path("foo/bar"), "foo/bar")
+        self.assertEqual(GitFICommit.parse_path("path/with\n and \" in it"),
+                         '"path/with\\n and \\" in it"')
+        self.assertRaises(AssertionError, GitFICommit.parse_path, 1)
+        self.assertRaises(AssertionError, GitFICommit.parse_path, 0)
+        self.assertRaises(AssertionError, GitFICommit.parse_path, -1)
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo//bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/bar/")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "/foo/bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/./bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/../bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/bar/.")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "foo/bar/..")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "./foo/bar")
+        self.assertRaises(AssertionError, GitFICommit.parse_path, "../foo/bar")
+
+
+class GitFastImport(object):
+
+    """Encapsulate communication with git fast-import."""
+
+    def __init__ (self, f, obj_fetcher, last_mark = 0):
+        """Set up self to communicate with a fast-import process through f."""
+        self.f = f  # File object where fast-import stream is written
+        self.obj_fetcher = obj_fetcher  # GitObjectFetcher instance
+        self.next_mark = last_mark + 1  # Next mark number
+        self.refs = set()  # Keep track of the refnames we've seen
+
+    def comment (self, s):
+        """Write the given comment in the fast-import stream."""
+        assert "\n" not in s, "Malformed comment: '%s'" % (s)
+        self.f.write("# %s\n" % (s))
+
+    def commit (self, ref, commitdata):
+        """Make a commit on the given ref, with the given GitFICommit.
+
+        Return the mark number identifying this commit.
+
+        """
+        self.f.write("""\
+commit %(ref)s
+mark :%(mark)i
+committer %(name)s <%(email)s> %(timestamp)i %(timezone)s
+data %(msgLength)i
+%(msg)s
+""" % {
+    'ref': ref,
+    'mark': self.next_mark,
+    'name': commitdata.name,
+    'email': commitdata.email,
+    'timestamp': commitdata.timestamp,
+    'timezone': commitdata.timezone,
+    'msgLength': len(commitdata.message),
+    'msg': commitdata.message,
+})
+
+        if ref not in self.refs:
+            self.refs.add(ref)
+            parent = ref + "^0"
+            if self.obj_fetcher.get_sha1(parent):
+                self.f.write("from %s\n" % (parent))
+
+        for op in commitdata.pathops:
+            self.f.write(" ".join(op))
+            self.f.write("\n")
+        self.f.write("\n")
+        retval = self.next_mark
+        self.next_mark += 1
+        return retval
+
+    def blob (self, data):
+        """Import the given blob.
+
+        Return the mark number identifying this blob.
+
+        """
+        self.f.write("blob\nmark :%i\ndata %i\n%s\n" %
+                     (self.next_mark, len(data), data))
+        retval = self.next_mark
+        self.next_mark += 1
+        return retval
+
+    def reset (self, ref, objname):
+        """Reset the given ref to point at the given Git object."""
+        self.f.write("reset %s\nfrom %s\n\n" %
+                     (ref, GitFICommit.parse_objname(objname)))
+        if ref not in self.refs:
+            self.refs.add(ref)
+
+
+class GitNotes(object):
+
+    """Encapsulate access to Git notes.
+
+    Simulates a dictionary of object name (SHA1) -> Git note mappings.
+
+    """
+
+    def __init__ (self, notes_ref, obj_fetcher):
+        """Create a new Git notes interface, bound to the given notes ref."""
+        self.notes_ref = notes_ref
+        self.obj_fetcher = obj_fetcher  # Used to get objects from repo
+        self.imports = []  # list: (objname, note data blob name) tuples
+
+    def __del__ (self):
+        """Verify that self.commit_notes() was called before destruction."""
+        if self.imports:
+            error("Missing call to self.commit_notes().")
+            error("%i notes are not committed!", len(self.imports))
+
+    def _load (self, objname):
+        """Return the note data associated with the given git object.
+
+        The note data is returned in string form. If no note is found
+        for the given object, None is returned.
+
+        """
+        try:
+            f = self.obj_fetcher.open_obj("%s:%s" % (self.notes_ref, objname))
+            ret = f.read()
+            f.close()
+        except KeyError:
+            ret = None
+        return ret
+
+    def __getitem__ (self, objname):
+        """Return the note contents associated with the given object.
+
+        Raise KeyError if given object has no associated note.
+
+        """
+        blobdata = self._load(objname)
+        if blobdata is None:
+            raise KeyError("Object '%s' has no note" % (objname))
+        return blobdata
+
+    def get (self, objname, default = None):
+        """Return the note contents associated with the given object.
+
+        Return given default if given object has no associated note.
+
+        """
+        blobdata = self._load(objname)
+        if blobdata is None:
+            return default
+        return blobdata
+
+    def import_note (self, objname, data, gfi):
+        """Tell git fast-import to store data as a note for objname.
+
+        This method uses the given GitFastImport object to create a
+        blob containing the given note data.  Also an entry mapping the
+        given object name to the created blob is stored until
+        commit_notes() is called.
+
+        Note that this method only works if it is later followed by a
+        call to self.commit_notes() (which produces the note commit
+        that refers to the blob produced here).
+
+        """
+        if not data.endswith("\n"):
+            data += "\n"
+        gfi.comment("Importing note for object %s" % (objname))
+        mark = gfi.blob(data)
+        self.imports.append((objname, mark))
+
+    def commit_notes (self, gfi, author, message):
+        """Produce a git fast-import note commit for the imported notes.
+
+        This method uses the given GitFastImport object to create a
+        commit on the notes ref, introducing the notes previously
+        submitted to import_note().
+
+        """
+        if not self.imports:
+            return
+        now = CVSDate("now")
+        commitdata = GitFICommit(author[0], author[1],
+                                 now.ts, now.tz_str(), message)
+        for objname, blobname in self.imports:
+            assert isinstance(objname, int) and objname > 0
+            assert isinstance(blobname, int) and blobname > 0
+            commitdata.note(blobname, objname)
+        gfi.commit(self.notes_ref, commitdata)
+        self.imports = []
+
+
+class GitCachedNotes(GitNotes):
+
+    """Encapsulate access to Git notes (cached version).
+
+    Only use this class if no caching is done at a higher level.
+
+    Simulates a dictionary of object name (SHA1) -> Git note mappings.
+
+    """
+
+    def __init__ (self, notes_ref, obj_fetcher):
+        """Set up a caching wrapper around GitNotes."""
+        GitNotes.__init__(self, notes_ref, obj_fetcher)
+        self._cache = {}  # Cache: object name -> note data
+
+    def __del__ (self):
+        """Verify that GitNotes' destructor is called."""
+        GitNotes.__del__(self)
+
+    def _load (self, objname):
+        """Extend GitNotes._load() with a local objname -> note cache."""
+        if objname not in self._cache:
+            self._cache[objname] = GitNotes._load(self, objname)
+        return self._cache[objname]
+
+    def import_note (self, objname, data, gfi):
+        """Extend GitNotes.import_note() with a local objname -> note cache."""
+        if not data.endswith("\n"):
+            data += "\n"
+        assert objname not in self._cache
+        self._cache[objname] = data
+        GitNotes.import_note(self, objname, data, gfi)
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/git_remote_cvs/setup.py b/git_remote_cvs/setup.py
new file mode 100644
index 0000000..21f521a
--- /dev/null
+++ b/git_remote_cvs/setup.py
@@ -0,0 +1,17 @@
+#!/usr/bin/env python
+
+"""Distutils build/install script for the git_remote_cvs package."""
+
+from distutils.core import setup
+
+setup(
+    name = 'git_remote_cvs',
+    version = '0.1.0',
+    description = 'Git remote helper program for CVS repositories',
+    license = 'GPLv2',
+    author = 'The Git Community',
+    author_email = 'git@vger.kernel.org',
+    url = 'http://www.git-scm.com/',
+    package_dir = {'git_remote_cvs': ''},
+    packages = ['git_remote_cvs'],
+)
-- 
1.6.4.262.gca66a.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFCv4 4/5] git-remote-cvs: Remote helper program for CVS repositories
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
                   ` (2 preceding siblings ...)
  2009-08-17  3:35 ` [RFCv4 3/5] 2/2: " Johan Herland
@ 2009-08-17  3:36 ` Johan Herland
  2009-08-17  3:36 ` [RFCv4 5/5] Add simple selftests of git-remote-cvs functionality Johan Herland
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-17  3:36 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, Johan Herland, barkalow, davvid, gitster

Implements the import of objects from a local or remote CVS repository.

This helper program uses the "git_remote_cvs" Python package introduced
earlier, and provides a working draft implementation of the remote helper
API, as described in Documentation/git-remote-helpers.txt. Further details
about this specific helper are described in the new
Documentation/git-remote-cvs.txt.

This patch has been improved by the following contributions:
- Daniel Barkalow: Updates reflecting changes in remote helper API
- David Aguilar: Lots of Python coding style fixes

Cc: Daniel Barkalow <barkalow@iabervon.org>
Cc: David Aguilar <davvid@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-remote-cvs.txt |   85 +++++
 Makefile                         |   24 ++
 git-remote-cvs.py                |  683 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 792 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-remote-cvs.txt
 create mode 100755 git-remote-cvs.py

diff --git a/Documentation/git-remote-cvs.txt b/Documentation/git-remote-cvs.txt
new file mode 100644
index 0000000..309ee7f
--- /dev/null
+++ b/Documentation/git-remote-cvs.txt
@@ -0,0 +1,85 @@
+git-remote-cvs(1)
+=================
+
+NAME
+----
+git-remote-cvs - Helper program for interoperation with CVS repositories
+
+SYNOPSIS
+--------
+'git remote-cvs' <remote>
+
+DESCRIPTION
+-----------
+
+Please see the linkgit:git-remote-helpers[1] documentation for general
+information about remote helper programs.
+
+CONFIGURATION
+-------------
+
+remote.*.cvsRoot::
+	The URL of the CVS repository (as found in a `CVSROOT` variable, or
+	in a `CVS/Root` file).
+	Example: "`:pserver:user@server/var/cvs/cvsroot`".
+
+remote.*.cvsModule::
+	The path of the CVS module (as found in a `CVS/Repository` file)
+	within the CVS repository specified in `remote.*.cvsRoot`.
+	Example: "`foo/bar`"
+
+remote.*.cachedSymbolsOnly::
+	When 'true', a cache of CVS symbols is used instead of querying the
+	CVS server for all existing symbols (potentially expensive). In this
+	mode, git-remote-cvs will not discover new CVS symbols unless you add
+	them explicitly with the "`addsymbol <symbol>`" command (on
+	git-remote-cvs's stdin), or request an explicit symbol cache update
+	from the CVS server with the "`syncsymbols`" command (on
+	git-remote-cvs's stdin). When 'false' (the default), the CVS server
+	will be queried whenever a list of CVS symbols is required.
+
+remote.*.usernameMap::
+	The path (absolute, or relative to the repository (NOT the worktree))
+	to the file that contains the mapping from CVS usernames to the
+	corresponding full names and email addresses, as used by Git in the
+	Author and Committer fields of commit objects. When this config
+	variable is set, CVS usernames will be resolved against this file.
+	If no match is found in the file, or if this config variable is unset,
+	or if the variable points to a non-existing file, the original CVS
+	username will be used as the Author/Committer name, and the
+	corresponding email address will be set to "`<username>@example.com`".
++
+The format of the usernameMap file is one entry per line, where each line is
+of the form "`username: Full Name <email@address>`".
+Example: `johndoe: John Doe <johndoe@example.com>`
+Blank lines and lines starting with '#' are ignored.
+
+COMMANDS
+--------
+
+In addition to the commands that constitute the git-remote-helpers API, the
+following extra commands are supported for managing the local symbol cache when
+the `remote.*.cachedSymbolsOnly` config variable is true. The following
+commands can be given on the standard input of git-remote-cvs:
+
+'addsymbol'::
+	Takes one CVS symbol name as argument. The given CVS symbol is
+	fetched from the CVS server and stored into the local CVS symbol
+	cache. If `remote.*.cachedSymbolsOnly` is enabled, this can be used
+	to introduce a new CVS symbol to the CVS helper application.
+
+'syncsymbols'::
+	All CVS symbols that are available from the given remote are
+	fetched from the CVS server and stored into the local CVS symbol
+	cache. This is equivalent to disabling `remote.{asterisk}.cachedSymbolsOnly`,
+	running the "list" command, and then finally re-enabling the
+	`remote.*.cachedSymbolsOnly` config variable. I.e. this command can
+	be used to manually synchronize the CVS symbols available to the
+	CVS helper application.
+
+'verify'::
+	Takes one CVS symbol name as argument. Verifies that the CVS symbol
+	has been successfully imported be checking out the CVS symbol from
+	the CVS server, and comparing the CVS working tree against the Git
+	tree object identified by `refs/cvs/<remote>/<symbol>`. This can be
+	used to verify the correctness of a preceding 'import' command.
diff --git a/Makefile b/Makefile
index bb5cea2..b2af678 100644
--- a/Makefile
+++ b/Makefile
@@ -350,6 +350,8 @@ SCRIPT_PERL += git-relink.perl
 SCRIPT_PERL += git-send-email.perl
 SCRIPT_PERL += git-svn.perl
 
+SCRIPT_PYTHON += git-remote-cvs.py
+
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
 	  $(patsubst %.py,%,$(SCRIPT_PYTHON)) \
@@ -1474,6 +1476,28 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh
 	mv $@+ $@
 endif # NO_PERL
 
+ifndef NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory instlibdir` && \
+	sed -e '1{' \
+	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
+	    -e '}' \
+	    -e 's|^import sys.*|&; sys.path.insert(0, "@@INSTLIBDIR@@")|' \
+	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
+	    $@.py >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+else # NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): % : unimplemented.sh
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
+	    -e 's|@@REASON@@|NO_PYTHON=$(NO_PYTHON)|g' \
+	    unimplemented.sh >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+endif # NO_PYTHON
+
 configure: configure.ac
 	$(QUIET_GEN)$(RM) $@ $<+ && \
 	sed -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
diff --git a/git-remote-cvs.py b/git-remote-cvs.py
new file mode 100755
index 0000000..94b61e7
--- /dev/null
+++ b/git-remote-cvs.py
@@ -0,0 +1,683 @@
+#!/usr/bin/env python
+
+"""Usage: git-remote-cvs <remote> [<url>]
+
+Git remote helper command for interacting with CVS repositories
+
+See git-remote-helpers documentation for details on external interface,
+usage, etc.  See git-remote-cvs documentation for specific
+configuration details of this remote helper.
+
+"""
+
+# PRINCIPLES:
+# -----------
+# - Importing same symbol twice (with no CVS changes in between) should
+#   yield the exact same Git state (and the second import should have
+#   no commits).
+# - Importing several CVS symbols pointing to the same state should
+#   yield corresponding refs pointing to the _same_ commit in Git.
+# - Importing a CVS symbol which has received only "regular commits"
+#   since last import should yield a fast-forward straight line of
+#   commits.
+
+# TODO / KNOWN PROBLEMS:
+# ----------------------
+# - Remove cachedSymbolsOnly config variable for now?
+# - Author map handling; map CVS usernames to Git full name + email
+# - Handle files that have been created AND deleted since last import
+# - How to handle CVS tags vs. branches.  Turn CVS tags into Git tags?
+# - Better CVS branch handling: When a branch as a super/subset of
+#   files/revs compared to another branch, find a way to base one branch
+#   on the other instead of creating parallel lines of development with
+#   roughly the same commits.
+# - Profiling, optimizations...
+
+import sys
+import os
+
+from git_remote_cvs.util import (debug, error, die, ProgressIndicator,
+                                 run_command)
+from git_remote_cvs.cvs import CVSState, CVSWorkDir, fetch_revs
+from git_remote_cvs.git import (get_git_dir, parse_git_config, git_config_bool,
+                                valid_git_ref, GitObjectFetcher, GitRefMap,
+                                GitFICommit, GitFastImport, GitNotes)
+from git_remote_cvs.cvs_symbol_cache import CVSSymbolCache
+from git_remote_cvs.commit_states import CommitStates
+from git_remote_cvs.cvs_revision_map import CVSRevisionMap, CVSStateMap
+from git_remote_cvs.changeset import build_changesets_from_revs
+
+
+class Config(object):
+
+    """Encapsulation of configuration variables."""
+
+    # Author name/email tuple for commits created by this tool
+    Author = ("git remote-cvs", "invalid@example.com")
+    # Git remote name
+    Remote = None
+    # CVS symbols are imported into this refs namespace/directory
+    RefSpace = None
+    # Git notes ref, the refname pointing to our git notes
+    NotesRef = None
+    # CVS repository identifier, a 2-tuple (cvs_root, cvs_module),
+    # where cvs_root is the CVS server/repository URL (as found in
+    # $CVSROOT, or in a CVS/Root file), and cvs_module is the path
+    # to a CVS module relative to the CVS repository (as found in a
+    # CVS/Repository file)
+    CVSRepo = (None, None)
+    # Path to the git-remote-cvs cache/work directory
+    # (normally "info/cvs/$remote" within $GIT_DIR)
+    WorkDir = None
+    # If False, the list of CVS symbols will always be retrieved from
+    # the CVS server using 'cvs rlog'.  If True, only the cached
+    # symbols within the "symbols" subdirectory of WorkDir are
+    # consulted.
+    CachedSymbolsOnly = False
+
+    @classmethod
+    def init (cls, remote):
+        """Fetch and setup configuration for the given remote."""
+        git_config = parse_git_config()
+        assert git_config["remote.%s.vcs" % (remote)] == "cvs"
+
+        cls.Author = (git_config["user.name"], git_config["user.email"])
+        cls.Remote = remote
+        cls.RefSpace = "refs/cvs/%s/" % (remote)
+        cls.NotesRef = "refs/notes/cvs/%s" % (remote)
+        cls.CVSRepo = (git_config["remote.%s.cvsroot" % (remote)],
+                       git_config["remote.%s.cvsmodule" % (remote)])
+        cls.WorkDir = os.path.join(get_git_dir(), "info/cvs", remote)
+        cls.CachedSymbolsOnly = git_config_bool(git_config.get(
+            "remote.%s.cachedsymbolsonly" % (remote), "false"))
+
+
+def work_path (*args):
+    """Return the given path appended to git-remote-cvs's cache/work dir."""
+    return os.path.join(Config.WorkDir, *args)
+
+
+def cvs_to_refname (cvsname):
+    """Return the git ref name for the given CVS symbolic name."""
+    if cvsname.startswith(Config.RefSpace):  # Already converted
+        return cvsname
+    return Config.RefSpace + cvsname
+
+
+def ref_to_cvsname (refname):
+    """Return the CVS symbolic name for the given git ref name."""
+    if refname.startswith(Config.RefSpace):
+        return refname[len(Config.RefSpace):]
+    return refname
+
+
+def valid_cvs_symbol (symbol):
+    """Return True iff the given CVS symbol can be imported into Git."""
+    return valid_git_ref(cvs_to_refname(symbol))
+
+
+def die_usage (msg, *args):
+    """Abort program with a helpful usage message."""
+    # Use this file's docstring as a usage string
+    print >> sys.stderr, __doc__
+    die(msg, *args)
+
+
+def import_cvs_revs (symbol, prev_state, cur_state, progress):
+    """Import the CVS revisions needed to satisfy the given CVS symbol.
+
+    This method will determine the CVS revisions involved in moving
+    from the given prev_state to the given cur_state.  This includes
+    looking at revision metadata in CVS, and importing needed blobs
+    from CVS.
+
+    The revision metadata is returned as a 2-level dict of CVSRev
+    objects: mapping path -> revnum -> CVSRev object.
+
+    """
+    # Calculate the revisions involved in moving from prev_state to
+    # cur_state, and fetch CVSRev objects for these revisions.
+    progress.pushprefix("Importing CVS revisions: ")
+    paths = set(prev_state.paths()).union(cur_state.paths())
+    num_fetched_revs = 0  # Number of CVSRev objects involved
+    num_imported_blobs = 0  # Number of blobs actually imported
+    cvs_revs = {}  # path -> revnum -> CVSRev
+    for i, path in enumerate(sorted(paths)):
+        progress.pushprefix("(%i/%i) %s: " % (i + 1, len(paths), path))
+        progress("")
+        prev_rev = prev_state.get(path)
+        cur_rev = cur_state.get(path)
+        if prev_rev and cur_rev and prev_rev == cur_rev:
+            # No changes since last import
+            progress.popprefix()
+            continue
+
+        # Fetch CVSRev objects for range [path:prev_rev, path:symbol]
+        path_revs = fetch_revs(path, prev_rev, cur_rev, symbol, Config.CVSRepo)
+        if not path_revs:
+            # Failed to find revs between prev_rev and symbol
+            if cur_rev:
+                assert not cur_rev.follows(prev_rev)
+                # The CVS symbol has been moved/reset since the
+                # last import in such a way that we cannot
+                # deduce the history between the last import
+                # and the current import.
+                # FIXME: Can we can work around this?
+                die("CVS symbol %s has been moved/reset from %s:%s to %s:%s "
+                    "since the last import.  This is not supported",
+                    symbol, path, prev_rev, path, cur_rev)
+            else:
+                # CVS symbol has been removed from this path.
+                # We cannot conclusively determine the history
+                # of this path following prev_rev.
+                # FIXME: Can we can work around this?
+                die("CVS symbol %s has been removed from %s since the last "
+                    "import.  This is not supported",
+                    symbol, path)
+
+        # OK.  We've got the revs in range [prev_rev, symbol]
+        # Verify/determine cur_rev
+        real_cur_rev = max(path_revs.keys())
+        if cur_rev:
+            assert cur_rev == real_cur_rev
+        else: cur_rev = real_cur_rev
+        # No need to re-import prev_rev if already imported
+        if prev_rev:
+            assert cur_rev.follows(prev_rev)
+            assert prev_rev in path_revs
+            del path_revs[prev_rev]
+        assert path_revs  # There should be more revs than just prev_rev
+
+        # Sanity checks:
+        # All revs from prev_rev to cur_rev are about to be imported
+        check_rev = cur_rev
+        while check_rev and check_rev != prev_rev:
+            assert check_rev in path_revs
+            check_rev = check_rev.parent()
+        # All previous revs have already been imported
+        check_rev = prev_rev
+        while check_rev:
+            assert Globals.CVSRevisionMap.has_rev(path, check_rev)
+            check_rev = check_rev.parent()
+
+        # Import CVS revisions as Git blobs
+        j = 0
+        for num, rev in sorted(path_revs.iteritems(), reverse = True):
+            j += 1
+            progress("(%i/%i) %s" % (j, len(path_revs), num))
+            assert num == rev.num
+            # Skip if already imported
+            if Globals.CVSRevisionMap.has_rev(rev.path, rev.num):
+                continue
+            # ...or if rev is a deletion
+            elif rev.deleted:
+                continue
+            # Import blob for reals
+            data = Globals.CVSWorkDir.get_revision_data(rev.path, rev.num)
+            Globals.GitFastImport.comment("Importing CVS revision %s:%s" %
+                                          (rev.path, rev.num))
+            mark = Globals.GitFastImport.blob(data)
+            Globals.CVSRevisionMap.add_blob(rev.path, rev.num, mark)
+            num_imported_blobs += 1
+
+        # Add path_revs to the overall structure of revs to be imported
+        assert path not in cvs_revs
+        cvs_revs[path] = path_revs
+        num_fetched_revs += len(path_revs)
+        progress.popprefix()
+    progress.popprefix()
+    progress("Imported %i blobs (reused %i existing blobs)" %
+             (num_imported_blobs, num_fetched_revs - num_imported_blobs), True)
+
+    return cvs_revs
+
+
+def advance_state (state, changeset):
+    """Advance the given state by applying the given changeset."""
+    # Verify that the given changeset "fits" on top of the given state
+    for rev in changeset:
+        prev_num = rev.num.parent()
+        state_num = state.get(rev.path)
+        if prev_num is None and state_num is None:
+            # rev is the first revision of this path being added
+            state.add(rev.path, rev.num)
+        elif prev_num and state_num and prev_num == state_num:
+            if rev.deleted:  # rev deletes path from state
+                state.remove(rev.path, prev_num)
+            else:  # rev follows state's revision of this path
+                state.replace(rev.path, rev.num)
+        else:
+            error("Cannot apply changeset with %s:%s on top of CVS state "
+                  "with %s:%s.",
+                  rev.path, changeset[rev.path].num,
+                  rev.path, state.get(rev.path))
+            error("    changeset: %s", changeset)
+            error("    CVS state: \n---\n%s---", state)
+            die("Failed to apply changeset.  Aborting.")
+
+
+def revert_state (state, changeset):
+    """Revert the given state to before the given changeset is applied.
+
+    This is the reverse of the above advance_state() function.
+
+    """
+    for rev in changeset:
+        prev_num = rev.num.parent()
+        state_num = state.get(rev.path)
+        if state_num is None:  # Revert deletion of file
+            assert rev.deleted
+            state.add(rev.path, prev_num)
+        else:
+            assert state_num == rev.num
+            if prev_num is None:  # Revert addition of file
+                state.remove(rev.path, rev.num)
+            else:  # Regular revert to previous version
+                state.replace(rev.path, prev_num)
+
+
+def import_changesets (ref, changesets, from_state, to_state, progress):
+    """Apply the given list of Changeset objects to the given ref.
+
+    Also verify that the changesets bring us from the given from_state
+    to the given to_state.
+
+    """
+    state = from_state
+    for i, c in enumerate(changesets):
+        advance_state(state, c)
+        progress("(%i/%i) Committing %s" % (i + 1, len(changesets), c))
+        # Make a git commit from changeset c
+        commitdata = GitFICommit(
+            c.author, c.author + "@example.com",  # TODO: author_map handling
+            c.date.ts, c.date.tz_str(),
+            "".join(["%s\n" % (line) for line in c.message]))
+
+        for rev in c:
+            p, n = rev.path, rev.num
+            if rev.deleted:
+                commitdata.delete(p)
+                continue
+            blobname = Globals.CVSRevisionMap.get_blob(p, n)
+            mode = Globals.CVSRevisionMap.get_mode(p)
+            if mode is None:  # Must retrieve mode from CVS checkout
+                debug("Retrieving mode info for '%s'" % (p))
+                Globals.CVSWorkDir.update(n, [p])
+                mode = Globals.CVSWorkDir.get_modeinfo([p])[p]
+                Globals.CVSRevisionMap.add_path(p, mode)
+            commitdata.modify(mode, blobname, p)
+
+        commitname = Globals.GitFastImport.commit(ref, commitdata)
+        Globals.CommitStates.add(commitname, state, Globals.GitFastImport)
+        for path, revnum in state:
+            Globals.CVSRevisionMap.add_commit(path, revnum, commitname)
+        assert commitname in Globals.CVSStateMap.get_commits(state)
+    assert state == to_state
+    return len(changesets)
+
+
+def import_cvs_symbol (cvs_symbol, progress):
+    """Import the given CVS symbol from CVS to Git.
+
+    Return False if nothing was imported, True otherwise.
+
+    """
+    git_ref = cvs_to_refname(cvs_symbol)
+    progress.pushprefix("%s: " % (cvs_symbol))
+
+    # Verify that we are asked to import valid git ref names
+    if not valid_git_ref(git_ref):
+        progress("Invalid git ref '%s'.  Skipping." % (git_ref), True)
+        progress.popprefix()
+        return False
+
+    # Retrieve previously imported CVS state
+    progress("Loading previously imported state of %s..." % (git_ref))
+    prev_commit = Globals.GitRefMap.get(git_ref)
+    prev_state = Globals.CommitStates.get(prev_commit, CVSState())
+
+    # Retrieve current CVS state of symbol
+    # Also: At some point we will need mode information for all CVS
+    # paths (stored in CVSRevisionMap).  This information can be added
+    # for each path on demand (using CVSWorkDir.get_modeinfo()), but
+    # doing so may be an expensive process.  It is much cheaper to load
+    # mode information for as many paths as possible in a _single_
+    # operation.  We do this below, by calling
+    # CVSRevisionMap.sync_modeinfo_from_cvs() in appropriate places.
+    if Config.CachedSymbolsOnly:
+        progress("Synchronizing local CVS symbol cache for symbol...")
+        # The symbol cache is likely not up-to-date.  Synchronize the
+        # given CVS symbol explicitly, to make sure we get the version
+        # current with the CVS server.
+        Globals.CVSSymbolCache.sync_symbol(cvs_symbol, Globals.CVSWorkDir,
+                                           progress)
+        # The above method updates the CVS workdir to the current CVS
+        # version.  Hence, now is a convenient time to preload mode
+        # info from the currently checked-out CVS files.  There may be
+        # more files for which we'll need mode information, but we'll
+        # deal with those when needed.
+        progress("Updating path mode info from current CVS checkout.")
+        Globals.CVSRevisionMap.sync_modeinfo_from_cvs(Globals.CVSWorkDir)
+    elif not Globals.CVSRevisionMap:  # No info for any paths, yet
+        # Pure optimization: We didn't get to preload all the mode info
+        # above.  Normally, the only alternative is load mode info for
+        # each path on-demand.  However, if our CVSRevisionMap is
+        # currently empty, that's probably going to be very expensive.
+        # Therefore, in this case, do an explicit CVS update here, and
+        # preload mode info for all paths.
+        progress("Updating CVS checkout to sync path mode info.")
+        Globals.CVSWorkDir.update(cvs_symbol)
+        Globals.CVSRevisionMap.sync_modeinfo_from_cvs(Globals.CVSWorkDir)
+
+    progress("Loading current CVS state...")
+    try:
+        cur_state = Globals.CVSSymbolCache[cvs_symbol]
+    except KeyError:
+        progress("Couldn't find symbol '%s'.  Skipping." % (cvs_symbol), True)
+        progress.popprefix()
+        return False
+
+    # Optimization: Check if the previous import of this symbol is
+    # still up-to-date.  If so, there's nothing more to be done.
+    progress("Checking if we're already up-to-date...")
+    if cur_state == prev_state:
+        progress("Already up-to-date.  Skipping.", True)
+        progress.popprefix()
+        return False
+
+    progress("Fetching CVS revisions...")
+    cvs_revs = import_cvs_revs(cvs_symbol, prev_state, cur_state, progress)
+    progress("Organizing revisions into chronological list of changesets...")
+    changesets = build_changesets_from_revs(cvs_revs)
+
+    # When importing a new branch, try to optimize branch start point,
+    # instead of importing entire branch from scratch
+    if prev_commit is None:
+        progress("Finding startpoint for new symbol...")
+        i = len(changesets)
+        state = cur_state.copy()
+        for c in reversed(changesets):
+            commit = Globals.CVSStateMap.get_exact_commit(state,
+                                                          Globals.CommitStates)
+            if commit is not None:
+                # We have found a commit that exactly matches the state
+                # after commit #i (changesets[i - 1])
+                Globals.GitFastImport.reset(git_ref, commit)
+                changesets = changesets[i:]
+                break
+            revert_state(state, c)
+            i -= 1
+    num_changesets = len(changesets)
+    num_applied = 0
+
+    # Apply changesets, bringing git_ref from prev_state to cur_state
+    if num_changesets:
+        progress("Importing changesets...")
+        num_applied = import_changesets(git_ref, changesets, prev_state,
+                                        cur_state, progress)
+    progress("Imported %i changesets (reused %i existing changesets)" %
+             (num_applied, num_changesets - num_applied), True)
+    progress.popprefix()
+    return True
+
+
+def do_import (*args):
+    """Do the 'import' command; import refs from a remote."""
+    if not args:
+        die_usage("'import' takes at least one parameter: ref...")
+
+    progress = ProgressIndicator("    ", sys.stderr)
+    cvs_symbols = map(ref_to_cvsname, args)
+    empty_import = True
+    for symbol in cvs_symbols:
+        if import_cvs_symbol(symbol, progress):
+            empty_import = False
+    if empty_import:
+        progress.finish("Everything up-to-date", True)
+        return 0
+    progress.finish("Finished importing %i CVS symbols to Git" %
+                    (len(cvs_symbols)), True)
+    return 0
+
+
+def do_list (*args):
+    """Do the 'list' command; list refs available from a CVS remote."""
+    if args:
+        die_usage("'list' takes no parameters")
+
+    progress = ProgressIndicator("    ", sys.stderr)
+    if Config.CachedSymbolsOnly:
+        progress("Listing symbols in local symbol cache...", True)
+        for symbol in sorted(Globals.CVSSymbolCache):
+            print cvs_to_refname(symbol)
+        progress.finish()
+        print  # Terminate output with a blank line
+        return 0
+
+    # Synchronize local symbol cache with CVS server
+    progress("Synchronizing local symbol cache with CVS server...")
+    Globals.CVSSymbolCache.sync_all_symbols(Config.CVSRepo, progress,
+                                            valid_cvs_symbol)
+    # Load current states of Git refs
+    progress("Loading current state of Git refs...")
+    changed, unchanged = 0, 0
+    for cvs_symbol, cvs_state in sorted(Globals.CVSSymbolCache.items()):
+        git_ref = cvs_to_refname(cvs_symbol)
+        progress("\tChecking if Git ref is up-to-date: %s" % (git_ref))
+        git_commit = Globals.GitRefMap.get(git_ref)
+        git_state = Globals.CommitStates.get(git_commit)
+        attrs = ""
+        if git_state and git_state == cvs_state:
+            attrs = " unchanged"
+            unchanged += 1
+        else:
+            git_commit = "?"
+            changed += 1
+        print "%s %s%s" % (git_commit, git_ref, attrs)
+    progress.finish("Found %i CVS symbols (%i changed, %i unchanged)" %
+                    (changed + unchanged, changed, unchanged))
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_capabilities (*args):
+    """Do the 'capabilities' command; report supported features."""
+    if args:
+        die_usage("'capabilities' takes no parameters")
+    print "import"
+    print "marks %s" % (work_path("marks"))
+#   print "export"
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_addsymbol (*args):
+    """Do the 'addsymbol' command; add given CVS symbol to local cache."""
+    if len(args) != 1:
+        die_usage("'addsymbol' takes one parameter: symbol")
+    symbol = args[0]
+    progress = ProgressIndicator("    ", sys.stderr)
+    if valid_cvs_symbol(symbol):
+        Globals.CVSSymbolCache.sync_symbol(symbol, Globals.CVSWorkDir,
+                                           progress)
+        progress.finish("Added '%s' to CVS symbol cache" % (symbol), True)
+    else:
+        error("Skipping CVS symbol '%s'; it is not a valid git ref", symbol)
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_syncsymbols (*args):
+    """Do the 'syncsymbols' command; sync all symbols with CVS server."""
+    if args:
+        die_usage("'syncsymbols' takes no parameters")
+    progress = ProgressIndicator("    ", sys.stderr)
+    Globals.CVSSymbolCache.sync_all_symbols(Config.CVSRepo, progress,
+                                            valid_cvs_symbol)
+    progress.finish()
+    print  # Terminate with a blank line
+    return 0
+
+
+def do_verify (*args):
+    """Do the 'verify' command; Compare CVS checkout and Git tree."""
+    if len(args) != 1:
+        die_usage("'verify' takes one parameter: symbol")
+    symbol = args[0]
+    gitref = cvs_to_refname(symbol)
+    assert valid_git_ref(gitref)
+
+    progress = ProgressIndicator("    ", sys.stderr)
+    progress("Checking out '%s' from CVS..." % (symbol))
+    Globals.CVSWorkDir.update(symbol)
+
+    add_env = {"GIT_INDEX_FILE": os.path.abspath(work_path("temp_index"))}
+    progress("Creating Git index from tree object @ '%s'..." % (gitref))
+    cmd = ("git", "read-tree", gitref)
+    assert run_command(cmd, add_env = add_env)[0] == 0
+
+    progress("Comparing CVS checkout to Git index...", True)
+    cmd = ("git", "--work-tree=%s" % (os.path.abspath(work_path("cvs"))),
+           "ls-files", "--exclude=CVS", "--deleted", "--modified", "--others",
+           "-t")
+    exit_code, output, errors = run_command(cmd, add_env = add_env)
+    assert exit_code == 0 and not errors
+
+    if output:
+        progress.finish("Failed verification of '%s'" % (symbol), True)
+        error("The '%s' command returned:\n---\n%s---", " ".join(cmd), output)
+    else:
+        progress.finish("Successfully verified '%s'" % (symbol), True)
+    print  # Terminate with a blank line
+    return exit_code
+
+
+def not_implemented (*args):
+    """Abort, while informing user that this command is not yet implemented."""
+    die_usage("Command not implemented")
+
+
+COMMANDS = {
+    "capabilities": do_capabilities,
+    "list": do_list,
+    # Special handling of 'import' in main()
+    # "import": do_import,
+    "export": not_implemented,
+    # Custom commands
+    "addsymbol": do_addsymbol,
+    "syncsymbols": do_syncsymbols,
+    "verify": do_verify,
+}
+
+
+class Globals(object):
+
+    """Global variables are placed here at the start of main()."""
+
+    pass
+
+
+def main (*args):
+    """Main program logic; execution starts here."""
+    debug("Invoked '%s'", " ".join(args))
+
+    # Initialization of subsystems
+    assert len(args) >= 2
+    # Read config for the given remote
+    Config.init(args[1])
+    # Local CVS symbol cache (CVS symbol -> CVS state mapping)
+    Globals.CVSSymbolCache = CVSSymbolCache(work_path("symbols"))
+    # Local CVS checkout
+    Globals.CVSWorkDir = CVSWorkDir(work_path("cvs"), Config.CVSRepo)
+    # Interface to 'git cat-file --batch'
+    Globals.GitObjectFetcher = GitObjectFetcher()
+    # Interface to Git object notes
+    Globals.GitNotes = GitNotes(Config.NotesRef, Globals.GitObjectFetcher)
+    # Mapping from Git commit objects to CVS states
+    Globals.CommitStates = CommitStates(Globals.GitNotes)
+    # Mapping from Git ref names to Git object names
+    Globals.GitRefMap = GitRefMap(Globals.GitObjectFetcher)
+    # Mapping from CVS revision to Git blob and commit objects
+    Globals.CVSRevisionMap = CVSRevisionMap(cvs_to_refname("_metadata"),
+                                            Globals.GitObjectFetcher)
+    last_mark = 0
+    if Globals.CVSRevisionMap.has_unresolved_marks():
+        # Update with marks from last import
+        last_mark = Globals.CVSRevisionMap.load_marks_file(work_path("marks"))
+    else:
+        # Truncate marks file.  We cannot automatically do this after
+        # .load_marks_file() above, since we cannot yet guarantee that
+        # we will be able to save the revision map persistently.  (That
+        # can only happen if we are given one or more import commands
+        # below.) We can only truncate this file when we know there are
+        # no unresolved marks in the revision map.
+        open(work_path("marks"), "w").close()
+    # Mapping from CVS states to commit objects that contain said state
+    Globals.CVSStateMap = CVSStateMap(Globals.CVSRevisionMap)
+
+    # Main program loop
+    import_refs = []  # Accumulate import commands here
+    # Cannot use "for line in sys.stdin" for buffering (?) reasons
+    line = sys.stdin.readline()
+    while (line):
+        cmdline = line.strip().split()
+        if not cmdline:
+            break  # Blank line means we're about to quit
+        debug("Got command '%s'", " ".join(cmdline))
+        cmd = cmdline.pop(0)
+        if cmd == "import":
+            import_refs.extend(cmdline)
+        else:
+            if cmd not in COMMANDS:
+                die_usage("Unknown command '%s'", cmd)
+            if COMMANDS[cmd](*cmdline):
+                die("Command '%s' failed", line.strip())
+            sys.stdout.flush()
+        line = sys.stdin.readline()
+
+    # Trigger import processing after last import command
+    ret = 0
+    if import_refs:
+        # Init producer of output in the git-fast-import format
+        Globals.GitFastImport = GitFastImport(
+            sys.stdout, Globals.GitObjectFetcher, last_mark)
+
+        # Perform import of given refs
+        ret = do_import(*import_refs)
+
+        # Notes on persistent storage of subsystems' data structures:
+        #
+        # Because the "import" command has been called, we here _know_
+        # that there is a fast-import process running in parallel.
+        # (This is NOT the case when there are no "import" commands).
+        # We can therefore now (and only now) safely commit the extra
+        # information that we store in the Git repo.
+        # In other words, the data structures that we commit to
+        # persistent storage with the following calls will NOT be
+        # committed if there are no "import" commands.  The data
+        # structures must handle this in one of two ways:
+        # - In the no-"import" scenario, there is simply nothing to
+        #   commit, so it can safely be skipped.
+        # - Any information that should have been committed in the
+        #   no-"import" scenario can be reconstructed repeatedly in
+        #   subsequent executions of this program, until the next
+        #   invocation of an "import" command provides an opportunity
+        #   to commit the data structure to persistent storage.
+
+        # Write out commit notes (mapping git commits to CVSStates)
+        # The following call would be a no-op in the no-"import" case
+        Globals.GitNotes.commit_notes(Globals.GitFastImport, Config.Author,
+            'Annotate commits imported by "git remote-cvs"\n')
+
+        # Save CVS revision metadata
+        # This data structure can handle the no-"import" case as long
+        # as the marks file from the last fast-import run is still
+        # present upon the next execution of this program.
+        Globals.CVSRevisionMap.commit_map(Globals.GitFastImport, Config.Author,
+            'Updated metadata used by "git remote-cvs"\n')
+
+    return ret
+
+
+if __name__ == '__main__':
+    sys.exit(main(*sys.argv))
-- 
1.6.4.262.gca66a.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFCv4 5/5] Add simple selftests of git-remote-cvs functionality
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
                   ` (3 preceding siblings ...)
  2009-08-17  3:36 ` [RFCv4 4/5] git-remote-cvs: Remote helper program for CVS repositories Johan Herland
@ 2009-08-17  3:36 ` Johan Herland
  2009-08-17 16:27 ` [RFCv4 6/5] Fix the Makefile-generated path to the git_remote_cvs package in git-remote-cvs Johan Herland
  2009-08-18  0:41 ` [RFCv4 7/5] More fixes to the git-remote-cvs installation procedure Johan Herland
  6 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-17  3:36 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, Johan Herland, barkalow, davvid, gitster

Add two new selftests:

- t9800-remote-cvs-basic: Test the git-remote-cvs implementation of the
  remote helper API, by verifying the expected output of the git-remote-cvs
  program when invoking remote helper API commands while doing some simple
  CVS operations.

- t9801-remote-cvs-fetch: A more high-level test of the fetch/import-side
  of the git-remote-cvs implementation, by verifying the expected repository
  state after doing "git fetch" from a CVS remote where a variety of CVS
  operations are being performed.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t9800-remote-cvs-basic.sh |  524 +++++++++++++++++++++++++++++++++++++++++++
 t/t9801-remote-cvs-fetch.sh |  291 ++++++++++++++++++++++++
 2 files changed, 815 insertions(+), 0 deletions(-)
 create mode 100755 t/t9800-remote-cvs-basic.sh
 create mode 100755 t/t9801-remote-cvs-fetch.sh

diff --git a/t/t9800-remote-cvs-basic.sh b/t/t9800-remote-cvs-basic.sh
new file mode 100755
index 0000000..6ddec17
--- /dev/null
+++ b/t/t9800-remote-cvs-basic.sh
@@ -0,0 +1,524 @@
+#!/bin/sh
+
+test_description='git remote-cvs basic tests'
+. ./test-lib.sh
+
+if ! test_have_prereq PYTHON; then
+	say 'skipping CVS foreign-vcs helper tests, python not available'
+	test_done
+fi
+
+CVS_EXEC=cvs
+CVS_OPTS="-f -q"
+CVS="$CVS_EXEC $CVS_OPTS"
+
+CVSROOT=$(pwd)/cvsroot
+export CVSROOT
+unset CVS_SERVER
+
+CVSMODULE=cvsmodule
+GITREMOTE=cvsremote
+
+if ! type $CVS_EXEC >/dev/null 2>&1
+then
+	say 'skipping remote-cvs tests, $CVS_EXEC not found'
+	test_done
+fi
+
+test_expect_success 'setup cvsroot' '$CVS init'
+
+test_expect_success '#1: setup a cvs module' '
+
+	mkdir "$CVSROOT/$CVSMODULE" &&
+	$CVS co -d module-cvs $CVSMODULE &&
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortuna
+velut luna
+statu variabilis,
+
+semper crescis
+aut decrescis;
+vita detestabilis
+
+nunc obdurat
+et tunc curat
+ludo mentis aciem,
+
+egestatem,
+potestatem
+dissolvit ut glaciem.
+EOF
+		$CVS add o_fortuna &&
+		cat <<EOF >message &&
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+EOF
+		$CVS commit -f -F message o_fortuna
+	)
+'
+
+test_expect_success 'set up CVS repo as a foreign remote' '
+
+	git config "user.name" "Test User"
+	git config "user.email" "test@example.com"
+	git config "remote.$GITREMOTE.vcs" cvs
+	git config "remote.$GITREMOTE.cvsRoot" "$CVSROOT"
+	git config "remote.$GITREMOTE.cvsModule" "$CVSMODULE"
+	git config "remote.$GITREMOTE.fetch" \
+		"+refs/cvs/$GITREMOTE/*:refs/remotes/$GITREMOTE/*"
+
+'
+
+test_expect_success '#1: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#1: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+? refs/cvs/$GITREMOTE/HEAD
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#1: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+# Importing CVS revision o_fortuna:1.1
+blob
+mark :1
+data 180
+O Fortuna
+velut luna
+statu variabilis,
+
+semper crescis
+aut decrescis;
+vita detestabilis
+
+nunc obdurat
+et tunc curat
+ludo mentis aciem,
+
+egestatem,
+potestatem
+dissolvit ut glaciem.
+
+commit refs/cvs/$GITREMOTE/HEAD
+mark :2
+data 82
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+
+M 644 :1 o_fortuna
+
+# Importing note for object 2
+blob
+mark :3
+data 14
+o_fortuna:1.1
+
+commit refs/notes/cvs/$GITREMOTE
+mark :4
+data 46
+Annotate commits imported by "git remote-cvs"
+
+N :3 :2
+
+blob
+mark :5
+data 32
+1 o_fortuna:1.1
+2 o_fortuna:1.1
+
+blob
+mark :6
+data 16
+blob 1
+commit 2
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :7
+data 42
+Updated metadata used by "git remote-cvs"
+
+M 644 :5 CVS/marks
+M 644 :6 o_fortuna/1.1
+
+EOF
+	grep -v "^committer " actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#1: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#1: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_expect_success '#2: update cvs module' '
+
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortune,
+like the moon
+you are changeable,
+
+ever waxing
+and waning;
+hateful life
+
+first oppresses
+and then soothes
+as fancy takes it;
+
+poverty
+and power
+it melts them like ice.
+EOF
+		cat <<EOF >message &&
+translate to English
+
+My Latin is terrible.
+EOF
+		$CVS commit -f -F message o_fortuna
+	)
+'
+
+test_expect_success '#2: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#2: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+? refs/cvs/$GITREMOTE/HEAD
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#2: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+# Importing CVS revision o_fortuna:1.2
+blob
+mark :8
+data 179
+O Fortune,
+like the moon
+you are changeable,
+
+ever waxing
+and waning;
+hateful life
+
+first oppresses
+and then soothes
+as fancy takes it;
+
+poverty
+and power
+it melts them like ice.
+
+commit refs/cvs/$GITREMOTE/HEAD
+mark :9
+data 44
+translate to English
+
+My Latin is terrible.
+
+from refs/cvs/$GITREMOTE/HEAD^0
+M 644 :8 o_fortuna
+
+# Importing note for object 9
+blob
+mark :10
+data 14
+o_fortuna:1.2
+
+commit refs/notes/cvs/$GITREMOTE
+mark :11
+data 46
+Annotate commits imported by "git remote-cvs"
+
+from refs/notes/cvs/$GITREMOTE^0
+N :10 :9
+
+blob
+mark :12
+data 32
+8 o_fortuna:1.2
+9 o_fortuna:1.2
+
+blob
+mark :13
+data 94
+
+blob
+mark :14
+data 16
+blob 8
+commit 9
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :15
+data 42
+Updated metadata used by "git remote-cvs"
+
+from refs/cvs/$GITREMOTE/_metadata^0
+M 644 :12 CVS/marks
+M 644 :13 o_fortuna/1.1
+M 644 :14 o_fortuna/1.2
+
+EOF
+	grep -v -e "^committer " -e "\b[0-9a-f]\{40\}\b" actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#2: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--import-marks=".git/info/cvs/$GITREMOTE/marks" \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#2: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_expect_success '#3: update cvs module' '
+
+	(
+		cd module-cvs &&
+		echo 1 >tick &&
+		$CVS add tick &&
+		$CVS commit -f -m 1 tick
+	)
+
+'
+
+test_expect_success '#3: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#3: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+? refs/cvs/$GITREMOTE/HEAD
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#3: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+# Importing CVS revision tick:1.1
+blob
+mark :16
+data 2
+1
+
+commit refs/cvs/$GITREMOTE/HEAD
+mark :17
+data 2
+1
+
+from refs/cvs/$GITREMOTE/HEAD^0
+M 644 :16 tick
+
+# Importing note for object 17
+blob
+mark :18
+data 23
+o_fortuna:1.2
+tick:1.1
+
+commit refs/notes/cvs/$GITREMOTE
+mark :19
+data 46
+Annotate commits imported by "git remote-cvs"
+
+from refs/notes/cvs/$GITREMOTE^0
+N :18 :17
+
+blob
+mark :20
+data 41
+16 tick:1.1
+17 tick:1.1
+17 o_fortuna:1.2
+
+blob
+mark :21
+data 104
+commit 17
+
+blob
+mark :22
+data 18
+blob 16
+commit 17
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :23
+data 42
+Updated metadata used by "git remote-cvs"
+
+from refs/cvs/$GITREMOTE/_metadata^0
+M 644 :20 CVS/marks
+M 644 :21 o_fortuna/1.2
+M 644 :22 tick/1.1
+
+EOF
+	grep -v -e "^committer " -e "\b[0-9a-f]\{40\}\b" actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#3: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--import-marks=".git/info/cvs/$GITREMOTE/marks" \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#3: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_expect_success '#4: git-remote-cvs "capabilities" command' '
+
+	echo "capabilities" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+import
+marks .git/info/cvs/$GITREMOTE/marks
+
+EOF
+	test_cmp expect actual
+
+'
+
+commit=$(git rev-parse "refs/cvs/$GITREMOTE/HEAD")
+
+test_expect_success '#4: git-remote-cvs "list" command' '
+
+	echo "list" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+$commit refs/cvs/$GITREMOTE/HEAD unchanged
+
+EOF
+	test_cmp expect actual
+
+'
+
+test_expect_success '#4: git-remote-cvs "import" command' '
+
+	echo "import refs/cvs/$GITREMOTE/HEAD" | git remote-cvs "$GITREMOTE" > actual &&
+	cat <<EOF >expect &&
+blob
+mark :24
+data 0
+
+blob
+mark :25
+data 142
+
+blob
+mark :26
+data 94
+
+commit refs/cvs/$GITREMOTE/_metadata
+mark :27
+data 42
+Updated metadata used by "git remote-cvs"
+
+from refs/cvs/$GITREMOTE/_metadata^0
+M 644 :24 CVS/marks
+M 644 :25 o_fortuna/1.2
+M 644 :26 tick/1.1
+
+EOF
+	grep -v -e "^committer " -e "\b[0-9a-f]\{40\}\b" actual > actual.filtered &&
+	test_cmp expect actual.filtered
+
+'
+
+test_expect_success '#4: Passing git-remote-cvs output to git-fast-import' '
+
+	git fast-import --quiet \
+		--import-marks=".git/info/cvs/$GITREMOTE/marks" \
+		--export-marks=".git/info/cvs/$GITREMOTE/marks" \
+		< actual &&
+	git gc
+
+'
+
+test_expect_success '#4: Verifying correctness of import' '
+
+	echo "verify HEAD" | git remote-cvs "$GITREMOTE"
+
+'
+
+test_done
diff --git a/t/t9801-remote-cvs-fetch.sh b/t/t9801-remote-cvs-fetch.sh
new file mode 100755
index 0000000..93d44a7
--- /dev/null
+++ b/t/t9801-remote-cvs-fetch.sh
@@ -0,0 +1,291 @@
+#!/bin/sh
+
+test_description='git remote-cvs basic tests'
+. ./test-lib.sh
+
+if ! test_have_prereq PYTHON; then
+	say 'skipping CVS foreign-vcs helper tests, python not available'
+	test_done
+fi
+
+CVS_EXEC=cvs
+CVS_OPTS="-f -q"
+CVS="$CVS_EXEC $CVS_OPTS"
+
+CVSROOT=$(pwd)/cvsroot
+export CVSROOT
+unset CVS_SERVER
+
+CVSMODULE=cvsmodule
+GITREMOTE=cvsremote
+
+if ! type $CVS_EXEC >/dev/null 2>&1
+then
+	say 'skipping remote-cvs tests, $CVS_EXEC not found'
+	test_done
+fi
+
+verify () {
+	git log --reverse --format="--- %T%n%s%n%n%b" "$GITREMOTE/$1" >actual &&
+	test_cmp "expect.$1" actual &&
+	echo "verify $1" | git remote-cvs "$GITREMOTE"
+}
+
+test_expect_success 'setup CVS repo' '$CVS init'
+
+test_expect_success 'create CVS module with initial commit' '
+
+	mkdir "$CVSROOT/$CVSMODULE" &&
+	$CVS co -d module-cvs $CVSMODULE &&
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortuna
+velut luna
+statu variabilis,
+
+semper crescis
+aut decrescis;
+vita detestabilis
+
+nunc obdurat
+et tunc curat
+ludo mentis aciem,
+
+egestatem,
+potestatem
+dissolvit ut glaciem.
+EOF
+		$CVS add o_fortuna &&
+		cat <<EOF >message &&
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+EOF
+		$CVS commit -f -F message o_fortuna
+	)
+'
+
+test_expect_success 'set up CVS repo/module as a foreign remote' '
+
+	git config "user.name" "Test User"
+	git config "user.email" "test@example.com"
+	git config "remote.$GITREMOTE.vcs" cvs
+	git config "remote.$GITREMOTE.cvsRoot" "$CVSROOT"
+	git config "remote.$GITREMOTE.cvsModule" "$CVSMODULE"
+	git config "remote.$GITREMOTE.fetch" \
+		"+refs/cvs/$GITREMOTE/*:refs/remotes/$GITREMOTE/*"
+
+'
+
+test_expect_success 'initial fetch from CVS remote' '
+
+	cat <<EOF >expect.HEAD &&
+--- 0e06d780dedab23e683c686fb041daa9a84c936c
+add "O Fortuna" lyrics
+
+These public domain lyrics make an excellent sample text.
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit' '
+
+	(
+		cd module-cvs &&
+		cat <<EOF >o_fortuna &&
+O Fortune,
+like the moon
+you are changeable,
+
+ever waxing
+and waning;
+hateful life
+
+first oppresses
+and then soothes
+as fancy takes it;
+
+poverty
+and power
+it melts them like ice.
+EOF
+		cat <<EOF >message &&
+translate to English
+
+My Latin is terrible.
+EOF
+		$CVS commit -f -F message o_fortuna
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- daa87269a5e00388135ad9542dc16ab6754466e5
+translate to English
+
+My Latin is terrible.
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit with new file' '
+
+	(
+		cd module-cvs &&
+		echo 1 >tick &&
+		$CVS add tick &&
+		$CVS commit -f -m 1 tick
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- 486935b4fccecea9b64cbed3a797ebbcbe2b7461
+1
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'fetch without CVS changes' '
+
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'add 2 CVS commits' '
+
+	(
+		cd module-cvs &&
+		echo 2 >tick &&
+		$CVS commit -f -m 2 tick &&
+		echo 3 >tick &&
+		$CVS commit -f -m 3 tick
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- 83437ab3e57bf0a42915de5310e3419792b5a36f
+2
+
+
+--- 60fc50406a82dc6bd32dc6e5f7bd23e4c3cdf7ef
+3
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit with removed file' '
+
+	(
+		cd module-cvs &&
+		$CVS remove -f tick &&
+		$CVS commit -f -m "remove file" tick
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- daa87269a5e00388135ad9542dc16ab6754466e5
+remove file
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'CVS commit with several new files' '
+
+	(
+		cd module-cvs &&
+		echo spam >spam &&
+		echo sausage >sausage &&
+		echo eggs >eggs &&
+		$CVS add spam sausage eggs &&
+		$CVS commit -f -m "spam, sausage, and eggs" spam sausage eggs
+	) &&
+	cat <<EOF >>expect.HEAD &&
+--- 3190dfce44a6d5e9916b4870dbf8f37d1ca4ddaf
+spam, sausage, and eggs
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD
+
+'
+
+test_expect_success 'new CVS branch' '
+
+	(
+		cd module-cvs &&
+		$CVS tag -b foo
+	) &&
+	cp expect.HEAD expect.foo &&
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo
+
+'
+
+test_expect_success 'CVS commit on branch' '
+
+	(
+		cd module-cvs &&
+		$CVS up -r foo &&
+		echo "spam spam spam" >spam &&
+		$CVS commit -f -m "commit on branch foo" spam
+	) &&
+	cat <<EOF >>expect.foo &&
+--- 1aba123e5c83898ce3a8b976cc6064d60246aef4
+commit on branch foo
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo
+
+'
+
+test_expect_success 'create CVS tag' '
+
+	(
+		cd module-cvs &&
+		$CVS tag bar
+	) &&
+	cp expect.foo expect.bar &&
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo &&
+	verify bar
+
+'
+
+test_expect_success 'another CVS commit on branch' '
+
+	(
+		cd module-cvs &&
+		echo "spam spam spam spam spam spam" >> spam &&
+		$CVS commit -f -m "another commit on branch foo" spam
+	) &&
+	cat <<EOF >>expect.foo &&
+--- 15a2635e76e8e5a5a8746021643de317452f2340
+another commit on branch foo
+
+
+EOF
+	git fetch "$GITREMOTE" &&
+	verify HEAD &&
+	verify foo &&
+	verify bar
+
+'
+
+test_done
-- 
1.6.4.262.gca66a.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFCv4 6/5] Fix the Makefile-generated path to the git_remote_cvs package in git-remote-cvs
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
                   ` (4 preceding siblings ...)
  2009-08-17  3:36 ` [RFCv4 5/5] Add simple selftests of git-remote-cvs functionality Johan Herland
@ 2009-08-17 16:27 ` Johan Herland
  2009-08-18  0:41 ` [RFCv4 7/5] More fixes to the git-remote-cvs installation procedure Johan Herland
  6 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-17 16:27 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, barkalow, davvid, gitster

Junio discovered that the installed git-remote-cvs executable is unable
to find the installed git_remote_cvs Python package. This is due to the
'instlibdir' target of git_remote_cvs/Makefile being invoked with
insufficient arguments (prefix and DESTDIR). This patch fixes that.

Signed-off-by: Johan Herland <johan@herland.net>
---

Sorry for the inconvenience.

...Johan


 Makefile                |    2 +-
 git_remote_cvs/Makefile |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index b2af678..b9a7f25 100644
--- a/Makefile
+++ b/Makefile
@@ -1479,7 +1479,7 @@ endif # NO_PERL
 ifndef NO_PYTHON
 $(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
 	$(QUIET_GEN)$(RM) $@ $@+ && \
-	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory instlibdir` && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' instlibdir` && \
 	sed -e '1{' \
 	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
 	    -e '}' \
diff --git a/git_remote_cvs/Makefile b/git_remote_cvs/Makefile
index 061c247..0d9eb31 100644
--- a/git_remote_cvs/Makefile
+++ b/git_remote_cvs/Makefile
@@ -28,7 +28,7 @@ install: $(pysetupfile)
 	$(PYTHON_PATH) $(pysetupfile) install --prefix $(DESTDIR_SQ)$(prefix)
 
 instlibdir: $(pysetupfile)
-	@echo "$(prefix)/$(PYLIBDIR)"
+	@echo "$(DESTDIR_SQ)$(prefix)/$(PYLIBDIR)"
 
 clean:
 	$(QUIET)$(PYTHON_PATH) $(pysetupfile) $(QUIETSETUP) clean -a
-- 
1.6.4.313.g38b9.dirty


-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFCv4 7/5] More fixes to the git-remote-cvs installation procedure
  2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
                   ` (5 preceding siblings ...)
  2009-08-17 16:27 ` [RFCv4 6/5] Fix the Makefile-generated path to the git_remote_cvs package in git-remote-cvs Johan Herland
@ 2009-08-18  0:41 ` Johan Herland
  6 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-08-18  0:41 UTC (permalink / raw)
  To: gitster; +Cc: git, Johannes.Schindelin, barkalow, davvid

- Makefile: Make sure git-remote-cvs is rebuilt when 'prefix' changes
  (by adding a dependency on GIT-CFLAGS). This prevents a regular 'make'
  followed by a 'make prefix=/somewhere/else install' from installing a
  non-working git-remote-cvs.

- Makefile: When mangling git-remote-cvs to add the git_remote_cvs install
  location to the Python search path, _replace_ the initial 'current dir'
  entry in sys.path (instead of merely prepending the install location).
  Hence, if the git_remote_cvs package is not installed at the correct
  location (and also not present in any of Python's default package dirs),
  then git-remote-cvs will fail loudly instead of silently falling back on
  the git_remote_cvs subdir in git.git.

- Allow for the git_remote_cvs install path to be overridden by the
  $GITPYTHONLIB environment variable.

- t/test-lib.sh: Set $GITPYTHONLIB (unless $GIT_TEST_INSTALLED is enabled)
  so that we test the currently built version of git_remote_cvs  (the one
  that is built in git_remote_cvs/build/lib) instead of a previously
  installed version.

- Another minor check and a line length fix.

Found-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
---

On Monday 17 August 2009, Junio C Hamano wrote:
> With your Makefile patch the test still failed.  It turns out that I had
> a broken version from the previous round installed in my PATH, and the
> test failed until I removed the faulty ones manually from the installed
> location.  This is not good.
>
> While running tests, we really should import from the build directory,
> preferrably ignoring the installed directory but at least giving
> precedence to the build directory over the installed directory, to avoid
> problems like this.

Agreed. This patch puts the build directory first in Python's search path
when running the testsuite (unless $GIT_TEST_INSTALLED is enabled). The
install directory is ignored (unless the install dir happens to be in
Python's default search path, in which case the build dir will still have
precedence).

> In my case, it was a "bad installed version masking the version we are
> testing", but a more problematic would be the other way around.  If you
> have a good version installed, and if somebody breaks it in the updated
> source, "make test" won't catch the breakage and then you "make install"
> a broken version without noticing.

Yes. This is also taken care of in this patch ('make test' sets
$GITPYTHONLIB, which forces git-remote-cvs to look for its package
in the build dir instead of the install dir).

This patch (and the previous 6/5) will be folded into the next iteration
of the jh/cvs-helper patch series.


...Johan

 Makefile      |   11 +++++++++--
 t/test-lib.sh |    9 +++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index b9a7f25..1d7bf80 100644
--- a/Makefile
+++ b/Makefile
@@ -1477,13 +1477,20 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh
 endif # NO_PERL
 
 ifndef NO_PYTHON
+$(patsubst %.py,%,$(SCRIPT_PYTHON)): GIT-CFLAGS
 $(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
 	$(QUIET_GEN)$(RM) $@ $@+ && \
-	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s --no-print-directory prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' instlibdir` && \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C git_remote_cvs -s \
+		--no-print-directory prefix='$(prefix_SQ)' DESTDIR='$(DESTDIR_SQ)' \
+		instlibdir` && \
 	sed -e '1{' \
 	    -e '	s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
 	    -e '}' \
-	    -e 's|^import sys.*|&; sys.path.insert(0, "@@INSTLIBDIR@@")|' \
+	    -e 's|^import sys.*|&; \\\
+	           import os; \\\
+	           sys.path[0] = os.environ.has_key("GITPYTHONLIB") and \\\
+	                         os.environ["GITPYTHONLIB"] or \\\
+	                         "@@INSTLIBDIR@@"|' \
 	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
 	    $@.py >$@+ && \
 	chmod +x $@+ && \
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 01ea386..a7fbfef 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -638,6 +638,15 @@ test -d ../templates/blt || {
 	error "You haven't built things yet, have you?"
 }
 
+if test -z "$GIT_TEST_INSTALLED"
+then
+	GITPYTHONLIB="$(pwd)/../git_remote_cvs/build/lib"
+	export GITPYTHONLIB
+	test -d ../git_remote_cvs/build || {
+		error "You haven't built git_remote_cvs yet, have you?"
+	}
+fi
+
 if ! test -x ../test-chmtime; then
 	echo >&2 'You need to build test-chmtime:'
 	echo >&2 'Run "make test-chmtime" in the source (toplevel) directory'
-- 
1.6.4.304.g1365c.dirty

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFCv4 3/5] 2/2: Add Python support library for CVS remote helper
  2009-08-17  3:35 ` [RFCv4 3/5] 2/2: " Johan Herland
@ 2009-10-05 13:31   ` Sverre Rabbelier
  2009-10-05 14:41     ` Johan Herland
  0 siblings, 1 reply; 12+ messages in thread
From: Sverre Rabbelier @ 2009-10-05 13:31 UTC (permalink / raw)
  To: Johan Herland
  Cc: Git List, Johannes Schindelin, Daniel Barkalow, David Aguilar,
	Junio C Hamano

Heya,

On Mon, Aug 17, 2009 at 05:35, Johan Herland <johan@herland.net> wrote:
> This patch introduces the rest of a Python package called "git_remote_cvs"
> containing the building blocks of the CVS remote helper. The CVS remote
> helper itself is NOT part of this patch.

Might I suggest that we pool our efforts and instead create a structure like:

git_remote_helpers/
git_remote_helpers/__init__.py
git_remote_helpers/Makefile
git_remote_helpers/setup.py
git_remote_helpers/util.py
git_remote_helpers/git
git_remote_helpers/git/__init__.py
git_remote_helpers/git/
git_remote_helpers/cvs
git_remote_helpers/cvs/__init__.py
git_remote_helpers/cvs/changeset.py
git_remote_helpers/cvs/commit_states.py
git_remote_helpers/cvs/cvs.py
git_remote_helpers/cvs/revision_map.py
git_remote_helpers/cvs/symbol_cache.py
git_remote_helpers/hg
git_remote_helpers/hg/__init__.py
git_remote_helpers/hg/hg.py
git_remote_helpers/hg/export.py

I'm willing to spend some time to do the needed refactoring, but IIUC
Daniel said that you need to reroll the cvs series anyway?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFCv4 3/5] 2/2: Add Python support library for CVS remote helper
  2009-10-05 13:31   ` Sverre Rabbelier
@ 2009-10-05 14:41     ` Johan Herland
  2009-10-05 14:50       ` Sverre Rabbelier
  0 siblings, 1 reply; 12+ messages in thread
From: Johan Herland @ 2009-10-05 14:41 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Git List, Johannes Schindelin, Daniel Barkalow, David Aguilar,
	Junio C Hamano

On Monday 05 October 2009, Sverre Rabbelier wrote:
> On Mon, Aug 17, 2009, Johan Herland <johan@herland.net> wrote:
> > This patch introduces the rest of a Python package called
> > "git_remote_cvs" containing the building blocks of the CVS remote
> > helper. The CVS remote helper itself is NOT part of this patch.
>
> Might I suggest that we pool our efforts and instead create a
> structure like:
>
> git_remote_helpers/
> git_remote_helpers/__init__.py
> git_remote_helpers/Makefile
> git_remote_helpers/setup.py
> git_remote_helpers/util.py
> git_remote_helpers/git
> git_remote_helpers/git/__init__.py
> git_remote_helpers/git/
> git_remote_helpers/cvs
> git_remote_helpers/cvs/__init__.py
> git_remote_helpers/cvs/changeset.py
> git_remote_helpers/cvs/commit_states.py
> git_remote_helpers/cvs/cvs.py
> git_remote_helpers/cvs/revision_map.py
> git_remote_helpers/cvs/symbol_cache.py
> git_remote_helpers/hg
> git_remote_helpers/hg/__init__.py
> git_remote_helpers/hg/hg.py
> git_remote_helpers/hg/export.py

I like this. One couple of questions (for now):

- Are you planning to share directory structure only, or some of the 
Python code as well? From the above structure it seems like you want to 
make use of e.g. util.py and git.py. I'd be delighted if the code is 
reusable by other remote helpers.

- Do you plan to put the remote helpers into this structure as well, or 
keep them separate? (currently the cvs remote helper lives separately 
in git-remote-cvs.py in the project root directory)


> I'm willing to spend some time to do the needed refactoring, but IIUC
> Daniel said that you need to reroll the cvs series anyway?

Yeah, a re-roll was planned once the sr/gfi-options work was finalized. 
(right now I'm mostly focused on the jh/notes work). Feel free to post 
the hg-specific and/or common parts as a separate patch, and I can 
rebase the CVS series on top of that before I submit the next 
iteration.

Nice to see people starting to pick up the foreign-vcs work. :)


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFCv4 3/5] 2/2: Add Python support library for CVS remote helper
  2009-10-05 14:41     ` Johan Herland
@ 2009-10-05 14:50       ` Sverre Rabbelier
  2009-10-05 15:35         ` Johan Herland
  0 siblings, 1 reply; 12+ messages in thread
From: Sverre Rabbelier @ 2009-10-05 14:50 UTC (permalink / raw)
  To: Johan Herland
  Cc: Git List, Johannes Schindelin, Daniel Barkalow, David Aguilar,
	Junio C Hamano

Heya,

On Mon, Oct 5, 2009 at 16:41, Johan Herland <johan@herland.net> wrote:
> - Are you planning to share directory structure only, or some of the
> Python code as well? From the above structure it seems like you want to
> make use of e.g. util.py and git.py. I'd be delighted if the code is
> reusable by other remote helpers.

Yes, the main reason I suggested this was because I want to reuse
util.py and git.py :). Of course, it'd probably be cleaner to move the
little CVS-specific code that is in git.py atm out of it.

> - Do you plan to put the remote helpers into this structure as well, or
> keep them separate? (currently the cvs remote helper lives separately
> in git-remote-cvs.py in the project root directory)

I think it's good where it is right now, and I've also placed
git-remote-hg.py in the root directory.

>> I'm willing to spend some time to do the needed refactoring, but IIUC
>> Daniel said that you need to reroll the cvs series anyway?
>
> Yeah, a re-roll was planned once the sr/gfi-options work was finalized.
> (right now I'm mostly focused on the jh/notes work). Feel free to post
> the hg-specific and/or common parts as a separate patch, and I can
> rebase the CVS series on top of that before I submit the next
> iteration.

Yeah, I'm waiting on Shawn to decide what should be done with the
whole options thing before I resubmit sr/gfi-options.

> Nice to see people starting to pick up the foreign-vcs work. :)

Now if only we could bribe Eric to write a svn helper :).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFCv4 3/5] 2/2: Add Python support library for CVS remote helper
  2009-10-05 14:50       ` Sverre Rabbelier
@ 2009-10-05 15:35         ` Johan Herland
  0 siblings, 0 replies; 12+ messages in thread
From: Johan Herland @ 2009-10-05 15:35 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Git List, Johannes Schindelin, Daniel Barkalow, David Aguilar,
	Junio C Hamano

On Monday 05 October 2009, Sverre Rabbelier wrote:
> On Mon, Oct 5, 2009 at 16:41, Johan Herland <johan@herland.net> wrote:
> > - Are you planning to share directory structure only, or some of
> > the Python code as well? From the above structure it seems like you
> > want to make use of e.g. util.py and git.py. I'd be delighted if
> > the code is reusable by other remote helpers.
>
> Yes, the main reason I suggested this was because I want to reuse
> util.py and git.py :). Of course, it'd probably be cleaner to move
> the little CVS-specific code that is in git.py atm out of it.

Sure. Refactor away. :)

> > - Do you plan to put the remote helpers into this structure as
> > well, or keep them separate? (currently the cvs remote helper lives
> > separately in git-remote-cvs.py in the project root directory)
>
> I think it's good where it is right now, and I've also placed
> git-remote-hg.py in the root directory.

Ok.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-10-05 15:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-17  3:35 [RFCv4 0/5] CVS remote helper Johan Herland
2009-08-17  3:35 ` [RFCv4 1/5] Basic build infrastructure for Python scripts Johan Herland
2009-08-17  3:35 ` [RFCv4 2/5] 1/2: Add Python support library for CVS remote helper Johan Herland
2009-08-17  3:35 ` [RFCv4 3/5] 2/2: " Johan Herland
2009-10-05 13:31   ` Sverre Rabbelier
2009-10-05 14:41     ` Johan Herland
2009-10-05 14:50       ` Sverre Rabbelier
2009-10-05 15:35         ` Johan Herland
2009-08-17  3:36 ` [RFCv4 4/5] git-remote-cvs: Remote helper program for CVS repositories Johan Herland
2009-08-17  3:36 ` [RFCv4 5/5] Add simple selftests of git-remote-cvs functionality Johan Herland
2009-08-17 16:27 ` [RFCv4 6/5] Fix the Makefile-generated path to the git_remote_cvs package in git-remote-cvs Johan Herland
2009-08-18  0:41 ` [RFCv4 7/5] More fixes to the git-remote-cvs installation procedure Johan Herland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.