All of lore.kernel.org
 help / color / mirror / Atom feed
* [StGit RFC] A more structured way of calling git
@ 2007-10-26 19:24 Karl Hasselström
  2007-11-03 10:56 ` Catalin Marinas
  2007-11-15 18:28 ` Catalin Marinas
  0 siblings, 2 replies; 8+ messages in thread
From: Karl Hasselström @ 2007-10-26 19:24 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson

I wanted to build an StGit command that coalesced adjacent patches to
a single patch. Because the end result tree would still be the same,
this should be doable without ever involving HEAD, the index, or the
worktree. StGit's existing infrastructure for manipulating patches
didn't lend itself to doing this kind of thing, though: it's not
modular enough. So I started to design a replacement low-level
interface to git, and things got slightly out of hand ... and I ended
up with a much bigger refactoring than I'd planned.

It's all outlined below, and all the code I currently have is
attached. Unless there's opposition, my plan is to convert one command
at a time to use the new infrastructure -- this can be done since the
on-disk format is unaffected.

Comments?


Python wrapping of git objects (gitlib.py)
----------------------------------------------------------------------

To make it easier to work with git objects, I've built a more
high-level interface than just calling git commands directly. Some of
it is trivial:

  * Blobs and tags aren't covered (yet), since StGit never uses them.
    I think. If they are needed in the future, they can easily be
    wrapped.

  * Trees are represented by the Python class Tree. They are
    immutable, and have only one property: the sha1 of the tree. More
    could be added if we need to look inside trees.

The interesting case is for commit objects:

  * Commits are represented by the Python class Commit. They are
    immutable, and have two properties: sha1 and commit data.

  * Commit data is represented by the Python class CommitData. These
    objects are also immutable, and have properties for author,
    committer, commit message, tree, and list of parent commits. They
    also have setter functions for these properties, which (since
    CommitData objects are immutable) return a modified copy of the
    original object.

  * Author and committer are represented by a Person class, also
    immutable with setter functions.

The user may create new CommitData objects, but never creates Tree or
Commit objects herself. Instead, she asks her Repository object to
create them for her:

  * Repository.get_tree and Repository.get_commit take sha1
    parameters, and returns the corresponding objects. They must
    already exist.

  * Repository.commit takes a CommitData parameter, and returns a
    corresponding Commit object representing the new commit.
    Internally, it runs git-commit-tree.

This has the nice property that Tree and Commit objects always
represent objects that git knows about. It also makes it trivial to
create new commits that have arbitrary existing commits as parents and
an existing tree. For example, my coalesce-adjacent-patches command
could be built on top of this.

The Repository object also has methods for reading and writing (and
caching) refs, but it lacks any method for creating new trees. This is
the job of the Index object. It has read_tree and write_tree methods
for getting Trees in to and out of the current index state. It also
has a merge method that does a 3-way merge between arbitrary Trees,
without ever touching the worktree (if the merge cannot be resolved
automatically, it simply fails).

The user is free to create as many Repository and Index objects as she
wants; their constructors take a git repository path and an index file
path as argumentes, respectively. This means that it's very easy to
work with a temporary index, which is neat in combination with the
Index.merge method: it lets you merge three trees to create a fourth
without ever touching the worktree or the default index.

For operations that involve a worktree as well, we have the
IndexAndWorktree class. It has methods for e.g. checkout and
update-index; this is also where a full, possibly conflicting merge
will go when I get around to implementing it.


Low-level StGit on top of the git wrappers (stacklib.py)
----------------------------------------------------------------------

That was all about git. We need an StGit layer on top of it.

There's a Stack object that represents a branch. It has two important
properties:

  * A PatchOrder object. This keeps track of the list of applied and
    unapplied patches, by name.

  * A store of Patch objects. This can look up Patch objects by name,
    and create new patches.

Patch objects represent patches, and are very simple. Basically the
only thing you can do with them is get their commit object, set their
commit object, and delete them. Author, commit message, top and
bottom, and all those things aren't a property of the patch; they are
properties of its commit.

(In the future, Patch objects should write stuff to the patch log.
They could also during a gradual transition to this new infrastructure
write out the per-patch metadata files that StGit currently uses.)

Importantly, unlike the current StGit stack class, there are no
high-level stack operations à la push and pop here. This is all
low-level manipulation of patch refs and the applied/unapplied files.
But in combination with the stuff in gitlib.py, lots of higher-level
StGit operations can be built on top of this.


Transactions (translib.py)
----------------------------------------------------------------------

I started to implement a few StGit commands on top of gitlib.py and
stacklib.py, and then realized something very appealing:

  Just about every StGit command can be accomplished by first creating
  a bunch of new commit objects with gitlib.py, then trying to check
  out the new HEAD, and then rewriting refs with stacklib.py. Only the
  first and second steps can possibly fail, and if they do, they do so
  without leaving any user-visible junk behind. This can be used to
  make all commands either succeed completely, or do nothing at all.

As an example (which I've not yet implemented), consider how push
would work:

  1. Create the new commit objects that the patches to be pushed will
     use. For each patch:

       a. Check if it's a fast forward. If so, just reuse the old
          commit object.

       b. Try the in-index merge with a temp index. If it succeeds,
          create a new commit object with that tree.

       c. Otherwise, stop trying to push any more patches.

  2. Check out the new HEAD tree. This may fail if the worktree and/or
     index contain conflicting changes. If so, we just abort the whole
     operation and tell the user which files she needs to clean up.

       a. If we had a patch that we couldn't push in (1.c), and then
          do a full 3-way merge with its original tree. This may fail
          if the worktree and/or index is dirty; if so, we don't try
          to push that patch.

       b. If the merge succeeds but with conflicts, create a new
          commit for it with the same tree as its parent (i.e. an
          empty commit) and leave the conflicts for the user to
          resolve.

       c. Otherwise, the merge autoresolved. Go back to (1) and try to
          push the remaining patches too. But remember that if we
          later need to abort the push due to dirty worktree/index, we
          have already pushed a few of the patches.

  3. Use stacklib.py to rewrite the branch ref and the patch refs.

This will end up pushing some subset of the requested patches. The
only way we'll ever get a result that isn't all-or-nothing is if a
merge conflicts. Note also how (except for the irritating (2.c)) we
never touch the index and worktree until we're already done, which
should make things both robust and fast.

(Step (2.c) is irritating, in that we actually have to check out a new
tree in order to use merge-recursive, and merge-recursive might
autoresolve a merge that the in-index merge failed to resolve, so that
we have checked out an intermediate tree even though there didn't end
up being any conflict for the user to resolve.)

The code in translib.py is a simple class that can hold a record of
everything that needs to be done in step (3), and then does it when
and if we get there.

The killer feature of transactions (apart from their use as a utility
when writing commands) is that we could build transaction logging.
Since every StGit command performs exatly one transaction, if we
simply logged the before and after values of the patch refs, branch
ref, and patch appliedness, we could build a generic StGit undo/redo
command.


Example commands (utillib.py)
----------------------------------------------------------------------

This file has sample implementations of some StGit commands: clean,
pop, push, and refresh. They don't have any bells and whistles, and
the push is fundamentally limited in that it doesn't handle conflicts
-- it'll complain and do nothing.

These were mostly done to excercise the new infrastructure and make
sure that I hadn't forgotten anything. The plan is not to replace the
existing commands, just make them use the new infrastructure.




diff --git a/stgit/gitlib.py b/stgit/gitlib.py
new file mode 100644
index 0000000..46911d5
--- /dev/null
+++ b/stgit/gitlib.py
@@ -0,0 +1,360 @@
+import os, os.path, re
+from exception import *
+import run
+
+class DetachedHeadException(StgException):
+    pass
+
+class Repr(object):
+    def __repr__(self):
+        return str(self)
+
+class NoValue(object):
+    pass
+
+def make_defaults(defaults):
+    def d(val, attr):
+        if val != NoValue:
+            return val
+        elif defaults != NoValue:
+            return getattr(defaults, attr)
+        else:
+            return None
+    return d
+
+class Person(Repr):
+    """Immutable."""
+    def __init__(self, name = NoValue, email = NoValue,
+                 date = NoValue, defaults = NoValue):
+        d = make_defaults(defaults)
+        self.__name = d(name, 'name')
+        self.__email = d(email, 'email')
+        self.__date = d(date, 'date')
+    name = property(lambda self: self.__name)
+    email = property(lambda self: self.__email)
+    date = property(lambda self: self.__date)
+    def set_name(self, name):
+        return type(self)(name = name, defaults = self)
+    def set_email(self, email):
+        return type(self)(email = email, defaults = self)
+    def set_date(self, date):
+        return type(self)(date = date, defaults = self)
+    def __str__(self):
+        return '%s <%s> %s' % (self.name, self.email, self.date)
+    @classmethod
+    def parse(cls, s):
+        m = re.match(r'^([^<]*)<([^>]*)>\s+(\d+\s+[+-]\d{4})$', s)
+        assert m
+        name = m.group(1).strip()
+        email = m.group(2)
+        date = m.group(3)
+        return cls(name, email, date)
+
+class Tree(Repr):
+    """Immutable."""
+    def __init__(self, sha1):
+        self.__sha1 = sha1
+    sha1 = property(lambda self: self.__sha1)
+    def __str__(self):
+        return 'Tree<%s>' % self.sha1
+
+class Commitdata(Repr):
+    """Immutable."""
+    def __init__(self, tree = NoValue, parents = NoValue, author = NoValue,
+                 committer = NoValue, message = NoValue, defaults = NoValue):
+        d = make_defaults(defaults)
+        self.__tree = d(tree, 'tree')
+        self.__parents = d(parents, 'parents')
+        self.__author = d(author, 'author')
+        self.__committer = d(committer, 'committer')
+        self.__message = d(message, 'message')
+    tree = property(lambda self: self.__tree)
+    parents = property(lambda self: self.__parents)
+    @property
+    def parent(self):
+        assert len(self.__parents) == 1
+        return self.__parents[0]
+    author = property(lambda self: self.__author)
+    committer = property(lambda self: self.__committer)
+    message = property(lambda self: self.__message)
+    def set_tree(self, tree):
+        return type(self)(tree = tree, defaults = self)
+    def set_parents(self, parents):
+        return type(self)(parents = parents, defaults = self)
+    def add_parent(self, parent):
+        return type(self)(parents = list(self.parents or []) + [parent],
+                          defaults = self)
+    def set_parent(self, parent):
+        return self.set_parents([parent])
+    def set_author(self, author):
+        return type(self)(author = author, defaults = self)
+    def set_committer(self, committer):
+        return type(self)(committer = committer, defaults = self)
+    def set_message(self, message):
+        return type(self)(message = message, defaults = self)
+    def __str__(self):
+        if self.tree == None:
+            tree = None
+        else:
+            tree = self.tree.sha1
+        if self.parents == None:
+            parents = None
+        else:
+            parents = [p.sha1 for p in self.parents]
+        return ('Commitdata<tree: %s, parents: %s, author: %s,'
+                ' committer: %s, message: "%s">'
+                ) % (tree, parents, self.author, self.committer, self.message)
+    @classmethod
+    def parse(cls, repository, s):
+        cd = cls()
+        lines = list(s.splitlines(True))
+        for i in xrange(len(lines)):
+            line = lines[i].strip()
+            if not line:
+                return cd.set_message(''.join(lines[i+1:]))
+            key, value = line.split(None, 1)
+            if key == 'tree':
+                cd = cd.set_tree(repository.get_tree(value))
+            elif key == 'parent':
+                cd = cd.add_parent(repository.get_commit(value))
+            elif key == 'author':
+                cd = cd.set_author(Person.parse(value))
+            elif key == 'committer':
+                cd = cd.set_committer(Person.parse(value))
+            else:
+                assert False
+        assert False
+
+class Commit(Repr):
+    """Immutable."""
+    def __init__(self, repository, sha1):
+        self.__sha1 = sha1
+        self.__repository = repository
+        self.__data = None
+    sha1 = property(lambda self: self.__sha1)
+    @property
+    def data(self):
+        if self.__data == None:
+            self.__data = Commitdata.parse(
+                self.__repository,
+                self.__repository.cat_object(self.sha1))
+        return self.__data
+    def __str__(self):
+        return 'Commit<sha1: %s, data: %s>' % (self.sha1, self.__data)
+
+class Refs(object):
+    def __init__(self, repository):
+        self.__repository = repository
+        self.__refs = None
+    def __cache_refs(self):
+        self.__refs = {}
+        for line in self.__repository.run(['git-show-ref']).output_lines():
+            m = re.match(r'^([0-9a-f]{40})\s+(\S+)$', line)
+            sha1, ref = m.groups()
+            self.__refs[ref] = sha1
+    def get(self, ref):
+        if self.__refs == None:
+            self.__cache_refs()
+        return self.__repository.get_commit(self.__refs[ref])
+    def set(self, ref, commit, msg):
+        if self.__refs == None:
+            self.__cache_refs()
+        old_sha1 = self.__refs.get(ref, '0'*40)
+        new_sha1 = commit.sha1
+        if old_sha1 != new_sha1:
+            self.__repository.run(['git-update-ref', '-m', msg,
+                                   ref, new_sha1, old_sha1]).no_output()
+            self.__refs[ref] = new_sha1
+    def delete(self, ref):
+        if self.__refs == None:
+            self.__cache_refs()
+        self.__repository.run(['git-update-ref',
+                               '-d', ref, self.__refs[ref]]).no_output()
+        del self.__refs[ref]
+
+class ObjectCache(object):
+    def __init__(self, create):
+        self.__objects = {}
+        self.__create = create
+    def __getitem__(self, name):
+        if not name in self.__objects:
+            self.__objects[name] = self.__create(name)
+        return self.__objects[name]
+    def __contains__(self, name):
+        return name in self.__objects
+    def __setitem__(self, name, val):
+        assert not name in self.__objects
+        self.__objects[name] = val
+
+def add_dict(d1, d2):
+    d = dict(d1)
+    d.update(d2)
+    return d
+
+class RunWithEnv(object):
+    def run(self, args, env = {}):
+        return run.Run(*args).env(add_dict(self.env, env))
+
+class Repository(RunWithEnv):
+    def __init__(self, directory):
+        self.__git_dir = directory
+        self.__refs = Refs(self)
+        self.__trees = ObjectCache(lambda sha1: Tree(sha1))
+        self.__commits = ObjectCache(lambda sha1: Commit(self, sha1))
+    env = property(lambda self: { 'GIT_DIR': self.__git_dir })
+    @classmethod
+    def default(cls):
+        """Return the default repository."""
+        return cls(run.Run('git-rev-parse', '--git-dir').output_one_line())
+    def default_index(self):
+        return Index(self, (os.environ.get('GIT_INDEX_FILE', None)
+                            or os.path.join(self.__git_dir, 'index')))
+    def temp_index(self):
+        return Index(self, self.__git_dir)
+    def default_worktree(self):
+        path = os.environ.get('GIT_WORK_TREE', None)
+        if not path:
+            o = run.Run('git-rev-parse', '--show-cdup').output_lines()
+            o = o or ['.']
+            assert len(o) == 1
+            path = o[0]
+        return Worktree(path)
+    def default_iw(self):
+        return IndexAndWorktree(self.default_index(), self.default_worktree())
+    directory = property(lambda self: self.__git_dir)
+    refs = property(lambda self: self.__refs)
+    def cat_object(self, sha1):
+        return self.run(['git-cat-file', '-p', sha1]).raw_output()
+    def get_tree(self, sha1):
+        return self.__trees[sha1]
+    def get_commit(self, sha1):
+        return self.__commits[sha1]
+    def commit(self, commitdata):
+        c = ['git-commit-tree', commitdata.tree.sha1]
+        for p in commitdata.parents:
+            c.append('-p')
+            c.append(p.sha1)
+        env = {}
+        for p, v1 in ((commitdata.author, 'AUTHOR'),
+                       (commitdata.committer, 'COMMITTER')):
+            if p != None:
+                for attr, v2 in (('name', 'NAME'), ('email', 'EMAIL'),
+                                 ('date', 'DATE')):
+                    if getattr(p, attr) != None:
+                        env['GIT_%s_%s' % (v1, v2)] = getattr(p, attr)
+        sha1 = self.run(c, env = env).raw_input(commitdata.message
+                                                ).output_one_line()
+        return self.get_commit(sha1)
+    @property
+    def head(self):
+        try:
+            return self.run(['git-symbolic-ref', '-q', 'HEAD']
+                            ).output_one_line()
+        except run.RunException:
+            raise DetachedHeadException()
+    def set_head(self, ref, msg):
+        self.run(['git-symbolic-ref', '-m', msg, 'HEAD', ref]).no_output()
+    @property
+    def head_commit(self):
+        return self.get_commit(self.run(['git-rev-parse', 'HEAD']
+                                        ).output_one_line())
+
+class MergeException(StgException):
+    pass
+
+class Index(RunWithEnv):
+    def __init__(self, repository, filename):
+        self.__repository = repository
+        if os.path.isdir(filename):
+            # Create a temp index in the given directory.
+            self.__filename = os.path.join(
+                filename, 'index.temp-%d-%x' % (os.getpid(), id(self)))
+            self.delete()
+        else:
+            self.__filename = filename
+    env = property(lambda self: add_dict(self.__repository.env,
+                                         { 'GIT_INDEX_FILE': self.__filename }))
+    def read_tree(self, tree):
+        self.run(['git-read-tree', tree.sha1]).no_output()
+    def write_tree(self):
+        return self.__repository.get_tree(
+            self.run(['git-write-tree']).output_one_line())
+    def is_clean(self):
+        try:
+            self.run(['git-update-index', '--refresh']).discard_output()
+        except run.RunException:
+            return False
+        else:
+            return True
+    def merge(self, base, ours, theirs):
+        """In-index merge, no worktree involved."""
+        self.run(['git-read-tree', '-m', '-i', '--aggressive',
+                  base.sha1, ours.sha1, theirs.sha1]).no_output()
+        try:
+            self.run(['git-merge-index', 'git-merge-one-file', '-a']
+                     ).no_output()
+        except run.RunException:
+            raise MergeException('In-index merge failed due to conflicts')
+    def delete(self):
+        if os.path.isfile(self.__filename):
+            os.remove(self.__filename)
+
+class Worktree(object):
+    def __init__(self, directory):
+        self.__directory = directory
+    env = property(lambda self: { 'GIT_WORK_TREE': self.__directory })
+
+class CheckoutException(StgException):
+    pass
+
+class IndexAndWorktree(RunWithEnv):
+    def __init__(self, index, worktree):
+        self.__index = index
+        self.__worktree = worktree
+    index = property(lambda self: self.__index)
+    env = property(lambda self: add_dict(self.__index.env, self.__worktree.env))
+    def checkout(self, old_commit, new_commit):
+        # TODO: Optionally do a 3-way instead of doing nothing when we
+        # have a problem. Or maybe we should stash changes in a patch?
+        try:
+            self.run(['git-read-tree', '-u', '-m',
+                      '--exclude-per-directory=.gitignore',
+                      old_commit.sha1, new_commit.sha1]
+                     ).discard_output()
+        except run.RunException:
+            raise CheckoutException('Index/workdir dirty')
+    def changed_files(self):
+        return self.run(['git-diff-files', '--name-only']).output_lines()
+    def update_index(self, files):
+        self.run(['git-update-index', '--remove', '-z', '--stdin']
+                 ).input_nulterm(files).discard_output()
+
+if __name__ == '__main__':
+    testdir = '/tmp/stgtest'
+    os.system('rm -rf %s' % testdir)
+    os.makedirs(testdir)
+    os.chdir(testdir)
+    for c in ['git init',
+              'echo foo >> foo',
+              'git add foo',
+              'git commit -m foo',
+              'echo bar >> foo',
+              'git commit -a -m foo']:
+        os.system(c)
+    r = Repository(os.path.join(testdir, '.git'))
+    head = r.head
+    c = r.refs.get(head)
+    print 'HEAD is', head, 'which is', c
+    c.data
+    print 'Expanded:', c
+    maja = Person(name = 'Maja', email = 'maja@example.com')
+    nisse = Person(name = 'Nisse', email = 'nisse@example.com')
+    c2 = r.commit(c.data.set_parents([c]).set_author(maja))
+    c3 = r.commit(c.data.set_parents([c]).set_author(nisse))
+    c4 = r.commit(c.data.set_parents([c2, c3]))
+    r.refs.set(head, c4, 'made a cool merge')
+    c5 = r.commit(c.data.set_parents([c4]).set_tree(
+        c.data.parents[0].data.tree))
+    head = 'refs/heads/foobar'
+    r.refs.set(head, c5, 'committed a revert')
+    r.set_head(head, 'switched to other branch')
diff --git a/stgit/run.py b/stgit/run.py
index 924e59a..43c3a23 100644
--- a/stgit/run.py
+++ b/stgit/run.py
@@ -105,7 +105,7 @@ class Run:
     def input_lines(self, lines):
         self.__indata = ''.join(['%s\n' % line for line in lines])
         return self
-    def input_nulterm(self, items):
+    def input_nulterm(self, lines):
         self.__indata = ''.join('%s\0' % line for line in lines)
         return self
     def no_output(self):
diff --git a/stgit/stacklib.py b/stgit/stacklib.py
new file mode 100644
index 0000000..06ba007
--- /dev/null
+++ b/stgit/stacklib.py
@@ -0,0 +1,120 @@
+import os.path
+import gitlib, utils
+
+class Patch(object):
+    def __init__(self, stack, name):
+        self.__stack = stack
+        self.__name = name
+    name = property(lambda self: self.__name)
+    def __ref(self):
+        return 'refs/patches/%s/%s' % (self.__stack.name, self.__name)
+    @property
+    def commit(self):
+        return self.__stack.repository.refs.get(self.__ref())
+    def set_commit(self, commit, msg):
+        self.__stack.repository.refs.set(self.__ref(), commit, msg)
+    def delete(self):
+        self.__stack.repository.refs.delete(self.__ref())
+    def is_applied(self):
+        return self.name in self.__stack.patchorder.applied
+    def is_empty(self):
+        c = self.commit
+        return c.data.tree == c.data.parent.data.tree
+
+class PatchOrder(object):
+    """Keeps track of patch order, and which patches are applied.
+    Works with patch names, not actual patches."""
+    __list_order = [ 'applied', 'unapplied' ]
+    def __init__(self, stack):
+        self.__stack = stack
+        self.__lists = {}
+    def __read_file(self, fn):
+        return tuple(utils.read_strings(
+            os.path.join(self.__stack.directory, fn)))
+    def __write_file(self, fn, val):
+        utils.write_strings(os.path.join(self.__stack.directory, fn), val)
+    def __get_list(self, name):
+        if not name in self.__lists:
+            self.__lists[name] = self.__read_file(name)
+        return self.__lists[name]
+    def __set_list(self, name, val):
+        val = tuple(val)
+        if val != self.__lists.get(name, None):
+            self.__lists[name] = val
+            self.__write_file(name, val)
+    applied = property(lambda self: self.__get_list('applied'),
+                       lambda self, val: self.__set_list('applied', val))
+    unapplied = property(lambda self: self.__get_list('unapplied'),
+                         lambda self, val: self.__set_list('unapplied', val))
+
+class Patches(object):
+    """Creates Patch objects."""
+    def __init__(self, stack):
+        self.__stack = stack
+        def create_patch(name):
+            p = Patch(self.__stack, name)
+            p.commit # raise exception if the patch doesn't exist
+            return p
+        self.__patches = gitlib.ObjectCache(create_patch) # name -> Patch
+    def exists(self, name):
+        return name in self.__patches
+    def get(self, name):
+        return self.__patches[name]
+    def new(self, name, commit, msg):
+        assert not name in self.__patches
+        p = Patch(self.__stack, name)
+        p.set_commit(commit, msg)
+        self.__patches[name] = p
+        return p
+
+class Stack(object):
+    def __init__(self, repository, name):
+        self.__repository = repository
+        self.__name = name
+        self.__patchorder = PatchOrder(self)
+        self.__patches = Patches(self)
+    name = property(lambda self: self.__name)
+    repository = property(lambda self: self.__repository)
+    patchorder = property(lambda self: self.__patchorder)
+    patches = property(lambda self: self.__patches)
+    @property
+    def directory(self):
+        return os.path.join(self.__repository.directory, 'patches', self.__name)
+    def __ref(self):
+        return 'refs/heads/%s' % self.__name
+    @property
+    def head(self):
+        return self.__repository.refs.get(self.__ref())
+    def set_head(self, commit, msg):
+        self.__repository.refs.set(self.__ref(), commit, msg)
+    @property
+    def base(self):
+        if self.patchorder.applied:
+            return self.patches.get(self.patchorder.applied[0]
+                                    ).commit.data.parent
+        else:
+            return self.head
+
+def strip_leading(prefix, s):
+    assert s.startswith(prefix)
+    return s[len(prefix):]
+
+class Repository(gitlib.Repository):
+    def __init__(self, *args, **kwargs):
+        gitlib.Repository.__init__(self, *args, **kwargs)
+        self.__stacks = {} # name -> Stack
+    @property
+    def current_branch(self):
+        return strip_leading('refs/heads/', self.head)
+    @property
+    def current_stack(self):
+        return self.get_stack(self.current_branch)
+    def get_stack(self, name):
+        if not name in self.__stacks:
+            if name == None:
+                s = None # detached HEAD
+            else:
+                # TODO: verify that the branch exists
+                s = Stack(self, name)
+            self.__stacks[name] = s
+        return self.__stacks[name]
diff --git a/stgit/translib.py b/stgit/translib.py
new file mode 100644
index 0000000..deb3420
--- /dev/null
+++ b/stgit/translib.py
@@ -0,0 +1,70 @@
+import gitlib
+from exception import *
+from out import *
+
+class TransactionException(StgException):
+    pass
+
+class StackTransaction(object):
+    def __init__(self, stack, msg):
+        self.__stack = stack
+        self.__msg = msg
+        self.__patches = {}
+        self.__applied = list(self.__stack.patchorder.applied)
+        self.__unapplied = list(self.__stack.patchorder.unapplied)
+    def __set_patches(self, val):
+        self.__patches = dict(val)
+    patches = property(lambda self: self.__patches, __set_patches)
+    def __set_applied(self, val):
+        self.__applied = list(val)
+    applied = property(lambda self: self.__applied, __set_applied)
+    def __set_unapplied(self, val):
+        self.__unapplied = list(val)
+    unapplied = property(lambda self: self.__unapplied, __set_unapplied)
+    def __check_consistency(self):
+        remaining = set(self.__applied + self.__unapplied)
+        for pn, commit in self.__patches.iteritems():
+            if commit == None:
+                assert self.__stack.patches.exists(pn)
+            else:
+                assert pn in remaining
+    def run(self, iw = None):
+        self.__check_consistency()
+
+        # Get new head commit.
+        if self.__applied:
+            top_patch = self.__applied[-1]
+            try:
+                new_head = self.__patches[top_patch]
+            except KeyError:
+                new_head = self.__stack.patches.get(top_patch).commit
+        else:
+            new_head = self.__stack.base
+
+        # Set branch head.
+        if new_head == self.__stack.head:
+            out.info('Head remains at %s' % new_head.sha1[:8])
+        elif new_head.data.tree == self.__stack.head.data.tree:
+            out.info('Head %s -> %s (same tree)' % (self.__stack.head.sha1[:8],
+                                                    new_head.sha1[:8]))
+        elif iw != None:
+            try:
+                iw.checkout(self.__stack.head, new_head)
+            except gitlib.CheckoutException, e:
+                raise TransactionException(e)
+            out.info('Head %s -> %s' % (self.__stack.head.sha1[:8],
+                                        new_head.sha1[:8]))
+        self.__stack.set_head(new_head, self.__msg)
+
+        # Write patches.
+        for pn, commit in self.__patches.iteritems():
+            if self.__stack.patches.exists(pn):
+                p = self.__stack.patches.get(pn)
+                if commit == None:
+                    p.delete()
+                else:
+                    p.set_commit(commit, self.__msg)
+            else:
+                self.__stack.patches.new(pn, commit, self.__msg)
+        self.__stack.patchorder.applied = self.__applied
+        self.__stack.patchorder.unapplied = self.__unapplied
diff --git a/stgit/utillib.py b/stgit/utillib.py
new file mode 100644
index 0000000..d09ecc4
--- /dev/null
+++ b/stgit/utillib.py
@@ -0,0 +1,139 @@
+import gitlib, translib
+from out import *
+
+def head_top_equal(repository):
+    head = repository.head_commit
+    try:
+        s = repository.current_stack
+    except gitlib.DetachedHeadException:
+        out.error('Not on any branch (detached HEAD)')
+        return False
+    applied = s.patchorder.applied
+    if not applied:
+        return True
+    top = s.patches.get(applied[-1])
+    if top.commit == head:
+        return True
+    out.error('The top patch (%s, %s)' % (top.name, top.commit.sha1),
+              'and HEAD (%s) are not the same.' % head.sha1)
+    return False
+
+def simple_merge(repository, base, ours, theirs):
+    """Given three trees, tries to do an in-index merge in a temporary
+    index with a temporary index. Returns the result tree, or None if
+    the merge failed (due to conflicts)."""
+    assert isinstance(base, gitlib.Tree)
+    assert isinstance(ours, gitlib.Tree)
+    assert isinstance(theirs, gitlib.Tree)
+    if base == ours:
+        # Fast forward: theirs is a descendant of ours.
+        return theirs
+    if base == theirs:
+        # Fast forward: ours is a descendant of theirs.
+        return ours
+    index = repository.temp_index()
+    try:
+        try:
+            index.merge(base, ours, theirs)
+        except gitlib.MergeException:
+            return None
+        return index.write_tree()
+    finally:
+        index.delete()
+
+def clean(stack):
+    t = translib.StackTransaction(stack, 'stg clean')
+    t.unapplied = []
+    for pn in stack.patchorder.unapplied:
+        p = stack.patches.get(pn)
+        if p.is_empty():
+            t.patches[pn] = None
+        else:
+            t.unapplied.append[pn]
+    t.applied = []
+    parent = stack.base
+    for pn in stack.patchorder.applied:
+        p = stack.patches.get(pn)
+        if p.is_empty():
+            t.patches[pn] = None
+            out.info('Deleting %s' % pn)
+        else:
+            if parent != p.commit.data.parent:
+                parent = t.patches[pn] = stack.repository.commit(
+                    p.commit.data.set_parent(parent))
+            else:
+                parent = p.commit
+            t.applied.append(pn)
+            out.info('Keeping %s' % pn)
+    t.run()
+
+def pop(stack, iw = None):
+    t = translib.StackTransaction(stack, 'stg pop')
+    pn = t.applied.pop()
+    t.unapplied.insert(0, pn)
+    t.run(iw)
+
+def push(stack, pn, iw = None):
+    t = translib.StackTransaction(stack, 'stg push')
+    t.unapplied.remove(pn)
+    t.applied.append(pn)
+    p = stack.patches.get(pn)
+    if stack.head != p.commit.data.parent:
+        tree = simple_merge(stack.repository, p.commit.data.parent.data.tree,
+                            stack.head.data.tree, p.commit.data.tree)
+        assert tree
+        t.patches[pn] = stack.repository.commit(
+            p.commit.data.set_parent(stack.head).set_tree(tree))
+    t.run(iw)
+
+def refresh(stack, iw):
+    iw.update_index(iw.changed_files())
+    tree = iw.index.write_tree()
+    t = translib.StackTransaction(stack, 'stg refresh')
+    p = stack.patches.get(t.applied[-1])
+    t.patches[p.name] = stack.repository.commit(
+        p.commit.data.set_tree(tree))
+    t.run()
+
+if __name__ == '__main__':
+    import os
+    import stacklib
+    testdir = '/tmp/stgtest'
+    os.system('rm -rf %s' % testdir)
+    os.makedirs(testdir)
+    os.chdir(testdir)
+    for c in ['git init',
+              'echo foo >> foo',
+              'git add foo',
+              'git commit -m foo',
+              'stg init']:
+        os.system(c)
+    for i in range(3):
+        for c in ['stg new p%d -m p%d' % (i, i),
+                  'echo %s >> foo%d' % (str(i)*4, i),
+                  'git add foo%d' % i,
+                  'stg refresh',
+                  'stg new q%d -m q%d' % (i, i)]:
+            os.system(c)
+    r = stacklib.Repository.default()
+    print 'Current branch:', r.current_branch
+    s = r.current_stack
+    print 'Applied:', s.patchorder.applied
+    print 'Unapplied:', s.patchorder.unapplied
+    os.system('git checkout HEAD^')
+    head_top_equal(r)
+    os.system('git checkout master')
+    head_top_equal(r)
+    clean(s)
+    iw = r.default_iw()
+    pop(s, iw)
+    pop(s, iw)
+    os.system('stg series')
+    os.system('stg status')
+    push(s, 'p2', iw)
+    os.system('stg series')
+    os.system('stg status')
+    os.system('echo 333 >> foo0')
+    refresh(s, iw)
+    os.system('stg series')
+    os.system('stg status')

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-10-26 19:24 [StGit RFC] A more structured way of calling git Karl Hasselström
@ 2007-11-03 10:56 ` Catalin Marinas
  2007-11-03 14:28   ` Yann Dirson
  2007-11-04 18:34   ` Karl Hasselström
  2007-11-15 18:28 ` Catalin Marinas
  1 sibling, 2 replies; 8+ messages in thread
From: Catalin Marinas @ 2007-11-03 10:56 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: David Kågedal, Git Mailing List, Yann Dirson

On 26/10/2007, Karl Hasselström <kha@treskal.com> wrote:
> I wanted to build an StGit command that coalesced adjacent patches to
> a single patch. Because the end result tree would still be the same,
> this should be doable without ever involving HEAD, the index, or the
> worktree.

Wouldn't HEAD need to be modified since the commit log changes
slightly, even though the tree is the same. Or am I misunderstanding
this?

> StGit's existing infrastructure for manipulating patches
> didn't lend itself to doing this kind of thing, though: it's not
> modular enough. So I started to design a replacement low-level
> interface to git, and things got slightly out of hand ... and I ended
> up with a much bigger refactoring than I'd planned.

Thanks for this. I'll need a bit of time to read it all and give
feedback. In general, I welcome this refactoring.

I'll go through the whole e-mail in the next days and get back to you.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-11-03 10:56 ` Catalin Marinas
@ 2007-11-03 14:28   ` Yann Dirson
  2007-11-04 18:40     ` Karl Hasselström
  2007-11-04 18:34   ` Karl Hasselström
  1 sibling, 1 reply; 8+ messages in thread
From: Yann Dirson @ 2007-11-03 14:28 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Karl Hasselström, David Kågedal, Git Mailing List

On Sat, Nov 03, 2007 at 10:56:36AM +0000, Catalin Marinas wrote:
> On 26/10/2007, Karl Hasselström <kha@treskal.com> wrote:
> > I wanted to build an StGit command that coalesced adjacent patches to
> > a single patch. Because the end result tree would still be the same,
> > this should be doable without ever involving HEAD, the index, or the
> > worktree.
> 
> Wouldn't HEAD need to be modified since the commit log changes
> slightly, even though the tree is the same. Or am I misunderstanding
> this?

This reminds me of someone suggesting that some patches could be
represented by more than one commit.  But I'm not sure such a beast
would be useful - I fear that would make StGIT much more complicated,
but would it really make things better ?

Best regards,
-- 
Yann

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-11-03 10:56 ` Catalin Marinas
  2007-11-03 14:28   ` Yann Dirson
@ 2007-11-04 18:34   ` Karl Hasselström
  1 sibling, 0 replies; 8+ messages in thread
From: Karl Hasselström @ 2007-11-04 18:34 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson

On 2007-11-03 10:56:36 +0000, Catalin Marinas wrote:

> On 26/10/2007, Karl Hasselström <kha@treskal.com> wrote:
>
> > I wanted to build an StGit command that coalesced adjacent patches
> > to a single patch. Because the end result tree would still be the
> > same, this should be doable without ever involving HEAD, the
> > index, or the worktree.
>
> Wouldn't HEAD need to be modified since the commit log changes
> slightly, even though the tree is the same. Or am I misunderstanding
> this?

I'm refering to the HEAD tree. The HEAD commit will of course change.

> > StGit's existing infrastructure for manipulating patches didn't
> > lend itself to doing this kind of thing, though: it's not modular
> > enough. So I started to design a replacement low-level interface
> > to git, and things got slightly out of hand ... and I ended up
> > with a much bigger refactoring than I'd planned.
>
> Thanks for this. I'll need a bit of time to read it all and give
> feedback. In general, I welcome this refactoring.
>
> I'll go through the whole e-mail in the next days and get back to
> you.

Thanks, I appreciate it.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-11-03 14:28   ` Yann Dirson
@ 2007-11-04 18:40     ` Karl Hasselström
  0 siblings, 0 replies; 8+ messages in thread
From: Karl Hasselström @ 2007-11-04 18:40 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Catalin Marinas, David Kågedal, Git Mailing List

On 2007-11-03 15:28:51 +0100, Yann Dirson wrote:

> This reminds me of someone suggesting that some patches could be
> represented by more than one commit.

You might be remebering me pointing out that the old infrastructure
supported (or at least not directly disallowed) this.

> But I'm not sure such a beast would be useful - I fear that would
> make StGIT much more complicated, but would it really make things
> better?

Yes, it makes everything much more complicated, and no, it doesn't buy
us anything new. After all, once we know the parent and the tree we
want a patch to have, we can just manufacture a commit that has that
tree and that parent.

My proposed new infrastructure cannot represent such patches, very
much by design.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-10-26 19:24 [StGit RFC] A more structured way of calling git Karl Hasselström
  2007-11-03 10:56 ` Catalin Marinas
@ 2007-11-15 18:28 ` Catalin Marinas
  2007-11-16  7:42   ` Karl Hasselström
  1 sibling, 1 reply; 8+ messages in thread
From: Catalin Marinas @ 2007-11-15 18:28 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: David Kågedal, Git Mailing List, Yann Dirson

On 26/10/2007, Karl Hasselström <kha@treskal.com> wrote:
> I wanted to build an StGit command that coalesced adjacent patches to
> a single patch. Because the end result tree would still be the same,
> this should be doable without ever involving HEAD, the index, or the
> worktree. StGit's existing infrastructure for manipulating patches
> didn't lend itself to doing this kind of thing, though: it's not
> modular enough. So I started to design a replacement low-level
> interface to git, and things got slightly out of hand ... and I ended
> up with a much bigger refactoring than I'd planned.

I eventually managed to have a (not so in-depth) look at this and I am
OK with it (but merging after 0.14). As you said the current structure
and the new one will work in parallel for some time. It would even be
possible to implement unit testing.

Some random comments below:

Commitdata maybe should be written as CommitData (as in the e-mail text).

The Repository.head property is not cached and it is probably OK to
avoid some problems I had. We could run profiling afterwards to see
how often it is called.

> Transactions (translib.py)
> ----------------------------------------------------------------------
>
> I started to implement a few StGit commands on top of gitlib.py and
> stacklib.py, and then realized something very appealing:
>
>   Just about every StGit command can be accomplished by first creating
>   a bunch of new commit objects with gitlib.py, then trying to check
>   out the new HEAD, and then rewriting refs with stacklib.py. Only the
>   first and second steps can possibly fail, and if they do, they do so
>   without leaving any user-visible junk behind. This can be used to
>   make all commands either succeed completely, or do nothing at all.

As you said, that's a bit difficult for 'push' as it is made of a
series of separate individual patch pushes. Can we not use a temporary
index by setting GIT_INDEX_FILE during the whole transaction and
freely update the working copy. Only if the transaction fails, we
clean the working copy and check out the default index? This might
slow down some operations though.

In the future, it would be nice to record the stack state before
transactions in a git object (via pickle) together with HEAD id and
provide unlimited undo.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-11-15 18:28 ` Catalin Marinas
@ 2007-11-16  7:42   ` Karl Hasselström
  2007-11-16  8:36     ` Karl Hasselström
  0 siblings, 1 reply; 8+ messages in thread
From: Karl Hasselström @ 2007-11-16  7:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson

On 2007-11-15 18:28:16 +0000, Catalin Marinas wrote:

> I eventually managed to have a (not so in-depth) look at this

Thanks!

> and I am OK with it (but merging after 0.14).

Yes, that seems like a good idea.

> As you said the current structure and the new one will work in
> parallel for some time.

Yes. This is a very nice property that makes the whole enterprise
feasible.

> It would even be possible to implement unit testing.

Yes. Unit testing would be easier with the new code, since it's more
modular.

> Commitdata maybe should be written as CommitData (as in the e-mail
> text).

Probably. I've alredy made that spelling error myself at least once
when writing code.

> The Repository.head property is not cached and it is probably OK to
> avoid some problems I had. We could run profiling afterwards to see
> how often it is called.

Yes. This applies to a few other things as well -- I try to be careful
to write things so that they could be cached transparently if it turns
out we need it. But I also try to not optimize prematurely.

In some cases the caching is more than just a speed optimization.
Commits and trees can be compared with regular Python object identity,
for example, because I never create duplicates.

> > I started to implement a few StGit commands on top of gitlib.py
> > and stacklib.py, and then realized something very appealing:
> >
> >   Just about every StGit command can be accomplished by first
> >   creating a bunch of new commit objects with gitlib.py, then
> >   trying to check out the new HEAD, and then rewriting refs with
> >   stacklib.py. Only the first and second steps can possibly fail,
> >   and if they do, they do so without leaving any user-visible junk
> >   behind. This can be used to make all commands either succeed
> >   completely, or do nothing at all.
>
> As you said, that's a bit difficult for 'push' as it is made of a
> series of separate individual patch pushes. Can we not use a
> temporary index by setting GIT_INDEX_FILE during the whole
> transaction and freely update the working copy. Only if the
> transaction fails, we clean the working copy and check out the
> default index? This might slow down some operations though.

The sample code I posted does exactly this index trick. So yes, I
think it's a splendid idea. :-)

Basically, what it does (or eventually should do) is repeatedly use
the temp index to create the new commits without ever touching the
default index or the worktree. Only if a merge conflicts would we
check out one of the parents and spill the conflicts to the worktree.
(And once we're finished pushing we'd want to check out the new stack
top, obviously.)

The only problem is that while git-read-tree can do a three-way merge
in a temp index without touching the worktree, git-merge-recursive
assumes that the worktree reflects one of the trees to be merged and
can be written to. Which is _totally OK_ if there really is going to
be a conflict. But merge-recursive does fancier conflict resolution
than read-tree (I think -- I still have to look up exactly what, but I
don't think read-tree does rename detection for example), so it might
succeed even if read-tree failed.

> In the future, it would be nice to record the stack state before
> transactions in a git object (via pickle) together with HEAD id and
> provide unlimited undo.

Yes, this was exactly what I was refering to when I talked about
transaction logging. The only difference is that I'd prefer a simple
text format instead of a pickle, but we can have a fight about that
detail later. :-)

The complete state that would have to be saved would be

  * (patchname, commit sha1) pairs for all patches

  * (branchname, branch head sha1) pair

  * the ordered lists of applied, unapplied, and eventually also
    hidden patches

This is somewhat redundant (applied/unapplied could be inferred from
the commit DAG, etc.) but it would allow us to represent relevant
illegal states as well, which is needed in order to be able to undo
repairs of inconsistent stacks.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [StGit RFC] A more structured way of calling git
  2007-11-16  7:42   ` Karl Hasselström
@ 2007-11-16  8:36     ` Karl Hasselström
  0 siblings, 0 replies; 8+ messages in thread
From: Karl Hasselström @ 2007-11-16  8:36 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: David Kågedal, Git Mailing List, Yann Dirson

On 2007-11-16 08:42:55 +0100, Karl Hasselström wrote:

> The complete state that would have to be saved would be
>
>   * (patchname, commit sha1) pairs for all patches
>
>   * (branchname, branch head sha1) pair
>
>   * the ordered lists of applied, unapplied, and eventually also
>     hidden patches

Oh, and

  * .git/config (one blob)

  * the worktree (one tree object?)

  * the index (one tree object per stage?)

Otherwise, we won't be able to un-clobber them. But a first
implementation could certainly skip this part.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-11-16  8:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-26 19:24 [StGit RFC] A more structured way of calling git Karl Hasselström
2007-11-03 10:56 ` Catalin Marinas
2007-11-03 14:28   ` Yann Dirson
2007-11-04 18:40     ` Karl Hasselström
2007-11-04 18:34   ` Karl Hasselström
2007-11-15 18:28 ` Catalin Marinas
2007-11-16  7:42   ` Karl Hasselström
2007-11-16  8:36     ` Karl Hasselström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.