All of lore.kernel.org
 help / color / mirror / Atom feed
* Simplifying work across multiple projects (while tracking  relationships among commit histories)
@ 2010-05-31  6:41 Yang Zhang
  2010-05-31  8:56 ` Jonathan Nieder
  0 siblings, 1 reply; 2+ messages in thread
From: Yang Zhang @ 2010-05-31  6:41 UTC (permalink / raw)
  To: git

After looking at some of the tools/techniques out there for working
with multiple git projects (submodules, subtree merge, braid, repo),
it seems that none are really well-suited for our use case. We're
developing a large system consisting of several components (libraries,
servers, applications, etc.). None of these components will ever exist
or be released as a stand-alone product. We're in "rapid development"
mode, so we're not even close to dealing with e.g. manually
maintaining information on versions/dependencies, and we just want
very tight integration among all the components -- yet the components
do deserve their own disentangled histories and (eventually)
independent branches/tags/versions/etc.

If we were using svn, all the code would live in a single repository,
and that would be all there was to think about this. However, it seems
that our use case (surprisingly) doesn't have a lot of good support in
the DVCS world.

For now, we'll probably just have some simple scripts that basically
do 'for i in $projects' loops for pulls, pushes, commits, etc.
However, this loses a lot of information that should be tracked about
the version/dependency information among the projects -- information
that at the same time we're not interested in manually tracking. We're
currently thinking of having a simple system that is initially set up
with a dependency graph among projects, e.g.:

  a: no dependencies
  b: depends on a

and whenever a commit is made to a project with dependencies (b), the
commit (perhaps in the commit message) contains a reference to the
particular versions of the dependent project(s) (a) that were checked
out.

The tool could simplify the use of such a scheme, e.g.:

- automatically augmenting commit messages with this information
- on commits/pushes, first commit/push the dependent projects
- checking out consistent versions of all the projects (or subgraphs thereof)

Does this make sense to others? Are we overlooking a better/existing
approach? Would it be worth building this? Suggestions on design
improvements to such a tool over what was described (e.g. better
approach than augmenting commit messages)?
--
Yang Zhang
http://yz.mit.edu/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Simplifying work across multiple projects (while tracking relationships among commit histories)
  2010-05-31  6:41 Simplifying work across multiple projects (while tracking relationships among commit histories) Yang Zhang
@ 2010-05-31  8:56 ` Jonathan Nieder
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Nieder @ 2010-05-31  8:56 UTC (permalink / raw)
  To: Yang Zhang; +Cc: git, Jens Lehmann, Johan Herland

Hi Yang,

Yang Zhang wrote:

> We're
> developing a large system consisting of several components (libraries,
> servers, applications, etc.).
[...]
> For now, we'll probably just have some simple scripts that basically
> do 'for i in $projects' loops for pulls, pushes, commits, etc.
> However, this loses a lot of information that should be tracked about
> the version/dependency information among the projects -- information
> that at the same time we're not interested in manually tracking. We're
> currently thinking of having a simple system that is initially set up
> with a dependency graph among projects, e.g.:
> 
>   a: no dependencies
>   b: depends on a
> 
> and whenever a commit is made to a project with dependencies (b), the
> commit (perhaps in the commit message) contains a reference to the
> particular versions of the dependent project(s) (a) that were checked
> out.

It sounds to me like submodules would be a better approach.  Because
it fits my great love of complaining (and I would like to hear what
solutions you come up with, if any), let me try to go through the
problems you would run into.

I am not a submodule developer or heavy user, so please check anything
I say before relying on it.

1. Suppose you are working on the program frobber and you notice it
   contains a usable sub-component veryfastregexp.  So you make a new
   repository for it, make sure it builds on its own, and publish.
   As always when starting a new project, there is a question of how
   much early history to preserve.  Probably best to start with a
   single commit, and provide a separate branch with
   ‘filter-branch --subdirectory-filter’ output if you are feeling
   generous.

   In frobber, you remove the copy of veryfastregexp, add it back
   with ‘git submodule add git://someserver/path/to/veryfastregexp’,
   commit, and publish.

    - New clones must use ‘git clone --recursive’.  How do you
      advertise this?

    - Existing clones must use ‘git submodule update --init’ after
      they pull.  In fact, it seems to me it’s not a bad idea to
      always use ‘git submodule update --init --recursive’ after each
      pull.  How do you advertise this?

    - Incoming patches that touch both veryfastregexp and frobber
      have to be split into separate patches for the two projects.
      How?

    - Pull requests are even worse (or just as bad, depending on
      how you solved the previous problem).

2. Some people like to use the latest stable version of all components
   they use, while other people like to avoid change wherever
   possible.  I’ll consider the latter sort of person in a moment.

   The developers of frobber want to use the latest version on the
   master branch for all components.  So they try the following:

	git submodule foreach '
		git checkout master &&
		git pull &&
		git submodule update --init --recursive
	'

   This checks out a branch tracking the upstream master branch
   for each submodule.

   Next they run ‘git add -u’ to mention all the updated submodule
   versions, test to make sure everything’s okay, and commit.

    - This does not bring sub-submodules to the latest version at the
      same time.  If the frobber developers wanted to do that, they
      might try

	git submodule foreach --recursive '
		git checkout master &&
		git pull &&
		git submodule update --init --recursive
	'

      Then they run ‘git add -u’, test, and run ‘git commit’.  But
      the editor informs them that submodules have unstaged changes.
      What happened, what are the consequences for others using this
      project, and can this be avoided?

      I’ll return to this in a moment.

3. Some people never upgrade until forced to.

   The veryfastregexp library is a resounding success, picked up
   by other people in the company, and rapidly developed.  After a
   particularly painful upgrade, the developers of frobber have a turn
   for the conservative.  From now on, their policy is “necessary
   fixes only”.  So they would like to maintain their own branch and
   cherry-pick from master.

   They put in a request for privileges to commit to their own branch,
   and wait.  In the meantime, an important fix comes up.  So they
   do the only thing they can do: publish a fork of veryfastregexp,
   update .gitmodules to point to it, run ‘git add -u’ to register
   the version they are using, test, commit, and push.

   New clones made with ‘git clone --recursive’ will use the
   project-specific version of veryfastregexp.

    - Users with existing clones must update the URL with
      ‘git remote set-url origin <new url>’ from the submodule
      or ‘git submodule sync’; otherwise, the next time they run
      ‘git submodule update’ there will be an ‘unable to checkout’
      error.  How do you advertise this?

    - Suppose a frobber developer tries the following from the
      frobber repository:

	cd veryfastregexp && git cherry-pick important-fix

      runs ‘git add -u’ from the toplevel, tests, commits, and
      pushes the result.  Of course, an important step is missing: he
      forgot to push to the veryfastregexp-frobber repository!  
      Anyone who tries to pull this change and run ‘git submodule
      update’ will find the commit object missing and be unable to
      check out the new revision.

      An update hook could have prevented this, since from the
      server side it is obvious which objects a new clone will have
      access to.  Where can one find such a hook?

   The frobber developers’ request for a branch in the veryfastregexp
   repository is granted.  So they switch .gitmodules back again
   and keep the submodule pointed to the for-frobber branch, updating
   as needed.

    - Now the old recipe

	git submodule foreach --recursive '
		git checkout master &&
		git pull &&
		git submodule update --init --recursive
	'

     does not work for them anymore, since this would switch
     the frobber branch back.  What should they do to adjust?
     How can they make it easy for new people on their team to
     get started, too?

4. The developers who want all components to be aggressively updated
   (see #2 above) need to do something similar.  They first switch all
   components to point to repositories with branches they own, and
   then run aggressively-update, where aggressively-update is a
   script something like

	#!/bin/sh
	git reset --keep upstream/master &&
	git submodule foreach aggressively-update &&
	git add -u &&
	make test &&
	git commit -v &&
	git push -f origin master

   - Can this be modified to pick up new submodules?

Thoughts welcome.
Jonathan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-05-31  8:56 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-31  6:41 Simplifying work across multiple projects (while tracking relationships among commit histories) Yang Zhang
2010-05-31  8:56 ` Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.