All of lore.kernel.org
 help / color / mirror / Atom feed
* Notes on Using Git with Subprojects
@ 2006-09-26 17:40 A Large Angry SCM
  2006-09-26 20:25 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-26 17:40 UTC (permalink / raw)
  To: git

							20060926.1715

Notes on Using Git with Subprojects
===================================
Copyright (C) 2006 by Raymond S Brand


Git does not have native support for subprojects and this is a good
thing because managing subprojects is better handled by the project
build machinery. Managing subprojects with the project build machinery
is more flexible than the native support from an SCM and allows the use
of different SCMs by different subprojects. However, there is a lot of
interest in using Git for the SCM of a project with subprojects.

Git, unfortunately, does not make it easy. What is wanted is to put all
of the subprojects in one repository and be able to checkout the various
parts from a local copy of the repository. The problem is, with Git, a
repository can have at most one working directory associated with it at
a time. This is because Git stores a lot of information about the
contents of the working directory in the repository. In fact, the usual
situation is that the repository, itself, is in the working directory.

This note describes a method to use Git with subprojects; other methods
are also possible.


Definitions
-----------
Parent Project:
	A project that logically contains one or more subprojects.

Project:
	A set of files that is (relatively) self contained with respect
	to changes and treated as a unit by the SCM.

Root Project:
	A project that is not a subproject.

Subproject:
	A project logically contained in another project.


Setup
-----
All subprojects are contained in a single repository and referred to by
separate branches in $GIT_DIR/refs/heads/. Developers create a local
copy of the project repository; hereafter, referred to as the "local
master repository" or just "local master". All fetches, pulls and pushes
by the developers with the "local master" of other developers should be
to and from their own "local master".


Root Project Checkout
---------------------
The root project is the project that all of the subprojects are a part
of. It is a parent project of one or more subprojects; each of which can
also be parent projects to other subprojects.

To checkout the root project, choose an name for the project working
directory (the working directory must not already exist) and perform the
equivalent of the following commands.

	git-clone -s -n $LOCAL_MASTER $ROOT_DIR \
		&& cd $ROOT_DIR \
		&& git-checkout -b $ROOT_BRANCH--local $ROOT_BRANCH

Where:
	$LOCAL_MASTER is the path to the "local master"
	$ROOT_DIR is the name of the working directory to use
	$ROOT_BRANCH is the branch name of the root project
	$ROOT_BRANCH--local is a branch for local changes

This will leave the current working directory of the shell in the
project working directory.

This also creates a branch with the suffix of "--local" to hold all of
the local working directory commits and modifications. The $ROOT_BRANCH
is used as a tracking branch so that upstream changes can be fetched
into the working repository without affecting the checked out files.

Once the root project is checked out, the subprojects are checked out.


Subproject Checkout
-------------------
Each project that is a parent project needs to checkout all of the
subprojects of the project. Each subproject is checked out with the
equivalent of the following bash commands:

	git-clone -s -n $LOCAL_MASTER $SUBPROJECT_DIR \
		&& ( cd $SUBPROJECT_DIR \
			&& git-checkout -b $SUBPROJECT_BRANCH--local \
				$SUBPROJECT_BRANCH
		)

Where:
	$LOCAL_MASTER is the path to the "local master"
	$SUBPROJECT_DIR is the directory name of the subproject
	$SUBPROJECT_BRANCH is the branch name of the subproject
	$SUBPROJECT_BRANCH--local is a branch for local changes
	
If a subproject has subprojects, then the checkouts need to done
recursively. With suitable project/subproject/branch naming conventions
this can easily automated.


Project Development
-------------------
Changes to a project are performed in the working directory of the
project and are recorded in the repository in the working directory on
the $PROJECT--local branch.


Receiving Project Upstream Changes
----------------------------------
Upstream project changes are first fetched into the project tracking
branch of the local master repository and are then fetched into the
project tracking branch of the working directory repositories. To merge
upstream changes into the working directory, a pull from the project
tracking branch of the working directory repository executed.

	# Fetch project branch from upstream to local master
	(cd $LOCAL_MASTER && git-fetch $UPSTREAM $PROJECT_BRANCH)

	# Fetch project branch from local master to working repo
	git-fetch $LOCAL_MASTER $PROJECT_BRANCH

	# Merge upstream changes in to working directory
	git-pull --no-commit . $PROJECT_BRANCH

Where:
	$LOCAL_MASTER is the path to the "local master"
	$UPSTREAM is the Git URL of the upstream repository
	$PROJECT_BRANCH is the branch name of the (sub)project
	

Sending Project Changes Upstream
--------------------------------
To send project changes upstream from a working directory repository,
the changes are first pushed to a branch in the local master repository,
$PROJECT--$IDENT. The changes can then be pushed or pulled from the
local master repository.

	# Push project changes to local master
	git-push $LOCAL_MASTER \
		$PROJECT_BRANCH--local:$PROJECT_BRANCH--$IDENT

	# Push project changes from local master to upstream
	(cd $LOCAL_MASTER && git-push $UPSTREAM \
		$PROJECT_BRANCH--$IDENT:$PROJECT_BRANCH--$NICK--$IDENT)

Where:
	$LOCAL_MASTER is the path to the "local master"
	$UPSTREAM is the Git URL of the upstream repository
	$PROJECT_BRANCH is the branch name of the (sub)project
	$PROJECT_BRANCH--local is a branch for local changes
	$IDENT is a label unique for this set of working directories
	$NICK is a (branch name safe) identifier of the developer
	

Automation
----------
Adding the following example code to the makefiles of the root project
and subprojects with the appropriate information can automate most of
the operations needed to support subprojects with Git. The ProjectSetup
target needs the REPOSITORY and IDENT make variables to be set but the
other targets will use the values saved by the ProjectSetup target.

 ---->8----  ---->8----  ---->8----  ---->8----  ---->8----  ---->8----
# Add this to the makefiles
PROJECT := branch/name          # Relative to $GIT_DIR/refs/heads/
SUBPROJECTS := a/b c/d          # Each relative to $GIT_DIR/refs/heads/
include Git_machinery.make

# Rest of the Makefile
 ---->8----  ---->8----  ---->8----  ---->8----  ---->8----  ---->8----
# Git_machinery.make

include GIT-SUBPROJECT-VARS.make

REFS := /refs/heads/$(REFPREFIX)

GIT-SUBPROJECT-VARS.make:
	rm -rf GIT-SUBPROJECT-VARS.make
	echo "REPOSITORY := $(REPOSITORY)" > GIT-SUBPROJECT-VARS.make
	echo "REFPREFIX := $(REFPREFIX)" >> GIT-SUBPROJECT-VARS.make
	echo "IDENT := $(IDENT)" >> GIT-SUBPROJECT-VARS.make

ProjectSetup:: GIT-SUBPROJECT-VARS.make
	for SUBPROJECT in $(SUBPROJECTS) ; \
	do \
	    git-clone -s -n $(REPOSITORY) $$(basename $$SUBPROJECT) \
	    && ( cd $$(basename $$SUBPROJECT) \
		&& git-checkout -b \
		    $(REFS)$$SUBPROJECT--local $(REFS)$$SUBPROJECT \
		&& $(MAKE) ProjectSetup \
	    ) \
	done

FetchMaster::
	git-fetch $(REPOSITORY) $(REFS)$(PROJECT)
	for SUBPROJECT in $(SUBPROJECTS) ; \
	do \
	    $(MAKE) -C $$(basename $$SUBPROJECT) FetchMaster ; \
	done

PullMaster::
	git-fetch $(REPOSITORY) $(REFS)$(PROJECT)
	git-pull --no-commit .  $(REFS)$(PROJECT)
	for SUBPROJECT in $(SUBPROJECTS) ; \
	do \
	    $(MAKE) -C $$(basename $$SUBPROJECT) PullMaster ; \
	done

PushMaster::
	git-push $(REPOSITORY) \
	    $(REFS)$(PROJECT)--local:$(REFS)$(PROJECT)--$(IDENT)
	for SUBPROJECT in $(SUBPROJECTS) ; \
	do \
	    $(MAKE) -C $$(basename $$SUBPROJECT) PushMaster ; \
	done

 ---->8----  ---->8----  ---->8----  ---->8----  ---->8----  ---->8----

The example Makefile code has some limitations: There is no error
checking. It does not handle subprojects rooted in a non top level
directory of the parent project. There are other Git commands that can
be usefully applied recursively to all subprojects.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 17:40 Notes on Using Git with Subprojects A Large Angry SCM
@ 2006-09-26 20:25 ` Johannes Schindelin
  2006-09-26 22:01   ` A Large Angry SCM
  2006-09-26 21:23 ` Daniel Barkalow
  2006-10-01  5:19 ` A Large Angry SCM
  2 siblings, 1 reply; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-26 20:25 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git

Hi,

On Tue, 26 Sep 2006, A Large Angry SCM wrote:

> 							20060926.1715

You forgot the time zone ;-)

> Notes on Using Git with Subprojects
> ===================================
> Copyright (C) 2006 by Raymond S Brand
> 
> 
> Git does not have native support for subprojects and this is a good
> thing because managing subprojects is better handled by the project
> build machinery. Managing subprojects with the project build machinery
> is more flexible than the native support from an SCM and allows the use
> of different SCMs by different subprojects. However, there is a lot of
> interest in using Git for the SCM of a project with subprojects.
> 
> Git, unfortunately, does not make it easy.

After skimming your text, I imagine that it should be possible (read: 
easy) to write a really simple script which does what you describe, 
storing the relevant information about root or sub projects in the config.

However, you left out one of the most important aspects of subprojects: 
the ability to manage the state of the root project: you can add, update 
and remove subprojects.

A while ago, Junio started playing with a new object type for subprojects 
so that you could have tree objects containing subprojects in addition 
to tree objects and blobs.

Of course, the difficult thing about this is to teach all tools to behave 
sensibly with the new object type.

Now, your approach of having multiple clones (sharing the object pool) is 
more simple than Junio's approach: no need to introduce a new object type, 
or adapt existing tools.

Taking this a step further, how about managing the root project in this 
manner:

A root project is a branch containing just one special file, 
"root-project". This file consists of lines like these:

-- snip --
f80a17bf3da1e24ac904f9078f68c3bf935ff250 next
03adf42c988195b50e1a1935ba5fcbc39b2b029b todo
-- snap --

The meaning: subdirectory "next" contains subproject "next" which is also 
tracked in the branch "next" of the root project. Likewise for "todo". The 
root project could even contain some administrative files like a Makefile, 
a license, README, etc.

You could even handle the update of root-project with each commit in a 
subproject by a hook in that subproject's .git/hooks/post-commit, so that 
you'd only need a script "git-checkout-root-project.sh" to initialize them 
all, and probably a script "git-update-root-project.sh".

Thoughts?

Ciao,
Dscho

P.S.: Is it just me, or do other people also find it confusing that 
--shared means different things for git-init-db and git-clone? (I know I 
am the sinner...)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 17:40 Notes on Using Git with Subprojects A Large Angry SCM
  2006-09-26 20:25 ` Johannes Schindelin
@ 2006-09-26 21:23 ` Daniel Barkalow
  2006-09-26 21:30   ` Shawn Pearce
  2006-09-26 22:07   ` A Large Angry SCM
  2006-10-01  5:19 ` A Large Angry SCM
  2 siblings, 2 replies; 39+ messages in thread
From: Daniel Barkalow @ 2006-09-26 21:23 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git

On Tue, 26 Sep 2006, A Large Angry SCM wrote:

> Git, unfortunately, does not make it easy. What is wanted is to put all
> of the subprojects in one repository and be able to checkout the various
> parts from a local copy of the repository. The problem is, with Git, a
> repository can have at most one working directory associated with it at
> a time. This is because Git stores a lot of information about the
> contents of the working directory in the repository. In fact, the usual
> situation is that the repository, itself, is in the working directory.

There are a bunch of use cases which people see as subprojects, with 
slightly different desires. For example, I personally don't think there's 
any point to subprojects if a commit of the parent project doesn't specify 
the embedded commits of each subproject (so, for example, you can use 
bisect on the parent project to figure out which act of updating a 
subproject broke the resulting system). AFAICT, your design doesn't handle 
that, but uses the most recently fetched versions of all subprojects, with 
the revision control of the parent only handling revisions in the 
arrangement and membership of subprojects in the parent.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 21:23 ` Daniel Barkalow
@ 2006-09-26 21:30   ` Shawn Pearce
  2006-09-26 22:33     ` A Large Angry SCM
  2006-09-26 22:07   ` A Large Angry SCM
  1 sibling, 1 reply; 39+ messages in thread
From: Shawn Pearce @ 2006-09-26 21:30 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: A Large Angry SCM, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Tue, 26 Sep 2006, A Large Angry SCM wrote:
> 
> > Git, unfortunately, does not make it easy. What is wanted is to put all
> > of the subprojects in one repository and be able to checkout the various
> > parts from a local copy of the repository. The problem is, with Git, a
> > repository can have at most one working directory associated with it at
> > a time. This is because Git stores a lot of information about the
> > contents of the working directory in the repository. In fact, the usual
> > situation is that the repository, itself, is in the working directory.
> 
> There are a bunch of use cases which people see as subprojects, with 
> slightly different desires. For example, I personally don't think there's 
> any point to subprojects if a commit of the parent project doesn't specify 
> the embedded commits of each subproject (so, for example, you can use 
> bisect on the parent project to figure out which act of updating a 
> subproject broke the resulting system). AFAICT, your design doesn't handle 
> that, but uses the most recently fetched versions of all subprojects, with 
> the revision control of the parent only handling revisions in the 
> arrangement and membership of subprojects in the parent.

I agree entirely.

I have about 30 "subprojects" tacked into one large Git repository
for this exact reason.  In at least 5 of these cases they shouldn't
be sharing a Git repository as by all rights they are different
projects.

What I'm doing is sort of like tacking both the Linux kernel and
glibc into the same Git repository because you might need to change
and bisect over updates to the system call layer.  Insane, yes.
Probably shouldn't be done; but right now that interface layer
between several subprojects is still in flux and it makes it rather
easy to keep everything in sync.

Its annoying to perform commits to the "root project" every time the
subproject changes.  And it brings some complexity when you want to
talk about merging that root project.  But if its automated as part
of "git commit" and "git merge" (either directly in those tools or
by hooks users can install) then its probbaly a non issue.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 20:25 ` Johannes Schindelin
@ 2006-09-26 22:01   ` A Large Angry SCM
  2006-09-26 22:13     ` Johannes Schindelin
  0 siblings, 1 reply; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-26 22:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 26 Sep 2006, A Large Angry SCM wrote:
> 
>> 							20060926.1715
> 
> You forgot the time zone ;-)

UTC of course. ;-)

[...]
> 
> After skimming your text, I imagine that it should be possible (read: 
> easy) to write a really simple script which does what you describe, 
> storing the relevant information about root or sub projects in the config.
> 
> However, you left out one of the most important aspects of subprojects: 
> the ability to manage the state of the root project: you can add, update 
> and remove subprojects.

Since each project can have subprojects, the root project is special 
_only_ in that the initial checkout is _not_ handled by the build 
machinery (make in my example). This means that adding and removing 
subprojects should be no different for the root project. A subproject 
should be able to stand on its own (with its subprojects); otherwise, 
it's not a project but instead a directory tree versioned separately.

> A while ago, Junio started playing with a new object type for subprojects 
> so that you could have tree objects containing subprojects in addition 
> to tree objects and blobs.

Which I though was awful. Subprojects are really much better managed by 
the build machinery; it's more flexible and doesn't require all the 
separate projects to use the same VCS.

> Of course, the difficult thing about this is to teach all tools to behave 
> sensibly with the new object type.

And teach all the tools to gracefully handle subprojects using a 
different VCS ...

> Now, your approach of having multiple clones (sharing the object pool) is 
> more simple than Junio's approach: no need to introduce a new object type, 
> or adapt existing tools.
> 
> Taking this a step further, how about managing the root project in this 
> manner:
> 
> A root project is a branch containing just one special file, 
> "root-project". This file consists of lines like these:
> 
> -- snip --
> f80a17bf3da1e24ac904f9078f68c3bf935ff250 next
> 03adf42c988195b50e1a1935ba5fcbc39b2b029b todo
> -- snap --
> 
> The meaning: subdirectory "next" contains subproject "next" which is also 
> tracked in the branch "next" of the root project. Likewise for "todo". The 
> root project could even contain some administrative files like a Makefile, 
> a license, README, etc.

How the state (subproject list, branch names, etc.) is recorded in a 
parent project is only important to the parent project. The parent 
project must also know how to interact with with each of its subprojects.

For instance, if you were building some kind of internet appliance, it 
could have a very large number of subprojects (kernel, various servers 
and daemons, etc) with no common build tool or target set used by all of 
them. What each parent project does in this situation is to "interface" 
to the build machinery of each subproject so that when you command "make 
appliance image" at the top level project, all the subprojects _and_ the 
top level project do what is need to create the appliance image. The 
appliance can depend on projects using many VCS'; Git for the kernel, 
SVN for the web server, CVS for the SMTP daemon, Monotone for some other 
part, etc.

> You could even handle the update of root-project with each commit in a 
> subproject by a hook in that subproject's .git/hooks/post-commit, so that 
> you'd only need a script "git-checkout-root-project.sh" to initialize them 
> all, and probably a script "git-update-root-project.sh".

For _full_ subproject support, handling a tree of projects is required. 
So treating the root project differently that any other parent project 
(except of the initial checkout of the root project), means that you 
can't work a subproject independent of its parent projects.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 21:23 ` Daniel Barkalow
  2006-09-26 21:30   ` Shawn Pearce
@ 2006-09-26 22:07   ` A Large Angry SCM
  1 sibling, 0 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-26 22:07 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow wrote:
> There are a bunch of use cases which people see as subprojects, with 
> slightly different desires. For example, I personally don't think there's 
> any point to subprojects if a commit of the parent project doesn't specify 
> the embedded commits of each subproject (so, for example, you can use 
> bisect on the parent project to figure out which act of updating a 
> subproject broke the resulting system). AFAICT, your design doesn't handle 
> that, but uses the most recently fetched versions of all subprojects, with 
> the revision control of the parent only handling revisions in the 
> arrangement and membership of subprojects in the parent.

That isn't that much different than what I outlined. Instead of 
recording the branch name and directory in the parent project, you could 
record the commit SHA1 ID for each subproject and directory. The 
machinery changes but the idea is the same.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 22:01   ` A Large Angry SCM
@ 2006-09-26 22:13     ` Johannes Schindelin
  2006-09-26 22:45       ` A Large Angry SCM
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-26 22:13 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git

Hi,

On Tue, 26 Sep 2006, A Large Angry SCM wrote:

> How the state (subproject list, branch names, etc.) is recorded in a 
> parent project is only important to the parent project. The parent 
> project must also know how to interact with with each of its 
> subprojects.

Granted, if you mix VCSes, this is most pragmatic.

But it is also wrong: The whole point in bundling the subprojects together 
is (IMHO) to get the benefits of a VCS for the root project, i.e. for the 
combined states of the subprojects. After all, you want to say "I know 
that this collection of projects at these states compiled and worked 
fine."

And if you let a build system handle the stitching of the subprojects, you 
completely lose these benefits.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 21:30   ` Shawn Pearce
@ 2006-09-26 22:33     ` A Large Angry SCM
  2006-09-27  8:06       ` Martin Waitz
  0 siblings, 1 reply; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-26 22:33 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Daniel Barkalow, git

Shawn Pearce wrote:
> Daniel Barkalow <barkalow@iabervon.org> wrote:
[...]
> I agree entirely.
> 
> I have about 30 "subprojects" tacked into one large Git repository
> for this exact reason.  In at least 5 of these cases they shouldn't
> be sharing a Git repository as by all rights they are different
> projects.
> 
> What I'm doing is sort of like tacking both the Linux kernel and
> glibc into the same Git repository because you might need to change
> and bisect over updates to the system call layer.  Insane, yes.
> Probably shouldn't be done; but right now that interface layer
> between several subprojects is still in flux and it makes it rather
> easy to keep everything in sync.
> 
> Its annoying to perform commits to the "root project" every time the
> subproject changes.  And it brings some complexity when you want to
> talk about merging that root project.  But if its automated as part
> of "git commit" and "git merge" (either directly in those tools or
> by hooks users can install) then its probbaly a non issue.

So, for each subproject of a parent project, you want to record branch, 
version (commit ID), and directory location. Not quite as easy to do in 
a makefile but do-able.

An operation that _needs_ to change more than one project's versioned 
state a time should be rare. If you have to do it often, then instead of 
subprojects you probably have a partitioning of one project. A 
subproject should be independent of its parent projects. A merge of a 
parent project should not affect a subproject other than to pick a 
particular subproject version.

Your example of the kernel and glibc is an example of sibling projects. 
Each one in independent and (some) versions of each project should work 
(better or worse) with the other. The root project here shouldn't really 
do more than specify which version of the kernel and glibc to use.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 22:13     ` Johannes Schindelin
@ 2006-09-26 22:45       ` A Large Angry SCM
  0 siblings, 0 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-26 22:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 26 Sep 2006, A Large Angry SCM wrote:
> 
>> How the state (subproject list, branch names, etc.) is recorded in a 
>> parent project is only important to the parent project. The parent 
>> project must also know how to interact with with each of its 
>> subprojects.
> 
> Granted, if you mix VCSes, this is most pragmatic.
> 
> But it is also wrong: The whole point in bundling the subprojects together 
> is (IMHO) to get the benefits of a VCS for the root project, i.e. for the 
> combined states of the subprojects. After all, you want to say "I know 
> that this collection of projects at these states compiled and worked 
> fine."
> 
> And if you let a build system handle the stitching of the subprojects, you 
> completely lose these benefits.

Bundling and subproject support are two different things. Bundling is 
for convenience. Subprojects are usually the result of a dependency on a 
project managed or controlled by some other entity or on some part of 
the larger project with radically different development requirements.

Recording which version of a subproject to use is important and my note 
failed to discuss it. That I'll remedy over the next several days.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 22:33     ` A Large Angry SCM
@ 2006-09-27  8:06       ` Martin Waitz
  2006-09-27  9:55         ` Johannes Schindelin
  2006-09-27 16:58         ` A Large Angry SCM
  0 siblings, 2 replies; 39+ messages in thread
From: Martin Waitz @ 2006-09-27  8:06 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Shawn Pearce, Daniel Barkalow, git

[-- Attachment #1: Type: text/plain, Size: 2526 bytes --]

hoi :)

On Tue, Sep 26, 2006 at 03:33:49PM -0700, A Large Angry SCM wrote:
> So, for each subproject of a parent project, you want to record branch, 
> version (commit ID), and directory location. Not quite as easy to do in 
> a makefile but do-able.

I've been playing with this kind of subprojects a little bit.

My current approach is like this:

 * create a .gitmodules file which lists all the directories
   which contain a submodule.
 * the .git/refs/heads directory of the submodule gets stored in
   .gitmodule/<modulename> inside the parent project
 * both things above should be tracked in the parent project.
   This way you always store the current state of each submodule
   in each commit of the parent project.  And you don't have to
   create a new parent commit for each change.  You can commit
   to the parent project when you think that all your modules are
   in a good state.
 * When checking out a project, all submodules listen in .gitmodules
   get checked out, too.
 * If there is a merge conflict in the module list or its refs/heads,
   this is handled specially, e.g. by triggering a new merge inside
   the submodule.
 * The object directory is shared between the parent and all modules.
   To make fsck-objects happy, the parent gets a refs/module link
   pointing to .gitmodule/ and all submodules get a refs/parent
   link pointing to the refs directory of the parent.

The concept is similiar to the gitlink objects which have been floating
around, but it is easier to prototype as no new git object type has to
be created.  If it works well we can later move the information stored
in .gitmodule* into an object type of its own.

By storing the complete refs/heads directory for each submodule instead
of only one head, it is possible to track multiple branches of a
subproject.  I'm don't know yet how this works out in praktice but I
think that it can be nice to be able to atomically commit to several
branches of one submodule (perhaps one branch per customer, per
hardware platform, whatever).

So, what have I done up to now? Not much. I created a little
script to set up a submodule as described above:
http://git.admingilde.org/?p=tali/git.git;a=blob;f=git-init-module.sh;h=0108873fd3aa8a42035039b19e8555513c075fca;hb=module

Next steps would be to modify clone and checkout to actually be able
to work in such a setup.  If this works then merging of subprojects
has to be done (the most complex part I guess).

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27  8:06       ` Martin Waitz
@ 2006-09-27  9:55         ` Johannes Schindelin
  2006-09-27 11:38           ` Martin Waitz
  2006-09-27 17:13           ` A Large Angry SCM
  2006-09-27 16:58         ` A Large Angry SCM
  1 sibling, 2 replies; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-27  9:55 UTC (permalink / raw)
  To: Martin Waitz; +Cc: A Large Angry SCM, Shawn Pearce, Daniel Barkalow, git

Hi,

On Wed, 27 Sep 2006, Martin Waitz wrote:

> On Tue, Sep 26, 2006 at 03:33:49PM -0700, A Large Angry SCM wrote:
> > So, for each subproject of a parent project, you want to record branch, 
> > version (commit ID), and directory location. Not quite as easy to do in 
> > a makefile but do-able.
> 
> I've been playing with this kind of subprojects a little bit.
> 
> My current approach is like this:
> 
>  * create a .gitmodules file which lists all the directories
>    which contain a submodule.
>  * the .git/refs/heads directory of the submodule gets stored in
>    .gitmodule/<modulename> inside the parent project

Taking this a step further, you could make subproject/.git/refs/heads a 
symbolic link to .git/refs/heads/subproject, with the benefit that fsck 
Just Works.

Nevertheless, you have to take care of the fact that you need to commit 
the state of the root project just after committing to any subproject.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27  9:55         ` Johannes Schindelin
@ 2006-09-27 11:38           ` Martin Waitz
  2006-09-27 12:01             ` Johannes Schindelin
  2006-09-27 17:13           ` A Large Angry SCM
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Waitz @ 2006-09-27 11:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: A Large Angry SCM, Shawn Pearce, Daniel Barkalow, git

[-- Attachment #1: Type: text/plain, Size: 967 bytes --]

hoi :)

On Wed, Sep 27, 2006 at 11:55:22AM +0200, Johannes Schindelin wrote:
> > My current approach is like this:
> > 
> >  * create a .gitmodules file which lists all the directories
> >    which contain a submodule.
> >  * the .git/refs/heads directory of the submodule gets stored in
> >    .gitmodule/<modulename> inside the parent project
> 
> Taking this a step further, you could make subproject/.git/refs/heads a 
> symbolic link to .git/refs/heads/subproject, with the benefit that fsck 
> Just Works.

in fact it is done this way (more or less).

> Nevertheless, you have to take care of the fact that you need to commit 
> the state of the root project just after committing to any subproject.

why?

You can accumulate as many changes in different subprojects until you
get to a state that is worth committing in the parent project.
All these changes are then seen as one atomic change to the whole
project.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 11:38           ` Martin Waitz
@ 2006-09-27 12:01             ` Johannes Schindelin
  2006-09-27 12:44               ` Sven Verdoolaege
  2006-09-27 12:46               ` Martin Waitz
  0 siblings, 2 replies; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-27 12:01 UTC (permalink / raw)
  To: Martin Waitz; +Cc: A Large Angry SCM, Shawn Pearce, Daniel Barkalow, git

Hi.

On Wed, 27 Sep 2006, Martin Waitz wrote:

> On Wed, Sep 27, 2006 at 11:55:22AM +0200, Johannes Schindelin wrote:
> > > My current approach is like this:
> > > 
> > >  * create a .gitmodules file which lists all the directories
> > >    which contain a submodule.
> > >  * the .git/refs/heads directory of the submodule gets stored in
> > >    .gitmodule/<modulename> inside the parent project
> > 
> > Taking this a step further, you could make subproject/.git/refs/heads a 
> > symbolic link to .git/refs/heads/subproject, with the benefit that fsck 
> > Just Works.
> 
> in fact it is done this way (more or less).

With the difference, that if you store the refs outside of 
<root>/.git/refs, you have to take extra care that prune does not delete 
the corresponding objects.

> > Nevertheless, you have to take care of the fact that you need to commit 
> > the state of the root project just after committing to any subproject.
> 
> why?
> 
> You can accumulate as many changes in different subprojects until you
> get to a state that is worth committing in the parent project.
> All these changes are then seen as one atomic change to the whole
> project.

AFAICT this is not the idea of subprojects-in-git. If you have to track 
the subprojects in the root project manually anyway, you don't need _any_ 
additional tool (you _can_ track files in a subdirectory containing a .git 
subdirectory).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 12:01             ` Johannes Schindelin
@ 2006-09-27 12:44               ` Sven Verdoolaege
  2006-09-27 21:05                 ` Junio C Hamano
  2006-09-27 12:46               ` Martin Waitz
  1 sibling, 1 reply; 39+ messages in thread
From: Sven Verdoolaege @ 2006-09-27 12:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Martin Waitz, A Large Angry SCM, Shawn Pearce, Daniel Barkalow, git

On Wed, Sep 27, 2006 at 02:01:11PM +0200, Johannes Schindelin wrote:
> On Wed, 27 Sep 2006, Martin Waitz wrote:
> > On Wed, Sep 27, 2006 at 11:55:22AM +0200, Johannes Schindelin wrote:
> > > Nevertheless, you have to take care of the fact that you need to commit 
> > > the state of the root project just after committing to any subproject.

So what happens if you pull some changes into a subproject?
Are you going to create a commit in the root project for each
intermediate commit that you pulled into your subproject?
If no, then why should you do so if you happen to do these change
in your local repo?

> AFAICT this is not the idea of subprojects-in-git.

As already pointed out by Daniel, there is no such thing as
"the idea of subprojects-in-git".  There are many ideas of
subprojects-in-git.

I, for one, would want to commit the changed state of a subproject
to the superproject explicitly.

> If you have to track 
> the subprojects in the root project manually anyway, you don't need _any_ 
> additional tool (you _can_ track files in a subdirectory containing a .git 
> subdirectory).

If I switch to a different branch or bisect in the superproject,
then the states of the subprojects should be changed accordingly.

skimo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 12:01             ` Johannes Schindelin
  2006-09-27 12:44               ` Sven Verdoolaege
@ 2006-09-27 12:46               ` Martin Waitz
  2006-09-27 13:13                 ` Johannes Schindelin
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Waitz @ 2006-09-27 12:46 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: A Large Angry SCM, Shawn Pearce, Daniel Barkalow, git

[-- Attachment #1: Type: text/plain, Size: 1914 bytes --]

hoi :)

On Wed, Sep 27, 2006 at 02:01:11PM +0200, Johannes Schindelin wrote:
> On Wed, 27 Sep 2006, Martin Waitz wrote:
> > On Wed, Sep 27, 2006 at 11:55:22AM +0200, Johannes Schindelin wrote:
> > > > My current approach is like this:
> > > > 
> > > >  * create a .gitmodules file which lists all the directories
> > > >    which contain a submodule.
> > > >  * the .git/refs/heads directory of the submodule gets stored in
> > > >    .gitmodule/<modulename> inside the parent project
> > > 
> > > Taking this a step further, you could make subproject/.git/refs/heads a 
> > > symbolic link to .git/refs/heads/subproject, with the benefit that fsck 
> > > Just Works.
> > 
> > in fact it is done this way (more or less).
> 
> With the difference, that if you store the refs outside of 
> <root>/.git/refs, you have to take extra care that prune does not delete 
> the corresponding objects.

that's why there is .git/refs/module/modulname -> .gitmodule/modulename.

> > You can accumulate as many changes in different subprojects until you
> > get to a state that is worth committing in the parent project.
> > All these changes are then seen as one atomic change to the whole
> > project.
> 
> AFAICT this is not the idea of subprojects-in-git. If you have to track 
> the subprojects in the root project manually anyway, you don't need _any_ 
> additional tool (you _can_ track files in a subdirectory containing a .git 
> subdirectory).

But then you loose the fine grained commits of your subprojects.
You only store the tree of the subproject when committing to the parent,
not the entire history.

I think having the "commit subproject changes to parent" step as a
manual action makes sense in the same way as you have to trigger a
commit to a repository by hand, too. You are not storing every little
change to your filesystem in the database.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 12:46               ` Martin Waitz
@ 2006-09-27 13:13                 ` Johannes Schindelin
  0 siblings, 0 replies; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-27 13:13 UTC (permalink / raw)
  To: Martin Waitz; +Cc: A Large Angry SCM, Shawn Pearce, Daniel Barkalow, git

Hi,

On Wed, 27 Sep 2006, Martin Waitz wrote:

> On Wed, Sep 27, 2006 at 02:01:11PM +0200, Johannes Schindelin wrote:
>
> > AFAICT this is not the idea of subprojects-in-git. If you have to track 
> > the subprojects in the root project manually anyway, you don't need _any_ 
> > additional tool (you _can_ track files in a subdirectory containing a .git 
> > subdirectory).
> 
> But then you loose the fine grained commits of your subprojects. You 
> only store the tree of the subproject when committing to the parent, not 
> the entire history.

The more I get told about subprojects (I don't have a use for them 
myself), the more I think you are right: subprojects should not be 
integrated deeply into git.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27  8:06       ` Martin Waitz
  2006-09-27  9:55         ` Johannes Schindelin
@ 2006-09-27 16:58         ` A Large Angry SCM
  2006-09-27 17:33           ` Jeff King
  2006-09-28  7:37           ` Martin Waitz
  1 sibling, 2 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-27 16:58 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Shawn Pearce, Daniel Barkalow, git

Martin Waitz wrote:
[...]
> My current approach is like this:
> 
>  * create a .gitmodules file which lists all the directories
>    which contain a submodule.
>  * the .git/refs/heads directory of the submodule gets stored in
>    .gitmodule/<modulename> inside the parent project
>  * both things above should be tracked in the parent project.
>    This way you always store the current state of each submodule
>    in each commit of the parent project.  And you don't have to
>    create a new parent commit for each change.  You can commit
>    to the parent project when you think that all your modules are
>    in a good state.

This means that modules are not separate, stand alone projects but, 
rather, just a sub part of your bigger project. Very useful and 
applicable in some situations but other situations want/need separate, 
stand alone subprojects.

>  * When checking out a project, all submodules listen in .gitmodules
>    get checked out, too.
>  * If there is a merge conflict in the module list or its refs/heads,
>    this is handled specially, e.g. by triggering a new merge inside
>    the submodule.
>  * The object directory is shared between the parent and all modules.
>    To make fsck-objects happy, the parent gets a refs/module link
>    pointing to .gitmodule/ and all submodules get a refs/parent
>    link pointing to the refs directory of the parent.
> 
[...]
> By storing the complete refs/heads directory for each submodule instead
> of only one head, it is possible to track multiple branches of a
> subproject.  I'm don't know yet how this works out in praktice but I
> think that it can be nice to be able to atomically commit to several
> branches of one submodule (perhaps one branch per customer, per
> hardware platform, whatever).

It's not immediately clear to me if tracking several long term 
(globally) visible branches in a checkout sub module is generally useful 
or only useful in special situations. I need to think about this...

[...]

You solved a similar problem to the one I'm working on; and one that is 
applicable to a number of projects. Namely, projects where all the parts 
are under the control of the same entity. For projects looking to escape 
  CVS, that use CVS modules, this looks like a Git solution.

The problem I'm working on is how to deal with the sub parts of a larger 
project when those sub parts are controlled by different entity. Silly 
example: the kernel is controlled by Linus; glibc is controlled by the 
GNU folks, gcc is controlled by some other GNU folks, the web server is 
controlled by the Apache Foundation, the X server is controlled by Xorg, 
etc.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27  9:55         ` Johannes Schindelin
  2006-09-27 11:38           ` Martin Waitz
@ 2006-09-27 17:13           ` A Large Angry SCM
  2006-09-27 23:14             ` Johannes Schindelin
  1 sibling, 1 reply; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-27 17:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Martin Waitz, Shawn Pearce, Daniel Barkalow, git

Johannes Schindelin wrote:
> Hi,
> 
> On Wed, 27 Sep 2006, Martin Waitz wrote:
> 
>> On Tue, Sep 26, 2006 at 03:33:49PM -0700, A Large Angry SCM wrote:
>>> So, for each subproject of a parent project, you want to record branch, 
>>> version (commit ID), and directory location. Not quite as easy to do in 
>>> a makefile but do-able.
>> I've been playing with this kind of subprojects a little bit.
>>
>> My current approach is like this:
>>
>>  * create a .gitmodules file which lists all the directories
>>    which contain a submodule.
>>  * the .git/refs/heads directory of the submodule gets stored in
>>    .gitmodule/<modulename> inside the parent project
> 
> Taking this a step further, you could make subproject/.git/refs/heads a 
> symbolic link to .git/refs/heads/subproject, with the benefit that fsck 
> Just Works.

Wouldn't an fsck in the parent complain about missing objects?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 16:58         ` A Large Angry SCM
@ 2006-09-27 17:33           ` Jeff King
  2006-09-28  3:47             ` A Large Angry SCM
  2006-09-28  7:37           ` Martin Waitz
  1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2006-09-27 17:33 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Martin Waitz, Shawn Pearce, Daniel Barkalow, git

On Wed, Sep 27, 2006 at 09:58:43AM -0700, A Large Angry SCM wrote:

> This means that modules are not separate, stand alone projects but, 
> rather, just a sub part of your bigger project. Very useful and 
> applicable in some situations but other situations want/need separate, 
> stand alone subprojects.

One thing that I believe some people have requested for subprojects is
to avoid downloading files/history for subprojects you're not interested
in.  I think this could be faciliated in this scheme by only cloning the
heads of the subprojects you're interested in (there would need to be
special machinery to handle this at the root level if we want to allow
making root commits without necessarily having all of the subprojects).

A first step to this would be an argument to git-clone to allow cloning
only a subset of refs.

> The problem I'm working on is how to deal with the sub parts of a larger 
> project when those sub parts are controlled by different entity. Silly 
> example: the kernel is controlled by Linus; glibc is controlled by the 
> GNU folks, gcc is controlled by some other GNU folks, the web server is 
> controlled by the Apache Foundation, the X server is controlled by Xorg, 
> etc.

The nice thing about this approach is that I believe the two systems
need only differ at clone time. Instead of creating a remotes file with
all of the subproject branches, you would just create multiple remotes
files. The root fetching porcelain needs to be smart enough to fetch all
remotes.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 12:44               ` Sven Verdoolaege
@ 2006-09-27 21:05                 ` Junio C Hamano
  2006-09-28 15:02                   ` Michael S. Tsirkin
  2006-09-28 20:16                   ` Jeff King
  0 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2006-09-27 21:05 UTC (permalink / raw)
  To: skimo
  Cc: Martin Waitz, A Large Angry SCM, Shawn Pearce, Daniel Barkalow,
	git, Josh Triplett, Jamey Sharp

Sven Verdoolaege <skimo@kotnet.org> writes:

> On Wed, Sep 27, 2006 at 02:01:11PM +0200, Johannes Schindelin wrote:
>
>> AFAICT this is not the idea of subprojects-in-git.
>
> As already pointed out by Daniel, there is no such thing as
> "the idea of subprojects-in-git".  There are many ideas of
> subprojects-in-git.
>
> I, for one, would want to commit the changed state of a subproject
> to the superproject explicitly.

I think this is a very good summary of the point in this entire
thread.  Different workflows call for different granularity, and
if something deserves to be called "subproject", not just "a
subdirectory of a single project", it is not unreasonable to
think that it would want to track its own state at different
pace from the other parts of the project, and at finer grain
than the project taken as the whole.

Not allowing subprojects to do so means every little change
anywhere in the project tree results in a tree-wide new commit
object; in that case, the whole thing is a single large project
from core-git's point of view.

Avoiding checking out parts of the project tree that you do not
care about while you work on such a single large project is
another interesting and useful area to think about, but I would
say at that point it is not about subproject at all -- it is
about working in a sparsely populated working tree of a single
project.

XCB team recently told us that they started from such a single
large project and now they are splitting that into separate
pieces.  Their experiences may be valuable to be shared to
discuss pros-and-cons of these two approaches.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 17:13           ` A Large Angry SCM
@ 2006-09-27 23:14             ` Johannes Schindelin
  2006-09-27 23:36               ` Shawn Pearce
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-27 23:14 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Martin Waitz, Shawn Pearce, Daniel Barkalow, git

Hi,

On Wed, 27 Sep 2006, A Large Angry SCM wrote:

> Johannes Schindelin wrote:
> > Hi,
> > 
> > On Wed, 27 Sep 2006, Martin Waitz wrote:
> > 
> > > On Tue, Sep 26, 2006 at 03:33:49PM -0700, A Large Angry SCM wrote:
> > > > So, for each subproject of a parent project, you want to record branch,
> > > > version (commit ID), and directory location. Not quite as easy to do in
> > > > a makefile but do-able.
> > > I've been playing with this kind of subprojects a little bit.
> > > 
> > > My current approach is like this:
> > > 
> > >  * create a .gitmodules file which lists all the directories
> > >    which contain a submodule.
> > >  * the .git/refs/heads directory of the submodule gets stored in
> > >    .gitmodule/<modulename> inside the parent project
> > 
> > Taking this a step further, you could make subproject/.git/refs/heads a
> > symbolic link to .git/refs/heads/subproject, with the benefit that fsck Just
> > Works.
> 
> Wouldn't an fsck in the parent complain about missing objects?

... not if my original idea (which I might have forgotten to mention ;-) 
was implemented: symlinking subproject/.git/objects to .git/objects.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 23:14             ` Johannes Schindelin
@ 2006-09-27 23:36               ` Shawn Pearce
  2006-09-27 23:55                 ` Rogan Dawes
  2006-09-28  4:48                 ` A Large Angry SCM
  0 siblings, 2 replies; 39+ messages in thread
From: Shawn Pearce @ 2006-09-27 23:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: A Large Angry SCM, Martin Waitz, Daniel Barkalow, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Wed, 27 Sep 2006, A Large Angry SCM wrote:
> > Wouldn't an fsck in the parent complain about missing objects?
> 
> ... not if my original idea (which I might have forgotten to mention ;-) 
> was implemented: symlinking subproject/.git/objects to .git/objects.

Right.  Which is one of the truely wonderful things about symlinks
in .git/refs and symlinking .git/objects. :-)


I don't know about anyone else but this thread has certainly helped
me rationalize a few thoughts about "subproject" support.

The major things I've taken away from it are:

 - Subprojects of any reasonable SCM should be supported.

   Although this is Git we sometimes want to play nice with other
   people working in the same pond.  We have a historical track
   record of doing this when it makes sense (git-svn, git-cvsserver,
   etc.) but clearly doing it for every SCM out there is not
   possible.

   But that said having "out of the box" support for Git subprojects
   within a larger Git project should Just Work.  It doesn't
   really yet.

 - Higher level projects should drive subprojects.

   Higher level projects tend to be composed of specific revisions or
   specific generations of subprojects.

   Part of the content of the higher level project is just what
   those subproject specifications are and how those subprojects
   should appear in a working directory.

 - Git Porcelain should help the user.

   Git operations should translate down through lower level projects
   when possible, and lower level project changes should push up
   when possible.

   E.g. git-fetch in a higher level repository should percolate
   down into the lower level repositories automatically.  Ditto with
   git-checkout and probably git-push.  git-commit in a lower level
   repository probably should update the specification file(s)
   in the higher level repository but not commit the higher level
   repository.

 - The subproject SCM interface needs to be modular.

   Users need to include many different subprojects and not all of
   them use Git.  Ideally Git would be able to at least be easily
   taught by the user how to invoke a particular subproject's SCM
   for the purpose of an initial checkout, if not for additional
   operations such as pull, push, commit and tag.


I used the term "generation of subprojects" as not everyone wants
to bind their root project to a specific revision of a subproject.
Indeed that may not be entirely practical.  Instead just a particular
lineage of development (e.g. "Version 1.0" vs. "Version 1.2")
may be all that is needed.

For example including a CVS hosted subproject into a larger Git
project means you can't use a specific SHA1 to reference a single
version of that CVS subproject.  Yet you can use a CVS branch
or label name.  But both of those are moving targets in a CVS
repository.  But in a sane CVS project a label or a branch will
be relatively stable over time, meaning that its good enough given
that its all we got (without importing everything into Git anyway).

Likewise a Git project including a Git subproject should be able
to reference a named tag of the subproject.  If the subproject
changes its tag and the fetcher agrees to the change (with a
--force) then its OK for the subproject to follow that tag change.
Likewise it should be acceptable for a subproject to reference a
specific branch head.  Although this is a moving target that may be
acceptable while the higher level project is under rapid development.

However an annoted tag probably should not be able to be created
on the higher level project unless all lower-level subprojects
are referenced by tags (or the equivilant) in their SCM.  Which
implies using a "stable tag" in CVS, a "/tags/foo@rev" in SVN,
or an annotated tag in Git and updating the specification file(s)
to reflect that.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 23:36               ` Shawn Pearce
@ 2006-09-27 23:55                 ` Rogan Dawes
  2006-09-28  0:36                   ` Shawn Pearce
  2006-09-28  5:02                   ` A Large Angry SCM
  2006-09-28  4:48                 ` A Large Angry SCM
  1 sibling, 2 replies; 39+ messages in thread
From: Rogan Dawes @ 2006-09-27 23:55 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: A Large Angry SCM, Martin Waitz, Daniel Barkalow, git

Shawn Pearce wrote:

> 
>  - Higher level projects should drive subprojects.
> 
>    Higher level projects tend to be composed of specific revisions or
>    specific generations of subprojects.
> 
>    Part of the content of the higher level project is just what
>    those subproject specifications are and how those subprojects
>    should appear in a working directory.
> 

> 
> I used the term "generation of subprojects" as not everyone wants
> to bind their root project to a specific revision of a subproject.
> Indeed that may not be entirely practical.  Instead just a particular
> lineage of development (e.g. "Version 1.0" vs. "Version 1.2")
> may be all that is needed.
> 

Does it not make sense that a commit of the higher level project should 
include the contents of its subprojects at that particular moment in time?

e.g. using the previous example of a kernel, apache, glibc, etc

You may track the subprojects using whatever scm applies to THAT 
subproject. But when you want to record the state of the entire project, 
you want to include the state of the subprojects. So, your super-project 
commit would actually recurse down into the working directories of the 
subprojects and record the state/contents of each file that makes up 
each of the subprojects.

So, if someone is tracking the overall project, and they do a pull of 
v1.1 (tag), they will see exactly what v1.1 looked like in your repo.

What this makes me think is that it might be useful to have a mechanism 
for recalculating the tree-ish of a subdirectory and finding an 
associated commit, for the case where a subproject is also managed by git.

i.e. given a super-project in this state, and knowing that this 
subproject is managed by git, which revision of the subproject are we 
talking about, and can we find a commit that matches this tree-ish? 
(assuming we have the history of the subproject available, of course)

Regards,

Rogan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 23:55                 ` Rogan Dawes
@ 2006-09-28  0:36                   ` Shawn Pearce
  2006-09-28  5:02                   ` A Large Angry SCM
  1 sibling, 0 replies; 39+ messages in thread
From: Shawn Pearce @ 2006-09-28  0:36 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: A Large Angry SCM, Martin Waitz, Daniel Barkalow, git

Rogan Dawes <lists@dawes.za.net> wrote:
> Shawn Pearce wrote:
> 
> >
> > - Higher level projects should drive subprojects.
> >
> >   Higher level projects tend to be composed of specific revisions or
> >   specific generations of subprojects.
> >
> >   Part of the content of the higher level project is just what
> >   those subproject specifications are and how those subprojects
> >   should appear in a working directory.
> >
> 
> >
> >I used the term "generation of subprojects" as not everyone wants
> >to bind their root project to a specific revision of a subproject.
> >Indeed that may not be entirely practical.  Instead just a particular
> >lineage of development (e.g. "Version 1.0" vs. "Version 1.2")
> >may be all that is needed.
> >
> 
> Does it not make sense that a commit of the higher level project should 
> include the contents of its subprojects at that particular moment in time?

If the subproject is a Git repository we can simply record a commit
SHA1 or a tag SHA1 and assuming we can locate the objects for that
repository in the future we can fully recover that subproject.
No subtree tracking necessary.

If the subproject is an SVN repository we can simply record
the branch/tag path and the specific revision number and fully
recover that subproject, assuming we have access to the central
SVN repository.  If we don't or we are concerned about that then we
would want to use git-svn/git-svnimport to mirror the SVN repository
and treat it as a proper Git subproject.

If the subproject is a Monotone or Darcs project I believe we could
also store some small token for a unique directory state of that
subproject and recover that from Monotone or Darcs without importing
the entire subproject into Git.  If you don't trust Monotone or Darcs
then maybe you should be importing/mirroring that project into Git.
At which point its a proper Git subproject.

If the subproject is CVS then you either need to trust the CVS
tag or import the tree into Git.  If you are going to import the
tree into Git then you might as well just treat the subproject as
a proper Git subproject and use some sort of CVS import or mirror
to keep that in sync.

Etc...

My basic point here is that a subproject already has its own SCM
and if its not Git you probably either are going to convert it to
Git via a Git bridge (git-svn anyone?) or you are going to keep it
in its "native" SCM as you probably don't want to incur the disk
costs of storing it twice (once in its native SCM and once in Git).

So subtree tracking is probably unncessary.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 17:33           ` Jeff King
@ 2006-09-28  3:47             ` A Large Angry SCM
  2006-09-28  3:52               ` Jeff King
  2006-09-28  3:52               ` Shawn Pearce
  0 siblings, 2 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-28  3:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin Waitz, Shawn Pearce, Daniel Barkalow, git

Jeff King wrote:
[...]
> One thing that I believe some people have requested for subprojects is
> to avoid downloading files/history for subprojects you're not interested
> in.  I think this could be faciliated in this scheme by only cloning the
> heads of the subprojects you're interested in (there would need to be
> special machinery to handle this at the root level if we want to allow
> making root commits without necessarily having all of the subprojects).

In what I'm suggesting, commits are local to a project's working 
directory repository and are pushed somewhere else to be recorded long 
term. Since projects are stand alone, possibly with dependencies, 
working on a (sub)project without having other associated (sub)projects 
is accomplished by checking it out.

> A first step to this would be an argument to git-clone to allow cloning
> only a subset of refs.

Something like this?

	git-init-db
	git-fetch <repository> <refspecs>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  3:47             ` A Large Angry SCM
@ 2006-09-28  3:52               ` Jeff King
  2006-09-28  3:58                 ` Shawn Pearce
  2006-09-28  3:52               ` Shawn Pearce
  1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2006-09-28  3:52 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Martin Waitz, Shawn Pearce, Daniel Barkalow, git

On Wed, Sep 27, 2006 at 08:47:34PM -0700, A Large Angry SCM wrote:

> >A first step to this would be an argument to git-clone to allow cloning
> >only a subset of refs.
> Something like this?
> 
> 	git-init-db
> 	git-fetch <repository> <refspecs>

Exactly, but I was suggesting something more user-friendly (e.g., it's
nice to use git-clone because it creates the remotes file). I was going
to hack up a quick change to git-clone, but I think some thought needs
to be given to semantics, especially with respect to tags (should it
imply no tags? Only tags which point to refs we're already fetching?).

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  3:47             ` A Large Angry SCM
  2006-09-28  3:52               ` Jeff King
@ 2006-09-28  3:52               ` Shawn Pearce
  2006-09-28 15:39                 ` Johannes Schindelin
  1 sibling, 1 reply; 39+ messages in thread
From: Shawn Pearce @ 2006-09-28  3:52 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Jeff King, Martin Waitz, Daniel Barkalow, git

A Large Angry SCM <gitzilla@gmail.com> wrote:
> Jeff King wrote:
> [...]
> >One thing that I believe some people have requested for subprojects is
> >to avoid downloading files/history for subprojects you're not interested
> >in.  I think this could be faciliated in this scheme by only cloning the
> >heads of the subprojects you're interested in (there would need to be
> >special machinery to handle this at the root level if we want to allow
> >making root commits without necessarily having all of the subprojects).
> 
> In what I'm suggesting, commits are local to a project's working 
> directory repository and are pushed somewhere else to be recorded long 
> term. Since projects are stand alone, possibly with dependencies, 
> working on a (sub)project without having other associated (sub)projects 
> is accomplished by checking it out.
> 
> >A first step to this would be an argument to git-clone to allow cloning
> >only a subset of refs.
> 
> Something like this?
> 
> 	git-init-db
> 	git-fetch <repository> <refspecs>

More like:

 	git-init-db
 	git-fetch --keep <repository> <refspecs>

but yes.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  3:52               ` Jeff King
@ 2006-09-28  3:58                 ` Shawn Pearce
  2006-09-28  4:00                   ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Shawn Pearce @ 2006-09-28  3:58 UTC (permalink / raw)
  To: Jeff King; +Cc: A Large Angry SCM, Martin Waitz, Daniel Barkalow, git

Jeff King <peff@peff.net> wrote:
> On Wed, Sep 27, 2006 at 08:47:34PM -0700, A Large Angry SCM wrote:
> 
> > >A first step to this would be an argument to git-clone to allow cloning
> > >only a subset of refs.
> > Something like this?
> > 
> > 	git-init-db
> > 	git-fetch <repository> <refspecs>
> 
> Exactly, but I was suggesting something more user-friendly (e.g., it's
> nice to use git-clone because it creates the remotes file). I was going
> to hack up a quick change to git-clone, but I think some thought needs
> to be given to semantics, especially with respect to tags (should it
> imply no tags? Only tags which point to refs we're already fetching?).

If you are fetching a set of commits from a repository you probably
should be fetching any tags that point at the commits you've fetched.
They tend to be few compared to the commits, they tend to be small,
and they tend to be important milestones in the tracked project.

I think that's why the native Git protocol sends tags for any
commits that were also sent.  :)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  3:58                 ` Shawn Pearce
@ 2006-09-28  4:00                   ` Jeff King
  2006-09-28  4:09                     ` Shawn Pearce
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2006-09-28  4:00 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: A Large Angry SCM, Martin Waitz, Daniel Barkalow, git

On Wed, Sep 27, 2006 at 11:58:55PM -0400, Shawn Pearce wrote:

> If you are fetching a set of commits from a repository you probably
> should be fetching any tags that point at the commits you've fetched.
> They tend to be few compared to the commits, they tend to be small,
> and they tend to be important milestones in the tracked project.
> 
> I think that's why the native Git protocol sends tags for any
> commits that were also sent.  :)

Oh, that's clever. :)

Do we do the right thing for non-git transports?

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  4:00                   ` Jeff King
@ 2006-09-28  4:09                     ` Shawn Pearce
  0 siblings, 0 replies; 39+ messages in thread
From: Shawn Pearce @ 2006-09-28  4:09 UTC (permalink / raw)
  To: Jeff King; +Cc: A Large Angry SCM, Martin Waitz, Daniel Barkalow, git

Jeff King <peff@peff.net> wrote:
> On Wed, Sep 27, 2006 at 11:58:55PM -0400, Shawn Pearce wrote:
> 
> > If you are fetching a set of commits from a repository you probably
> > should be fetching any tags that point at the commits you've fetched.
> > They tend to be few compared to the commits, they tend to be small,
> > and they tend to be important milestones in the tracked project.
> > 
> > I think that's why the native Git protocol sends tags for any
> > commits that were also sent.  :)
> 
> Oh, that's clever. :)
> 
> Do we do the right thing for non-git transports?

Yes, I think we do.

Only its not quite as clever as the HTTP/FTP commit walker first
needs to get a list of available refs (which includes tag and
tag^{}) and compares each obtained commit to the ^{} entries.
If there's a match it gets the tag.

And rsync being as dumb as it is should be fetching everything. :)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 23:36               ` Shawn Pearce
  2006-09-27 23:55                 ` Rogan Dawes
@ 2006-09-28  4:48                 ` A Large Angry SCM
  1 sibling, 0 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-28  4:48 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Johannes Schindelin, Martin Waitz, Daniel Barkalow, git

Shawn Pearce wrote:
[...]
>  - Git Porcelain should help the user.
> 
>    Git operations should translate down through lower level projects
>    when possible, and lower level project changes should push up
>    when possible.
> 
>    E.g. git-fetch in a higher level repository should percolate
>    down into the lower level repositories automatically.  Ditto with
>    git-checkout and probably git-push.  git-commit in a lower level
>    repository probably should update the specification file(s)
>    in the higher level repository but not commit the higher level
>    repository.

I think recursing through all subprojects for most Git commands is 
actually the exception. Plus, porcelains aren't going to help much past 
the first subproject that isn't Git managed.

[...]

> However an annoted tag probably should not be able to be created
> on the higher level project unless all lower-level subprojects
> are referenced by tags (or the equivilant) in their SCM.  Which
> implies using a "stable tag" in CVS, a "/tags/foo@rev" in SVN,
> or an annotated tag in Git and updating the specification file(s)
> to reflect that.

I fail to see a reason for this restriction. Each project should be 
managed separately. Also, how do you enforce the restriction on other VCSs?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 23:55                 ` Rogan Dawes
  2006-09-28  0:36                   ` Shawn Pearce
@ 2006-09-28  5:02                   ` A Large Angry SCM
  1 sibling, 0 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-28  5:02 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: Shawn Pearce, Martin Waitz, Daniel Barkalow, git

Rogan Dawes wrote:
[...]
> Does it not make sense that a commit of the higher level project should 
> include the contents of its subprojects at that particular moment in time?
> 
> e.g. using the previous example of a kernel, apache, glibc, etc
> 
> You may track the subprojects using whatever scm applies to THAT 
> subproject. But when you want to record the state of the entire project, 
> you want to include the state of the subprojects. So, your super-project 
> commit would actually recurse down into the working directories of the 
> subprojects and record the state/contents of each file that makes up 
> each of the subprojects.
> 
> So, if someone is tracking the overall project, and they do a pull of 
> v1.1 (tag), they will see exactly what v1.1 looked like in your repo.
> 
> What this makes me think is that it might be useful to have a mechanism 
> for recalculating the tree-ish of a subdirectory and finding an 
> associated commit, for the case where a subproject is also managed by git.
> 
> i.e. given a super-project in this state, and knowing that this 
> subproject is managed by git, which revision of the subproject are we 
> talking about, and can we find a commit that matches this tree-ish? 
> (assuming we have the history of the subproject available, of course)

Some development environments will require that all the (used) code is 
imported into the local VCS of choice. But not all environments. For 
some development environments, recording the version of the subproject 
is sufficient. Assuming it's possible at some future time to get the 
state associated with the version.

Also keep in mind, to effectively participate in a project, you will 
likely need to use the VCS of the project. So importing everything into 
another VCS (Git) will just cause _you_ more work.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 16:58         ` A Large Angry SCM
  2006-09-27 17:33           ` Jeff King
@ 2006-09-28  7:37           ` Martin Waitz
  2006-09-28 20:30             ` A Large Angry SCM
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Waitz @ 2006-09-28  7:37 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Shawn Pearce, Daniel Barkalow, git

[-- Attachment #1: Type: text/plain, Size: 1918 bytes --]

hoi :)

On Wed, Sep 27, 2006 at 09:58:43AM -0700, A Large Angry SCM wrote:
> This means that modules are not separate, stand alone projects but, 
> rather, just a sub part of your bigger project. Very useful and 
> applicable in some situations but other situations want/need separate, 
> stand alone subprojects.

you can do everything with the submodule which would be possible with
a normal GIT repository.  And you can always clone it into an directory
which is not controlled by a parent project.

I really think that this is an very important property of a submodule.

> >By storing the complete refs/heads directory for each submodule instead
> >of only one head, it is possible to track multiple branches of a
> >subproject.  I'm don't know yet how this works out in praktice but I
> >think that it can be nice to be able to atomically commit to several
> >branches of one submodule (perhaps one branch per customer, per
> >hardware platform, whatever).
> 
> It's not immediately clear to me if tracking several long term 
> (globally) visible branches in a checkout sub module is generally useful 
> or only useful in special situations. I need to think about this...

One use-case which may be important here:

The submodule has two different branches which got forked and are not
intended to be merged again.  At some point in time the parent project
wants to switch from one branch of the submodule to another branch.
If a user still has modifications in the old branch and wants to
update the parent project then it is important to know if the local
modifications and those coming from the parent have to be merged or
should stay in different branches.
If the parent is switching branches there should only be some warning
if the user still has modifications in the old branch, giving him the
chance to port the modifications to the other branch.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 21:05                 ` Junio C Hamano
@ 2006-09-28 15:02                   ` Michael S. Tsirkin
  2006-09-28 20:16                   ` Jeff King
  1 sibling, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2006-09-28 15:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: skimo, Martin Waitz, A Large Angry SCM, Shawn Pearce,
	Daniel Barkalow, git, Josh Triplett, Jamey Sharp

Quoting r. Junio C Hamano <junkio@cox.net>:
> Avoiding checking out parts of the project tree that you do not
> care about while you work on such a single large project is
> another interesting and useful area to think about, but I would
> say at that point it is not about subproject at all -- it is
> about working in a sparsely populated working tree of a single
> project.

I agree completely - at least as far as I'm concerned, working in
a sparsely populated working tree is what it's all about.
For example, sometimes I am just editing documentation and
it would be nice 

It's easy to check out just a subdirectory the first time:
>git checkout master `git-ls-tree -r --name-only master subdirectory`
>echo ref: refs/heads/master > .git/HEAD
but when you try a pull/rebase git will check out all of the tree.

Is there some way to avoid this?

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  3:52               ` Shawn Pearce
@ 2006-09-28 15:39                 ` Johannes Schindelin
  0 siblings, 0 replies; 39+ messages in thread
From: Johannes Schindelin @ 2006-09-28 15:39 UTC (permalink / raw)
  To: Shawn Pearce
  Cc: A Large Angry SCM, Jeff King, Martin Waitz, Daniel Barkalow, git

Hi,

On Wed, 27 Sep 2006, Shawn Pearce wrote:

> A Large Angry SCM <gitzilla@gmail.com> wrote:
> > Jeff King wrote:
> > [...]
> > >One thing that I believe some people have requested for subprojects is
> > >to avoid downloading files/history for subprojects you're not interested
> > >in.  I think this could be faciliated in this scheme by only cloning the
> > >heads of the subprojects you're interested in (there would need to be
> > >special machinery to handle this at the root level if we want to allow
> > >making root commits without necessarily having all of the subprojects).
> > 
> > In what I'm suggesting, commits are local to a project's working 
> > directory repository and are pushed somewhere else to be recorded long 
> > term. Since projects are stand alone, possibly with dependencies, 
> > working on a (sub)project without having other associated (sub)projects 
> > is accomplished by checking it out.
> > 
> > >A first step to this would be an argument to git-clone to allow cloning
> > >only a subset of refs.
> > 
> > Something like this?
> > 
> > 	git-init-db
> > 	git-fetch <repository> <refspecs>
> 
> More like:
> 
>  	git-init-db
>  	git-fetch --keep <repository> <refspecs>
> 
> but yes.  :-)

You are missing the remotes/ information:

git-repo-config remote.origin.url <repository>
for spec in <refspecs>; do
	git-repo-config remote.origin.fetch $spec ^$
done

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-27 21:05                 ` Junio C Hamano
  2006-09-28 15:02                   ` Michael S. Tsirkin
@ 2006-09-28 20:16                   ` Jeff King
  1 sibling, 0 replies; 39+ messages in thread
From: Jeff King @ 2006-09-28 20:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: skimo, Martin Waitz, A Large Angry SCM, Shawn Pearce,
	Daniel Barkalow, git, Josh Triplett, Jamey Sharp

On Wed, Sep 27, 2006 at 02:05:28PM -0700, Junio C Hamano wrote:

> Avoiding checking out parts of the project tree that you do not
> care about while you work on such a single large project is
> another interesting and useful area to think about, but I would
> say at that point it is not about subproject at all -- it is
> about working in a sparsely populated working tree of a single
> project.

Keep in mind that it might not be an attempt to avoid checking out part
of the tree, but rather importing part of the tree (the subproject) into
your repository at all (to save space, download time, etc). So unless
you're also proposing sparse repos, I think this still might be a
subproject issue.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28  7:37           ` Martin Waitz
@ 2006-09-28 20:30             ` A Large Angry SCM
  2006-09-29  7:04               ` Martin Waitz
  0 siblings, 1 reply; 39+ messages in thread
From: A Large Angry SCM @ 2006-09-28 20:30 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Shawn Pearce, Daniel Barkalow, git

Martin Waitz wrote:
> On Wed, Sep 27, 2006 at 09:58:43AM -0700, A Large Angry SCM wrote:
>> This means that modules are not separate, stand alone projects but, 
>> rather, just a sub part of your bigger project. Very useful and 
>> applicable in some situations but other situations want/need separate, 
>> stand alone subprojects.
> 
> you can do everything with the submodule which would be possible with
> a normal GIT repository.  And you can always clone it into an directory
> which is not controlled by a parent project.
> 
> I really think that this is an very important property of a submodule.

I must be missing something.

I just read you original message in the (sub)thread again and you said:

	 * the .git/refs/heads directory of the submodule gets stored in
	   .gitmodule/<modulename> inside the parent project

If the submodule refs in the parent are a _copy_, then work performed in 
the submodule outside of the parent will be lost when the parent is in 
control of the submodule again.

If the submodule refs in the parent are the actual submodule refs then 
the submodule is not independent of the parent.

If the submodule refs in the parent are a symlink to the refs in the 
submodule, then the parent has no control over which version of the 
submodule it gets on the next checkout since the submodule can update 
the ref.

[...]
> One use-case which may be important here:
> 
> The submodule has two different branches which got forked and are not
> intended to be merged again.  At some point in time the parent project
> wants to switch from one branch of the submodule to another branch.
> If a user still has modifications in the old branch and wants to
> update the parent project then it is important to know if the local
> modifications and those coming from the parent have to be merged or
> should stay in different branches.
> If the parent is switching branches there should only be some warning
> if the user still has modifications in the old branch, giving him the
> chance to port the modifications to the other branch.

Again, this is leading me to believe that the submodule is not 
independent of the parent.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-28 20:30             ` A Large Angry SCM
@ 2006-09-29  7:04               ` Martin Waitz
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Waitz @ 2006-09-29  7:04 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Shawn Pearce, Daniel Barkalow, git

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]

hoi :)

On Thu, Sep 28, 2006 at 01:30:39PM -0700, A Large Angry SCM wrote:
> If the submodule refs in the parent are the actual submodule refs then 
> the submodule is not independent of the parent.

ok, it's not independent in that sense that you can move the directory
away and expect the submodule to work even when the parent does not
exist any more.

But you can do normal GIT work as before.
You can create new branches (they will be stored in the parent
but you have to "git add .gitmodule/..." explicitly in order
to track the branch in the parent), fetch/pull from other sides,
create commits, etc.
So in your normal workflow you do not have to do anything in the
parent while you work in the submodule.  That's what I've called
"independent".

I hope the intention is clear now, perhaps I've been sloppy
with words.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Notes on Using Git with Subprojects
  2006-09-26 17:40 Notes on Using Git with Subprojects A Large Angry SCM
  2006-09-26 20:25 ` Johannes Schindelin
  2006-09-26 21:23 ` Daniel Barkalow
@ 2006-10-01  5:19 ` A Large Angry SCM
  2 siblings, 0 replies; 39+ messages in thread
From: A Large Angry SCM @ 2006-10-01  5:19 UTC (permalink / raw)
  To: git

This updates the note based on the discussion the original note 
generated. The most significant difference is the ability to specify 
a particular version of a subproject.


							20061001.0400

Notes on Using Git with Subprojects
===================================
Copyright (C) 2006 by Raymond S Brand


Git does not have native support for subprojects and this is a good
thing because managing subprojects is better handled by the project
build machinery. Managing subprojects with the project build machinery
is more flexible than the native support from an SCM and allows the use
of different SCMs by different subprojects. However, there is a lot of
interest in using Git for the SCM of a project with subprojects.

Git, unfortunately, does not make it easy. What is wanted is to put all
of the subprojects in one repository and be able to checkout the various
parts from a local copy of the repository. The problem is, with Git, a
repository can have at most one working directory associated with it at
a time. This is because Git stores a lot of information about the
contents of the working directory in the repository. In fact, the usual
situation is that the repository, itself, is in the working directory.

An important criteria for supporting subprojects is that a subproject
may be developed and controlled by an entity or organization different
than that of the parent project of the subproject. As a consequence of
this, the build machinery needed to manage a project controlled by
another entity as a subproject must be in the parent project. 

This note describes a method to use Git with subprojects; other methods
are also possible.


Definitions
-----------
Parent Project:
	A project that logically contains one or more subprojects.

Project:
	A set of files that is (relatively) self contained with respect
	to changes and treated as a unit by the SCM.

Root Project:
	A project that is not a subproject.

Subproject:
	A project logically contained in another project.

Super project:
	A project composed of subprojects.


Setup
-----
All subprojects are contained in a single repository and referred to by
separate heads or tags. Developers create a local copy of the project
repository; hereafter, referred to as the "local master repository" or
just "local master". All fetches, pulls and pushes by the developers
with the "local master" of other developers should be to and from their
own "local master".


Root Project Checkout
---------------------
The root project is the project that all of the subprojects are a part
of. It is a parent project of one or more subprojects; each of which may
also be parent projects to other subprojects.

To checkout the root project, choose an name for the project working
directory (the working directory must not already exist) and perform the
equivalent of the following commands.

	git-clone -s -n $LOCAL_MASTER $ROOT_DIR \
		&& cd $ROOT_DIR \
		&& git-checkout -b $ROOT_BRANCH--local $ROOT_VERSION

Where:
	$LOCAL_MASTER is the path to the "local master"
	$ROOT_DIR is the name of the working directory to use
	$ROOT_VERSION is the ref name of the root project state
	$ROOT_BRANCH--local is a branch for local changes

This will leave the current working directory of the shell in the
project working directory.

This also creates a branch with the suffix of "--local" to hold all of
the local working directory commits and modifications. The $ROOT_BRANCH
is used as a tracking branch so that upstream changes can be fetched
into the working repository without affecting the checked out files.

Once the root project is checked out, the subprojects are checked out.


Subproject Checkout
-------------------
Each project that is a parent project needs to checkout all of the
subprojects of the project. Each subproject is checked out with the
equivalent of the following bash commands:

	git-clone -s -n $LOCAL_MASTER $SUBPROJECT_DIR \
		&& ( cd $SUBPROJECT_DIR \
			&& git-checkout -b $SUBPROJECT_BRANCH--local \
				$SUBPROJECT_VERSION
		)

Where:
	$LOCAL_MASTER is the path to the "local master"
	$SUBPROJECT_DIR is the directory name of the subproject
	$SUBPROJECT_VERSION is the ref name of the subproject state
	$SUBPROJECT_BRANCH--local is a branch for local changes
	

Project Development
-------------------
Changes to a project are performed in the working directory of the
project and are recorded in the repository in the working directory on
the $PROJECT--local branch.


Receiving Project Upstream Changes
----------------------------------
Upstream project changes are first fetched into the project tracking
branch of the local master repository and are then fetched into the
project tracking branch of the working directory repositories. To merge
upstream changes into the working directory, a pull from the project
tracking branch of the working directory repository executed.

	# Fetch project branch from upstream to local master
	(cd $LOCAL_MASTER && git-fetch $UPSTREAM $PROJECT_BRANCH)

	# Fetch project branch from local master to working repo
	git-fetch $LOCAL_MASTER $PROJECT_BRANCH

	# Merge upstream changes in to working directory
	git-pull --no-commit . $PROJECT_BRANCH

Where:
	$LOCAL_MASTER is the path to the "local master"
	$UPSTREAM is the Git URL of the upstream repository
	$PROJECT_BRANCH is the branch name of the (sub)project


Sending Project Changes Upstream
--------------------------------
To send project changes upstream from a working directory repository,
the changes are first pushed to a branch in the local master repository,
$PROJECT--$IDENT. The changes can then be pushed or pulled from the
local master repository.

	# Push project changes to local master
	git-push $LOCAL_MASTER \
		$PROJECT_BRANCH--local:$PROJECT_BRANCH--$IDENT

	# Push project changes from local master to upstream
	(cd $LOCAL_MASTER && git-push $UPSTREAM \
		$PROJECT_BRANCH--$IDENT:$PROJECT_BRANCH--$NICK--$IDENT)

Where:
	$LOCAL_MASTER is the path to the "local master"
	$UPSTREAM is the Git URL of the upstream repository
	$PROJECT_BRANCH is the tracking branch of the (sub)project
	$PROJECT_BRANCH--local is a branch for local changes
	$IDENT is a label unique for this set of working directories
	$NICK is a (branch name safe) identifier of the developer


Automation
----------
The following proof of concept code can be added to parent project
makefiles to automate most of the operations needed to support
subprojects with Git.

 ---->8----  ---->8----  ---->8----  ---->8----  ---->8----  ---->8----
# Add this to the makefiles

SUBPROJECTLIST := GIT:s1_ref:s1_dev:s1 \ 
		  GIT:s2_ref:s2_dev:s2 \ 
		  GIT:s3_ref:s3_dev:s3

include Git_machinery.make

# Rest of the Makefile
 ---->8----  ---->8----  ---->8----  ---->8----  ---->8----  ---->8----
# Git_machinery.make

GIT-CLONE := gitclone
GIT-CHECKOUT := gitcheckout
GIT_FETCH := gitfetch
GIT-PULL := gitpull
GIT-PUSH := gitpush

include GIT-SUBPROJECT-VARS.mak

GIT-SUBPROJECT-VARS.mak: Makefile
	@rm -rf GIT-SUBPROJECT-VARS.mak
	@echo "REPOSITORY := $(REPOSITORY)" > GIT-SUBPROJECT-VARS.mak
	@echo "IDENT := $(IDENT)" >> GIT-SUBPROJECT-VARS.mak

# SUBPROJECTLIST has the following format:
#	Each "word" in the list starts with a scheme identifier followed
#	by a ':'. The reainder if the "word" is the scheme specific
#	subproject details.
#   Scheme: GIT
#	The scheme specific subproject details are 3 fields separated by
#	':'s.
#	Field 1: Version reference - reference to checkout and fetch
#		from.
#	Field 2: Development branch ref name.
#	Field 3: Sub directory for the subproject.

GIT--subproject--setup::
	for SUBPROJECT in $(filter GIT:%,$(SUBPROJECTLIST)) ; do \
	  PARAM=($$(echo $$SUBPROJECT | tr ':' ' ')) ; \
	  $(GIT-CLONE) -s -n $(REPOSITORY) $${PARAM[3]} \
	  && ( cd $${PARAM[3]} \
	    && $(GIT-CHECKOUT) -b $${PARAM[2]}--local $${PARAM[1]} ; \
	  ) ; \
	done

GIT--subproject--fetch::
	@for SUBPROJECT in $(filter GIT:%,$(SUBPROJECTLIST)) ; do \
	  PARAM=($$(echo $$SUBPROJECT | tr ':' ' ')) ; \
	  ( cd $${PARAM[3]} \
	    && $(GIT-FETCH) $(REPOSITORY) $${PARAM[1]} ; \
	  ) ; \
	done

GIT--subproject--pull::
	@for SUBPROJECT in $(filter GIT:%,$(SUBPROJECTLIST)) ; do \
	  PARAM=($$(echo $$SUBPROJECT | tr ':' ' ')) ; \
	  ( cd $${PARAM[3]} \
	    && $(GIT-FETCH) $(REPOSITORY) $${PARAM[1]} ; \
	    && $(GIT-PULL) --no-commit . $${PARAM[1]} ; \
	  ) ; \
	done

GIT--subproject--push::
	@for SUBPROJECT in $(filter GIT:%,$(SUBPROJECTLIST)) ; do \
	  PARAM=($$(echo $$SUBPROJECT | tr ':' ' ')) ; \
	  ( cd $${PARAM[3]} \
	    && $(GIT-PUSH) $(REPOSITORY) \
	      $${PARAM[2]}--local:$${PARAM[2]}--$(IDENT) ; \
	  ) ; \
	done

 ---->8----  ---->8----  ---->8----  ---->8----  ---->8----  ---->8----

The example Makefile code has a number of limitations: There is no error
handling. It only handles Git managed subprojects. There are other Git
commands that can be usefully applied to subprojects.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2006-10-01  5:19 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-26 17:40 Notes on Using Git with Subprojects A Large Angry SCM
2006-09-26 20:25 ` Johannes Schindelin
2006-09-26 22:01   ` A Large Angry SCM
2006-09-26 22:13     ` Johannes Schindelin
2006-09-26 22:45       ` A Large Angry SCM
2006-09-26 21:23 ` Daniel Barkalow
2006-09-26 21:30   ` Shawn Pearce
2006-09-26 22:33     ` A Large Angry SCM
2006-09-27  8:06       ` Martin Waitz
2006-09-27  9:55         ` Johannes Schindelin
2006-09-27 11:38           ` Martin Waitz
2006-09-27 12:01             ` Johannes Schindelin
2006-09-27 12:44               ` Sven Verdoolaege
2006-09-27 21:05                 ` Junio C Hamano
2006-09-28 15:02                   ` Michael S. Tsirkin
2006-09-28 20:16                   ` Jeff King
2006-09-27 12:46               ` Martin Waitz
2006-09-27 13:13                 ` Johannes Schindelin
2006-09-27 17:13           ` A Large Angry SCM
2006-09-27 23:14             ` Johannes Schindelin
2006-09-27 23:36               ` Shawn Pearce
2006-09-27 23:55                 ` Rogan Dawes
2006-09-28  0:36                   ` Shawn Pearce
2006-09-28  5:02                   ` A Large Angry SCM
2006-09-28  4:48                 ` A Large Angry SCM
2006-09-27 16:58         ` A Large Angry SCM
2006-09-27 17:33           ` Jeff King
2006-09-28  3:47             ` A Large Angry SCM
2006-09-28  3:52               ` Jeff King
2006-09-28  3:58                 ` Shawn Pearce
2006-09-28  4:00                   ` Jeff King
2006-09-28  4:09                     ` Shawn Pearce
2006-09-28  3:52               ` Shawn Pearce
2006-09-28 15:39                 ` Johannes Schindelin
2006-09-28  7:37           ` Martin Waitz
2006-09-28 20:30             ` A Large Angry SCM
2006-09-29  7:04               ` Martin Waitz
2006-09-26 22:07   ` A Large Angry SCM
2006-10-01  5:19 ` A Large Angry SCM

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.