All of lore.kernel.org
 help / color / mirror / Atom feed
* .gitlink for Summer of Code
@ 2007-03-25 12:30 Eric Lesh
  2007-03-25 15:20 ` Matthieu Moy
  2007-03-25 20:46 ` Shawn O. Pearce
  0 siblings, 2 replies; 61+ messages in thread
From: Eric Lesh @ 2007-03-25 12:30 UTC (permalink / raw)
  To: git

I would like to tackle .gitlink for Summer of Code.  The deadline is
about a day away, but this is a chance to make sure you still think
.gitlink is a good idea, and to see if there are any big problems with
the idea in general (before a more detailed proposal is actually submitted).

.gitlink is for a lightweight checkout of a branch into a separate
directory on the local filesystem.  A .gitlink'ed checkout has its own
index+HEAD, but otherwise refers back to the main repository for
objects, refs, etc.

Junio has said (http://www.spinics.net/lists/git/msg24964.html) he
works in a similar way with many work trees which are symlinked to his
main repo:
: gitster git.wk0; ls -l .git/
total 120
drwxrwsr-x  3 junio src  4096 Mar  5 16:22 ./
drwxrwsr-x 15 junio src 16384 Mar  5 16:23 ../
-rw-rw-r--  1 junio src    41 Mar  5 16:22 HEAD
lrwxrwxrwx  1 junio src    27 Mar  3 22:53 config -> /src/git/.git/config
lrwxrwxrwx  1 junio src    26 Mar  3 22:53 hooks -> /src/git/.git/hooks/
-rw-rw-r--  1 junio src 82455 Mar  5 16:22 index
lrwxrwxrwx  1 junio src    25 Mar  3 22:53 info -> /src/git/.git/info/
drwxrwsr-x  3 junio src  4096 Mar  3 22:59 logs/
lrwxrwxrwx  1 junio src    28 Mar  3 22:53 objects -> /src/git/.git/objects/
lrwxrwxrwx  1 junio src    32 Mar  3 22:53 packed-refs -> /src/git/.git/packed-refs
lrwxrwxrwx  1 junio src    25 Mar  3 22:53 refs -> /src/git/.git/refs/
lrwxrwxrwx  1 junio src    28 Mar  3 22:53 remotes -> /src/git/.git/remotes/
lrwxrwxrwx  1 junio src    29 Mar  3 22:53 rr-cache -> /src/git/.git/rr-cache/

A .gitlink could simplify this, ridding the directory of symlinks and
instead having a .gitlink file that specifies a $GIT_DIR to which
everything refers.

My implementation will be based on Josef Weidendorfer's "[RFC]
Lightweight checkouts via ".gitlink""
(http://thread.gmane.org/gmane.comp.version-control.git/33755), which
didn't receive much objection (or comment at all).  Hopefully this
doesn't mean you simply aren't interested.

Goals:
  o Lightweight checkouts are essentially branches that reside outside of the
    main repository on the local filesystem.
  o They act as normal git checkouts (i.e. git tools know about .gitlink)
  o They can be moved around and maintain their link to the base repo
  o They can exist within other checkouts (for eventual submodule support)
  o git-clone --light-weight option to set this up
  o porcelains (i.e. cogito) don't barf

Is this something that you would like to see?  Any other comments?

Thanks,

Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
@ 2007-03-25 15:20 ` Matthieu Moy
  2007-03-25 20:39   ` Shawn O. Pearce
                     ` (2 more replies)
  2007-03-25 20:46 ` Shawn O. Pearce
  1 sibling, 3 replies; 61+ messages in thread
From: Matthieu Moy @ 2007-03-25 15:20 UTC (permalink / raw)
  To: git

Eric Lesh <eclesh@ucla.edu> writes:

> .gitlink is for a lightweight checkout of a branch into a separate
> directory on the local filesystem.

I think it's a pitty to restrict yourself to _local_ filesystem. There
are tons of cases where you have a fast, non-NFS, access to a machine
and would like to host your repository there.

That said, I suppose removing this restriction moves the solution from
the category "quick and efficient hack" to something much harder.

> A .gitlink'ed checkout has its own index+HEAD, but otherwise refers
> back to the main repository for objects, refs, etc.

Stupid question: why .gitlink, and not .git/link or so? This file is
not versionned, I don't think it should be in the working tree.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 15:20 ` Matthieu Moy
@ 2007-03-25 20:39   ` Shawn O. Pearce
  2007-03-25 20:54     ` Johannes Schindelin
  2007-03-25 20:55     ` Junio C Hamano
  2007-03-26 17:16   ` Eric Lesh
  2007-03-26 17:31   ` Jakub Narebski
  2 siblings, 2 replies; 61+ messages in thread
From: Shawn O. Pearce @ 2007-03-25 20:39 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, Eric Lesh

Matthieu Moy <Matthieu.Moy@imag.fr> wrote:
> Eric Lesh <eclesh@ucla.edu> writes:
> 
> > .gitlink is for a lightweight checkout of a branch into a separate
> > directory on the local filesystem.
> 
> I think it's a pitty to restrict yourself to _local_ filesystem. There
> are tons of cases where you have a fast, non-NFS, access to a machine
> and would like to host your repository there.
> 
> That said, I suppose removing this restriction moves the solution from
> the category "quick and efficient hack" to something much harder.

Yes.  But there's another project on the ideas list that addresses
that (``Lazy Clone'').  It is quite a bit more difficult than the
.gitlink idea as the implementation requires a network protocol
client implemented somewhere near the read_sha1_file interface.

Junio and I had talked about this (I think it was on the list,
but maybe it was on #git) recently and considered maybe trying to
do it as a wrapper *above* read_sha1_file/has_sha1_file, and adjust
the clients that call them to invoke the wrapper instead.

Nothing will obviously beat having the file on a local disk; but if
you have a fast LAN it may be hard to beat an anonymous TCP socket
dedicated to serving Git objects.  Such a socket may even beat out
NFS...  ;-)
 
> > A .gitlink'ed checkout has its own index+HEAD, but otherwise refers
> > back to the main repository for objects, refs, etc.
> 
> Stupid question: why .gitlink, and not .git/link or so? This file is
> not versionned, I don't think it should be in the working tree.

I've thought the same thing.

Actually, I'd almost say put it into .git/config, e.g.:

	mkdir .git
	cat >.git/config <<EOF
	[core]
		repositoryversion = 0
		filemode = true
		link = /path/to/source
	EOF

as then the index and HEAD file can both be stored in .git, just
like with the non-gitlink case.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
  2007-03-25 15:20 ` Matthieu Moy
@ 2007-03-25 20:46 ` Shawn O. Pearce
  1 sibling, 0 replies; 61+ messages in thread
From: Shawn O. Pearce @ 2007-03-25 20:46 UTC (permalink / raw)
  To: Eric Lesh; +Cc: git

Eric Lesh <eclesh@ucla.edu> wrote:
> I would like to tackle .gitlink for Summer of Code.  The deadline is
> about a day away, but this is a chance to make sure you still think
> .gitlink is a good idea, and to see if there are any big problems with
> the idea in general (before a more detailed proposal is actually submitted).
...
> Is this something that you would like to see?  Any other comments?

Aside from the other fork of this thread discussing the ``Lazy
Clone'' idea, I don't really have any other comments.

I don't setup the same thing as Junio, but I have the same problem.
My day job setup has at least 5 copies of the same repository.
I'd like an easy way to have them share the same odb, refs,
and config.

If you choose to submit an application please see our template
(http://git.or.cz/gitwiki/SoC2007Template) and try to talk a bit
about your background too.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 20:39   ` Shawn O. Pearce
@ 2007-03-25 20:54     ` Johannes Schindelin
  2007-03-25 21:03       ` Shawn O. Pearce
  2007-03-25 20:55     ` Junio C Hamano
  1 sibling, 1 reply; 61+ messages in thread
From: Johannes Schindelin @ 2007-03-25 20:54 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Matthieu Moy, git, Eric Lesh

Hi,

On Sun, 25 Mar 2007, Shawn O. Pearce wrote:

> Matthieu Moy <Matthieu.Moy@imag.fr> wrote:
> > Eric Lesh <eclesh@ucla.edu> writes:
> > 
> > > .gitlink is for a lightweight checkout of a branch into a separate
> > > directory on the local filesystem.
> > 
> > I think it's a pitty to restrict yourself to _local_ filesystem. There
> > are tons of cases where you have a fast, non-NFS, access to a machine
> > and would like to host your repository there.
> > 
> > That said, I suppose removing this restriction moves the solution from
> > the category "quick and efficient hack" to something much harder.
> 
> Yes.  But there's another project on the ideas list that addresses
> that (``Lazy Clone'').  It is quite a bit more difficult than the
> .gitlink idea as the implementation requires a network protocol
> client implemented somewhere near the read_sha1_file interface.

Not only that. You'd have to change the way read_sha1_file() is called to 
allow fetching more than one object at a time. Otherwise this will be so 
slow as to be unusable.

That's basically the reason why I changed my mind, and preferred shallow 
clones over lazy clones.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 20:39   ` Shawn O. Pearce
  2007-03-25 20:54     ` Johannes Schindelin
@ 2007-03-25 20:55     ` Junio C Hamano
  2007-03-25 21:05       ` Shawn O. Pearce
  2007-03-27  3:40       ` Petr Baudis
  1 sibling, 2 replies; 61+ messages in thread
From: Junio C Hamano @ 2007-03-25 20:55 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Matthieu Moy, git, Eric Lesh

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Actually, I'd almost say put it into .git/config, e.g.:
>
> 	mkdir .git
> 	cat >.git/config <<EOF
> 	[core]
> 		repositoryversion = 0
> 		filemode = true
> 		link = /path/to/source
> 	EOF
>
> as then the index and HEAD file can both be stored in .git, just
> like with the non-gitlink case.

This is not usable at least for me as it does not allow sharing
the .git/config file across checkouts.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 20:54     ` Johannes Schindelin
@ 2007-03-25 21:03       ` Shawn O. Pearce
  0 siblings, 0 replies; 61+ messages in thread
From: Shawn O. Pearce @ 2007-03-25 21:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Matthieu Moy, git, Eric Lesh

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Sun, 25 Mar 2007, Shawn O. Pearce wrote:
> > Yes.  But there's another project on the ideas list that addresses
> > that (``Lazy Clone'').  It is quite a bit more difficult than the
> > .gitlink idea as the implementation requires a network protocol
> > client implemented somewhere near the read_sha1_file interface.
> 
> Not only that. You'd have to change the way read_sha1_file() is called to 
> allow fetching more than one object at a time. Otherwise this will be so 
> slow as to be unusable.

Yes, and no. ;-)

Lets say we put a repository on an NFS share, and clone it using
--shared.  So it's now an alternate ODB.  And read_sha1_file()
is now doing synchronous reads, unless the client has something
cached.  Which we could just as easily cache ourselves in the
loose object directory

We could make it faster by batching up requests.  But batching
requests may be difficult in some contexts, as we don't know what
else we need until we get back that commit or tree we are trying
to read.  ;-)
 
-- 
Shawn.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 20:55     ` Junio C Hamano
@ 2007-03-25 21:05       ` Shawn O. Pearce
  2007-03-27  3:40       ` Petr Baudis
  1 sibling, 0 replies; 61+ messages in thread
From: Shawn O. Pearce @ 2007-03-25 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthieu Moy, git, Eric Lesh

Junio C Hamano <junkio@cox.net> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > Actually, I'd almost say put it into .git/config, e.g.:
> >
> > 	mkdir .git
> > 	cat >.git/config <<EOF
> > 	[core]
> > 		repositoryversion = 0
> > 		filemode = true
> > 		link = /path/to/source
> > 	EOF
> >
> > as then the index and HEAD file can both be stored in .git, just
> > like with the non-gitlink case.
> 
> This is not usable at least for me as it does not allow sharing
> the .git/config file across checkouts.

Me either.

What I thought of after writing that was that core.link should
also imply reading ${core.link}/config before .git/config, so that
the repository config can override the master repository, but the
master repository provides the bulk of the configuration.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 15:20 ` Matthieu Moy
  2007-03-25 20:39   ` Shawn O. Pearce
@ 2007-03-26 17:16   ` Eric Lesh
  2007-03-26 17:22     ` Matthieu Moy
  2007-03-26 17:31   ` Jakub Narebski
  2 siblings, 1 reply; 61+ messages in thread
From: Eric Lesh @ 2007-03-26 17:16 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

On Sun, 2007-03-25 at 17:20 +0200, Matthieu Moy wrote:
> I think it's a pitty to restrict yourself to _local_ filesystem. There
> are tons of cases where you have a fast, non-NFS, access to a machine
> and would like to host your repository there.
> 
> That said, I suppose removing this restriction moves the solution from
> the category "quick and efficient hack" to something much harder.

For now, this is really meant for quick and easy access to multiple
branches of your own repo at the same time.


> 
> Stupid question: why .gitlink, and not .git/link or so? This file is
> not versionned, I don't think it should be in the working tree.
> 

There is no .git dir for these.  The .gitlink refers back to the main
repository's .git/external/$submodule, which is a full .git dir with
index+HEAD and symlinks back to the .git of the main repo for everything
else.

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 17:16   ` Eric Lesh
@ 2007-03-26 17:22     ` Matthieu Moy
  2007-03-26 17:38       ` Eric Lesh
  0 siblings, 1 reply; 61+ messages in thread
From: Matthieu Moy @ 2007-03-26 17:22 UTC (permalink / raw)
  To: git

Eric Lesh <eclesh@ucla.edu> writes:

> There is no .git dir for these.  The .gitlink refers back to the main
> repository's .git/external/$submodule, which is a full .git dir with
> index+HEAD and symlinks back to the .git of the main repo for everything
> else.

I don't see any contradiction.

Light checkouts would have an almost empty .git (it still needs an
index, and its own head anyway) and use the .git of the main repo for
everything else.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 15:20 ` Matthieu Moy
  2007-03-25 20:39   ` Shawn O. Pearce
  2007-03-26 17:16   ` Eric Lesh
@ 2007-03-26 17:31   ` Jakub Narebski
  2007-03-26 18:21     ` Matthieu Moy
  2 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-03-26 17:31 UTC (permalink / raw)
  To: git

Matthieu Moy wrote:

> Stupid question: why .gitlink, and not .git/link or so? This file is
> not versionned, I don't think it should be in the working tree.

It would be special-cased, as is .git directory not versioned, the
same way .gitlink would be not versioned (not like .gitignore).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 17:22     ` Matthieu Moy
@ 2007-03-26 17:38       ` Eric Lesh
  2007-03-26 18:35         ` Martin Waitz
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Lesh @ 2007-03-26 17:38 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

On Mon, 2007-03-26 at 19:22 +0200, Matthieu Moy wrote:
> 
> I don't see any contradiction.
> 
> Light checkouts would have an almost empty .git (it still needs an
> index, and its own head anyway) and use the .git of the main repo for
> everything else.
> 

Josef Weidendorfer tried to implement this before, and he concluded that
having a _text file_ .git, instead of a directory, would be a good way
to distinguish .gitlinked checkouts from normal checkouts.

As far as .git/link, that would seem to work better.  As long as there
is a sanity check to make sure that you don't manage to mix things up,
it might be fine.

-Eric

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 17:31   ` Jakub Narebski
@ 2007-03-26 18:21     ` Matthieu Moy
  2007-03-27  0:48       ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: Matthieu Moy @ 2007-03-26 18:21 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski

Jakub Narebski <jnareb@gmail.com> writes:

> Matthieu Moy wrote:
>
>> Stupid question: why .gitlink, and not .git/link or so? This file is
>> not versionned, I don't think it should be in the working tree.
>
> It would be special-cased, as is .git directory not versioned, the
> same way .gitlink would be not versioned (not like .gitignore).

That's how I understand it, but why 2 special cases (.git and
.gitlink) when you need one?

-- 
Matthieu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 17:38       ` Eric Lesh
@ 2007-03-26 18:35         ` Martin Waitz
  2007-03-26 19:33           ` Josef Weidendorfer
  0 siblings, 1 reply; 61+ messages in thread
From: Martin Waitz @ 2007-03-26 18:35 UTC (permalink / raw)
  To: Eric Lesh; +Cc: Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]

hoi :)

On Mon, Mar 26, 2007 at 10:38:08AM -0700, Eric Lesh wrote:
> > Light checkouts would have an almost empty .git (it still needs an
> > index, and its own head anyway) and use the .git of the main repo for
> > everything else.
> > 
> 
> Josef Weidendorfer tried to implement this before, and he concluded that
> having a _text file_ .git, instead of a directory, would be a good way
> to distinguish .gitlinked checkouts from normal checkouts.

but that does not allow for per-checkout HEAD and index.
I don't see any reason for providing any sort for "gitlink" which
also uses HEAD and index from the linked location -- then you could
use a simple symlink, too.  So having an almost empty .git directory
and reusing parts from another .git directory makes a lot of sense to
me.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 18:35         ` Martin Waitz
@ 2007-03-26 19:33           ` Josef Weidendorfer
  2007-03-26 19:49             ` Matthieu Moy
  2007-03-26 22:03             ` Martin Waitz
  0 siblings, 2 replies; 61+ messages in thread
From: Josef Weidendorfer @ 2007-03-26 19:33 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Eric Lesh, Matthieu Moy, git

On Monday 26 March 2007, Martin Waitz wrote:
> hoi :)
> 
> On Mon, Mar 26, 2007 at 10:38:08AM -0700, Eric Lesh wrote:
> > > Light checkouts would have an almost empty .git (it still needs an
> > > index, and its own head anyway) and use the .git of the main repo for
> > > everything else.
> > > 
> > 
> > Josef Weidendorfer tried to implement this before, and he concluded that
> > having a _text file_ .git, instead of a directory, would be a good way
> > to distinguish .gitlinked checkouts from normal checkouts.
> 
> but that does not allow for per-checkout HEAD and index.
> I don't see any reason for providing any sort for "gitlink" which
> also uses HEAD and index from the linked location -- then you could
> use a simple symlink, too.

The idea was to make this a possible building block for submodules.
A simple symlink does not work there when you want the checkout to
work even after moving the whole checkout directory around (e.g. to move the
submodule around inside of the superproject).

> So having an almost empty .git directory 
> and reusing parts from another .git directory makes a lot of sense to
> me.

This would work. However, you can not clone from an almost empty .git
directory with current git.

The original proposal was to have a standard .git directory for every
light-weight checkout inside of the base .git directory, e.g.
in <base>/.git/ext/<name>.git where <name> is some identifier for the
lightweight checkout, either provided in the .gitlink file or
automatically determined.

Hmm... the "almost empty .git directory" has its merits.
You can override config options, and of course, the "base" for the
lightweight checkout still can be a full .git dir, as would be needed
for submodule support. In fact, you have more freedom to choose the
path to the base gitdir.
I like it ;-)

So this changes the .gitlink proposal to:
* smartly reset GIT_DIR when a core.link option is set in .git/config
  (and set GIT_WORK_DIR accordingly)
* fake a core.name if not set (this comes from the original proposal
  to get an automatic identifier of a submodule checkout by its
  relative path to the supermodule)
* git-checkout to allow to create a fresh light-weight checkout

Josef

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 19:33           ` Josef Weidendorfer
@ 2007-03-26 19:49             ` Matthieu Moy
  2007-03-26 23:14               ` Josef Weidendorfer
  2007-03-26 22:03             ` Martin Waitz
  1 sibling, 1 reply; 61+ messages in thread
From: Matthieu Moy @ 2007-03-26 19:49 UTC (permalink / raw)
  To: git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

> This would work. However, you can not clone from an almost empty .git
> directory with current git.

Of course not. If you say "have .git/ <=> is a repository", then your
lightweight checkout should not have a .git/. But why should anything
that have a .git/ directory be a repository??

> The original proposal was to have a standard .git directory for every
> light-weight checkout inside of the base .git directory, e.g.
> in <base>/.git/ext/<name>.git where <name> is some identifier for the
> lightweight checkout, either provided in the .gitlink file or
> automatically determined.

That seems really weird. That implies for example:

* Deleting a checkout means deleting both your local tree _and_ a part
  of the .git/ directory of the repository. Have you ever imagined
  having to do more than "rm -fr working-tree" even with an inferior
  VCS such as CVS?

* It makes it impossible to have a checkout of a read-only location.
  For example, if one of my colleague has a repository in
  /home/otheruser/repo/, if I want to get a working tree of it, I need
  to get a complete clone of his repo to be able to do it. Assuming
  someone ever implements a "lightweight checkout of a remote
  location" (bzr has this for example. You can run 
  "bzr checkout --lightweigth http://whatever.com/), this would mean
  creating a directory on the server for each of the potential
  clients, not to mention the impossibility to do it over http.

* You have to manage a name for each lightweight checkout. What would
  be such name? User-provided? uuidgen-like?


I find the way bzr deals with this pretty elegant: a repository with a
working tree is just a working tree and a repository located in the
same directory. The repository stores its files (content of each
revisions in history, ...) in .bzr/repository/, and the working tree
stores them (the index, pending merges, ... and pointer to the
corresponding branch) in .bzr/checkout/.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 19:33           ` Josef Weidendorfer
  2007-03-26 19:49             ` Matthieu Moy
@ 2007-03-26 22:03             ` Martin Waitz
  2007-03-26 22:51               ` Junio C Hamano
  2007-03-26 23:00               ` Josef Weidendorfer
  1 sibling, 2 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-26 22:03 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 2063 bytes --]

On Mon, Mar 26, 2007 at 09:33:44PM +0200, Josef Weidendorfer wrote:
> The idea was to make this a possible building block for submodules.
> A simple symlink does not work there when you want the checkout to
> work even after moving the whole checkout directory around (e.g. to move the
> submodule around inside of the superproject).

Well the submodule use case is a bit different than the lightweight
checkout.
When you store the submodule object database inside the supermodule then
you only need to store the position of the submodule relative to its
supermodule.  As you wrote this is neccessary in order to find the part
of the object database which belongs to this one submodule.
Finding the supermodule repository is obviously not difficult, only
finding the right part of it.
But for lightweight checkouts you need something which is closer to a
symlink.

> > So having an almost empty .git directory 
> > and reusing parts from another .git directory makes a lot of sense to
> > me.
> 
> This would work. However, you can not clone from an almost empty .git
> directory with current git.

You can't clone from a .gitlink with current git, eighter ;-).
But if you e.g. set git_dir according to your link then everything
should work quite easily.

> The original proposal was to have a standard .git directory for every
> light-weight checkout inside of the base .git directory, e.g.
> in <base>/.git/ext/<name>.git where <name> is some identifier for the
> lightweight checkout, either provided in the .gitlink file or
> automatically determined.

What would you store in these per-checkout directories?
The index and HEAD?  Anything more?
For submodules I currently use <parent>/.git/objects/module/<submodule>/
to store the objects belonging to the submodule.
Perhaps it makes sense to extend this to a full .git directory per
submodule, I'm not yet decided on that.
For submodules the object store has to be different, but for normal
lightweight checkout this should of course be shared.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 22:03             ` Martin Waitz
@ 2007-03-26 22:51               ` Junio C Hamano
  2007-03-26 23:16                 ` Submodule object store Martin Waitz
  2007-03-26 23:17                 ` .gitlink for Summer of Code Josef Weidendorfer
  2007-03-26 23:00               ` Josef Weidendorfer
  1 sibling, 2 replies; 61+ messages in thread
From: Junio C Hamano @ 2007-03-26 22:51 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

Martin Waitz <tali@admingilde.org> writes:

> For submodules I currently use <parent>/.git/objects/module/<submodule>/
> to store the objects belonging to the submodule.

I was not following the gitlink discussion closely, but what is
the motivation behind this separation of the object store?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 22:03             ` Martin Waitz
  2007-03-26 22:51               ` Junio C Hamano
@ 2007-03-26 23:00               ` Josef Weidendorfer
  2007-03-26 23:27                 ` Martin Waitz
  1 sibling, 1 reply; 61+ messages in thread
From: Josef Weidendorfer @ 2007-03-26 23:00 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Eric Lesh, Matthieu Moy, git

On Tuesday 27 March 2007, Martin Waitz wrote:
> On Mon, Mar 26, 2007 at 09:33:44PM +0200, Josef Weidendorfer wrote:
> > The idea was to make this a possible building block for submodules.
> > A simple symlink does not work there when you want the checkout to
> > work even after moving the whole checkout directory around (e.g. to move the
> > submodule around inside of the superproject).
> 
> Well the submodule use case is a bit different than the lightweight
> checkout.
> When you store the submodule object database inside the supermodule then
> you only need to store the position of the submodule relative to its
> supermodule.  As you wrote this is neccessary in order to find the part
> of the object database which belongs to this one submodule.

Where do you store this in your module3 branch?

> Finding the supermodule repository is obviously not difficult, only
> finding the right part of it.
> But for lightweight checkouts you need something which is closer to a
> symlink.

Yes, of course.

> > > So having an almost empty .git directory 
> > > and reusing parts from another .git directory makes a lot of sense to
> > > me.
> > 
> > This would work. However, you can not clone from an almost empty .git
> > directory with current git.
> 
> You can't clone from a .gitlink with current git, eighter ;-).
> But if you e.g. set git_dir according to your link then everything
> should work quite easily.

Yes.

> > The original proposal was to have a standard .git directory for every
> > light-weight checkout inside of the base .git directory, e.g.
> > in <base>/.git/ext/<name>.git where <name> is some identifier for the
> > lightweight checkout, either provided in the .gitlink file or
> > automatically determined.
> 
> What would you store in these per-checkout directories?
> The index and HEAD?  Anything more?

To make it easy to implement, I thought about a standard .git layout,
with most directories being symlinks.

> For submodules I currently use <parent>/.git/objects/module/<submodule>/
> to store the objects belonging to the submodule.
> Perhaps it makes sense to extend this to a full .git directory per
> submodule, I'm not yet decided on that.

IMHO this would be a nice property. As the submodule could exist independently
with its own remote heads/tags, you probably would want to at least track these,
even if it is a submodule in your superproject.
And then it makes sense to move it directly to .git/module/...

There also was a use case where one library project is used in >10
superprojects. It would be nice to be able to make the submodule git dir
be outside of the supermodules git dir. However, this also can be done
with symlinks without any special support (aside from sharing the
head namespace).

Josef

> For submodules the object store has to be different, but for normal
> lightweight checkout this should of course be shared.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 19:49             ` Matthieu Moy
@ 2007-03-26 23:14               ` Josef Weidendorfer
  2007-03-27 16:59                 ` Matthieu Moy
  0 siblings, 1 reply; 61+ messages in thread
From: Josef Weidendorfer @ 2007-03-26 23:14 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

On Monday 26 March 2007, Matthieu Moy wrote:
> Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:
> > The original proposal was to have a standard .git directory for every
> > light-weight checkout inside of the base .git directory, e.g.
> > in <base>/.git/ext/<name>.git where <name> is some identifier for the
> > lightweight checkout, either provided in the .gitlink file or
> > automatically determined.
> 
> That seems really weird. That implies for example:
> [... some good reasons to not do it this way ...]

Ok, you are right.
It is better to not touch the original repository
for lightweight checkouts.

> * You have to manage a name for each lightweight checkout. What would
>   be such name? User-provided? uuidgen-like?

Such a name is interesting as identifier for submodules.
It would be the relative path of the submodule from the supermodule base;
or user supplied.

Lightweight checkouts and submodules have different requirements;
yet, the lightweight checkouts should be so flexible to be
able to be used for submodules checkouts.

Josef

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Submodule object store
  2007-03-26 22:51               ` Junio C Hamano
@ 2007-03-26 23:16                 ` Martin Waitz
  2007-03-26 23:28                   ` Junio C Hamano
  2007-03-26 23:17                 ` .gitlink for Summer of Code Josef Weidendorfer
  1 sibling, 1 reply; 61+ messages in thread
From: Martin Waitz @ 2007-03-26 23:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

hoi :)

On Mon, Mar 26, 2007 at 03:51:21PM -0700, Junio C Hamano wrote:
> I was not following the gitlink discussion closely, but what is
> the motivation behind this separation of the object store?

Mostly scalability.
Some operations need to traverse all objects and this may be prohibitive
in large repositories.  So the traversal has to be split somehow to not
require all objects to be loaded into RAM at the same time.
The most natural separation is at the submodule level.
The idea is to collect all the submodule references in the supermodule
and then traverse the submodule independently with the saved references.

To make purge and fsck work we have the hard requirement that it
must be possible to list all objects which belong to one submodule.
As modules (rightfully!) don't have any project ID, we need some other
means to map submodule tree entries which are stored in the supermodule to
the corresponding object store.
The most simple way is to use the location of the submodule within the
parent to find the submodule object store.  prune and fsck can then use
a path-limited commit traversal in the parent to get all relevant
submodule references.


And moving the submodule object store into the .git directory of the
supermodule has several reasons: in bare repositories it has to be
that way anyway and I don't want to loose the submodules history if
the user decides to remove the submodule from his working directory.

Not the entire submodule object store has to be moved this way, only
that part that is referenced by the supermodule.  So it can make sense
to have a full .git repository in the submodule, together with another
object store (handled like alternates) in the supermodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 22:51               ` Junio C Hamano
  2007-03-26 23:16                 ` Submodule object store Martin Waitz
@ 2007-03-26 23:17                 ` Josef Weidendorfer
       [not found]                   ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
                                     ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Josef Weidendorfer @ 2007-03-26 23:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Waitz, Eric Lesh, Matthieu Moy, git

On Tuesday 27 March 2007, Junio C Hamano wrote:
> Martin Waitz <tali@admingilde.org> writes:
> 
> > For submodules I currently use <parent>/.git/objects/module/<submodule>/
> > to store the objects belonging to the submodule.
> 
> I was not following the gitlink discussion closely, but what is
> the motivation behind this separation of the object store?

The separation issue is about scalability of submodules, and not
directly about gitlink.

Josef

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:36                     ` Martin Waitz
@ 2007-03-26 23:20                       ` David Lang
  2007-03-26 23:55                         ` Martin Waitz
  2007-03-27 11:25                       ` Uwe Kleine-König
  1 sibling, 1 reply; 61+ messages in thread
From: David Lang @ 2007-03-26 23:20 UTC (permalink / raw)
  To: Martin Waitz
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

On Tue, 27 Mar 2007, Martin Waitz wrote:

> On Mon, Mar 26, 2007 at 04:28:28PM -0700, Junio C Hamano wrote:
>> Martin Waitz <tali@admingilde.org> writes:
>>
>>> To make purge and fsck work we have the hard requirement that it
>>> must be possible to list all objects which belong to one submodule.
>>
>> I understand you would want to separate the ref namespace, but I
>> still do not see why you would want to have a separate object
>> store, laid out in a funny way.  Unless you are thinking about
>> using rsync to transfer object store, that is.
>
> I want to be able to list all objects which are not reachable in the
> object store, without traversing all submodules at the same time.
> The only way I can think of to achieve this is to have one separate
> object store per submodule and then do the traversal per submodule.

why do you want to optimize for the relativly rare fsck function rather then the 
more common read functions (which would benifit from shareing object that are 
identical between projects)?

David Lang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 23:17                 ` .gitlink for Summer of Code Josef Weidendorfer
       [not found]                   ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
@ 2007-03-26 23:24                   ` Junio C Hamano
  2007-03-27 17:04                   ` Linus Torvalds
  2 siblings, 0 replies; 61+ messages in thread
From: Junio C Hamano @ 2007-03-26 23:24 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Martin Waitz, Eric Lesh, Matthieu Moy, git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

> On Tuesday 27 March 2007, Junio C Hamano wrote:
>> Martin Waitz <tali@admingilde.org> writes:
>> 
>> > For submodules I currently use <parent>/.git/objects/module/<submodule>/
>> > to store the objects belonging to the submodule.
>> 
>> I was not following the gitlink discussion closely, but what is
>> the motivation behind this separation of the object store?
>
> The separation issue is about scalability of submodules, and not
> directly about gitlink.

Unless you are thinking about rsync of object store, it is not
clear what "scalability of submodules" has to do with having
separate object database.

The issue I recall from earlier discussion on scalability of
submodules was about the direct placement of commit objects in
supermodule trees (which would force fetching in supermodule to
drag in all submodules even when the "integration" or "build
infrastructure" person does not want to have submodules pulled).
But I think that is an issue of the definition of connectivity.
It should be orthogonal to the issue of how object store is laid
out, shouldn't it?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 23:00               ` Josef Weidendorfer
@ 2007-03-26 23:27                 ` Martin Waitz
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-26 23:27 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]

hoi :)

On Tue, Mar 27, 2007 at 01:00:12AM +0200, Josef Weidendorfer wrote:
> On Tuesday 27 March 2007, Martin Waitz wrote:
> > For submodules I currently use <parent>/.git/objects/module/<submodule>/
> > to store the objects belonging to the submodule.
> > Perhaps it makes sense to extend this to a full .git directory per
> > submodule, I'm not yet decided on that.
> 
> IMHO this would be a nice property. As the submodule could exist
> independently with its own remote heads/tags, you probably would want
> to at least track these, even if it is a submodule in your
> superproject.  And then it makes sense to move it directly to
> .git/module/...

I am not sure that all the other submodule heads and tags really belong
into the superproject.  Perhaps they should simply be handled in some
other way -- after all the submodule is a normal git repository and can
handle heads and tags on its own quite well.
But I haven't thought tag-handling for submodules through yet.

> There also was a use case where one library project is used in >10
> superprojects. It would be nice to be able to make the submodule git dir
> be outside of the supermodules git dir. However, this also can be done
> with symlinks without any special support (aside from sharing the
> head namespace).

Of course you can always have a normal library.git repository with all
the tags and stuff somewhere and just fetch from there if you need
some central location for it.  You could even add alternate entries
pointing there to the libraries object store inside the supermodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:16                 ` Submodule object store Martin Waitz
@ 2007-03-26 23:28                   ` Junio C Hamano
  2007-03-26 23:36                     ` Martin Waitz
  0 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2007-03-26 23:28 UTC (permalink / raw)
  To: Martin Waitz; +Cc: Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

Martin Waitz <tali@admingilde.org> writes:

> To make purge and fsck work we have the hard requirement that it
> must be possible to list all objects which belong to one submodule.

I understand you would want to separate the ref namespace, but I
still do not see why you would want to have a separate object
store, laid out in a funny way.  Unless you are thinking about
using rsync to transfer object store, that is.

> And moving the submodule object store into the .git directory of the
> supermodule has several reasons:...

Oh, I am not disputing that.  If you want to use a single object
store for both levels, I think that is a sensible thing to do.
I just do not see the point of segregating submodule objects and
supermodule objects in the single object store you attach to the
supermodule.  After all, any traversal starts from refs, so by
segregating the ref namespace you would limit the extent of the
traversal, no?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:28                   ` Junio C Hamano
@ 2007-03-26 23:36                     ` Martin Waitz
  2007-03-26 23:20                       ` David Lang
  2007-03-27 11:25                       ` Uwe Kleine-König
  0 siblings, 2 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-26 23:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

On Mon, Mar 26, 2007 at 04:28:28PM -0700, Junio C Hamano wrote:
> Martin Waitz <tali@admingilde.org> writes:
> 
> > To make purge and fsck work we have the hard requirement that it
> > must be possible to list all objects which belong to one submodule.
> 
> I understand you would want to separate the ref namespace, but I
> still do not see why you would want to have a separate object
> store, laid out in a funny way.  Unless you are thinking about
> using rsync to transfer object store, that is.

I want to be able to list all objects which are not reachable in the
object store, without traversing all submodules at the same time.
The only way I can think of to achieve this is to have one separate
object store per submodule and then do the traversal per submodule.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:55                         ` Martin Waitz
@ 2007-03-26 23:40                           ` David Lang
  2007-03-27 15:25                             ` Martin Waitz
  2007-03-27  0:29                           ` Junio C Hamano
  1 sibling, 1 reply; 61+ messages in thread
From: David Lang @ 2007-03-26 23:40 UTC (permalink / raw)
  To: Martin Waitz
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

On Tue, 27 Mar 2007, Martin Waitz wrote:

> hoi :)
>
> On Mon, Mar 26, 2007 at 03:20:34PM -0800, David Lang wrote:
>>> I want to be able to list all objects which are not reachable in the
>>> object store, without traversing all submodules at the same time.
>>> The only way I can think of to achieve this is to have one separate
>>> object store per submodule and then do the traversal per submodule.
>>
>> why do you want to optimize for the relativly rare fsck function rather
>> then the more common read functions (which would benifit from shareing
>> object that are identical between projects)?
>
> Because I don't know how to make it _possible_ for large repositories
> otherwise.  Consider a Linux-distribution which handles each package
> as one submodule.
>
> I don't think that it's too much balanced towards fsck.
> The separated object store also helps reduce the memory requirement for
> large pushs/pulls.
> Sharing objects can be achieved by alternates if you want.

alternates require explicitly setting up the sharing.

useing the same object store makes this work automaticaly (think of all the 
copies of COPYING that would end up being the same as a trivial example)

> If someone comes up with a nice way to handle everything in one big
> object store I would happily use that! :-)

what exactly are the problems with one big object store?

ones that I can think of:

1. when you are doing a fsck you need to walk all the trees and find out the 
list of objects that you know about.

   done as a tree of binary values you can hold a LOT in memory before running 
into swap.

   if it's enough larger then available ram then an option for fsck to use trees 
on disk is an option.

2. when creating a pack you will eventually run into pack-size limits with too 
many objects

   teach the pack creators to make packs that are subsets rather then everything 
(I belive that most of the smarts are there, it just needs the upper control 
logic to tell the existing things what to include)

3. when doing a pull it takes longer to figure out what to pull to get a 
duplicate of _everything_

   add a way to do a 'pull projectlist' that would look at what objects are 
needed by the project(s) requested and only try to pack up those objects

what else is there that I'm not thinking of? so far these look like long-term 
problems as opposed to short-term problems, and all of them have fairly simple 
fixes that can be implemented as they become an issue.

David Lang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:20                       ` David Lang
@ 2007-03-26 23:55                         ` Martin Waitz
  2007-03-26 23:40                           ` David Lang
  2007-03-27  0:29                           ` Junio C Hamano
  0 siblings, 2 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-26 23:55 UTC (permalink / raw)
  To: David Lang
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

hoi :)

On Mon, Mar 26, 2007 at 03:20:34PM -0800, David Lang wrote:
> >I want to be able to list all objects which are not reachable in the
> >object store, without traversing all submodules at the same time.
> >The only way I can think of to achieve this is to have one separate
> >object store per submodule and then do the traversal per submodule.
> 
> why do you want to optimize for the relativly rare fsck function rather 
> then the more common read functions (which would benifit from shareing 
> object that are identical between projects)?

Because I don't know how to make it _possible_ for large repositories
otherwise.  Consider a Linux-distribution which handles each package
as one submodule.

I don't think that it's too much balanced towards fsck.
The separated object store also helps reduce the memory requirement for
large pushs/pulls.
Sharing objects can be achieved by alternates if you want.
If someone comes up with a nice way to handle everything in one big
object store I would happily use that! :-)

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:55                         ` Martin Waitz
  2007-03-26 23:40                           ` David Lang
@ 2007-03-27  0:29                           ` Junio C Hamano
  2007-03-27 14:28                             ` Martin Waitz
  1 sibling, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2007-03-27  0:29 UTC (permalink / raw)
  To: Martin Waitz; +Cc: David Lang, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

Martin Waitz <tali@admingilde.org> writes:

> The separated object store also helps reduce the memory requirement for
> large pushs/pulls.

That's a total bull.  The size of push/pull only depends on how
you separate set of refs (which affects the traversal hence
affects the set of objects to be exchanged).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 18:21     ` Matthieu Moy
@ 2007-03-27  0:48       ` Jakub Narebski
  0 siblings, 0 replies; 61+ messages in thread
From: Jakub Narebski @ 2007-03-27  0:48 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

Matthieu Moy wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> Matthieu Moy wrote:
>>
>>> Stupid question: why .gitlink, and not .git/link or so? This file is
>>> not versionned, I don't think it should be in the working tree.
>>
>> It would be special-cased, as is .git directory not versioned, the
>> same way .gitlink would be not versioned (not like .gitignore).
> 
> That's how I understand it, but why 2 special cases (.git and
> .gitlink) when you need one?

Well, in the first proposal we didn't though about idea of having
lightweight checkout .git/ directory (with .git/link, or core.link 
in .git/config) to have only parts of it, perhaps overriding main
files.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-25 20:55     ` Junio C Hamano
  2007-03-25 21:05       ` Shawn O. Pearce
@ 2007-03-27  3:40       ` Petr Baudis
  1 sibling, 0 replies; 61+ messages in thread
From: Petr Baudis @ 2007-03-27  3:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Matthieu Moy, git, Eric Lesh

On Sun, Mar 25, 2007 at 10:55:31PM CEST, Junio C Hamano wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > Actually, I'd almost say put it into .git/config, e.g.:
> >
> > 	mkdir .git
> > 	cat >.git/config <<EOF
> > 	[core]
> > 		repositoryversion = 0
> > 		filemode = true
> > 		link = /path/to/source
> > 	EOF
> >
> > as then the index and HEAD file can both be stored in .git, just
> > like with the non-gitlink case.
> 
> This is not usable at least for me as it does not allow sharing
> the .git/config file across checkouts.

Can't you take linked .git/config and override it with stuff from local
.git/config in that case? Don't replace, supraposition.

Take somewhat contrived example of having checkout on a FAT partition
linking to a repository on a sane filesystem (are you permanently short
on disk space on your /home partition too, except for about two months
right after you double your disk capacity? :). You might want to disable
core.fileMode there. Maybe this will not ever happen in real world and
we might not care. Maybe not...

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:36                     ` Martin Waitz
  2007-03-26 23:20                       ` David Lang
@ 2007-03-27 11:25                       ` Uwe Kleine-König
  2007-03-27 11:50                         ` Uwe Kleine-König
  2007-03-27 15:46                         ` Martin Waitz
  1 sibling, 2 replies; 61+ messages in thread
From: Uwe Kleine-König @ 2007-03-27 11:25 UTC (permalink / raw)
  To: Martin Waitz
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

Hello,

Martin Waitz wrote:
> On Mon, Mar 26, 2007 at 04:28:28PM -0700, Junio C Hamano wrote:
> > Martin Waitz <tali@admingilde.org> writes:
> > 
> > > To make purge and fsck work we have the hard requirement that it
> > > must be possible to list all objects which belong to one submodule.
> > 
> > I understand you would want to separate the ref namespace, but I
> > still do not see why you would want to have a separate object
> > store, laid out in a funny way.  Unless you are thinking about
> > using rsync to transfer object store, that is.
> 
> I want to be able to list all objects which are not reachable in the
> object store, without traversing all submodules at the same time.
> The only way I can think of to achieve this is to have one separate
> object store per submodule and then do the traversal per submodule.
I might have understood something wrongly, but to list objects that are
not reachable you need to traverse all trees anyhow, don't you.  

Then how big is the difference between a directory and an submodule?
I'd expect it's not so big if the submodules included in different
revisions of the supermodule share most of their history.  Of course you
need to exploit that.  Thinking again that might be the problem?

Best regards
Uwe

-- 
Uwe Kleine-König

5 out of 4 people have trouble with fractions.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 11:25                       ` Uwe Kleine-König
@ 2007-03-27 11:50                         ` Uwe Kleine-König
  2007-03-27 15:53                           ` Martin Waitz
  2007-03-27 15:46                         ` Martin Waitz
  1 sibling, 1 reply; 61+ messages in thread
From: Uwe Kleine-König @ 2007-03-27 11:50 UTC (permalink / raw)
  To: Martin Waitz, Junio C Hamano, Josef Weidendorfer, Eric Lesh,
	Matthieu Moy, git

Hallo again,

Uwe Kleine-König wrote:
> > I want to be able to list all objects which are not reachable in the
> > object store, without traversing all submodules at the same time.
> > The only way I can think of to achieve this is to have one separate
> > object store per submodule and then do the traversal per submodule.
> I might have understood something wrongly, but to list objects that are
> not reachable you need to traverse all trees anyhow, don't you.  
> 
> Then how big is the difference between a directory and an submodule?
> I'd expect it's not so big if the submodules included in different
> revisions of the supermodule share most of their history.  Of course you
> need to exploit that.  Thinking again that might be the problem?
I didn't look at the code, but an other issue might be:

If you separate the odbs e.g by the pathname of the subproject, what
happens if I choose to move the linux kernel in my embedded Linux
project from /linux to /kernel/linux?

Or maybe worse:  If I currently track the Kernel in a tree (because of
git lacking submodule support) and switch to submodule.  Then
linux/Makefile has to exist in both the supermodule's and the
submodule's odb.

Best regards
Uwe

-- 
Uwe Kleine-König

http://www.google.com/search?q=30+hours+and+4+days+in+seconds

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27  0:29                           ` Junio C Hamano
@ 2007-03-27 14:28                             ` Martin Waitz
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-27 14:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: David Lang, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 453 bytes --]

hoi :)

On Mon, Mar 26, 2007 at 05:29:05PM -0700, Junio C Hamano wrote:
> Martin Waitz <tali@admingilde.org> writes:
> > The separated object store also helps reduce the memory requirement for
> > large pushs/pulls.
> 
> That's a total bull.  The size of push/pull only depends on how
> you separate set of refs (which affects the traversal hence
> affects the set of objects to be exchanged).

of course you are right.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-26 23:40                           ` David Lang
@ 2007-03-27 15:25                             ` Martin Waitz
  2007-03-27 16:53                               ` David Lang
  0 siblings, 1 reply; 61+ messages in thread
From: Martin Waitz @ 2007-03-27 15:25 UTC (permalink / raw)
  To: David Lang
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 2872 bytes --]

hoi :)

Its really funny that when I proposed one big object database everybody
wanted it separated and now that I propose a separate database everybody
wants it as one combined database.
I read this as a sign that people really try to think critically about
the design, which is a good thing and will hopefully lead to a good
and stable submodule implementation.

On Mon, Mar 26, 2007 at 03:40:15PM -0800, David Lang wrote:
> useing the same object store makes this work automaticaly (think of all the 
> copies of COPYING that would end up being the same as a trivial example)

Yes, but I guess not much more than COPYING, INSTALL, some trivial
Makefiles and empty files will be shared between subprojects.
Except when you have the same subproject in your tree multiple times,
of course.

Yet this sharing is exactly why I started to do it that way, until Linus
stopped me.

> >If someone comes up with a nice way to handle everything in one big
> >object store I would happily use that! :-)
> 
> what exactly are the problems with one big object store?

I think we really have to discuss this separation on several layers:
traversal, pack-files, and object database.

For the traversal the point of separating it into a per-module traversal
is that only one module has to be loaded into RAM at a time.
This effects all operations which do a (potentially) recursive traversal:
push, pull, fsck, prune, repack.
However a separated traversal will no longer be garanteed to only list
an object once, so this has to be handled in some way.

Pack files should have better access patterns if they are per-module.
Most of the time you are only interested in one individual module and
locality is important here.

Separating the entire object database is a way to improve unreachability
analysis, as it now can be done per module.
The other two separations are easier to implement with a separated
object database, but that's not too strong an argument.


So if we can come up with a nice way to do unreachability analysis we
can indeed go on with the shared object database and tackle the
remaining scalability issues as they arise.  Those could then be added
later without changing the on-disk format.

> ones that I can think of:
> 
> 1. when you are doing a fsck you need to walk all the trees and find out 
> the list of objects that you know about.
> 
>   done as a tree of binary values you can hold a LOT in memory before 
>   running into swap.

Could you explain the algorithm you are thinking about in more detail?

>   if it's enough larger then available ram then an option for fsck to use 
>   trees on disk is an option.

This could simplify some things.
There could be an on-disk index of all known objects, so that the sha1
sums do not have to loaded into RAM all at once.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 11:25                       ` Uwe Kleine-König
  2007-03-27 11:50                         ` Uwe Kleine-König
@ 2007-03-27 15:46                         ` Martin Waitz
  1 sibling, 0 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-27 15:46 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]

hoi :)

On Tue, Mar 27, 2007 at 01:25:49PM +0200, Uwe Kleine-König wrote:
> Martin Waitz wrote:
> > I want to be able to list all objects which are not reachable in the
> > object store, without traversing all submodules at the same time.
> > The only way I can think of to achieve this is to have one separate
> > object store per submodule and then do the traversal per submodule.
> I might have understood something wrongly, but to list objects that are
> not reachable you need to traverse all trees anyhow, don't you.  
> 
> Then how big is the difference between a directory and an submodule?
> I'd expect it's not so big if the submodules included in different
> revisions of the supermodule share most of their history.  Of course you
> need to exploit that.  Thinking again that might be the problem?

there is no big difference, only that the submodule may be huge by
itself.

And yes, every object has to be traversed anyhow, but not neccessarily by
the same process and at the same time.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 11:50                         ` Uwe Kleine-König
@ 2007-03-27 15:53                           ` Martin Waitz
  2007-03-27 16:56                             ` Josef Weidendorfer
  2007-03-27 17:22                             ` Uwe Kleine-König
  0 siblings, 2 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-27 15:53 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

hoi :)

On Tue, Mar 27, 2007 at 01:50:29PM +0200, Uwe Kleine-König wrote:
> If you separate the odbs e.g by the pathname of the subproject, what
> happens if I choose to move the linux kernel in my embedded Linux
> project from /linux to /kernel/linux?

Then a new separate object database would have to be created.
This is the part I really don't like about separate object databases,
but perhaps some persistent alternates information could help here.

For any other way to separate the odb (project id, whatever), we
can't get a list of references into it by a path-limited traversal
in the parent. Thus separate odbs which are not bound to a special
location have some serious downsides.

> Or maybe worse:  If I currently track the Kernel in a tree (because of
> git lacking submodule support) and switch to submodule.  Then
> linux/Makefile has to exist in both the supermodule's and the
> submodule's odb.

Sorry, I don't understand you here.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 16:56                             ` Josef Weidendorfer
@ 2007-03-27 16:44                               ` Martin Waitz
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-27 16:44 UTC (permalink / raw)
  To: Josef Weidendorfer
  Cc: Uwe Kleine-König, Junio C Hamano, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

hoi :)

On Tue, Mar 27, 2007 at 06:56:09PM +0200, Josef Weidendorfer wrote:
> On Tuesday 27 March 2007, Martin Waitz wrote:
> > For any other way to separate the odb (project id, whatever), we
> > can't get a list of references into it by a path-limited traversal
> > in the parent. Thus separate odbs which are not bound to a special
> > location have some serious downsides.
> 
> For path-limited traversal, you still need to know all the paths
> with super/subproject boundaries somewhere in the history.
> Do you store this information somewhere?
> If so, how is this different from directly storing the boundaries
> (aside from size)?

You only need the path-limited traversal when you want to look at one
individual submodule.  And then you start the operation in this
submodule, so you know the path.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 15:25                             ` Martin Waitz
@ 2007-03-27 16:53                               ` David Lang
  0 siblings, 0 replies; 61+ messages in thread
From: David Lang @ 2007-03-27 16:53 UTC (permalink / raw)
  To: Martin Waitz
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

On Tue, 27 Mar 2007, Martin Waitz wrote:

> On Mon, Mar 26, 2007 at 03:40:15PM -0800, David Lang wrote:
>> useing the same object store makes this work automaticaly (think of all the
>> copies of COPYING that would end up being the same as a trivial example)
>
> Yes, but I guess not much more than COPYING, INSTALL, some trivial
> Makefiles and empty files will be shared between subprojects.
> Except when you have the same subproject in your tree multiple times,
> of course.

although, if you end up packing multiple projects togeather you may end up 
finding more things that diff well against each other (although it will slow 
down the packing with more objects.

> Yet this sharing is exactly why I started to do it that way, until Linus
> stopped me.

I missed that one.

>>> If someone comes up with a nice way to handle everything in one big
>>> object store I would happily use that! :-)
>>
>> what exactly are the problems with one big object store?
>
> I think we really have to discuss this separation on several layers:
> traversal, pack-files, and object database.
>
> For the traversal the point of separating it into a per-module traversal
> is that only one module has to be loaded into RAM at a time.
> This effects all operations which do a (potentially) recursive traversal:
> push, pull, fsck, prune, repack.
> However a separated traversal will no longer be garanteed to only list
> an object once, so this has to be handled in some way.

an object can already appear more then once in pack files.

> Pack files should have better access patterns if they are per-module.
> Most of the time you are only interested in one individual module and
> locality is important here.
>
> Separating the entire object database is a way to improve unreachability
> analysis, as it now can be done per module.
> The other two separations are easier to implement with a separated
> object database, but that's not too strong an argument.

if modules are really as seperate as you make them out to be then what you want 
isn't multiple modules inside one overall project (top level .git) you want 
multiple projects and a way to link them togeather.

>
> So if we can come up with a nice way to do unreachability analysis we
> can indeed go on with the shared object database and tackle the
> remaining scalability issues as they arise.  Those could then be added
> later without changing the on-disk format.
>
>> ones that I can think of:
>>
>> 1. when you are doing a fsck you need to walk all the trees and find out
>> the list of objects that you know about.
>>
>>   done as a tree of binary values you can hold a LOT in memory before
>>   running into swap.
>
> Could you explain the algorithm you are thinking about in more detail?

as I understand it the need is to efficiantly create a list of all the objects 
that are reachable (so that we can then go through the objects and remove them 
if they aren't on the list).

you need these sorted to make it easy to find if something is in the list, and 
with millions of entries you don't it to be a flat list (inserting new values 
becomes very inefficiant) so the classic answer is to do a tree structure. you 
can either do a tree with the object ID's in all the nodes, or you can do one 
where only the leaf nodes hold the object ID's and the other nodes just hold 
pointers (which would then allow you to spill the leaf nodes to disk more 
efficiantly as they wouldn't need to be accessed when inserting unless the node 
itself needed to be changed. looking them up is being done more or less in alpha 
order for loose objects (and could be made to be so for objects in packs) so any 
file I/O for lookups would be close to sequential

this sort of memory useage wouldn't be acceptable for something that happens 
frequently, but a fsck/prune is relativly infrequent and can be run off-hours.

>>   if it's enough larger then available ram then an option for fsck to use
>>   trees on disk is an option.
>
> This could simplify some things.
> There could be an on-disk index of all known objects, so that the sha1
> sums do not have to loaded into RAM all at once.

you wouldn't want to trust this for a fsck/prune

David Lang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 15:53                           ` Martin Waitz
@ 2007-03-27 16:56                             ` Josef Weidendorfer
  2007-03-27 16:44                               ` Martin Waitz
  2007-03-27 17:22                             ` Uwe Kleine-König
  1 sibling, 1 reply; 61+ messages in thread
From: Josef Weidendorfer @ 2007-03-27 16:56 UTC (permalink / raw)
  To: Martin Waitz
  Cc: Uwe Kleine-König, Junio C Hamano, Eric Lesh, Matthieu Moy, git

On Tuesday 27 March 2007, Martin Waitz wrote:
> For any other way to separate the odb (project id, whatever), we
> can't get a list of references into it by a path-limited traversal
> in the parent. Thus separate odbs which are not bound to a special
> location have some serious downsides.

For path-limited traversal, you still need to know all the paths
with super/subproject boundaries somewhere in the history.
Do you store this information somewhere?
If so, how is this different from directly storing the boundaries
(aside from size)?

Josef

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 23:14               ` Josef Weidendorfer
@ 2007-03-27 16:59                 ` Matthieu Moy
  0 siblings, 0 replies; 61+ messages in thread
From: Matthieu Moy @ 2007-03-27 16:59 UTC (permalink / raw)
  To: git

Josef Weidendorfer <Josef.Weidendorfer@gmx.de> writes:

>> * You have to manage a name for each lightweight checkout. What would
>>   be such name? User-provided? uuidgen-like?
>
> Such a name is interesting as identifier for submodules.
> It would be the relative path of the submodule from the supermodule base;
> or user supplied.
>
> Lightweight checkouts and submodules have different requirements;
> yet, the lightweight checkouts should be so flexible to be
> able to be used for submodules checkouts.

Sure, but AAUI, submodules would use light checkouts, but light
checkouts are interesting by themselves, so the naming thing should be
in submodules support.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 17:04                   ` Linus Torvalds
@ 2007-03-27 17:00                     ` David Lang
  2007-03-27 18:15                       ` Linus Torvalds
  2007-03-27 17:35                     ` Martin Waitz
  2007-03-27 18:09                     ` Daniel Barkalow
  2 siblings, 1 reply; 61+ messages in thread
From: David Lang @ 2007-03-27 17:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josef Weidendorfer, Junio C Hamano, Martin Waitz, Eric Lesh,
	Matthieu Moy, git

On Tue, 27 Mar 2007, Linus Torvalds wrote:

> - walking the *global* object list is simply not possible. You need to
>   fsck every single subtree individually, and fsck the superproject on
>   its own, *without* recursing into the subprojects. And you need to be
>   able to clone the superproject and only one or two subprojects, and
>   never see it as one "atomic" big repository.

you can do a fsck to make sure that all needed objects are available on each 
project individually, but a prune/gc _does_ need to go through the global object 
list to find out what objects aren't needed anymore (otherwise, how do you know 
if the object isn't needed by another thing sharing the same object store?)

David Lang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-26 23:17                 ` .gitlink for Summer of Code Josef Weidendorfer
       [not found]                   ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
  2007-03-26 23:24                   ` Junio C Hamano
@ 2007-03-27 17:04                   ` Linus Torvalds
  2007-03-27 17:00                     ` David Lang
                                       ` (2 more replies)
  2 siblings, 3 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 17:04 UTC (permalink / raw)
  To: Josef Weidendorfer
  Cc: Junio C Hamano, Martin Waitz, Eric Lesh, Matthieu Moy, git



On Tue, 27 Mar 2007, Josef Weidendorfer wrote:

> On Tuesday 27 March 2007, Junio C Hamano wrote:
> > Martin Waitz <tali@admingilde.org> writes:
> > 
> > > For submodules I currently use <parent>/.git/objects/module/<submodule>/
> > > to store the objects belonging to the submodule.
> > 
> > I was not following the gitlink discussion closely, but what is
> > the motivation behind this separation of the object store?
> 
> The separation issue is about scalability of submodules, and not
> directly about gitlink.

NOTE! It's fine to share the *object*store* for a supermodule setup.

The scalability concerns are not about the number of objects, but about 
the operations that work on them, and specifically *traverse* the objects.

So while it's fine to share the same GIT_OBJECT_DIR for all the 
submodules, it's *not* ok if "git clone" on a supermodule will consider 
things to be one single repository, and clone it as one huge thing, 
generating (and having to look up!) a ten-million object pack for a 
hundred smaller projects. THAT won't scale.

Basically, a "git-rev-list --objects HEAD" in the super-module should only 
list the objects in the supermodule itself, not in all the submodules. And 
that implies that cloning a supermodule is not about cloning a single big 
repository: it would be a matter of:

 - first cloning first the supermodule itself (which is often fairly 
   small: just a top-level directory, with some top-level Makefiles and a 
   number of directories that are submodules)

 - then parsing some supermodule data structure, and cloning each 
   submodule individually.

Similarly for "fetch" (and merging too, of course - it ends up having to 
merge each sub-project separately). 

Think of it this way: if you think people find it a bit annoying that you 
currently have to get all the history when you do clone (and why people 
have worked on "shallow clones" in git), imagine just *how* frustrating it 
is if you have to get all five-hundred subprojects when you only want to 
work on one small one!

Think of something like a huge *BSD "world" tree, where the supermodule 
contains *everything*. Do you really _really_ expect that every single 
developer wants to clone it all? I have no idea how much that is, but I 
can well imagine that it's several thousand subprojects, some of which are 
quite big in their own right. 

Also, imagine the server side.. Anybody who thinks that the server wants 
to (or is even *able* to) do things like a fsck on the totality, or keep 
every single object in memory, is in for a nasty surprise..

So I think that:

 - sharing object directories should not be a requirement, but it should 
   certainly be *possible*. Quite often you might want to do it, although 
   for really big superprojects it might well make sense to have 
   individual object stores too.

 - walking the *global* object list is simply not possible. You need to 
   fsck every single subtree individually, and fsck the superproject on 
   its own, *without* recursing into the subprojects. And you need to be 
   able to clone the superproject and only one or two subprojects, and 
   never see it as one "atomic" big repository.

I really think people should think about the *BSD kind of "world" setup. 
You absolutely do _not_ want supermodules to be indivisible "everything or 
nothign" kind of things. You want submodules to be very much separate 
repostories, although you *can* of course share the object store if you 
want to (the same way git can do it between any number of totally 
unrelated repositories!)

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 15:53                           ` Martin Waitz
  2007-03-27 16:56                             ` Josef Weidendorfer
@ 2007-03-27 17:22                             ` Uwe Kleine-König
  2007-03-27 18:41                               ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Uwe Kleine-König @ 2007-03-27 17:22 UTC (permalink / raw)
  To: Martin Waitz
  Cc: Junio C Hamano, Josef Weidendorfer, Eric Lesh, Matthieu Moy, git

Hallali,

Martin Waitz wrote:
> On Tue, Mar 27, 2007 at 01:50:29PM +0200, Uwe Kleine-König wrote:
> > If you separate the odbs e.g by the pathname of the subproject, what
> > happens if I choose to move the linux kernel in my embedded Linux
> > project from /linux to /kernel/linux?
> 
> Then a new separate object database would have to be created.
> This is the part I really don't like about separate object databases,
> but perhaps some persistent alternates information could help here.
> 
> For any other way to separate the odb (project id, whatever), we
> can't get a list of references into it by a path-limited traversal
> in the parent. Thus separate odbs which are not bound to a special
> location have some serious downsides.
(CVS comes to mind ...)
This currently convinces me that a separate odb is wrong.

> > Or maybe worse:  If I currently track the Kernel in a tree (because of
> > git lacking submodule support) and switch to submodule.  Then
> > linux/Makefile has to exist in both the supermodule's and the
> > submodule's odb.
> 
> Sorry, I don't understand you here.
Assume I have

	embeddedproject$ git ls-tree HEAD | grep linux
	040000 tree 0123456789abcdef... linux-2.6

and then I commit on top of that, s.t. I get:

	embeddedproject$ git ls-tree HEAD | grep linux
	040000 commit 0123456789abcde0... linux-2.6

(or how ever you save submodules).  Then you might have to duplicate the
objects of linux-2.6, because they are part of both histories.

Best regards
Uwe

-- 
Uwe Kleine-König

primes where sieve (p:xs) = [ x | x<-xs, x `rem` p /= 0 ]; \
primes = map head (iterate sieve [2..])

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 17:04                   ` Linus Torvalds
  2007-03-27 17:00                     ` David Lang
@ 2007-03-27 17:35                     ` Martin Waitz
  2007-03-27 18:09                     ` Daniel Barkalow
  2 siblings, 0 replies; 61+ messages in thread
From: Martin Waitz @ 2007-03-27 17:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josef Weidendorfer, Junio C Hamano, Eric Lesh, Matthieu Moy, git

[-- Attachment #1: Type: text/plain, Size: 1248 bytes --]

hoi :)

On Tue, Mar 27, 2007 at 10:04:53AM -0700, Linus Torvalds wrote:
>  - walking the *global* object list is simply not possible. You need to 
>    fsck every single subtree individually, and fsck the superproject on 
>    its own, *without* recursing into the subprojects. And you need to be 
>    able to clone the superproject and only one or two subprojects, and 
>    never see it as one "atomic" big repository.

and just skip the unreachability check of fsck?
With this limitation a shared object store would be possible.

> I really think people should think about the *BSD kind of "world" setup. 
> You absolutely do _not_ want supermodules to be indivisible "everything or 
> nothign" kind of things. You want submodules to be very much separate 
> repostories, although you *can* of course share the object store if you 
> want to (the same way git can do it between any number of totally 
> unrelated repositories!)

You already convinced me that the "world" use-case is worthwhile.
As to *can* be shared: I'd really like to have some default location for
all objects so that it can be found automatically when you later decide
to checkout a new submodule which has not yet been fetched.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 17:04                   ` Linus Torvalds
  2007-03-27 17:00                     ` David Lang
  2007-03-27 17:35                     ` Martin Waitz
@ 2007-03-27 18:09                     ` Daniel Barkalow
  2007-03-27 18:19                       ` Linus Torvalds
  2007-03-27 18:36                       ` Steven Grimm
  2 siblings, 2 replies; 61+ messages in thread
From: Daniel Barkalow @ 2007-03-27 18:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josef Weidendorfer, Junio C Hamano, Martin Waitz, Eric Lesh,
	Matthieu Moy, git

On Tue, 27 Mar 2007, Linus Torvalds wrote:

> Think of it this way: if you think people find it a bit annoying that you 
> currently have to get all the history when you do clone (and why people 
> have worked on "shallow clones" in git), imagine just *how* frustrating it 
> is if you have to get all five-hundred subprojects when you only want to 
> work on one small one!

Is it fair to say that subproject support means that there's a use case 
where everybody will need shallow clones? And that it points out natural 
triggers for shallowness?

I don't see that the "shallow clone" mechanism is special for subprojects 
(and I don't think that a solution that depends on subprojects being what 
causes it is a good idea), but clearly it makes sense to support: (1) no 
clone of submodules, (2) shallow clone of submodules, and (3) full clone 
of submodules.

Somebody working on gcc for *BSD would presumably want to get all of gcc 
and a shallow clone of the other 1000 submodules, right? Or they'd just 
clone the submodule and ignore the superproject. At least, they'd need 
shallow clones of a bunch of the submodules, because it's not interesting 
to have the superproject otherwise.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 17:00                     ` David Lang
@ 2007-03-27 18:15                       ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 18:15 UTC (permalink / raw)
  To: David Lang
  Cc: Josef Weidendorfer, Junio C Hamano, Martin Waitz, Eric Lesh,
	Matthieu Moy, git



On Tue, 27 Mar 2007, David Lang wrote:

> On Tue, 27 Mar 2007, Linus Torvalds wrote:
> 
> > - walking the *global* object list is simply not possible. You need to
> >   fsck every single subtree individually, and fsck the superproject on
> >   its own, *without* recursing into the subprojects. And you need to be
> >   able to clone the superproject and only one or two subprojects, and
> >   never see it as one "atomic" big repository.
> 
> you can do a fsck to make sure that all needed objects are available on each
> project individually, but a prune/gc _does_ need to go through the global

No it doesn't.

If you do per-project object stores, there's no need. There simply isn't 
anything to prune "globally". Everything is local.

Now, if you share the object directory, you can't prune. But that's true 
even without any "subproject" support, and has nothing to do with sub- or 
super-projects.
	
		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 18:09                     ` Daniel Barkalow
@ 2007-03-27 18:19                       ` Linus Torvalds
  2007-03-27 20:54                         ` Daniel Barkalow
  2007-03-27 18:36                       ` Steven Grimm
  1 sibling, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 18:19 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Josef Weidendorfer, Junio C Hamano, Martin Waitz, Eric Lesh,
	Matthieu Moy, git



On Tue, 27 Mar 2007, Daniel Barkalow wrote:
> 
> Is it fair to say that subproject support means that there's a use case 
> where everybody will need shallow clones? And that it points out natural 
> triggers for shallowness?

No.

I personally don't believe in shallow clones. And I *certainly* don't 
believe that it has anything to do with subprojects. So people may want 
shallow clones, but it's at least independent of the issue of submodules.

With subprojects, it's not that you don't want the history. It's just that 
you don't want the history for *all* projects. Most people care about a 
very small subset.

(The exception, of course, is when the superproject simply isn't that big, 
and only has a couple of subprojects. In git, for example, the xdiff stuff 
could be a subproject if you wanted to do it that way. But then, the 
subproject isn't a size issue, it's purely an organizational thing, and 
there is no argument for/against shallowness there either).

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 18:09                     ` Daniel Barkalow
  2007-03-27 18:19                       ` Linus Torvalds
@ 2007-03-27 18:36                       ` Steven Grimm
  2007-03-27 20:02                         ` Daniel Barkalow
  1 sibling, 1 reply; 61+ messages in thread
From: Steven Grimm @ 2007-03-27 18:36 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Linus Torvalds, Josef Weidendorfer, Junio C Hamano, Martin Waitz,
	Eric Lesh, Matthieu Moy, git

Daniel Barkalow wrote:
> Somebody working on gcc for *BSD would presumably want to get all of gcc 
> and a shallow clone of the other 1000 submodules, right? Or they'd just 
> clone the submodule and ignore the superproject. At least, they'd need 
> shallow clones of a bunch of the submodules, because it's not interesting 
> to have the superproject otherwise.
>   

The obvious use case for "I want the superproject and just one 
submodule" is when the superproject has build tools, header files, or 
other pieces of data that are shared by some/all of the submodules. 
Maybe not the case in BSD per se, but having a top-level file full of 
settings, paths to tools, etc. that gets included by the individual 
Makefiles in subdirectories isn't all that uncommon in complex 
multi-part projects.

-Steve

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 17:22                             ` Uwe Kleine-König
@ 2007-03-27 18:41                               ` Linus Torvalds
  2007-03-27 19:42                                 ` Uwe Kleine-König
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 18:41 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Martin Waitz, Junio C Hamano, Josef Weidendorfer, Eric Lesh,
	Matthieu Moy, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3016 bytes --]



On Tue, 27 Mar 2007, Uwe Kleine-König wrote:
> 
> 	embeddedproject$ git ls-tree HEAD | grep linux
> 	040000 commit 0123456789abcde0... linux-2.6
> 
> (or how ever you save submodules).  Then you might have to duplicate the
> objects of linux-2.6, because they are part of both histories.

No they are not. Unless you do it wrong.

The *only* object that is part of the superproject would be the tree that 
*contains* that entry itself.

We should *never* automatically follow such an entry down, *exactly* 
because that doesn't scale. So to actually follow that entry for something 
like a recursive, you'd literally "cd into linux, and start 'git diff' 
from commit 0123456.."

In other words, the subproject would be its own project, and the 
superproject never sees it as "part of itself". I really think, for 
example, that the "git diff" family of programs (diff-index, diff-tree, 
diff-files) and things like "git ls-tree" should literally:

 - have a mode where they don't even recurse into subprojects, and I 
   personally think that it could/should be the default!

 - when they recurse, they should literally (at least to begin with) do 
   that kind of "fork() ; if (child) { chdir(subproject); execve(myself) }" 

The latter is really to make sure that *even*by*mistake* we don't screw 
things up and tie the sub/superproject together too tightly. 

I'm serious. I really think that the first version (which ends up being 
the one that sets semantics) should be very careful here, so that 
subprojects never get mixed up with the superproject.

And I'm also serious about the "don't recurse into subproject by default 
at all". If I'm at the superproject, and I maintain the superproject, I 
think the state of the subprojects themselves are a totally separate 
issue. It's quite a valid thing to do to maintain the build 
infrastructure, and if I'm the maintainer of that, and I do "git diff", I 
sure as hell don't want to wait for git to do "git diff" on the 
subprojects when there are 5000 of them!

Sure, "git diff" is fast (on the kernel, it takes me 0.069s on a clean 
tree), but 

 - multiply that 0.069s by 5000 and it's not so fast any more

 - when you have a thousand subprojects, it's quite possible (even likely) 
   that all your directories won't fit in the cache any more, and suddenly 
   even a single "git diff" takes several seconds.

Really! Try this on the Linux tree (that "drop_caches" thing needs root 
privileges):

	echo 3 > /proc/sys/vm/drop_caches
	git diff

and see it take something like 5 seconds. Now, imagine that you have a 
hundred subprojects, and they're big enough that the caches are *never* 
warm.

People sometimes don't seem to understand what "scalability" really means. 
Scalability means that something that is so fast that you don't even 
*think* about it will become a major bottleneck when you do it a thousand 
times, and the working set has grown so big that it totally blows out 
several levels of caches (both CPU caches and disk caches)

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 18:41                               ` Linus Torvalds
@ 2007-03-27 19:42                                 ` Uwe Kleine-König
  2007-03-27 19:53                                   ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Uwe Kleine-König @ 2007-03-27 19:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Waitz, Junio C Hamano, Josef Weidendorfer, Eric Lesh,
	Matthieu Moy, git

Hello Linus,

Linus Torvalds wrote:
> On Tue, 27 Mar 2007, Uwe Kleine-König wrote:
> > 
> > 	embeddedproject$ git ls-tree HEAD | grep linux
> > 	040000 commit 0123456789abcde0... linux-2.6
> > 
> > (or how ever you save submodules).  Then you might have to duplicate the
> > objects of linux-2.6, because they are part of both histories.
> 
> No they are not. Unless you do it wrong.
> 
> The *only* object that is part of the superproject would be the tree that 
> *contains* that entry itself.
Yes, I got that.  I think my concern is still valid, so probably I was
just unable to phrase it explicitly.  So I retry:

In the state above (i.e. linux-2.6 being a commit) the
superproject's odb doesn't necessarily needs the object
0123456789abcde0, right.  But the commit before that had linux-2.6 being
a tree.  And in that state linux-2.6/Makefile has to be in the
superproject's odb.  So if you choose the save the objects of submodules
in a different odb, linux-2.6/Makefile has to be in both of them.
 
I agree with the things you said afterwards, but they don't match the
issue I wanted to point out.

Best regards
Uwe

-- 
Uwe Kleine-König

http://www.google.com/search?q=2004+in+roman+numerals

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 19:42                                 ` Uwe Kleine-König
@ 2007-03-27 19:53                                   ` Linus Torvalds
  2007-03-27 19:59                                     ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 19:53 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Martin Waitz, Junio C Hamano, Josef Weidendorfer, Eric Lesh,
	Matthieu Moy, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1223 bytes --]



On Tue, 27 Mar 2007, Uwe Kleine-König wrote:
> 
> In the state above (i.e. linux-2.6 being a commit) the
> superproject's odb doesn't necessarily needs the object
> 0123456789abcde0, right.  But the commit before that had linux-2.6 being
> a tree.

Well, you're saying that somebody split an existing non-supermodule 
project?

If so, the supermodule really *does* have the old tree as its state, and 
sure, there will be duplication, but it's duplication that existed in the 
actual projects themselves, not something that the superproject 
introduced.

In other words, I don't think that's an argument for or against sharing 
the object database. You should *always* be able to share the object 
database by setting GIT_OBJECT_DIR if you want (or by using alternates). 
But that's independent of whether you are a sub/supermodule..

After all, if you generate two totally *separate* projects (no subproject 
at all) and they just shared some state on their own (say, git and xdiff 
both as totally independent git repositories), they have objects that can 
be in common. Do you want to use alternates or share an object database? 
Maybe, or maybe not. It depends on the user, not on whether it's a 
subproject.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Submodule object store
  2007-03-27 19:53                                   ` Linus Torvalds
@ 2007-03-27 19:59                                     ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 19:59 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Martin Waitz, Junio C Hamano, Josef Weidendorfer, Eric Lesh,
	Matthieu Moy, git



On Tue, 27 Mar 2007, Linus Torvalds wrote:
> 
> In other words, I don't think that's an argument for or against sharing 
> the object database. You should *always* be able to share the object 
> database by setting GIT_OBJECT_DIR if you want (or by using alternates). 
> But that's independent of whether you are a sub/supermodule..

In fact, I suspect that you might well have a situation where there are 
more objects to be shared "across" superproject boundaries than within 
them.

For example, say that I'm a mirror site, and I mirror two different 
distributions, both of which use superprojects (but *different* 
superprojects!) to track their distro stuff.

Obviously, the top-level setup is likely totally different, and they 
probably differ a bit in which subprojects they have too, but in many 
cases, those two *different* superprojects will have subprojects that 
could often share 99% of all their objects not within the superproject, 
but individually *across* superprojects.

So you would not want to have a object store that is tied to the 
superproject, but you might well want to have each superproject share the 
object store for the subprojects that they have in common. The "kernel" 
subproject in the "ubuntu" superproject might want to share the object 
store for the "linux-2.6" subproject in the Fedora 7 superproject.

(Similarly, there migh be sharing with totally *individual* projects, ie 
you might want to make both just have an alternate that points 
directly to the "official" tree that is in neither of the two 
superprojects and that I maintain separately).

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 18:36                       ` Steven Grimm
@ 2007-03-27 20:02                         ` Daniel Barkalow
  2007-03-27 21:27                           ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Daniel Barkalow @ 2007-03-27 20:02 UTC (permalink / raw)
  To: Steven Grimm
  Cc: Linus Torvalds, Josef Weidendorfer, Junio C Hamano, Martin Waitz,
	Eric Lesh, Matthieu Moy, git

On Tue, 27 Mar 2007, Steven Grimm wrote:

> The obvious use case for "I want the superproject and just one submodule" is
> when the superproject has build tools, header files, or other pieces of data
> that are shared by some/all of the submodules. Maybe not the case in BSD per
> se, but having a top-level file full of settings, paths to tools, etc. that
> gets included by the individual Makefiles in subdirectories isn't all that
> uncommon in complex multi-part projects.

This is actually the case I'm personally interested in. But in that case, 
you want to reverse the superproject/subproject organization, because that 
way each project part can use the desired version of the common stuff, and 
people can modify the common stuff without then testing the whole 
universe.

I.e., at some point, you'll want to change the behavior of the build 
system in such a way that all of the per-part configuration information 
sets needs to be updated to work with it. If the build system is in the 
superproject, you need to do everything at once. If the build system is in 
a subsubproject, you can make the change without affecting anything, and, 
as subprojects pull the build system change, they update the subsubproject 
entry and the configuration files as a single subproject commit. For 
sanity, you want to reach the point where all of the projects are using 
the same subsubproject version, but that doesn't have to happen overnight, 
and you don't have a single commit which touches every subproject.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 21:11                           ` Linus Torvalds
@ 2007-03-27 20:54                             ` David Lang
  2007-03-27 23:31                               ` Jakub Narebski
  0 siblings, 1 reply; 61+ messages in thread
From: David Lang @ 2007-03-27 20:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Barkalow, Josef Weidendorfer, Junio C Hamano,
	Martin Waitz, Eric Lesh, Matthieu Moy, git

On Tue, 27 Mar 2007, Linus Torvalds wrote:

> On Tue, 27 Mar 2007, Daniel Barkalow wrote:
>>
>> Are you talking about submodule history, or submodule state? If they care
>> about any state but not the corresponding history, they need to do a
>> shallow clone of the subproject, right?
>
> I don't see what the confusion is about.
>
> Why would you want a shallow clone, and what does that have to do with
> submodules?
>
> I'm saying that the *normal* case is that of the thousands of submodules,
> you generally care about one or two (the ones you work on).
>
> Those modules you want full history for. The supermodule you want because
> it contains the build infrastructure. You'd generally want full history
> for that too.

if you are working on the submodule then you are correct.

however if you are working on the supermodule it's a different story.

if I'm working on the 'ubuntu superproject' it would be nice to be able to find 
what is different between the 'Jan 2007' and 'April 2007' versions. one could 
have the 2.6.19 kernel and the other would have 2.6.20. I don't care about all 
the individual changes between these two states of the kernel, but I need to be 
able to compile either one as part of my testing. If I bisect the in the 
superproject to the commit that updated the kernel, then I would consider 
getting the 'kernel subproject' history to be able to bisect the bug further (or 
I may just report it to the kernel maintainers for them to check.

> There's absolutely zero reason to think shallow clones have *anything* to
> do with this. It's a totally separate dimension. Sure, you could use
> shallow clones *too*, but it has nothing to do with subprojects.

they are seperate, but if you need to compile the superproject you either need 
to get the full history of every subproject, or you need shallow clones (or some 
third approach).

David Lang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 18:19                       ` Linus Torvalds
@ 2007-03-27 20:54                         ` Daniel Barkalow
  2007-03-27 21:11                           ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Daniel Barkalow @ 2007-03-27 20:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josef Weidendorfer, Junio C Hamano, Martin Waitz, Eric Lesh,
	Matthieu Moy, git

On Tue, 27 Mar 2007, Linus Torvalds wrote:

> On Tue, 27 Mar 2007, Daniel Barkalow wrote:
> > 
> > Is it fair to say that subproject support means that there's a use case 
> > where everybody will need shallow clones? And that it points out natural 
> > triggers for shallowness?
> 
> No.
> 
> I personally don't believe in shallow clones. And I *certainly* don't 
> believe that it has anything to do with subprojects. So people may want 
> shallow clones, but it's at least independent of the issue of submodules.
> 
> With subprojects, it's not that you don't want the history. It's just that 
> you don't want the history for *all* projects. Most people care about a 
> very small subset.

Are you talking about submodule history, or submodule state? If they care 
about any state but not the corresponding history, they need to do a 
shallow clone of the subproject, right?

Or are you assuming that people only want to have every subproject either 
there with full history or entirely absent?

I think that one common thing would be to care about the sequence of linux 
kernel snapshots selected by openembedded in their commits, without caring 
about the linux kernel history in between those snapshots. And they 
probably even want to bisect the superproject (still without getting into 
kernel versions that were never in superproject commits), so they can 
track down what caused their PDA to stop booting, where it's not clear 
which program is even responsible. Maybe once the bug is down to a single 
superproject commit, they'd want the history for the responsible 
subproject.

In any case, I think that the superproject object database needs to keep 
track of where references came from (e.g., if you pull somebody's 
superproject commit, and the point of that commit is to use a 
custom-modified subproject commit, that subproject commit must come from 
the same person, and you need to be able to fetch it correctly after the 
fact if you don't get it immediately, even if you've personally forgotten 
the URL).

> (The exception, of course, is when the superproject simply isn't that big, 
> and only has a couple of subprojects. In git, for example, the xdiff stuff 
> could be a subproject if you wanted to do it that way. But then, the 
> subproject isn't a size issue, it's purely an organizational thing, and 
> there is no argument for/against shallowness there either).

Of course. And to make this use case also viable, it's probably necessary 
to be able to tell git to fetch these subprojects automatically, because 
you'll be sad if you leave for a long plane trip with the latest git but 
not the xdiff it uses. (Clearly an application for .gitattributes, but 
that'd be extra fun to implement.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 20:54                         ` Daniel Barkalow
@ 2007-03-27 21:11                           ` Linus Torvalds
  2007-03-27 20:54                             ` David Lang
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 21:11 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Josef Weidendorfer, Junio C Hamano, Martin Waitz, Eric Lesh,
	Matthieu Moy, git



On Tue, 27 Mar 2007, Daniel Barkalow wrote:
> 
> Are you talking about submodule history, or submodule state? If they care 
> about any state but not the corresponding history, they need to do a 
> shallow clone of the subproject, right?

I don't see what the confusion is about.

Why would you want a shallow clone, and what does that have to do with 
submodules?

I'm saying that the *normal* case is that of the thousands of submodules, 
you generally care about one or two (the ones you work on).

Those modules you want full history for. The supermodule you want because 
it contains the build infrastructure. You'd generally want full history 
for that too.

There's absolutely zero reason to think shallow clones have *anything* to 
do with this. It's a totally separate dimension. Sure, you could use 
shallow clones *too*, but it has nothing to do with subprojects.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 20:02                         ` Daniel Barkalow
@ 2007-03-27 21:27                           ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-03-27 21:27 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Steven Grimm, Josef Weidendorfer, Junio C Hamano, Martin Waitz,
	Eric Lesh, Matthieu Moy, git



On Tue, 27 Mar 2007, Daniel Barkalow wrote:
> 
> This is actually the case I'm personally interested in. But in that case, 
> you want to reverse the superproject/subproject organization, because that 
> way each project part can use the desired version of the common stuff, and 
> people can modify the common stuff without then testing the whole 
> universe.

The build infrastructure is only a small part of the superproject thing.

A much more interesting thing in many ways is the "how do the pieces fit 
together" question, ie the "which library version X do I need for program 
version Y?"

And that needs to be at the superproject level, obviously. The person who 
works on the application will want to fetch the library too, but he likely 
isn't interested in all the *other* libraries that don't affect his app, 
and he likely isn't interested in things like standard libraries (which 
may be in the superproject too, but since their versioning doesn't affect 
any normal subproject, you'd not expect application developers to have all 
of libc checked out and built, would you?).

So yes, you could have several levels: the top level for "versioning", the 
middle level for "applications and libraries" and some third level for 
"build infrastructure that can be shared". However, I've never actually 
seen any project work that way. People *always* seem to put the build 
infrastructure at either the top level, or as one of the subprojects that 
is required for all the other subprojects.

Of course, it's possible that the reason people do that is that things 
like CVS are really really bad at the versioning stuff, and since they 
aren't distributed, you cannot put the shared build infrastructure in 
multiple projects at the same time anyway.

So with a distributed environment like git, doing the shared build 
infrastructure as a separate sub-sub-project would work in ways it does 
*not* work in a centralized model, but I think we also want to just 
support the way people are used to working, and that definitely involves 
having the build infrastructure at or under the top level..

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 23:31                               ` Jakub Narebski
@ 2007-03-27 23:20                                 ` David Lang
  0 siblings, 0 replies; 61+ messages in thread
From: David Lang @ 2007-03-27 23:20 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Wed, 28 Mar 2007, Jakub Narebski wrote:

>> if I'm working on the 'ubuntu superproject' it would be nice to be able to find
>> what is different between the 'Jan 2007' and 'April 2007' versions. one could
>> have the 2.6.19 kernel and the other would have 2.6.20. I don't care about all
>> the individual changes between these two states of the kernel, but I need to be
>> able to compile either one as part of my testing. If I bisect the in the
>> superproject to the commit that updated the kernel, then I would consider
>> getting the 'kernel subproject' history to be able to bisect the bug further (or
>> I may just report it to the kernel maintainers for them to check.
>
> I'd rather call this idea _sparse_ clone (not shallow), as you have only
> some points in the history, but they don't need to be top 'n' ones.

Ok I can see the difference in the definition of the two, the ideal would 
probably be to have sparse and shallow clones be different instances of the same 
mechanism.

  sparse being specific points in the history, shallow being a range.

  allow for multiple ranges, and the ability to 'fill in the blanks' later so 
that points can become ranges and ranges can merge.

also having the server say 'it would only be XMB more to pull everything you 
don't have, do you want to do this?' would cause more load on the server for 
each of the partial pulls, but would encourage people to fill out partial 
repositories instead of hitting the servers repeatedly.

David Lang

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: .gitlink for Summer of Code
  2007-03-27 20:54                             ` David Lang
@ 2007-03-27 23:31                               ` Jakub Narebski
  2007-03-27 23:20                                 ` David Lang
  0 siblings, 1 reply; 61+ messages in thread
From: Jakub Narebski @ 2007-03-27 23:31 UTC (permalink / raw)
  To: git

David Lang wrote:

> On Tue, 27 Mar 2007, Linus Torvalds wrote:
> 
>> On Tue, 27 Mar 2007, Daniel Barkalow wrote:
>>>
>>> Are you talking about submodule history, or submodule state? If they care
>>> about any state but not the corresponding history, they need to do a
>>> shallow clone of the subproject, right?
>>
>> I don't see what the confusion is about.
>>
>> Why would you want a shallow clone, and what does that have to do with
>> submodules?
>>
>> I'm saying that the *normal* case is that of the thousands of submodules,
>> you generally care about one or two (the ones you work on).
>>
>> Those modules you want full history for. The supermodule you want because
>> it contains the build infrastructure. You'd generally want full history
>> for that too.
> 
> if you are working on the submodule then you are correct.
> 
> however if you are working on the supermodule it's a different story.
> 
> if I'm working on the 'ubuntu superproject' it would be nice to be able to find 
> what is different between the 'Jan 2007' and 'April 2007' versions. one could 
> have the 2.6.19 kernel and the other would have 2.6.20. I don't care about all 
> the individual changes between these two states of the kernel, but I need to be 
> able to compile either one as part of my testing. If I bisect the in the 
> superproject to the commit that updated the kernel, then I would consider 
> getting the 'kernel subproject' history to be able to bisect the bug further (or 
> I may just report it to the kernel maintainers for them to check.

I'd rather call this idea _sparse_ clone (not shallow), as you have only
some points in the history, but they don't need to be top 'n' ones.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2007-03-27 23:47 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
2007-03-25 15:20 ` Matthieu Moy
2007-03-25 20:39   ` Shawn O. Pearce
2007-03-25 20:54     ` Johannes Schindelin
2007-03-25 21:03       ` Shawn O. Pearce
2007-03-25 20:55     ` Junio C Hamano
2007-03-25 21:05       ` Shawn O. Pearce
2007-03-27  3:40       ` Petr Baudis
2007-03-26 17:16   ` Eric Lesh
2007-03-26 17:22     ` Matthieu Moy
2007-03-26 17:38       ` Eric Lesh
2007-03-26 18:35         ` Martin Waitz
2007-03-26 19:33           ` Josef Weidendorfer
2007-03-26 19:49             ` Matthieu Moy
2007-03-26 23:14               ` Josef Weidendorfer
2007-03-27 16:59                 ` Matthieu Moy
2007-03-26 22:03             ` Martin Waitz
2007-03-26 22:51               ` Junio C Hamano
2007-03-26 23:16                 ` Submodule object store Martin Waitz
2007-03-26 23:28                   ` Junio C Hamano
2007-03-26 23:36                     ` Martin Waitz
2007-03-26 23:20                       ` David Lang
2007-03-26 23:55                         ` Martin Waitz
2007-03-26 23:40                           ` David Lang
2007-03-27 15:25                             ` Martin Waitz
2007-03-27 16:53                               ` David Lang
2007-03-27  0:29                           ` Junio C Hamano
2007-03-27 14:28                             ` Martin Waitz
2007-03-27 11:25                       ` Uwe Kleine-König
2007-03-27 11:50                         ` Uwe Kleine-König
2007-03-27 15:53                           ` Martin Waitz
2007-03-27 16:56                             ` Josef Weidendorfer
2007-03-27 16:44                               ` Martin Waitz
2007-03-27 17:22                             ` Uwe Kleine-König
2007-03-27 18:41                               ` Linus Torvalds
2007-03-27 19:42                                 ` Uwe Kleine-König
2007-03-27 19:53                                   ` Linus Torvalds
2007-03-27 19:59                                     ` Linus Torvalds
2007-03-27 15:46                         ` Martin Waitz
2007-03-26 23:17                 ` .gitlink for Summer of Code Josef Weidendorfer
     [not found]                   ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
2007-03-26 23:24                   ` Junio C Hamano
2007-03-27 17:04                   ` Linus Torvalds
2007-03-27 17:00                     ` David Lang
2007-03-27 18:15                       ` Linus Torvalds
2007-03-27 17:35                     ` Martin Waitz
2007-03-27 18:09                     ` Daniel Barkalow
2007-03-27 18:19                       ` Linus Torvalds
2007-03-27 20:54                         ` Daniel Barkalow
2007-03-27 21:11                           ` Linus Torvalds
2007-03-27 20:54                             ` David Lang
2007-03-27 23:31                               ` Jakub Narebski
2007-03-27 23:20                                 ` David Lang
2007-03-27 18:36                       ` Steven Grimm
2007-03-27 20:02                         ` Daniel Barkalow
2007-03-27 21:27                           ` Linus Torvalds
2007-03-26 23:00               ` Josef Weidendorfer
2007-03-26 23:27                 ` Martin Waitz
2007-03-26 17:31   ` Jakub Narebski
2007-03-26 18:21     ` Matthieu Moy
2007-03-27  0:48       ` Jakub Narebski
2007-03-25 20:46 ` Shawn O. Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.