From: Linus Torvalds <torvalds@linux-foundation.org>
To: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
Cc: Junio C Hamano <junkio@cox.net>,
Martin Waitz <tali@admingilde.org>, Eric Lesh <eclesh@ucla.edu>,
Matthieu Moy <Matthieu.Moy@imag.fr>,
git@vger.kernel.org
Subject: Re: .gitlink for Summer of Code
Date: Tue, 27 Mar 2007 10:04:53 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0703270952020.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <200703270117.59205.Josef.Weidendorfer@gmx.de>
On Tue, 27 Mar 2007, Josef Weidendorfer wrote:
> On Tuesday 27 March 2007, Junio C Hamano wrote:
> > Martin Waitz <tali@admingilde.org> writes:
> >
> > > For submodules I currently use <parent>/.git/objects/module/<submodule>/
> > > to store the objects belonging to the submodule.
> >
> > I was not following the gitlink discussion closely, but what is
> > the motivation behind this separation of the object store?
>
> The separation issue is about scalability of submodules, and not
> directly about gitlink.
NOTE! It's fine to share the *object*store* for a supermodule setup.
The scalability concerns are not about the number of objects, but about
the operations that work on them, and specifically *traverse* the objects.
So while it's fine to share the same GIT_OBJECT_DIR for all the
submodules, it's *not* ok if "git clone" on a supermodule will consider
things to be one single repository, and clone it as one huge thing,
generating (and having to look up!) a ten-million object pack for a
hundred smaller projects. THAT won't scale.
Basically, a "git-rev-list --objects HEAD" in the super-module should only
list the objects in the supermodule itself, not in all the submodules. And
that implies that cloning a supermodule is not about cloning a single big
repository: it would be a matter of:
- first cloning first the supermodule itself (which is often fairly
small: just a top-level directory, with some top-level Makefiles and a
number of directories that are submodules)
- then parsing some supermodule data structure, and cloning each
submodule individually.
Similarly for "fetch" (and merging too, of course - it ends up having to
merge each sub-project separately).
Think of it this way: if you think people find it a bit annoying that you
currently have to get all the history when you do clone (and why people
have worked on "shallow clones" in git), imagine just *how* frustrating it
is if you have to get all five-hundred subprojects when you only want to
work on one small one!
Think of something like a huge *BSD "world" tree, where the supermodule
contains *everything*. Do you really _really_ expect that every single
developer wants to clone it all? I have no idea how much that is, but I
can well imagine that it's several thousand subprojects, some of which are
quite big in their own right.
Also, imagine the server side.. Anybody who thinks that the server wants
to (or is even *able* to) do things like a fsck on the totality, or keep
every single object in memory, is in for a nasty surprise..
So I think that:
- sharing object directories should not be a requirement, but it should
certainly be *possible*. Quite often you might want to do it, although
for really big superprojects it might well make sense to have
individual object stores too.
- walking the *global* object list is simply not possible. You need to
fsck every single subtree individually, and fsck the superproject on
its own, *without* recursing into the subprojects. And you need to be
able to clone the superproject and only one or two subprojects, and
never see it as one "atomic" big repository.
I really think people should think about the *BSD kind of "world" setup.
You absolutely do _not_ want supermodules to be indivisible "everything or
nothign" kind of things. You want submodules to be very much separate
repostories, although you *can* of course share the object store if you
want to (the same way git can do it between any number of totally
unrelated repositories!)
Linus
next prev parent reply other threads:[~2007-03-27 17:06 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
2007-03-25 15:20 ` Matthieu Moy
2007-03-25 20:39 ` Shawn O. Pearce
2007-03-25 20:54 ` Johannes Schindelin
2007-03-25 21:03 ` Shawn O. Pearce
2007-03-25 20:55 ` Junio C Hamano
2007-03-25 21:05 ` Shawn O. Pearce
2007-03-27 3:40 ` Petr Baudis
2007-03-26 17:16 ` Eric Lesh
2007-03-26 17:22 ` Matthieu Moy
2007-03-26 17:38 ` Eric Lesh
2007-03-26 18:35 ` Martin Waitz
2007-03-26 19:33 ` Josef Weidendorfer
2007-03-26 19:49 ` Matthieu Moy
2007-03-26 23:14 ` Josef Weidendorfer
2007-03-27 16:59 ` Matthieu Moy
2007-03-26 22:03 ` Martin Waitz
2007-03-26 22:51 ` Junio C Hamano
2007-03-26 23:16 ` Submodule object store Martin Waitz
2007-03-26 23:28 ` Junio C Hamano
2007-03-26 23:36 ` Martin Waitz
2007-03-26 23:20 ` David Lang
2007-03-26 23:55 ` Martin Waitz
2007-03-26 23:40 ` David Lang
2007-03-27 15:25 ` Martin Waitz
2007-03-27 16:53 ` David Lang
2007-03-27 0:29 ` Junio C Hamano
2007-03-27 14:28 ` Martin Waitz
2007-03-27 11:25 ` Uwe Kleine-König
2007-03-27 11:50 ` Uwe Kleine-König
2007-03-27 15:53 ` Martin Waitz
2007-03-27 16:56 ` Josef Weidendorfer
2007-03-27 16:44 ` Martin Waitz
2007-03-27 17:22 ` Uwe Kleine-König
2007-03-27 18:41 ` Linus Torvalds
2007-03-27 19:42 ` Uwe Kleine-König
2007-03-27 19:53 ` Linus Torvalds
2007-03-27 19:59 ` Linus Torvalds
2007-03-27 15:46 ` Martin Waitz
2007-03-26 23:17 ` .gitlink for Summer of Code Josef Weidendorfer
[not found] ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
2007-03-26 23:24 ` Junio C Hamano
2007-03-27 17:04 ` Linus Torvalds [this message]
2007-03-27 17:00 ` David Lang
2007-03-27 18:15 ` Linus Torvalds
2007-03-27 17:35 ` Martin Waitz
2007-03-27 18:09 ` Daniel Barkalow
2007-03-27 18:19 ` Linus Torvalds
2007-03-27 20:54 ` Daniel Barkalow
2007-03-27 21:11 ` Linus Torvalds
2007-03-27 20:54 ` David Lang
2007-03-27 23:31 ` Jakub Narebski
2007-03-27 23:20 ` David Lang
2007-03-27 18:36 ` Steven Grimm
2007-03-27 20:02 ` Daniel Barkalow
2007-03-27 21:27 ` Linus Torvalds
2007-03-26 23:00 ` Josef Weidendorfer
2007-03-26 23:27 ` Martin Waitz
2007-03-26 17:31 ` Jakub Narebski
2007-03-26 18:21 ` Matthieu Moy
2007-03-27 0:48 ` Jakub Narebski
2007-03-25 20:46 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0703270952020.6730@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=Josef.Weidendorfer@gmx.de \
--cc=Matthieu.Moy@imag.fr \
--cc=eclesh@ucla.edu \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=tali@admingilde.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.