All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
Cc: Junio C Hamano <junkio@cox.net>,
	Martin Waitz <tali@admingilde.org>, Eric Lesh <eclesh@ucla.edu>,
	Matthieu Moy <Matthieu.Moy@imag.fr>,
	git@vger.kernel.org
Subject: Re: .gitlink for Summer of Code
Date: Tue, 27 Mar 2007 10:04:53 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0703270952020.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <200703270117.59205.Josef.Weidendorfer@gmx.de>



On Tue, 27 Mar 2007, Josef Weidendorfer wrote:

> On Tuesday 27 March 2007, Junio C Hamano wrote:
> > Martin Waitz <tali@admingilde.org> writes:
> > 
> > > For submodules I currently use <parent>/.git/objects/module/<submodule>/
> > > to store the objects belonging to the submodule.
> > 
> > I was not following the gitlink discussion closely, but what is
> > the motivation behind this separation of the object store?
> 
> The separation issue is about scalability of submodules, and not
> directly about gitlink.

NOTE! It's fine to share the *object*store* for a supermodule setup.

The scalability concerns are not about the number of objects, but about 
the operations that work on them, and specifically *traverse* the objects.

So while it's fine to share the same GIT_OBJECT_DIR for all the 
submodules, it's *not* ok if "git clone" on a supermodule will consider 
things to be one single repository, and clone it as one huge thing, 
generating (and having to look up!) a ten-million object pack for a 
hundred smaller projects. THAT won't scale.

Basically, a "git-rev-list --objects HEAD" in the super-module should only 
list the objects in the supermodule itself, not in all the submodules. And 
that implies that cloning a supermodule is not about cloning a single big 
repository: it would be a matter of:

 - first cloning first the supermodule itself (which is often fairly 
   small: just a top-level directory, with some top-level Makefiles and a 
   number of directories that are submodules)

 - then parsing some supermodule data structure, and cloning each 
   submodule individually.

Similarly for "fetch" (and merging too, of course - it ends up having to 
merge each sub-project separately). 

Think of it this way: if you think people find it a bit annoying that you 
currently have to get all the history when you do clone (and why people 
have worked on "shallow clones" in git), imagine just *how* frustrating it 
is if you have to get all five-hundred subprojects when you only want to 
work on one small one!

Think of something like a huge *BSD "world" tree, where the supermodule 
contains *everything*. Do you really _really_ expect that every single 
developer wants to clone it all? I have no idea how much that is, but I 
can well imagine that it's several thousand subprojects, some of which are 
quite big in their own right. 

Also, imagine the server side.. Anybody who thinks that the server wants 
to (or is even *able* to) do things like a fsck on the totality, or keep 
every single object in memory, is in for a nasty surprise..

So I think that:

 - sharing object directories should not be a requirement, but it should 
   certainly be *possible*. Quite often you might want to do it, although 
   for really big superprojects it might well make sense to have 
   individual object stores too.

 - walking the *global* object list is simply not possible. You need to 
   fsck every single subtree individually, and fsck the superproject on 
   its own, *without* recursing into the subprojects. And you need to be 
   able to clone the superproject and only one or two subprojects, and 
   never see it as one "atomic" big repository.

I really think people should think about the *BSD kind of "world" setup. 
You absolutely do _not_ want supermodules to be indivisible "everything or 
nothign" kind of things. You want submodules to be very much separate 
repostories, although you *can* of course share the object store if you 
want to (the same way git can do it between any number of totally 
unrelated repositories!)

		Linus

  parent reply	other threads:[~2007-03-27 17:06 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
2007-03-25 15:20 ` Matthieu Moy
2007-03-25 20:39   ` Shawn O. Pearce
2007-03-25 20:54     ` Johannes Schindelin
2007-03-25 21:03       ` Shawn O. Pearce
2007-03-25 20:55     ` Junio C Hamano
2007-03-25 21:05       ` Shawn O. Pearce
2007-03-27  3:40       ` Petr Baudis
2007-03-26 17:16   ` Eric Lesh
2007-03-26 17:22     ` Matthieu Moy
2007-03-26 17:38       ` Eric Lesh
2007-03-26 18:35         ` Martin Waitz
2007-03-26 19:33           ` Josef Weidendorfer
2007-03-26 19:49             ` Matthieu Moy
2007-03-26 23:14               ` Josef Weidendorfer
2007-03-27 16:59                 ` Matthieu Moy
2007-03-26 22:03             ` Martin Waitz
2007-03-26 22:51               ` Junio C Hamano
2007-03-26 23:16                 ` Submodule object store Martin Waitz
2007-03-26 23:28                   ` Junio C Hamano
2007-03-26 23:36                     ` Martin Waitz
2007-03-26 23:20                       ` David Lang
2007-03-26 23:55                         ` Martin Waitz
2007-03-26 23:40                           ` David Lang
2007-03-27 15:25                             ` Martin Waitz
2007-03-27 16:53                               ` David Lang
2007-03-27  0:29                           ` Junio C Hamano
2007-03-27 14:28                             ` Martin Waitz
2007-03-27 11:25                       ` Uwe Kleine-König
2007-03-27 11:50                         ` Uwe Kleine-König
2007-03-27 15:53                           ` Martin Waitz
2007-03-27 16:56                             ` Josef Weidendorfer
2007-03-27 16:44                               ` Martin Waitz
2007-03-27 17:22                             ` Uwe Kleine-König
2007-03-27 18:41                               ` Linus Torvalds
2007-03-27 19:42                                 ` Uwe Kleine-König
2007-03-27 19:53                                   ` Linus Torvalds
2007-03-27 19:59                                     ` Linus Torvalds
2007-03-27 15:46                         ` Martin Waitz
2007-03-26 23:17                 ` .gitlink for Summer of Code Josef Weidendorfer
     [not found]                   ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
2007-03-26 23:24                   ` Junio C Hamano
2007-03-27 17:04                   ` Linus Torvalds [this message]
2007-03-27 17:00                     ` David Lang
2007-03-27 18:15                       ` Linus Torvalds
2007-03-27 17:35                     ` Martin Waitz
2007-03-27 18:09                     ` Daniel Barkalow
2007-03-27 18:19                       ` Linus Torvalds
2007-03-27 20:54                         ` Daniel Barkalow
2007-03-27 21:11                           ` Linus Torvalds
2007-03-27 20:54                             ` David Lang
2007-03-27 23:31                               ` Jakub Narebski
2007-03-27 23:20                                 ` David Lang
2007-03-27 18:36                       ` Steven Grimm
2007-03-27 20:02                         ` Daniel Barkalow
2007-03-27 21:27                           ` Linus Torvalds
2007-03-26 23:00               ` Josef Weidendorfer
2007-03-26 23:27                 ` Martin Waitz
2007-03-26 17:31   ` Jakub Narebski
2007-03-26 18:21     ` Matthieu Moy
2007-03-27  0:48       ` Jakub Narebski
2007-03-25 20:46 ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0703270952020.6730@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=Josef.Weidendorfer@gmx.de \
    --cc=Matthieu.Moy@imag.fr \
    --cc=eclesh@ucla.edu \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=tali@admingilde.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.