All of lore.kernel.org
 help / color / mirror / Atom feed
* question (possibly) on git subtree/submodules
@ 2010-07-23 14:00 Maurizio Vitale
  2010-07-23 16:56 ` Chris Packham
  0 siblings, 1 reply; 9+ messages in thread
From: Maurizio Vitale @ 2010-07-23 14:00 UTC (permalink / raw)
  To: git


I'm new to git and have read the recent thread on subtree support.
I'm not sure they (or git submodules) offer what I'm looking for.
Here's the scenario:
       - I have a large monolithic code base, all in my repository (e.g.
         I don't need to link in external repositories, which is what I
         understand submodules offer
       - I'd like to be able to clone only a small fraction of the
         repository (say an arbitrary directory or even a single file)
         in order to make small changes
       - these directories are not known when the full repository is set
         up.
       - commits to the part I've checked out should show in the history
         of any clone that includes the part, up to the full repository
       - ideally, I should be able to incrementally clone portions (e.g.
         I've checked out path/dir_A and realize I need to modify
         path/dir_B as well).
         these additional clones should be in whatever branch I switched
         to after the initial checkouts.

Assuming the above makes any sense (in general or in git), is there
anything in git that would help me doing what I'm looking for?
Thanks,

        Maurizio 

       

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question (possibly) on git subtree/submodules
  2010-07-23 14:00 question (possibly) on git subtree/submodules Maurizio Vitale
@ 2010-07-23 16:56 ` Chris Packham
  2010-07-23 17:18   ` Jonathan Nieder
  2010-07-27 10:56   ` Alex
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Packham @ 2010-07-23 16:56 UTC (permalink / raw)
  To: maurizio.vitale; +Cc: Maurizio Vitale, git

Hi,

On 23/07/10 07:00, Maurizio Vitale wrote:
> 
> I'm new to git and have read the recent thread on subtree support.
> I'm not sure they (or git submodules) offer what I'm looking for.
> Here's the scenario:
>        - I have a large monolithic code base, all in my repository (e.g.
>          I don't need to link in external repositories, which is what I
>          understand submodules offer
>        - I'd like to be able to clone only a small fraction of the
>          repository (say an arbitrary directory or even a single file)
>          in order to make small changes
>        - these directories are not known when the full repository is set
>          up.
>        - commits to the part I've checked out should show in the history
>          of any clone that includes the part, up to the full repository
>        - ideally, I should be able to incrementally clone portions (e.g.
>          I've checked out path/dir_A and realize I need to modify
>          path/dir_B as well).
>          these additional clones should be in whatever branch I switched
>          to after the initial checkouts.
> 
> Assuming the above makes any sense (in general or in git), is there
> anything in git that would help me doing what I'm looking for?
> Thanks,
> 
>         Maurizio 

The short answer is no. Nothing git has currently will let you clone a
subset of files. Shallow clones exist if you want all the code and the
last X changes. The reason for this is git, like other DVCSes, tracks
_changes_ rather than _files_ this is something that took me a while to
get my head around when I was learning git.

The best advice I've seen is to actually take your repository and use
git filter-branch to create several smaller repositories (or depending
on your desire for retention of history just create new repos). You can
then use submodules or subtrees to stitch these back together into a
super project to which you can add the smaller repositories as needed
(note: I have never used subtrees so I'm not 100% sure if what I'm
saying applies to them) .

We use this model with submodules at $dayjob and it works quite well for us.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question (possibly) on git subtree/submodules
  2010-07-23 16:56 ` Chris Packham
@ 2010-07-23 17:18   ` Jonathan Nieder
  2010-07-23 18:35     ` Chris Packham
  2010-07-27 10:56   ` Alex
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2010-07-23 17:18 UTC (permalink / raw)
  To: Chris Packham; +Cc: maurizio.vitale, Maurizio Vitale, git

Chris Packham wrote:

> The short answer is no. Nothing git has currently will let you clone a
> subset of files. Shallow clones exist if you want all the code and the
> last X changes. The reason for this is git, like other DVCSes, tracks
> _changes_ rather than _files_ this is something that took me a while to
> get my head around when I was learning git.

Not quite as cut-and-dried as it may sound, I think.  Internally git
compresses blobs (and other objects) by comparing them to other ones,
but I do not think that is what you are talking about, and I do not
see what that has to do with partial clones.  In fact, the main reason
I can see that partial clones (in the sense of getting all metadata
but not all blobs) are not implemented is that no one has written code
for it yet.

Here is a thread on related work[1].  Maybe someone else can find a
more pertinent link.

> The best advice I've seen is to actually take your repository and use
> git filter-branch to create several smaller repositories

Right, and this is what “git subtree” excels at.  It provides an
alternative interface and implementation for “git filter-branch
--subdirectory-filter”.

Hope that helps,
Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/73117/focus=73935

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question (possibly) on git subtree/submodules
  2010-07-23 17:18   ` Jonathan Nieder
@ 2010-07-23 18:35     ` Chris Packham
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Packham @ 2010-07-23 18:35 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On 23/07/10 10:18, Jonathan Nieder wrote:
> Chris Packham wrote:
> 
>> The short answer is no. Nothing git has currently will let you clone a
>> subset of files. Shallow clones exist if you want all the code and the
>> last X changes. The reason for this is git, like other DVCSes, tracks
>> _changes_ rather than _files_ this is something that took me a while to
>> get my head around when I was learning git.
> 
> Not quite as cut-and-dried as it may sound, I think.  Internally git
> compresses blobs (and other objects) by comparing them to other ones,
> but I do not think that is what you are talking about, and I do not
> see what that has to do with partial clones.  In fact, the main reason
> I can see that partial clones (in the sense of getting all metadata
> but not all blobs) are not implemented is that no one has written code
> for it yet.
> 
> Here is a thread on related work[1].  Maybe someone else can find a
> more pertinent link.
> 

OK I think I must have read to much into the "tracks changes" part,
thanks for pointing it out.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question (possibly) on git subtree/submodules
  2010-07-23 16:56 ` Chris Packham
  2010-07-23 17:18   ` Jonathan Nieder
@ 2010-07-27 10:56   ` Alex
  2010-07-27 12:48     ` Jakub Narebski
  1 sibling, 1 reply; 9+ messages in thread
From: Alex @ 2010-07-27 10:56 UTC (permalink / raw)
  To: git

Chris Packham <judge.packham <at> gmail.com> writes:

> The short answer is no. Nothing git has currently will let you clone a
> subset of files. 

Isn't that what 'sparse checkout' does?
(http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_checkout)

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question (possibly) on git subtree/submodules
  2010-07-27 10:56   ` Alex
@ 2010-07-27 12:48     ` Jakub Narebski
  2010-07-27 14:24       ` RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) Marc Branchaud
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Narebski @ 2010-07-27 12:48 UTC (permalink / raw)
  To: Alex; +Cc: git

Alex <ajb44.geo@yahoo.com> writes:

> Chris Packham <judge.packham <at> gmail.com> writes:
> 
> > The short answer is no. Nothing git has currently will let you clone a
> > subset of files. 
> 
> Isn't that what 'sparse checkout' does?
> (http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_checkout)

No, 'sparse checkout' is only about checkout, i.e. the working area.
You still have all objects in repository, only part of tree (part of
project / repository) is not checked out, not present on disk as
files.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules)
  2010-07-27 12:48     ` Jakub Narebski
@ 2010-07-27 14:24       ` Marc Branchaud
  2010-07-27 16:55         ` skillzero
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Branchaud @ 2010-07-27 14:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Alex, git

On 10-07-27 08:48 AM, Jakub Narebski wrote:
> Alex <ajb44.geo@yahoo.com> writes:
> 
>> Chris Packham <judge.packham <at> gmail.com> writes:
>>
>>> The short answer is no. Nothing git has currently will let you clone a
>>> subset of files. 
>>
>> Isn't that what 'sparse checkout' does?
>> (http://www.kernel.org/pub/software/scm/git/docs/git-read-tree.html#_sparse_checkout)
> 
> No, 'sparse checkout' is only about checkout, i.e. the working area.
> You still have all objects in repository, only part of tree (part of
> project / repository) is not checked out, not present on disk as
> files.

There's no such thing as a "sparse fetch" but you can do something like

	git clone -n git://there/foo.git
	cd foo

then

	git checkout origin/<branch> -- <paths...>

or
	git config core.sparseCheckout true
	[ Add paths to .git/info/sparse-checkout ]
	git checkout <branch>


but it's fairly inconvenient for day-to-day work.  Also, putting a
.git/info/sparse-checkout file in a public repo seems of limited use.

So IMHO the current sparse-checkout feature is pretty bare-bones and could
use some meat.  Here's some thoughts:

* What's missing is a way to define named collections of paths
("sparse-sets?") in .git/info/sparse-checkout, so that you can conveniently
checkout a particular subset of the working directory.  It would also be nice
to switch between different sparse-sets.

* It would also be good to have a way for a repo to define a default
sparse-set, so that a clone would only checkout that default.

* I also think that core.sparseCheckout should be true by default, and git
should impose no sparseness if .git/info/sparse-checkout is missing or empty.

Comments?

		M.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Sparse checkout improvements (was: Re: question (possibly)  on git subtree/submodules)
  2010-07-27 14:24       ` RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) Marc Branchaud
@ 2010-07-27 16:55         ` skillzero
  2010-07-28 13:42           ` RFC: Sparse checkout improvements Marc Branchaud
  0 siblings, 1 reply; 9+ messages in thread
From: skillzero @ 2010-07-27 16:55 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: Jakub Narebski, Alex, git

On Tue, Jul 27, 2010 at 7:24 AM, Marc Branchaud <marcnarc@xiplink.com> wrote:

> * What's missing is a way to define named collections of paths
> ("sparse-sets?") in .git/info/sparse-checkout, so that you can conveniently
> checkout a particular subset of the working directory.  It would also be nice
> to switch between different sparse-sets.

I pasted in a script I wrote to work with the sparse checkout feature.
I'm not a scripting expert so it probably doesn't things incorrectly.
It lets you create "modules" by adding sections to .gitmodules file at
the root of the repository (or a file you specify). You can list them,
switch/checkout between them, or reset back to normal:

[module "MyApp1"]
	<path1>
	<path2>

$ git module list
MyApp1

$ git module checkout MyApp1

$ git module reset

> * It would also be good to have a way for a repo to define a default
> sparse-set, so that a clone would only checkout that default.

Yes, this would be nice. Ideally what I would like is for there to be
a clone option to specify a "module" (what I've been calling sparse
sets). A first step could just clone the full repository with -n then
do 'git module checkout <module>' (what my other scripts do to prepare
archives). Ideally, it would do some work on the server side to only
send the objects needed for paths specified by the sparse set (but
still allow me to commit and later push changes back).

-- 

git-module script (email may mess up the spacing, causing things to
not line up, but you get the idea)

use Getopt::Long qw(:config gnu_getopt);
use File::Path;

my $gBranch			= "";
my $gHelp			= 0;
my $gModulesFile	= "";
my $gModules		= {};
my $gRecursive		= 0;

# Parse the command line.
if( @ARGV < 1 ) { Usage(); }
GetOptions(
	"b|branch=s"		=> \$gBranch,
	"h|help"			=> \$gHelp,
	"f|modules-file=s"	=> \$gModulesFile,
	"r|recursive!"		=> \$gRecursive,
) or die( "\n" );
if( $gHelp ) { Usage(); }

if( @ARGV < 1 ) { die( "error: no command specified. See 'git module
help'.\n" ); }
my $cmd = shift;
if(    $cmd eq "checkout" )	{ cmd_checkout(); }
elsif( $cmd eq "help" )		{ Usage(); }
elsif( $cmd eq "list" )		{ cmd_list(); }
elsif( $cmd eq "reset" )	{ cmd_reset(); }
else { die( "error: unknown command '$cmd'. See 'git module help'.\n" ); }

#
# cmd_checkout
#
sub	cmd_checkout
{
	ReadModulesFile();
	my $moduleName = shift @ARGV;
	if( !$moduleName )				{ die( "error: no module name specified.\n" ); }
	if( !$gModules->{$moduleName} )	{ die( "error: module '$moduleName'
not found.\n" ); }
	
	# Enable sparse.
	my $currentCmd = "git config core.sparseCheckout true";
	system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" );
	
	# Write the sparse patterns.
	my $gitDir = `git rev-parse --git-dir`;
	if( $? != 0 ) { die( "error: can't find git repository $?.\n" ); }
	chop( $gitDir );
	my $sparsePath = $gitDir . "/info/sparse-checkout";
	if( $? != 0 ) { die( "error: read git directory failed $?.\n" ); }
	open( FILE, ">", $sparsePath ) or die( "error: can't open '$sparsePath'.\n" );
	foreach( @{$gModules->{$moduleName}} )
	{
		print( FILE "$_\n" );
	}
	close( FILE );
	
	# Checkout using the new sparse patterns.
	my $currentCmd = "git checkout $gBranch --";
	system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" );
}

#
# cmd_list
#
sub	cmd_list
{
	ReadModulesFile();
	my $moduleName = shift @ARGV;
	if( $moduleName eq "" )
	{
		if( $gRecursive )
		{
			foreach my $moduleName ( sort( keys %{$gModules} ) )
			{
				print( "$moduleName\n" );
				foreach( @{$gModules->{$moduleName}} )
				{
					print( "\t$_\n" );
				}
			}
		}
		else
		{
			foreach my $moduleName ( sort( keys %{$gModules} ) )
			{
				print( "$moduleName\n" );
			}
		}
	}
	else
	{
		if( !$gModules->{$moduleName} ) { die( "module '$moduleName' not
found.\n" ); }
		foreach( @{$gModules->{$moduleName}} )
		{
			print( "$_\n" );
		}
	}
}

#
# cmd_reset
#
sub	cmd_reset
{
	# Enable sparse.
	my $currentCmd = "git config core.sparseCheckout true";
	system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" );
	
	# Write a special sparse pattern of "*" to mean everything.
	my $gitDir = `git rev-parse --git-dir`;
	if( $? != 0 ) { die( "error: can't find git repository $?.\n" ); }
	chop( $gitDir );
	my $sparsePath = $gitDir . "/info/sparse-checkout";
	if( $? != 0 ) { die( "error: read git directory failed $?.\n" ); }
	open( FILE, ">", $sparsePath ) or die( "error: can't open '$sparsePath'.\n" );
	print( FILE "*\n" );
	close( FILE );
	
	# Checkout to clear the skip-worktree bits and checkout all entries.
	my $currentCmd = "git checkout $gBranch --";
	system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" );
	
	# Disable sparse.
	my $currentCmd = "git config core.sparseCheckout false";
	system( $currentCmd ) == 0 or die( "error: '$currentCmd' $?.\n" );
}

#
# ReadModulesFile
#
sub	ReadModulesFile
{
	my @lines = ();
	if( $gModulesFile eq "" )		# No file means read from the .gitmodules
file in the repo.
	{
		if( $gBranch ne "" )	{ @lines = `git show $gBranch:.gitmodules`;}
		else					{ @lines = `git show HEAD:.gitmodules`; }
		if( $? != 0 )			{ die( "error: read .gitmodules file failed: $?.\n" ); }
	}
	elsif( $gModulesFile eq "-" )	# - means read from stdin.
	{
		@lines = <STDIN>;
	}
	else
	{
		open( FILE, $gModulesFile ) or die( "error: can't open '$gModulesFile'.\n" );
		@lines = <FILE>;
		close( FILE );
	}
	chomp( @lines );
	
	my $isModule   = 0;
	my $moduleName = "";
	foreach my $line ( @lines )
	{
		$line =~ s/^\s+//; # Strip leading whitespace.
		$line =~ s/\s+$//; # Strip trailing whitespace.
		$line =~ s/\r//g;  # Strip CR's.
		$line =~ s/\n//g;  # Strip LF's.
		if( $line =~ /\s*\[(.*?)\]\s*/ ) # Check for section header.
		{
			$moduleName = $1;
			if( $moduleName =~ /\s*module\s*\"(.*)\"/ )
			{
				$moduleName = $1;
				$isModule = 1;
			}
			else
			{
				$isModule = 0;
			}
			next;
		}
		next if !$isModule;			# Skip entries that aren't in module sections.
		next if $line =~ /^\s*\;/;	# Skip lines beginning with ';'.
		next if $line =~ /^\s*\#/;	# Skip lines beginning with '#'.
		next if length $line == 0;	# Skip empty lines.
		push( @{$gModules->{$moduleName}}, $line );
	}
}

#
# Usage
#
sub	Usage
{
	print( STDERR "Usage: git-module [options] command [command options]\n" );
	print( STDERR "\n" );
	print( STDERR "Options:\n" );
	print( STDERR "    -b/--branch <name>          Branch to use.\n" );
	print( STDERR "    -f/--modules-file <file>    Custom modules file to
use.\n" );
	print( STDERR "\n" );
	print( STDERR "Commands:\n" );
	print( STDERR "    checkout <name>     Check out a module.\n" );
	print( STDERR "    list [-r] [name]    List module(s). -r lists
modules and patterns.\n" );
	print( STDERR "    reset               Reset to a non-sparse checkout.\n" );
	print( STDERR "\n" );
	exit( 1 );
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Sparse checkout improvements
  2010-07-27 16:55         ` skillzero
@ 2010-07-28 13:42           ` Marc Branchaud
  0 siblings, 0 replies; 9+ messages in thread
From: Marc Branchaud @ 2010-07-28 13:42 UTC (permalink / raw)
  To: skillzero; +Cc: Jakub Narebski, Alex, git

On 10-07-27 12:55 PM, skillzero@gmail.com wrote:
> On Tue, Jul 27, 2010 at 7:24 AM, Marc Branchaud <marcnarc@xiplink.com> wrote:
> 
>> * What's missing is a way to define named collections of paths
>> ("sparse-sets?") in .git/info/sparse-checkout, so that you can conveniently
>> checkout a particular subset of the working directory.  It would also be nice
>> to switch between different sparse-sets.
> 
> I pasted in a script I wrote to work with the sparse checkout feature.
> I'm not a scripting expert so it probably doesn't things incorrectly.
> It lets you create "modules" by adding sections to .gitmodules file at
> the root of the repository (or a file you specify). You can list them,
> switch/checkout between them, or reset back to normal:

That script looks like a great proof-of-concept.  I haven't tried it out yet,
but it seems to work along the lines of what I was thinking about.

I'd like to see most of this functionality folded into the standard git
commands, and maybe a new git-sparse command for managing sparse sets.

> [module "MyApp1"]
> 	<path1>
> 	<path2>
> 
> $ git module list
> MyApp1
> 
> $ git module checkout MyApp1
> 
> $ git module reset
> 
>> * It would also be good to have a way for a repo to define a default
>> sparse-set, so that a clone would only checkout that default.
> 
> Yes, this would be nice. Ideally what I would like is for there to be
> a clone option to specify a "module" (what I've been calling sparse
> sets). A first step could just clone the full repository with -n then
> do 'git module checkout <module>' (what my other scripts do to prepare
> archives).

I'd really prefer to see it as a configuration option for the remote
repository.  Let the remote tell me what the initial sparse set should be.

> Ideally, it would do some work on the server side to only
> send the objects needed for paths specified by the sparse set (but
> still allow me to commit and later push changes back).

I'm less interested in sparse fetching, so I'll stay out of that side of the
conversation.

		M.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-07-28 13:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-23 14:00 question (possibly) on git subtree/submodules Maurizio Vitale
2010-07-23 16:56 ` Chris Packham
2010-07-23 17:18   ` Jonathan Nieder
2010-07-23 18:35     ` Chris Packham
2010-07-27 10:56   ` Alex
2010-07-27 12:48     ` Jakub Narebski
2010-07-27 14:24       ` RFC: Sparse checkout improvements (was: Re: question (possibly) on git subtree/submodules) Marc Branchaud
2010-07-27 16:55         ` skillzero
2010-07-28 13:42           ` RFC: Sparse checkout improvements Marc Branchaud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.