* restriction of pulls
@ 2007-02-09 10:49 Christoph Duelli
2007-02-09 11:19 ` Jakub Narebski
2007-02-09 14:54 ` Johannes Schindelin
0 siblings, 2 replies; 13+ messages in thread
From: Christoph Duelli @ 2007-02-09 10:49 UTC (permalink / raw)
To: git
Is it possible to restrict a chechout, clone or a later pull to some
subdirectory of a repository?
(Background: using subversion (or cvs), it is possible to do a file or
directory-restricted update.)
Say, I have a repository containing 2 (mostly) independent projects A and B
(in separate) directories:
- R
- A
- B
Is it possibly to pull all the changes made to B, but not those made to A.
(Yes, I know that this causes trouble if there are dependencies into A.)
Regards
--
Christoph Duelli
MELOS GmbH
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 10:49 restriction of pulls Christoph Duelli
@ 2007-02-09 11:19 ` Jakub Narebski
2007-02-09 14:54 ` Johannes Schindelin
1 sibling, 0 replies; 13+ messages in thread
From: Jakub Narebski @ 2007-02-09 11:19 UTC (permalink / raw)
To: git
Christoph Duelli wrote:
> Is it possible to restrict a chechout, clone or a later pull to some
> subdirectory of a repository?
> (Background: using subversion (or cvs), it is possible to do a file or
> directory-restricted update.)
>
> Say, I have a repository containing 2 (mostly) independent projects
> A and B (in separate) directories:
> - R
> - A
> - B
> Is it possibly to pull all the changes made to B, but not those made to A.
> (Yes, I know that this causes trouble if there are dependencies into A.)
No, it is not possible. Moreover, it is not sensible, as it breaks atomicity
of a commit. Well, you can hack, but...
That said, there is experimental submodule (subproject) support
http://git.or.cz/gitwiki/SubprojectSupport
http://git.admingilde.org/tali/git.git/module2
(there was also proposal of more lightweight submodule support, but I don't
have a link to it), and you should have set A and B as submodules
(subprojects).
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 10:49 restriction of pulls Christoph Duelli
2007-02-09 11:19 ` Jakub Narebski
@ 2007-02-09 14:54 ` Johannes Schindelin
2007-02-09 15:32 ` Rogan Dawes
1 sibling, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2007-02-09 14:54 UTC (permalink / raw)
To: Christoph Duelli; +Cc: git
Hi,
On Fri, 9 Feb 2007, Christoph Duelli wrote:
> Is it possible to restrict a chechout, clone or a later pull to some
> subdirectory of a repository?
No. In git, a revision really is a revision, and not a group of file
revisions.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 14:54 ` Johannes Schindelin
@ 2007-02-09 15:32 ` Rogan Dawes
2007-02-09 16:19 ` Andy Parkins
2007-02-10 14:50 ` Johannes Schindelin
0 siblings, 2 replies; 13+ messages in thread
From: Rogan Dawes @ 2007-02-09 15:32 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Christoph Duelli, git
Johannes Schindelin wrote:
> Hi,
>
> On Fri, 9 Feb 2007, Christoph Duelli wrote:
>
>> Is it possible to restrict a chechout, clone or a later pull to some
>> subdirectory of a repository?
>
> No. In git, a revision really is a revision, and not a group of file
> revisions.
>
> Ciao,
> Dscho
>
I thought about how this might be implemented, although I'm not entirely
sure how efficient this will be.
One obstacle to implementing partial checkouts is that one does not know
which objects have changed or been deleted. One way of addressing this
is to keep a record of the hashes of all the objects that were NOT
checked out. (If one does not check out part of a directory, simply
store the hash of the top level, and you do not need to store the child
hashes.) This record would be a kind of "negative index".
When deciding what to check in, or which files are modified, one would
check the "negative index" first to see if an entry exists. If not, only
then would you check the filesystem to see if modification times have
changed. With the "negative index", and the files in the file system,
one would be able to construct new commits, without any problem.
It would also require an updated transfer protocol, which would allow
the client to specify a tag/commit, then walk the tree that it points to
to find the portion that the client is looking for, then pull only those
objects (and possibly their history). This is likely to be VERY
inefficient in terms of round trips, at least initially.
This might be able to benefit from the shallow checkout support that was
recently implemented.
Comments?
Rogan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 15:32 ` Rogan Dawes
@ 2007-02-09 16:19 ` Andy Parkins
2007-02-09 16:36 ` Rogan Dawes
2007-02-10 14:50 ` Johannes Schindelin
1 sibling, 1 reply; 13+ messages in thread
From: Andy Parkins @ 2007-02-09 16:19 UTC (permalink / raw)
To: git; +Cc: Rogan Dawes, Johannes Schindelin, Christoph Duelli
On Friday 2007 February 09 15:32, Rogan Dawes wrote:
> One obstacle to implementing partial checkouts is that one does not know
> which objects have changed or been deleted. One way of addressing this
Why would you want to do a partial checkout. I used subversion for a long
time before git, which does to partial checkouts and it's a nightmare.
Things like this
cd dir1/
edit files
cd ../dir2
edit files
svn commit
* committed revision 100
KABLAM! Disaster. Revision 100 no longer compiles/runs. The changes in dir1
and dir2 were complimentary changes (say like renaming a function and then
the places that call that function).
I didn't even notice how awful it was until I started using git and had a VCS
that did the right thing.
In every way that matters you can do a partial checkout - I can pull any
version of any file out of the repository. However, it should certainly not
be the case that git records that fact.
I think what you're actually after (from your description) is a shallow clone.
I believe that went in a while ago from Dscho.
$ git clone --depth=5 <someurl>
Will fetch only the last 5 revisions from the remote. The other half to that
is a shallow-by-tree clone; that is anathema to git as there is no such thing
as a partial tree. Submodule support is what you want, but that's still in
development.
The only piece that (I think) is missing to get the functionality you want is
a kind of lazy transfer mode. For something like, say, the kde repository
you can do
svn checkout svn://svn.kde.org/kde/some/deep/path/in/the/project
And just get that directory - i.e. you don't have to pay the cost of
downloading the whole of KDE. Git can't do that; however, I think one day it
will be able to by choosing not to download every object from the remote.
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 16:19 ` Andy Parkins
@ 2007-02-09 16:36 ` Rogan Dawes
2007-02-09 16:45 ` Andy Parkins
0 siblings, 1 reply; 13+ messages in thread
From: Rogan Dawes @ 2007-02-09 16:36 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Johannes Schindelin, Christoph Duelli
Andy Parkins wrote:
> On Friday 2007 February 09 15:32, Rogan Dawes wrote:
>
>> One obstacle to implementing partial checkouts is that one does not know
>> which objects have changed or been deleted. One way of addressing this
>
> Why would you want to do a partial checkout. I used subversion for a long
> time before git, which does to partial checkouts and it's a nightmare.
>
> Things like this
>
> cd dir1/
> edit files
> cd ../dir2
> edit files
> svn commit
> * committed revision 100
>
> KABLAM! Disaster. Revision 100 no longer compiles/runs. The changes in dir1
> and dir2 were complimentary changes (say like renaming a function and then
> the places that call that function).
Please note that my suggestion does NOT imply allowing partial checkins
(or if it does, it was not my intention)
What I am trying to support is Jon Smirl's description of how some
Mozilla contributors work, specifically the documentation folks.
They do not have any need to look at the actual code, but simply limit
themselves to the files in the doc/ directory.
Supporting a partial checkout of this doc/ directory would allow them to
get a "check in"-able subdirectory, without having to download the rest
of the source.
What I intended to convey was that when determining which files have
changed, and presenting them to the user to decide whether to commit
them or not, the filesystem-walker would first check the "negative
index" to see if that directory/file had been explicitly excluded from
the checkout. This implies that they did not (and do not intend to)
modify that portion of the tree. Which implies that the committer can
then construct a complete view of the entire tree (now including the
changes that were made in the partial checkout) by resolving the
modified files with the knowledge of the hashes of the unmodified
files/trees.
>
> In every way that matters you can do a partial checkout - I can pull any
> version of any file out of the repository. However, it should certainly not
> be the case that git records that fact.
Why not? If you only want to modify that file, does it not make sense
that you can just check out that file, modify it, and check it back in?
Or at least if not check it in, construct a diff for mailing to the
maintainer?
Or even, allowing the maintainer to pull/merge the changes from the
contributor, even though the contributor doesn't necessarily have all
the blobs required to make up the tree he is committing? They should all
be available from the "alternate" if required.
Rogan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 16:36 ` Rogan Dawes
@ 2007-02-09 16:45 ` Andy Parkins
2007-02-09 17:32 ` Rogan Dawes
0 siblings, 1 reply; 13+ messages in thread
From: Andy Parkins @ 2007-02-09 16:45 UTC (permalink / raw)
To: git; +Cc: Rogan Dawes, Johannes Schindelin, Christoph Duelli
On Friday 2007 February 09 16:36, Rogan Dawes wrote:
> Please note that my suggestion does NOT imply allowing partial checkins
> (or if it does, it was not my intention)
My apologies then; I did misunderstand.
> > In every way that matters you can do a partial checkout - I can pull any
> > version of any file out of the repository. However, it should certainly
> > not be the case that git records that fact.
>
> Why not? If you only want to modify that file, does it not make sense
> that you can just check out that file, modify it, and check it back in?
Sorry - what I meant was that it shouldn't record that you checked out
revision 74 of that file and retain a link from the current version to that
old version.
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 16:45 ` Andy Parkins
@ 2007-02-09 17:32 ` Rogan Dawes
2007-02-10 9:59 ` Andy Parkins
0 siblings, 1 reply; 13+ messages in thread
From: Rogan Dawes @ 2007-02-09 17:32 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Johannes Schindelin, Christoph Duelli
Andy Parkins wrote:
> On Friday 2007 February 09 16:36, Rogan Dawes wrote:
>
>> Please note that my suggestion does NOT imply allowing partial checkins
>> (or if it does, it was not my intention)
>
> My apologies then; I did misunderstand.
>
That'll teach me to be more clear ;-)
>
>>> In every way that matters you can do a partial checkout - I can pull any
>>> version of any file out of the repository. However, it should certainly
>>> not be the case that git records that fact.
>> Why not? If you only want to modify that file, does it not make sense
>> that you can just check out that file, modify it, and check it back in?
>
> Sorry - what I meant was that it shouldn't record that you checked out
> revision 74 of that file and retain a link from the current version to that
> old version.
Well, the new commit would have the previous commit as its direct
parent, even though it may not have all the blobs to support it.
Which implies that all the git merge semantics should still work,
assuming that the person actually doing the merge has all the necessary
objects to resolve any conflicts. (Which does not necessarily imply that
he has ALL of the objects in the tree, just those that are implicated in
any conflicts).
So, for example, the doc team may have a documentation maintainer who
has the entire doc/ directory, who resolves any submissions from the doc
team, and feeds that up into the master tree. And all of this could be
done by means of pulls by the upstream maintainers.
Rogan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 17:32 ` Rogan Dawes
@ 2007-02-10 9:59 ` Andy Parkins
0 siblings, 0 replies; 13+ messages in thread
From: Andy Parkins @ 2007-02-10 9:59 UTC (permalink / raw)
To: git; +Cc: Rogan Dawes, Johannes Schindelin, Christoph Duelli
On Friday 2007, February 09, Rogan Dawes wrote:
> Well, the new commit would have the previous commit as its direct
> parent, even though it may not have all the blobs to support it.
This I agree with; this seems like the way that a partial checkout would
be supported.
As you say - there would be no need to have the blobs available for
objects you aren't altering. Unfortunately, it seems like it would be
a huge amount of work to actually do.
Andy
--
Dr Andrew Parkins, M Eng (Hons), AMIEE
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-09 15:32 ` Rogan Dawes
2007-02-09 16:19 ` Andy Parkins
@ 2007-02-10 14:50 ` Johannes Schindelin
2007-02-12 13:58 ` Rogan Dawes
1 sibling, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2007-02-10 14:50 UTC (permalink / raw)
To: Rogan Dawes; +Cc: Christoph Duelli, git
Hi,
On Fri, 9 Feb 2007, Rogan Dawes wrote:
> Johannes Schindelin wrote:
> >
> > On Fri, 9 Feb 2007, Christoph Duelli wrote:
> >
> > > Is it possible to restrict a chechout, clone or a later pull to some
> > > subdirectory of a repository?
> >
> > No. In git, a revision really is a revision, and not a group of file
> > revisions.
>
> I thought about how this might be implemented, although I'm not entirely
> sure how efficient this will be.
There are basically three ways I can think of:
- rewrite the commit objects on the fly. You might want to avoid the use
of the pack protocol here (i.e. use HTTP or FTP transport).
- try to teach git a way to ignore certain missing objects and
directories. This might be involved, but you could extend upload-pack
easily with a new extension for that.
(my favourite:)
- use git-split to create a new branch, which only contains doc/. Do work
only on that branch, and merge into mainline from time to time.
If you don't need the history, you don't need to git-split the branch.
You only need to make sure that the newly created branch is _not_ branched
off of mainline, since the next merge would _delete_ all files outside of
doc/ (merge would see that the files exist in mainline, and existed in the
common ancestor, too, so would think that the files were deleted in the
doc branch).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-10 14:50 ` Johannes Schindelin
@ 2007-02-12 13:58 ` Rogan Dawes
2007-02-12 14:13 ` Johannes Schindelin
0 siblings, 1 reply; 13+ messages in thread
From: Rogan Dawes @ 2007-02-12 13:58 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Rogan Dawes, Christoph Duelli, git
Johannes Schindelin wrote:
> Hi,
>
> On Fri, 9 Feb 2007, Rogan Dawes wrote:
>
>> Johannes Schindelin wrote:
>>> On Fri, 9 Feb 2007, Christoph Duelli wrote:
>>>
>>>> Is it possible to restrict a chechout, clone or a later pull to some
>>>> subdirectory of a repository?
>>> No. In git, a revision really is a revision, and not a group of file
>>> revisions.
>> I thought about how this might be implemented, although I'm not entirely
>> sure how efficient this will be.
>
> There are basically three ways I can think of:
>
> - rewrite the commit objects on the fly. You might want to avoid the use
> of the pack protocol here (i.e. use HTTP or FTP transport).
>
> - try to teach git a way to ignore certain missing objects and
> directories. This might be involved, but you could extend upload-pack
> easily with a new extension for that.
>
> (my favourite:)
> - use git-split to create a new branch, which only contains doc/. Do work
> only on that branch, and merge into mainline from time to time.
>
> If you don't need the history, you don't need to git-split the branch.
>
> You only need to make sure that the newly created branch is _not_ branched
> off of mainline, since the next merge would _delete_ all files outside of
> doc/ (merge would see that the files exist in mainline, and existed in the
> common ancestor, too, so would think that the files were deleted in the
> doc branch).
>
> Ciao,
> Dscho
>
Your third option sounds quite clever, apart from the problem of
attributing a commit and a commit message to someone, when the actual
commit doesn't match what they actually did :-(
As well as wondering what happens when they check out a few more files.
Do we rewrite those commits as well? What happens if the user has made
some commits already? What happens if they have already sent those
upstream? etc.
I think the best solution is ultimately to make git able to cope with
certain missing objects.
I started writing this in response to another message, but it will do
fine here, too:
The description I give here will likely horrify people in terms of
communications inefficiency, but I'm sure that can be improved.
Scenario:
A user sees a documentation bug in a git-managed project, and decides
that she wants to do something about it. Since she is not on the fastest
of connections, she'd like to reduce the checkout to a reasonable
minimum, while still working with the git tools.
Viewing the repo layout using gitweb, she sees that all the
documentation is stored in the docs/ directory from the root.
So, she creates a local repo to work in:
$ git init-db
She configures her local repo to reference the source one:
(Hypothetical syntax)
$ git clone --reference http://example.com/project.git \
http://example.com/project.git
Since the reference and repo are the same (and non-local), git doesn't
actually download anything, other than the current heads (and maybe tags).
She then does a partial checkout of the master branch, but only the
docs/ directory:
$ git checkout -p master docs/
The -p flag indicates that this is a partial checkout of master. Git
records that the current HEAD is "master", checks out the docs/
directory, and removes any other files in the working directory (that it
knew about from the existing index, if any - I'm not suggesting that it
should arbitrarily delete files!)
The checkout process goes as follows: Resolve the <treeish> that HEAD
points to, and retrieve it from the upstream repo if it does not exist
locally. Continue requesting only the necessary tree and blob objects to
satisfy the requested checkout. i.e. From the first tree, identify the
docs/ directory. Then request only that tree object. Continue to
download tree and blob objects until the entire docs/ directory can be
created in the working directory.
This will likely require a new index file format, that simply stores the
hashes of objects (blobs or trees) that have not been checked out, as
well as the current file's stat information.
Now create a "negative index" (pindex?) that has details about the other
files and directories that were not checked out. Obviously, this does
not need to recurse into directories that were not checked out. Simply
having the hash of the parent directory in the pindex is sufficient
information to reconstruct a new index. (This might require a new index
format that does not include all known files, but simply stores the hash
of the unchecked-out tree or blob.)
Then creating a new commit would require creating the necessary blobs
for changed files, new tree objects for trees that change, and a commit
object.
As far as I can tell, that could then be pushed/pulled/merged using the
existing tools, without any problems.
Rogan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-12 13:58 ` Rogan Dawes
@ 2007-02-12 14:13 ` Johannes Schindelin
2007-02-12 14:29 ` Rogan Dawes
0 siblings, 1 reply; 13+ messages in thread
From: Johannes Schindelin @ 2007-02-12 14:13 UTC (permalink / raw)
To: Rogan Dawes; +Cc: Rogan Dawes, Christoph Duelli, git
Hi,
On Mon, 12 Feb 2007, Rogan Dawes wrote:
> Johannes Schindelin wrote:
> >
> > (my favourite:)
> > - use git-split to create a new branch, which only contains doc/. Do work
> > only on that branch, and merge into mainline from time to time.
>
> Your third option sounds quite clever, apart from the problem of attributing a
> commit and a commit message to someone, when the actual commit doesn't match
> what they actually did :-(
This problem is not related to subprojects at all. If the commit message
does not match the patch, you are always fscked.
> As well as wondering what happens when they check out a few more files. Do we
> rewrite those commits as well? What happens if the user has made some commits
> already? What happens if they have already sent those upstream? etc.
I think you misunderstood. My favourite option would make docs a
_separate_ project, with its own history. It just happens to be pulled
from time to time, just like git-gui, gitk and git-fast-import in git.git.
> I think the best solution is ultimately to make git able to cope with
> certain missing objects.
Hmm. I am not convinced. On nice thing about git is its level of
integrity. Which means that no random objects are missing.
> I started writing this in response to another message, but it will do fine
> here, too:
>
> The description I give here will likely horrify people in terms of
> communications inefficiency, but I'm sure that can be improved.
>
> [goes on... and describes the lazy clone!]
AFAICT this really is the lazy clone. And it was already determined that
it is all to easy to pull in all commit objects by accident. Which boils
down to a substantial chunk of the repository.
But if you want to play with it: by all means, go ahead. It might just be
that you overcome the fundamental difficulties, and we get something nice
out of it.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: restriction of pulls
2007-02-12 14:13 ` Johannes Schindelin
@ 2007-02-12 14:29 ` Rogan Dawes
0 siblings, 0 replies; 13+ messages in thread
From: Rogan Dawes @ 2007-02-12 14:29 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Christoph Duelli, git
Johannes Schindelin wrote:
> Hi,
>
> On Mon, 12 Feb 2007, Rogan Dawes wrote:
>
>> Johannes Schindelin wrote:
>>> (my favourite:)
>>> - use git-split to create a new branch, which only contains doc/. Do work
>>> only on that branch, and merge into mainline from time to time.
>> Your third option sounds quite clever, apart from the problem of attributing a
>> commit and a commit message to someone, when the actual commit doesn't match
>> what they actually did :-(
>
> This problem is not related to subprojects at all. If the commit message
> does not match the patch, you are always fscked.
Well, I was thinking about the fact that the files originally checked in
will not match the files "checked in" in the rewritten commit.
>> As well as wondering what happens when they check out a few more files. Do we
>> rewrite those commits as well? What happens if the user has made some commits
>> already? What happens if they have already sent those upstream? etc.
>
> I think you misunderstood. My favourite option would make docs a
> _separate_ project, with its own history. It just happens to be pulled
> from time to time, just like git-gui, gitk and git-fast-import in git.git.
I see. However, that does not allow for the random single-file checkout
scenario I sketched out. Which may or may not be common/desirable, but
it is an extreme case of the partial checkout, without fixed delineation.
>> I think the best solution is ultimately to make git able to cope with
>> certain missing objects.
>
> Hmm. I am not convinced. On nice thing about git is its level of
> integrity. Which means that no random objects are missing.
Good point. :-(
>> I started writing this in response to another message, but it will do fine
>> here, too:
>>
>> The description I give here will likely horrify people in terms of
>> communications inefficiency, but I'm sure that can be improved.
>>
>> [goes on... and describes the lazy clone!]
>
> AFAICT this really is the lazy clone. And it was already determined that
> it is all to easy to pull in all commit objects by accident. Which boils
> down to a substantial chunk of the repository.
>
Not so much a lazy clone as a partial clone. It is only in the "clone",
"fetch" or "checkout" code paths that new objects will be retrieved from
the source repo. Things like "git log"/"git show" would not do so, and
would be required to handle missing objects gracefully.
> But if you want to play with it: by all means, go ahead. It might just be
> that you overcome the fundamental difficulties, and we get something nice
> out of it.
>
> Ciao,
> Dscho
>
Maybe ;-) We'll see if I get any time for it.
Rogan
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-02-12 14:31 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-09 10:49 restriction of pulls Christoph Duelli
2007-02-09 11:19 ` Jakub Narebski
2007-02-09 14:54 ` Johannes Schindelin
2007-02-09 15:32 ` Rogan Dawes
2007-02-09 16:19 ` Andy Parkins
2007-02-09 16:36 ` Rogan Dawes
2007-02-09 16:45 ` Andy Parkins
2007-02-09 17:32 ` Rogan Dawes
2007-02-10 9:59 ` Andy Parkins
2007-02-10 14:50 ` Johannes Schindelin
2007-02-12 13:58 ` Rogan Dawes
2007-02-12 14:13 ` Johannes Schindelin
2007-02-12 14:29 ` Rogan Dawes
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.