git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [QUESTION] Access to a huge GIT repository.
@ 2005-11-16 12:24 Franck
  2005-11-16 16:46 ` Linus Torvalds
  2005-11-16 18:24 ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Franck @ 2005-11-16 12:24 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I'm trying to clone a small part of a big repository. This repository
contains several branchs that are useless for me. Actually this
repository is the linux-mips one, and branchs are used to track each
kernel minor version. That is to say it contains 4 branchs which are
linux-2.0, linux-2.2, linux-2.4 and master (linux-2.6).

I'd like to clone this repository without grabbing linux-2.0,
linux-2.2, linux-2.4 branchs. I tried several things like:

        $ git init-db
        $ git fetch rsync://ftp.linux-mips.org/git/linux.git master

But all tries download every objects of each branchs. I believe that's
because of (a) the protocol used to access of the remote repo (b) the
master branch has been created from linux-2.4 branch so its first
commit object contains a branch 2.4 commit obj as parent object (let's
call it the "father" object). Is that correct ?

So I downloaded the whole thing, and try to remove "father" object and
right after run a 'git prune'. But unfortunately I can't find it
anywhere in .git directory. I did:

        $ git-verify-pack < .git/objects/pack/*.idx
        $ git-unpack-objects < .git/objects/pack/*.pack

But I can't find "father object" anywhere in .git/objects directory.
Still it's referenced by .git/objects/pack/pack-....idx file.

Can anybody give me some advices ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-16 12:24 [QUESTION] Access to a huge GIT repository Franck
@ 2005-11-16 16:46 ` Linus Torvalds
  2005-11-17 10:36   ` Franck
  2005-11-16 18:24 ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2005-11-16 16:46 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List



On Wed, 16 Nov 2005, Franck wrote:
> 
> I'm trying to clone a small part of a big repository. This repository
> contains several branchs that are useless for me. Actually this
> repository is the linux-mips one, and branchs are used to track each
> kernel minor version. That is to say it contains 4 branchs which are
> linux-2.0, linux-2.2, linux-2.4 and master (linux-2.6).
> 
> I'd like to clone this repository without grabbing linux-2.0,
> linux-2.2, linux-2.4 branchs. I tried several things like:
> 
>         $ git init-db
>         $ git fetch rsync://ftp.linux-mips.org/git/linux.git master

First off, "rsync://" will never do what you want. It uses rsync 
(surprised surprise) to grab the objects, so since it has no clue what the 
objects are, it has no choice but to just grab them all.

> But all tries download every objects of each branchs. I believe that's
> because of (a) the protocol used to access of the remote repo

Yes. Note that ftp.linux-mips.org does run the git daemon, so you can use 
"git://" instead of "rsync://" (that's how I pull from them when I sync).

>								 (b) the
> master branch has been created from linux-2.4 branch so its first
> commit object contains a branch 2.4 commit obj as parent object (let's
> call it the "father" object). Is that correct ?

Not having looked at that particular repo, I don't know. git can do it 
either way - either one long common history, or branches with totally 
unrelated histories. 

The fact that _some_ of the linux-mips repositories are based on mine 
makes me suspect that all their 2.6-based ones are rooted like mine is, 
but that may or may not be true.

Just try your above command line with "git://" instead.

(NOTE! Doing a full clone like the above is pretty expensive with git for 
the server side, so it might take a while before it starts feeding you 
stuff if it is under heavy load)

> So I downloaded the whole thing, and try to remove "father" object and
> right after run a 'git prune'. But unfortunately I can't find it
> anywhere in .git directory. I did:
> 
>         $ git-verify-pack < .git/objects/pack/*.idx
>         $ git-unpack-objects < .git/objects/pack/*.pack
> 
> But I can't find "father object" anywhere in .git/objects directory.
> Still it's referenced by .git/objects/pack/pack-....idx file.
> 
> Can anybody give me some advices ?

If you want to get rid of other branches, do:

	# Remove the branches
	git branch -D linux-2.0
	git branch -D linux-2.2
	git branch -D linux-2.4
	.. whatever other branches you don't want ..

	# Repack the repo
	git repack -a -d

	# Prune it all down
	git prune-packed
	git prune

and you should have a nice single pack that only contains the branch(es) 
that you're interested in.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-16 12:24 [QUESTION] Access to a huge GIT repository Franck
  2005-11-16 16:46 ` Linus Torvalds
@ 2005-11-16 18:24 ` Junio C Hamano
  2005-11-16 20:01   ` Martin Langhoff
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2005-11-16 18:24 UTC (permalink / raw)
  To: Franck; +Cc: git

Franck <vagabon.xyz@gmail.com> writes:

> I'm trying to clone a small part of a big repository.

What Linus has already said...

This is the second time this week the issue came up, so maybe
"large bundled repository, whose users typically are interested
in only one branch" may not be so uncommon as I first expected.
An optional form of 'git-clone' that clones only from a limited
subset of branches might be useful.

The underlying network transfer program, 'git-clone-pack', can
be told to clone only specified branches.  If somebody is
interested, updating the 'git-clone' wrapper to use it should
not be too hard -- obviously this needs to be done for other
transports as well, though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-16 18:24 ` Junio C Hamano
@ 2005-11-16 20:01   ` Martin Langhoff
  2005-11-16 20:10     ` Linus Torvalds
  2005-11-16 20:35     ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Martin Langhoff @ 2005-11-16 20:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Franck, git

On 11/17/05, Junio C Hamano <junkio@cox.net> wrote:
> The underlying network transfer program, 'git-clone-pack', can
> be told to clone only specified branches.  If somebody is
> interested, updating the 'git-clone' wrapper to use it should
> not be too hard -- obviously this needs to be done for other
> transports as well, though.

cg-clone already does this. One tricky thing with the selective
cloning is that you want to pull the named head plus all its related
objects, and then pull only the _relevant_ tags. There's been
discussion of pulling tagsrefs+related tag objects and then pruning
any tagrefs+tagobjects where the commit is unreachable with the
objects you already have. You surely remember my 'git-rev-parse  is
crashing' thread.

If you just pull tagrefs and all the objects needed for them, chances
are you'll get the whole repo anyway ;-)

cheers,


martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-16 20:01   ` Martin Langhoff
@ 2005-11-16 20:10     ` Linus Torvalds
  2005-11-16 20:35     ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2005-11-16 20:10 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Junio C Hamano, Franck, git



On Thu, 17 Nov 2005, Martin Langhoff wrote:
>
> cg-clone already does this. One tricky thing with the selective
> cloning is that you want to pull the named head plus all its related
> objects, and then pull only the _relevant_ tags.

Well, if you keep to native git protocols, you can trivially do that by 
just fetching the required heads, and then fetching only the tags for 
which you have the pointed-to object (ie look for the ^{} thing in 
git-ls-remote, and check if you have that object, then get those tags).

For rsync, since you get all objects anyway, there's no point to limiting 
the branches. Might as well just delete them and prune them.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-16 20:01   ` Martin Langhoff
  2005-11-16 20:10     ` Linus Torvalds
@ 2005-11-16 20:35     ` Junio C Hamano
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2005-11-16 20:35 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git

Martin Langhoff <martin.langhoff@gmail.com> writes:

> On 11/17/05, Junio C Hamano <junkio@cox.net> wrote:
>> The underlying network transfer program, 'git-clone-pack', can
>> be told to clone only specified branches.  If somebody is
>> interested, updating the 'git-clone' wrapper to use it should
>> not be too hard -- obviously this needs to be done for other
>> transports as well, though.
>
> cg-clone already does this.

Yes, but it can be improved.  It does fetch-pack for git native
transports, which means the receiving end expands the pack.  I
think we would either need to teach cg-clone to tell cg-fetch
that it is cloning and invoke git-clone-pack, or add an optional
"do not expand" option to git-fetch-pack (git-clone-pack always
keeps the downloaded pack unexpanded).  Huge pack transfer
spends long time on the downloader side unpacking the pack.

> If you just pull tagrefs and all the objects needed for them, chances
> are you'll get the whole repo anyway ;-)

Yes, that's why we changed ls-remote to report the magic ^{}
entries and it is used in cg-fetch::fetch_tags.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-16 16:46 ` Linus Torvalds
@ 2005-11-17 10:36   ` Franck
  2005-11-17 16:23     ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Franck @ 2005-11-17 10:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Hi Linus

2005/11/16, Linus Torvalds <torvalds@osdl.org>:
>
>
> First off, "rsync://" will never do what you want. It uses rsync
> (surprised surprise) to grab the objects, so since it has no clue what the
> objects are, it has no choice but to just grab them all.
>

yes that was a bad example, actually I've tried several protocol
including git protocol.

> The fact that _some_ of the linux-mips repositories are based on mine
> makes me suspect that all their 2.6-based ones are rooted like mine is,
> but that may or may not be true.
>

hmm, don't think "linux mips" repository is rooted like yours. It
seems that it has been imported from mips cvs repository which
contains linux-mips history since 1995.

> Just try your above command line with "git://" instead.
>
> (NOTE! Doing a full clone like the above is pretty expensive with git for
> the server side, so it might take a while before it starts feeding you
> stuff if it is under heavy load)
>

well I've already tried that but have renounced after waiting more
than 4 hours ! I don't know if the server was under heavy load or if
git protocol needs a lots of resources but it seems useless to setup a
git dameon on it...

> If you want to get rid of other branches, do:

Sorry, I forget to tell that I have already tried what you suggested
at first (except that I did not do a 'git repack -a -d') but it didn't
work out (and that the reason why I tried the "kill father object"
thing). Since I forgot the repack thing, I retried again, and it did
last more than 4 hours to cpmplete all git commands. After that I run
gitk --all to check that all old branch's objects have been removed
but I can see all of them.

I plan to take look at git code to understand how object removal is
done, but until this do you have any ideas why this fails ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-17 10:36   ` Franck
@ 2005-11-17 16:23     ` Linus Torvalds
  2005-11-17 21:47       ` Franck
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2005-11-17 16:23 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List



On Thu, 17 Nov 2005, Franck wrote:
> 
> Sorry, I forget to tell that I have already tried what you suggested
> at first (except that I did not do a 'git repack -a -d') but it didn't
> work out (and that the reason why I tried the "kill father object"
> thing).

If that didn't work, then you can't kill the father object: it will not 
only just get rid of one single object, but it will make fsck unhappy that 
your tree is incomplete.

What you probably _can_ do is to find whatever top-most commit you want 
(say, the v2.6.0 commit), and use grafting to make that have no parents. 
Then you can do git-prune to get rid of everything under it.

> Since I forgot the repack thing, I retried again, and it did
> last more than 4 hours to cpmplete all git commands.

Sounds like the original archive was fully unpacked. That really does get 
very very slow after a while.

>		 After that I run
> gitk --all to check that all old branch's objects have been removed
> but I can see all of them.

Note that if the branches are still there, git prune won't do anything to 
them (and git repack will pack them all). You need to remove the branches 
_first_, so that git repack and prune sees that their objects aren't 
needed.

But yes, if the tree is actually based on some full CVS history and really 
goes back all the way to 1995, then you can't prune it (since all branches 
are actually part of the history from the very top), and you'd need to 
graft away part of it like above.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-17 16:23     ` Linus Torvalds
@ 2005-11-17 21:47       ` Franck
  2005-11-17 22:44         ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Franck @ 2005-11-17 21:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

2005/11/17, Linus Torvalds <torvalds@osdl.org>:
>
> What you probably _can_ do is to find whatever top-most commit you want
> (say, the v2.6.0 commit), and use grafting to make that have no parents.
> Then you can do git-prune to get rid of everything under it.
>

ok that's what I was trying to do by killing the parent object.  Now
when looking a the graph using gitk all old objects have been removed.
But I'm suprised because the git repository is the same size as it was
before pruning all old objects. Can you explain why ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-17 21:47       ` Franck
@ 2005-11-17 22:44         ` Linus Torvalds
  2005-11-19 12:23           ` Franck
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2005-11-17 22:44 UTC (permalink / raw)
  To: Franck; +Cc: Git Mailing List



On Thu, 17 Nov 2005, Franck wrote:

> 2005/11/17, Linus Torvalds <torvalds@osdl.org>:
> >
> > What you probably _can_ do is to find whatever top-most commit you want
> > (say, the v2.6.0 commit), and use grafting to make that have no parents.
> > Then you can do git-prune to get rid of everything under it.
> >
> 
> ok that's what I was trying to do by killing the parent object.  Now
> when looking a the graph using gitk all old objects have been removed.
> But I'm suprised because the git repository is the same size as it was
> before pruning all old objects. Can you explain why ?

make sure you re-pack if it was packed. "git prune" will not remove packs 
at all, so..

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-17 22:44         ` Linus Torvalds
@ 2005-11-19 12:23           ` Franck
  2005-11-19 12:45             ` Lukas Sandström
  2005-11-19 17:56             ` Linus Torvalds
  0 siblings, 2 replies; 22+ messages in thread
From: Franck @ 2005-11-19 12:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

2005/11/17, Linus Torvalds <torvalds@osdl.org>:
>
> > On Thu, 17 Nov 2005, Franck wrote:
> >
> > ok that's what I was trying to do by killing the parent object.  Now
> > when looking a the graph using gitk all old objects have been removed.
> > But I'm suprised because the git repository is the same size as it was
> > before pruning all old objects. Can you explain why ?
>
> make sure you re-pack if it was packed. "git prune" will not remove packs
> at all, so..
>

I just looked at git-prune script and it seems to remove unreachable
objects only in .git/objects/[0-9a-f][0-9a-f] directories, not in pack
files.

Then by running git-repack -a -d, I build a new small pack that
contains only latest objects, but then the script runs
git-redundant-pack script which erases the new small one since all its
objects are included in the old big one. Is that correct ? If so
git-redundant-pack script could return the oldest redundant ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-19 12:23           ` Franck
@ 2005-11-19 12:45             ` Lukas Sandström
  2005-11-19 20:42               ` Junio C Hamano
  2005-11-19 17:56             ` Linus Torvalds
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Sandström @ 2005-11-19 12:45 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Franck

Franck wrote:
> 2005/11/17, Linus Torvalds <torvalds@osdl.org>:
> 
>>>On Thu, 17 Nov 2005, Franck wrote:
>>>
>>>ok that's what I was trying to do by killing the parent object.  Now
>>>when looking a the graph using gitk all old objects have been removed.
>>>But I'm suprised because the git repository is the same size as it was
>>>before pruning all old objects. Can you explain why ?
>>
>>make sure you re-pack if it was packed. "git prune" will not remove packs
>>at all, so..
>>
> 
> 
> I just looked at git-prune script and it seems to remove unreachable
> objects only in .git/objects/[0-9a-f][0-9a-f] directories, not in pack
> files.
> 
> Then by running git-repack -a -d, I build a new small pack that
> contains only latest objects, but then the script runs
> git-redundant-pack script which erases the new small one since all its
> objects are included in the old big one. Is that correct ? If so
> git-redundant-pack script could return the oldest redundant ?
> 
> Thanks
> --
>                Franck

The reason the old pack is kept instead of the new one is that it
is a proper superset of the new one. 

The "git-repack -a -d" case is fixed in Junios master, and I have sent out
a patch which lets you pipe git-fsck-objects --full --unreachable to
git-pack-redundant in the other cases, so the problem should hopefully
be resolved shortly.

/Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-19 12:23           ` Franck
  2005-11-19 12:45             ` Lukas Sandström
@ 2005-11-19 17:56             ` Linus Torvalds
  2005-11-19 19:52               ` Junio C Hamano
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2005-11-19 17:56 UTC (permalink / raw)
  To: Franck, Junio C Hamano; +Cc: Git Mailing List



On Sat, 19 Nov 2005, Franck wrote:
> 
> Then by running git-repack -a -d, I build a new small pack that
> contains only latest objects, but then the script runs
> git-redundant-pack script which erases the new small one since all its
> objects are included in the old big one. Is that correct ?

Gaah.

The git-redundant-pack logic has totally broken "git repack -a -d"

Junio, can we undo that, please? With "-a", redundant pack removal is 
trivial (and we used to do it right, exactly because it is trivial), and 
without "-a", redundant pack removal is pointless (since it will never 
pack objects that are already packed).

So "git repack" should _never_ call git-redundant-pack. It's always either 
wrong or pointless.

git-redundant-pack is great for "git prune", not for repack.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-19 17:56             ` Linus Torvalds
@ 2005-11-19 19:52               ` Junio C Hamano
  2005-11-21 20:11                 ` Franck
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2005-11-19 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Franck, Git Mailing List

Linus Torvalds <torvalds@osdl.org> writes:

> So "git repack" should _never_ call git-redundant-pack. It's always either 
> wrong or pointless.

Right-o.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-19 12:45             ` Lukas Sandström
@ 2005-11-19 20:42               ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2005-11-19 20:42 UTC (permalink / raw)
  To: Lukas Sandström; +Cc: git

Lukas Sandström <lukass@etek.chalmers.se> writes:

> The "git-repack -a -d" case is fixed in Junios master, and I have sent out
> a patch which lets you pipe git-fsck-objects --full --unreachable to
> git-pack-redundant in the other cases, so the problem should hopefully
> be resolved shortly.

After thinking a bit about it, I tend to agree with what Linus
said, and would vote for the simplicity of the older way.  From
usage point of view, git-repack is primarily about repacking,
and the -d option is a funny exception applicable to -a case,
only because it is so obvious (i.e. after repacking everything
into one, it is obvious any other packs are unneeded).  We might
even want to make the "-a" flag to imply "-a -d" there.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-19 19:52               ` Junio C Hamano
@ 2005-11-21 20:11                 ` Franck
  2005-11-21 20:45                   ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Franck @ 2005-11-21 20:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

2005/11/19, Junio C Hamano <junkio@cox.net>:
> Linus Torvalds <torvalds@osdl.org> writes:
>
> > So "git repack" should _never_ call git-redundant-pack. It's always either
> > wrong or pointless.
>
> Right-o.
>

Ok, it works now. My new git repository is only 60Mo. But :), I would
like to make up my public repository based on this "light" repository.
And if someone has the big repository, I would like him to be able to
pull my public repo into his one. But since I used grafting to "cut"
my light repo and .git/info/grafts file is not copied during
push/pull/clone operations it's not going to work. Is it a scheme that
could work ?

Moreover, I'm wondering if my public repository really needs to store
big repo's pack files as it is described in git tutorial ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-21 20:11                 ` Franck
@ 2005-11-21 20:45                   ` Junio C Hamano
  2005-11-22  9:22                     ` Franck
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2005-11-21 20:45 UTC (permalink / raw)
  To: Franck; +Cc: git

Franck <vagabon.xyz@gmail.com> writes:

> ... But since I used grafting to "cut"
> my light repo and .git/info/grafts file is not copied during
> push/pull/clone operations it's not going to work. Is it a scheme that
> could work ?

If you tell your downloaders that your repository is incomplete
and they need to have at least up to such and such commits from
another repository, they should be able to slurp from you.

It might be possible to teach upload-pack (that is run when your
downloaders run git-fetch or git-clone against your repository)
to somehow send a customized error message to the client when it
finds the other end needs certain objects that you yourself do
not even have. In that message you could say something like "due
to space constraints this repository is an incomplete one, and
you can only use it on top of a clone of such and such
repository, found at this URL: ...".

> Moreover, I'm wondering if my public repository really needs to store
> big repo's pack files as it is described in git tutorial ?

Copying is one way, but if the repository you are borrowing from
lives on the same machine, you could use objects/info/alternates
to point at it from your public repository.  In either case, the
point in that section is to reduce the need to transfer and
expand much stuff in your public repository while keeping the
repository complete.  The recommended procedure in the tutorial
always assumes that the public repository is kept fsck-objects
clean and complete.

What you are trying to do is to keep your public repository
fsck-objects *un*clean and still let downloaders work with it;
so I suspect following that section of the tutorial procedure
defeats the purpose your experiments.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-21 20:45                   ` Junio C Hamano
@ 2005-11-22  9:22                     ` Franck
  2005-11-22  9:50                       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Franck @ 2005-11-22  9:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2005/11/21, Junio C Hamano <junkio@cox.net>:
> Franck <vagabon.xyz@gmail.com> writes:
>
> > ... But since I used grafting to "cut"
> > my light repo and .git/info/grafts file is not copied during
> > push/pull/clone operations it's not going to work. Is it a scheme that
> > could work ?
>
> If you tell your downloaders that your repository is incomplete
> and they need to have at least up to such and such commits from
> another repository, they should be able to slurp from you.
>

What do you mean by "have at least up to such and such commits" ? I
can see only one commit that they need: the one I used to create my
public repository...

> It might be possible to teach upload-pack (that is run when your
> downloaders run git-fetch or git-clone against your repository)
> to somehow send a customized error message to the client when it
> finds the other end needs certain objects that you yourself do
> not even have. In that message you could say something like "due
> to space constraints this repository is an incomplete one, and
> you can only use it on top of a clone of such and such
> repository, found at this URL: ...".
>

That's a good idea. We get the same thing when cloning linux
repository. BTW how is it done in that case ?

> > Moreover, I'm wondering if my public repository really needs to store
> > big repo's pack files as it is described in git tutorial ?
>
> What you are trying to do is to keep your public repository
> fsck-objects *un*clean and still let downloaders work with it;
> so I suspect following that section of the tutorial procedure
> defeats the purpose your experiments.
>

Absolutely.  My question was not accurate sorry. It should have been
"can I have a public repository wiith fsck-objects unclean and with a
grafts file that should be downloaded when cloning it.

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-22  9:22                     ` Franck
@ 2005-11-22  9:50                       ` Junio C Hamano
  2005-11-22 10:40                         ` Franck
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2005-11-22  9:50 UTC (permalink / raw)
  To: Franck; +Cc: git

Franck <vagabon.xyz@gmail.com> writes:

> 2005/11/21, Junio C Hamano <junkio@cox.net>:
>> Franck <vagabon.xyz@gmail.com> writes:
>>
>> > ... But since I used grafting to "cut"
>> > my light repo and .git/info/grafts file is not copied during
>> > push/pull/clone operations it's not going to work. Is it a scheme that
>> > could work ?
>>
>> If you tell your downloaders that your repository is incomplete
>> and they need to have at least up to such and such commits from
>> another repository, they should be able to slurp from you.

I was not talking about _your_ case specifically.  If you happen
to have based your partial history on top of a single commit
then the set of "such and such commits" might be only one, but
you could for example clone from Linus tip, merge in a couple of
jgarzik branch heads, put your own commits on top of them and
then cauterize your history, stopping at those foreign commits.
In such a case you obviously need to tell others where you
chopped your history off.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-22  9:50                       ` Junio C Hamano
@ 2005-11-22 10:40                         ` Franck
  2005-11-22 17:06                           ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Franck @ 2005-11-22 10:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2005/11/22, Junio C Hamano <junkio@cox.net>:
> I was not talking about _your_ case specifically.  If you happen
> to have based your partial history on top of a single commit
> then the set of "such and such commits" might be only one, but
> you could for example clone from Linus tip, merge in a couple of
> jgarzik branch heads, put your own commits on top of them and
> then cauterize your history, stopping at those foreign commits.
> In such a case you obviously need to tell others where you
> chopped your history off.
>

I built the lite repository and got an error depending on which
original repo I used to push the lite one. Here is the history:

--------------------------------------------------------------
#
# building my pub repo from the "lite" repository
#

$ git checkout -b lite

$ git-show-branch
* [lite] Merge with Linux 2.6.14.
 ! [master] Merge with Linux 2.6.14.
  ! [origin] Merge with db93a82fa9d8b4d6e31c227922eaae829253bb88.
---

# push the initial commit object

$ git push /home/franck/pub/linux/git/linux-lite.git lite
updating 'refs/heads/lite'
  from 0000000000000000000000000000000000000000
  to   8643db584b46a61c968ae230897869f789bae020
Packing 19558 objects
Unpacking 19558 objects
 100% (19558/19558) done
refs/heads/lite: 0000000000000000000000000000000000000000 ->
8643db584b46a61c968ae230897869f789bae020

#
# push first changes from lite repositoy -> KO
#
$ making some hard work

$ git commit -m "Did some hard work"
$ git push /home/franck/pub/linux/git/linux-lite.git lite
updating 'refs/heads/lite'
  from 8643db584b46a61c968ae230897869f789bae020
  to   730518eea7523afd5b7891bb7849973cab52d963
Packing 0 objects
Unpacking 0 objects

error: unpack should have generated
730518eea7523afd5b7891bb7849973cab52d963, but I can't find it!
$

#
# push first changes from "full" repository -> OK
#

# go to full repository and checkout linux.2.6.14

$ git commit -m "Did some hard work"
$ git push /home/franck/pub/linux/git/linux-lite.git lite
updating 'refs/heads/lite'
  from 8643db584b46a61c968ae230897869f789bae020
  to   067d05600fe7251b8c923fbeb9ba0068ee272110
Packing 108 objects
Unpacking 108 objects
 100% (108/108) done
refs/heads/lite: 8643db584b46a61c968ae230897869f789bae020 ->
067d05600fe7251b8c923fbeb9ba0068ee272110
--------------------------------------------------------------

It seems that the "lite" repository can't be used as a working
repository. And If I use the last method to push some work, I can only
pull that changes from a full repository. From a lite one (without any
changes of course) I get this error:

$ git pull /home/franck/pub/linux/git/linux-lite.git lite
Packing 108 objects
Unpacking 108 objects
 100% (108/108) done
Merging HEAD with f42aaff3bf8041c2d43f1ff6fdfe5df6e8a5b00b
Merging:
8643db584b46a61c968ae230897869f789bae020 Merge with Linux 2.6.14.
f42aaff3bf8041c2d43f1ff6fdfe5df6e8a5b00b Did some hard work
found 0 common ancestor(s):
Traceback (most recent call last):
  File "/home/fbuihuu/bin/git-merge-recursive", line 870, in ?
    firstBranch, secondBranch, graph)
  File "/home/fbuihuu/bin/git-merge-recursive", line 67, in merge
    mergedCA = ca[0]
IndexError: list index out of range
No merge strategy handled the merge.

Do you have any clues ?

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-22 10:40                         ` Franck
@ 2005-11-22 17:06                           ` Junio C Hamano
  2005-11-22 19:10                             ` Franck
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2005-11-22 17:06 UTC (permalink / raw)
  To: Franck; +Cc: git

Franck <vagabon.xyz@gmail.com> writes:

> It seems that the "lite" repository can't be used as a working
> repository.
>...
> Do you have any clues ?

Were you around on this list, around the beginning of this
month?  The thread that starts here may be of interest:

	http://marc.theaimsgroup.com/?l=git&m=113089701819420

I think (although I do not exactly know how the "lite"
repository was constructed) what you are doing is similar to
what I called "shallow clone" there in that thread.  At the end
of the thread I think I listed what you can and cannot do in
such an incomplete repository.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [QUESTION] Access to a huge GIT repository.
  2005-11-22 17:06                           ` Junio C Hamano
@ 2005-11-22 19:10                             ` Franck
  0 siblings, 0 replies; 22+ messages in thread
From: Franck @ 2005-11-22 19:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2005/11/22, Junio C Hamano <junkio@cox.net>:
> Were you around on this list, around the beginning of this
> month?

nope, I subscribed to the list one week later.

> The thread that starts here may be of interest:
>
>         http://marc.theaimsgroup.com/?l=git&m=113089701819420
>
> I think (although I do not exactly know how the "lite"
> repository was constructed) what you are doing is similar to
> what I called "shallow clone" there in that thread.

Indeed. What I'm doing to get my "shallow clone" is basicaly remove
every ref/branchs I don't want, then run git-prune, git-repack -a -d.
But the result should be the same as your "shallow clone" method.

>  At the end
> of the thread I think I listed what you can and cannot do in
> such an incomplete repository.
>

Yes. But the big difference is that, in my case, the shallow copy is
used as a public repository, whereas in your case the shallow copy is
used as a working repository. Anyway, Ralf (the mips arch maintainer)
is going to create a new pruned repository that should resolve my
problem. I'm going to wait for it instead of trying to make my own
"broken" repository.

Thanks
--
               Franck

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2005-11-22 19:10 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-16 12:24 [QUESTION] Access to a huge GIT repository Franck
2005-11-16 16:46 ` Linus Torvalds
2005-11-17 10:36   ` Franck
2005-11-17 16:23     ` Linus Torvalds
2005-11-17 21:47       ` Franck
2005-11-17 22:44         ` Linus Torvalds
2005-11-19 12:23           ` Franck
2005-11-19 12:45             ` Lukas Sandström
2005-11-19 20:42               ` Junio C Hamano
2005-11-19 17:56             ` Linus Torvalds
2005-11-19 19:52               ` Junio C Hamano
2005-11-21 20:11                 ` Franck
2005-11-21 20:45                   ` Junio C Hamano
2005-11-22  9:22                     ` Franck
2005-11-22  9:50                       ` Junio C Hamano
2005-11-22 10:40                         ` Franck
2005-11-22 17:06                           ` Junio C Hamano
2005-11-22 19:10                             ` Franck
2005-11-16 18:24 ` Junio C Hamano
2005-11-16 20:01   ` Martin Langhoff
2005-11-16 20:10     ` Linus Torvalds
2005-11-16 20:35     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).