All of lore.kernel.org
 help / color / mirror / Atom feed
* Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
@ 2012-08-05  4:56 Junio C Hamano
  2012-08-05  9:38 ` Michael Haggerty
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Junio C Hamano @ 2012-08-05  4:56 UTC (permalink / raw)
  To: git

The "alternates" mechanism lets you keep a single object store (not
necessarily a git repository on its own, but just the objects/ part
of it) on a machine, have multiple repositories on the same machine
share objects from it, to save the network transfer bandwidth when
cloning from remote repositories and the disk space used by the
local repositories.  A repository created by "clone --reference" or
"clone -s" uses this mechanism to borrow objects from the object
store of another repository.  A user also can manually add new
entries to $GIT_DIR/objects/info/alternates to borrow from other
object stores.

The UI for this mechanism however has some room for improvement, and
we may want to start improving it for the next release after the
upcoming Git 1.7.12 (or even Git 2.0 if the change is a large one
that may be backward incompatible but gives us a vast improvement).

Here are some random thoughts as a discussion starter.

 - By design, the borrowed object store MUST not ever lose any
   object from it, as such an object loss can corrupt the borrowing
   repositories.  In theory, it is OK for the object store whose
   objects are borrowed by repositories to acquire new objects, but
   losing existing objects is an absolute no-no.

   But the UI of "clone -s" encourages users to borrow from the
   object store of a repository that the user may actively develop
   in.  It is perfectly normal for users to perform operations that
   make objects that used to be reachable from tips of its branches
   unreachable (e.g. rebase, reset, "branch -d") in a repository
   that is used for active development, but a "gc" after such an
   operation will lose objects that were originally available in the
   repository.  If objects lost that way were still needed by the
   repositories that borrow from it, the borrowing repository gets
   corrupt immediately.

   In practice, this means that users who use "clone -s" to make a
   new repository can *never* prune the original repository without
   risking to corrupt its borrowing repository [*1*].

   Some ideas:

   - Make "clone --reference" without "-s" not to borrow from the
     reference repository.  E.g. if you have a clone of Linus
     repository at /git/linux.git/, cloning a related repository
     using it as --reference:

     $ git clone --reference /git/linux.git git://k.org/linux-next.git

     should still take advantage of /git/linux.git/{refs,objects} to
     reduce the transfer cost of fetching from k.org, but the
     resulting repository should not point /git/linux.git with its
     objects/info/alternates file.

   - Make the distinction between a regular repository and an object
     store that is meant to be used for object sharing stronger.

     Perhaps a configuration item "core.objectstore = readonly" can
     be introduced, and we forbid "clone -s" from pointing at a
     repository without such a configuration.  We also forbid object
     pruning operations such as "gc" and "repack" from being run in
     a repository marked as such.

     It may be necessary to allow some special kind of repacking of
     such a "readonly" object store, in order to reduce the number
     of packfiles (and get rid of loose object files); it needs to
     be implemented carefully not to lose any object, regardless of
     local reachability.

 - When you have a repository and one or more repositories that
   borrow from it, you may want to dissociate the borrowing
   repositories from the borrowed one (e.g. so that you can repack
   or prune the original repository safely, or you may even want to
   remove it).

   I think "git repack -a -d -f" in the borrowing repository happens
   to be the way to do this, but it is not clear to the users why.

   Some ideas:

   - It might not be a bad idea to have a dedicated new command to
     help users manage alternates ("git alternates"?); obviously
     this will be one of its subcommand "git alternates detach" if
     we go that route.

   - Or just an entry in the documentation is sufficient?

 - When you have two or more repositories that do not share objects,
   you may want to rearrange things so that they share their objects
   from a single common object store.

   There is no direct UI to do this, as far as I know.  You can
   obviously create a new bare repository, push there from all
   of these repositories, and then borrow from there, e.g.

	git --bare init shared.git &&
	for r in a.git b.git c.git ...
        do
	    (
		cd "$r" &&
	        git push ../shared.git "refs/*:refs/remotes/$r/*" &&
		echo ../../../shared.git/objects >.git/objects/info/alternates
   	    )
	done

   And then repack shared.git once.

   Some ideas:

   - (obvious: give a canned command to do the above, perhaps then
     set the core.objectstore=readonly in the resuting shared.git)

 - When you have one object store and a repository that does not yet
   borrow from it, you may want to make the repository borrow from
   the object store.  Obviously you can run "echo" like the sample
   script in the previous item above, but it is not obvious how to
   perform the logical next step of shrinking $GIT_DIR/objects of
   the repository that now borrows the objects.

   I think "git repack -a -d" is the way to do this, but if you
   compare this command to "git repack -a -d -f" we saw previously
   in this message, it is not surprising that the users would be
   confused---it is not obvious at all.

   Some ideas:

   - (obvious: give a canned subcommand to do this)


[Footnote]

*1* Making the borrowed object store aware of all the repositories
that borrow from it, so that operations like "gc" and "repack" in
the repository with the borrowed object store can keep objects that
are needed by borrowing repositories, is theoretically possible, but
is not a workable approach in practice, as (1) borrowers may not
have a write access to the shared object store to add such a back
pointer to begin with, (2) "gc"/"repack" in the borrowed object
store and normal operations in the borrowing repositories can easily
race with each other, without any coordination between the users,
and (3) a casual "borrowing" can simply be done with a simple "echo"
as shown in the main text of this message, and there is no way to
ensure a backpointer from the borrowed object store to such a
borrowing repository.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
@ 2012-08-05  9:38 ` Michael Haggerty
  2012-08-05 19:01   ` Junio C Hamano
  2012-08-07  6:16   ` Jeff King
  2012-08-06 21:55 ` Junio C Hamano
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Michael Haggerty @ 2012-08-05  9:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 08/05/2012 06:56 AM, Junio C Hamano wrote:
> The "alternates" mechanism [...]
> The UI for this mechanism however has some room for improvement, and
> we may want to start improving it for the next release after the
> upcoming Git 1.7.12 (or even Git 2.0 if the change is a large one
> that may be backward incompatible but gives us a vast improvement).
 >
> Here are some random thoughts as a discussion starter. [...]
[...]
>     - Make the distinction between a regular repository and an object
>       store that is meant to be used for object sharing stronger.
>
>       Perhaps a configuration item "core.objectstore = readonly" can
>       be introduced, and we forbid "clone -s" from pointing at a
>       repository without such a configuration.  We also forbid object
>       pruning operations such as "gc" and "repack" from being run in
>       a repository marked as such.

Must the repository necessarily be "readonly"?  It seems that it would 
be permissible to push new objects to such a repository; just not to 
delete existing objects.  Thus maybe another term would be better to 
describe such a repository, like "appendonly" or "noprune" or even 
something more abstract like "donor".

I have some other crazy ideas for making the concept even more powerful:

* Support remote alternate repositories.  Local repository obtains 
missing objects from the remote as needed.  This would probably be 
insanely inefficient without also supporting...

* Lazy copying of "borrowed" objects to the local repository.  Any 
object fetched from the alternate object store is copied to the local 
object store.

Together, I think that these two features would give fully-functional 
shallow clones.

Such alternates could even be chained together: for example, keep a 
single local lazy clone of the upstream repository somewhere on your 
site or on your computer, and use that as read-through cache for other 
clones.

* To help manage local disk space, allow intelligent curation of the 
objects kept in the local store when they are also available in the 
alternate.  The criteria for what to keep could be things like 
"revisions with depth <= 20 on branches X, Y/*, and Z"; "objects that 
have been accessed within the last 3 months", "all tag objects 
refs/tags/release-*".  It should be possible to cull objects not meeting 
the criteria with or without actively fetching all objects meeting the 
criteria.  Probably the criteria would be stored in the configuration to 
be reused (and perhaps run as part of "git gc").

This would cure a lot of "storing big, non-deltaable files" pain because 
big blobs could be stored on a central server without multiplying the 
size of every clone.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  9:38 ` Michael Haggerty
@ 2012-08-05 19:01   ` Junio C Hamano
  2012-08-07  6:16   ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2012-08-05 19:01 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I have some other crazy ideas for making the concept even more powerful:

Sorry, but the "a bit more sanity" topic is not interested in making
the concept powerful at all.

This is about making it usable with ease without the user having to
worry about "oh, I was about to shoot myself in the foot by running
repack; it is good that I remembered objects in this repository are
borrowed by other repositories" and things like that.

For the purpose of "a bit more sanity" topic, adding new things
users have to worry about to the mix, e.g.  "what happens if my
network goes away?  I can afford not to have access to these kinds
of objects for a while, but I must always have access to those
objects, so I can borrow the former but not the latter", is going in
the other way.

The ideas in your messages are *not* useless.  Enhancements along
those lines may be useful, but they do not fit in the same
discussion of making the current mechanism simmpler and easier for
users to use the mechanism in a safe and sane way.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
  2012-08-05  9:38 ` Michael Haggerty
@ 2012-08-06 21:55 ` Junio C Hamano
  2012-08-08  1:42 ` Sascha Cunz
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2012-08-06 21:55 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

>  - When you have one object store and a repository that does not yet
>    borrow from it, you may want to make the repository borrow from
>    the object store.  Obviously you can run "echo" like the sample
>    script in the previous item above, but it is not obvious how to
>    perform the logical next step of shrinking $GIT_DIR/objects of
>    the repository that now borrows the objects.
>
>    I think "git repack -a -d" is the way to do this, but if you
>    compare this command to "git repack -a -d -f" we saw previously
>    in this message, it is not surprising that the users would be
>    confused---it is not obvious at all.
>
>    Some ideas:
>
>    - (obvious: give a canned subcommand to do this)

The analysis of this item is wrong, I think.  "git repack -a -d -l"
should be the way to do so.

The message looks wrong when it turns out that there is no need to
have any object in the borrowing repository, though.  We only see
"Nothing new to pack" (which technically is correct), and the
command exits successfully.  You can peek .git/objects/ to find out
that all the objects the borrower used to have its own copy are now
gone (because they are available at the alternate), but the message
gives a false impression that we thought about doing something,
found nothing new to be packed, and gave up without doing anything.

But that is not what is happening.  We traversed the connectivity,
found that all the objects necessary for our history are housed in
our alternates, gave "Nothing new to pack" (because we do not have
to have any object on our own), and then removed all the object
files and packs in our repository.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  9:38 ` Michael Haggerty
  2012-08-05 19:01   ` Junio C Hamano
@ 2012-08-07  6:16   ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2012-08-07  6:16 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git

On Sun, Aug 05, 2012 at 11:38:12AM +0200, Michael Haggerty wrote:

> I have some other crazy ideas for making the concept even more powerful:
> 
> * Support remote alternate repositories.  Local repository obtains
> missing objects from the remote as needed.  This would probably be
> insanely inefficient without also supporting...
> 
> * Lazy copying of "borrowed" objects to the local repository.  Any
> object fetched from the alternate object store is copied to the local
> object store.
> 
> Together, I think that these two features would give fully-functional
> shallow clones.

You might be interested in looking at my rough (_very_ rough) experiment
with object db "hooks":

  https://github.com/peff/git/commits/jk/external-odb

The basic idea is to have helper programs that basically have two
commands: give a list of sha1s you can provide, and fetch a specific
object by sha1. That's enough for the low levels of git to fall-back to
a helper on an object lookup failure, and copy the object to a local
cache. Managing the cache could be done externally by helper-specific
code.

Sorry, there's no documentation on the format or behavior, and most of
the changes are in one big patch. If you're interested and find it
unreadable, I can try to clean it up.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
  2012-08-05  9:38 ` Michael Haggerty
  2012-08-06 21:55 ` Junio C Hamano
@ 2012-08-08  1:42 ` Sascha Cunz
  2012-08-11  9:35 ` Hallvard Breien Furuseth
  2012-08-27 22:39 ` Oswald Buddenhagen
  4 siblings, 0 replies; 26+ messages in thread
From: Sascha Cunz @ 2012-08-08  1:42 UTC (permalink / raw)
  To: Junio C Hamano, git

[..]
>  - By design, the borrowed object store MUST not ever lose any
>    object from it, as such an object loss can corrupt the borrowing
>    repositories.  In theory, it is OK for the object store whose
>    objects are borrowed by repositories to acquire new objects, but
>    losing existing objects is an absolute no-no.
[...]
>    In practice, this means that users who use "clone -s" to make a
>    new repository can *never* prune the original repository without
>    risking to corrupt its borrowing repository [*1*].
[...]

Given your example of /git/linux.git being a clone of Linus' repository, 
cloning a related repository using it as --reference:

     $ cd /git
     $ git clone --reference /git/linux.git git://k.org/linux-next.git mine

Wouldn't it be by far a less intrusive alternative to do the following (in the 
clone step above):

- create the file /git/linux.git/objects/borrowing/_git_mine (This is where we
  borrow FROM).
  This file would hold a packed-ref list of HEADs from the /git/mine clone of
  the repository.

  _git_mine here is slash-stripped version of the destination path. Maybe the
  packed-ref format could also be extended by a single line containing a full
  path to the foreign repository.

- On every update-ref to /git/mine, update the 'borrowing' refs in
  /git/linux.git

- On any maintenance on /git/linux.git (gc, prune, repack, etc.) consider refs
  in the packed-refs at objects/borrowing to be valid references.

  If packed-ref format was adopted like stated above, we could stat() here if
  this directory still exists and error out if it doesn't (In this case the
  user should tell us if she moved or removed the clone).

Any alternatives for looking up the packed-refs list for borrowing would also 
be doable; i.E. putting the list of valid borrowing-packed-refs-files into the 
config file (as opposed to lookup $GIT_DIR/objects/borrowing above).
Putting this list into the config file would eliminate need for the packed-ref 
format change and give the user the ability to maintain her clones with well-
known command 'git config'

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
                   ` (2 preceding siblings ...)
  2012-08-08  1:42 ` Sascha Cunz
@ 2012-08-11  9:35 ` Hallvard Breien Furuseth
  2012-08-27 22:39 ` Oswald Buddenhagen
  4 siblings, 0 replies; 26+ messages in thread
From: Hallvard Breien Furuseth @ 2012-08-11  9:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
>    Some ideas:
> 
>    - Make "clone --reference" without "-s" not to borrow from the
>      reference repository.  (...)

Generalize: Introduce volatile alternate object stores.  Commands like
(remote) fetch, repack, gc will copy desired objects they see there.

That allows pruneable alternates if people want them: Make every
borrowing repo also borrow from a companion volatile store.  To prune
some shared objects:  Move them from the alternate to the volatile.
Repack or gc all borrowing repos.  Empty the volatile alternate.
Similar to detach from one alternate repo while keeping others:
gc with the to-be-dropped alternate as a volatile.

Also it gives a simple way to try to repair a repo with missing
objects, if you have some other repositories which might have the
objects: Repack with the other repositories as volatile alternates.

BTW, if a wanted object disappears from the volatile alternate while
fetch is running, fetch should get it from the remote after all.

>    - Make the distinction between a regular repository and an object
>      store that is meant to be used for object sharing stronger.
> 
>      Perhaps a configuration item "core.objectstore = readonly" can
>      be introduced, and we forbid "clone -s" from pointing at a
>      repository without such a configuration.  We also forbid object
>      pruning operations such as "gc" and "repack" from being run in
>      a repository marked as such.

I hope Michael's "append-only"/"donor" is feasible instead.  In which
case safer gc/repack are needed, like you outline:

>      It may be necessary to allow some special kind of repacking of
>      such a "readonly" object store, in order to reduce the number
>      of packfiles (and get rid of loose object files); it needs to
>      be implemented carefully not to lose any object, regardless of
>      local reachability.

And it needs to be default behavior in such stores, so users won't
need don't-shoot-myself-in-foot options.

>    - It might not be a bad idea to have a dedicated new command to
>      help users manage alternates ("git alternates"?); obviously
>      this will be one of its subcommand "git alternates detach" if
>      we go that route.

"git object-store <subcommand>  -- manage alternates & object stores"?

>    - Or just an entry in the documentation is sufficient?

Better doc would be useful anyway, and this command gives a place to
put it:-)  I had no idea alternates were intended to be read-only,
but that does explain some seeming defects I'd wondered about.

>  - When you have two or more repositories that do not share objects,
>    you may want to rearrange things so that they share their objects
>    from a single common object store.
> 
>    There is no direct UI to do this, as far as I know.  You can
>    obviously create a new bare repository, push there from all
>    of these repositories, and then borrow from there, e.g.
>    
> 	git --bare init shared.git &&
> 	for r in a.git b.git c.git ...
>         do
> 	    (
> 		cd "$r" &&
> 	        git push ../shared.git "refs/*:refs/remotes/$r/*" &&
> 		echo ../../../shared.git/objects >.git/objects/info/alternates
>    	    )
> 	done
> 
>    And then repack shared.git once.

...and finally gc the other repositories.

The refs/remotes/$r/ namespace becomes misleading if the user renames
or copies the corresponding Git repository, and then cleverly does
something to the shared repo and the repo (if any) in directory $r.

I suggest refs/remotes/$unique_number/ and note $unique_number
somewhere in the borrowing repo.  If someone insists on being clever,
this may force them to read up on what they're doing first.

Or store no refs, since the shared repo shouldn't lose objects anyway.

If we're sure objects won't be lost: Create a proper remote with the
shared repo.  That way the user can push into it once in a while, and
he can configure just which refs should be shared.

> 
>    Some ideas:
> 
>    - (obvious: give a canned command to do the above, perhaps then
>      set the core.objectstore=readonly in the resuting shared.git)

That's getting closer to 'bzr init-repository': One dir with the
shared repo and all borrowing repositories.  A simple model which Git
can track and the user need not think further about.

This way, git clone/init of a new repo in this dir can learn to notice
and use the shared repo.

We can also have a command (git object-store?) to maintain the
repository collection, since Git knows where to find them all:
Push from all repos into the shared repo, gc all repos, even prune
unused objects from the shared repo - after imlementing sufficient
paranoia.

>  - When you have one object store and a repository that does not yet
>    borrow from it, you may want to make the repository borrow from
>    the object store.  Obviously you can run "echo" like the sample
>    script in the previous item above, but it is not obvious how to
>    perform the logical next step of shrinking $GIT_DIR/objects of
>    the repository that now borrows the objects.
> 
>    I think "git repack -a -d" is the way to do this, but if you
>    compare this command to "git repack -a -d -f" we saw previously
>    in this message, it is not surprising that the users would be
>    confused---it is not obvious at all.

Hopefully users only need to know "git gc".

> [Footnote]
> 
> *1* Making the borrowed object store aware of all the repositories
> that borrow from it, so that operations like "gc" and "repack" in
> the repository with the borrowed object store can keep objects that
> are needed by borrowing repositories, is theoretically possible, but
> is not a workable approach in practice, as (1) borrowers may not
> have a write access to the shared object store to add such a back
> pointer to begin with,

Thus this can only be an optional feature.
Not via direct backrefs though, see above about refs/remotes/$r/.

> (2) "gc"/"repack" in the borrowed object
> store and normal operations in the borrowing repositories can easily
> race with each other, without any coordination between the users,

This sounds like a bug to me, unless you refer to deleting objects
from the shared store.  The doc does not warn that we can't even
maintain shared store while using Git commands in borrowing
repositories.

> and (3) a casual "borrowing" can simply be done with a simple "echo"
> as shown in the main text of this message, and there is no way to
> ensure a backpointer from the borrowed object store to such a
> borrowing repository.

-- 
Hallvard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?
  2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
                   ` (3 preceding siblings ...)
  2012-08-11  9:35 ` Hallvard Breien Furuseth
@ 2012-08-27 22:39 ` Oswald Buddenhagen
  2012-08-28 19:19   ` GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?) Hallvard Breien Furuseth
  4 siblings, 1 reply; 26+ messages in thread
From: Oswald Buddenhagen @ 2012-08-27 22:39 UTC (permalink / raw)
  To: git

hi,

Junio C Hamano <gitster <at> pobox.com> writes:
> The "alternates" mechanism [...]

sorry for the somewhat late response - i found this thread only now.

at qt-project.org we have a somewhat peculiar setup: we have the qt4 repository,
and a bunch of qt5 repositories which resulted from a split. qt5 is under active
development, but qt4 is still maintained. that means that we need to cherry-pick
between those repositories quite a lot. for an optimal cherry-picking experience
one needs three-way-merging, which means we need shared object stores. which is
where the problems start:

my first approach was just a common objects/ directory with all repositories
symlinking into it. problems:
- the object store can never be garbage-collected. with a lot of heavy rebasing
and temporarily added remotes, it gets messy after a while.
- there is a constant risk of destroying the object store by inadvertently
running git gc - which is particularly likely with git-gui, as it seems to be
retarded enough to ignore the auto-gc setting.

so the second approach is the "bare aggregator repo" which adds all other repos
as remotes, and the other repos link back via alternates. problems:
- to actually share objects, one always needs to push to the aggregator
- tags having a shared namespace doesn't actually work, because the repos have
the same tags on different commits (they are independent repos, after all)
- one still cannot safely garbage-collect the aggregator, as the refs don't
include the stashes and the index, so rebasing may invalidate these more
transient objects.

i would re-propose hallvard's "volatile" alternates (at least i think that's
what he was talking about two weeks ago): they can be used to obtain objects,
but every object which is in any way referenced from the current clone must be
available locally (or from a "regular" alternate). that means that diffing, etc.
would get objects only temporarily, while cherry-picking would actually copy
(some of) the objects. this would make it possible to "cross-link" repositories,
safely and without any "3rd parties".

thoughts?

regards

^ permalink raw reply	[flat|nested] 26+ messages in thread

* GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?)
  2012-08-27 22:39 ` Oswald Buddenhagen
@ 2012-08-28 19:19   ` Hallvard Breien Furuseth
  2012-08-29  7:42     ` Oswald Buddenhagen
  0 siblings, 1 reply; 26+ messages in thread
From: Hallvard Breien Furuseth @ 2012-08-28 19:19 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: git

Oswald Buddenhagen wrote:
> (...)so the second approach is the "bare aggregator repo" which adds
> all other repos as remotes, and the other repos link back via
> alternates. problems:
> 
> - to actually share objects, one always needs to push to the aggregator

Run a cron job which frequently does that?

> - tags having a shared namespace doesn't actually work, because the
> repos have the same tags on different commits (they are independent
> repos, after all)

Junio's proposal partially fixes that: It pushes refs/* instead of
refs/heads/*, to refs/remotes/<borrowing repo>/.  However...

> - one still cannot safely garbage-collect the aggregator, as the refs
> don't include the stashes and the index, so rebasing may invalidate
> these more transient objects.

Also if you copy a repo (e.g. making a backup) instead of cloning it,
and then start using both, they'll push into the same namespace -
overwriting each other's refs.  Non-fast-forward pushes can thus lose
refs to objects needed by the other repo.

receive.denyNonFastForwards only rejects pushes to refs/heads/ or
something.  (A feature, as I learned when I reported it as bug:-)
IIRC Git has no config option to reject all non-fast-forward pushes.

> i would re-propose hallvard's "volatile" alternates (at least i think that's
> what he was talking about two weeks ago): they can be used to obtain
> objects, but every object which is in any way referenced from the current
> clone must be available locally (or from a "regular" alternate). that means
> that diffing, etc.  would get objects only temporarily, while cherry-picking
> would actually copy (some of) the objects. this would make it possible to
> "cross-link" repositories, safely and without any "3rd parties".

I'm afraid that idea by itself won't work:-(  Either you borrow from a
store or not.  If Git uses an object from the volatile store, it can't
always know if the caller needs the object to be copied.

OTOH volatile stores which you do *not* borrow from would be useful:
Let fetch/repack/gc/whatever copy missing objects from there.


2nd attempt for a way to gc of the alternate repo:  Copy the with
removed objects into each borrowing repo, then gc them.   Like this:

1. gc, but pack all to-be-removed objects into a "removable" pack.

2. Hardlink/copy the removable pack - with a .keep file - into
   borrowing repos when feasible:  I.e. repos you can find and
   have write access to.  Update their .git/objects/info/packs.
   (Is there a Git command for this?)  Repeat until nothing to do,
   in case someone created a new repo during this step.

3. Move the pack from the alternate repo to a backup object store
   which will keep it for a while.

4. Delete the .keep files from step (2).  They were needed in case
   a user gc'ed away an object from the pack and then added an
   identical object - borrowed from the to-be-removed pack.

5. gc/repack the other repos at your leisure.

666. Repos you could not update in step (2), can get temporarily
   broken.  Their owners must link the pack from the backup store by
   hand, or use that store as a volatile store and then gc/repack.

Loose objects are a problem:  If a repo has longer expiry time(s)
than the alternate store, it will get loads of loose objects from all
repos which push into the alternate store.  Worse, gc can *unpack*
those objects, consuming a lot of space.  See threads "git gc == git
garbage-create from removed branch" (3 May) and "Keeping unreachable
objects in a separate pack instead of loose?" (10 Jun).

Presumably the work-arounds are:
- Use long expiry times in the alternate repo.  I don't know which
  expiration config settings are relevant how.
- Add some command which checks and warns if the repo has longer
  expiry time than the repo it borrows from.
Also I hope Git will be changed to instead pack such loose objects
somewhere, as discussed in the above threads.

All in all, this isn't something you'd want to do every day.  But it
looks doable and can be scripted.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?)
  2012-08-28 19:19   ` GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?) Hallvard Breien Furuseth
@ 2012-08-29  7:42     ` Oswald Buddenhagen
  2012-08-29 15:52       ` GC of alternate object store Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Oswald Buddenhagen @ 2012-08-29  7:42 UTC (permalink / raw)
  To: Hallvard Breien Furuseth; +Cc: git

On Tue, Aug 28, 2012 at 09:19:53PM +0200, Hallvard Breien Furuseth wrote:
> Oswald Buddenhagen wrote:
> > (...)so the second approach is the "bare aggregator repo" which adds
> > all other repos as remotes, and the other repos link back via
> > alternates. problems:
> > 
> > - to actually share objects, one always needs to push to the aggregator
> 
> Run a cron job which frequently does that?
> 
nope. i also have separate repos which share the same code, so when i
develop it i need to pick between them "live". of course it's unlikely
to get conflicts in this case, so the missing object sharing is not that
bad (the objects are transferred via format-patch, as i'm rewriting
paths anyway), but when it happens it's messy to get out again.

> > - tags having a shared namespace doesn't actually work, because the
> > repos have the same tags on different commits (they are independent
> > repos, after all)
> 
> Junio's proposal partially fixes that: It pushes refs/* instead of
> refs/heads/*, to refs/remotes/<borrowing repo>/.  However...
> 
i did exacty that. the tags are *still* not populated - git just tries
very hard to treat them specially.
and the "stash" file is also ignored, unfortunately.

> > - one still cannot safely garbage-collect the aggregator, as the refs
> > don't include the stashes and the index, so rebasing may invalidate
> > these more transient objects.
> 
> Also if you copy a repo (e.g. making a backup) instead of cloning it,
> and then start using both, they'll push into the same namespace -
> overwriting each other's refs.
>
right. it's a clear user error, though - i wouldn't *expect* it to work.
anyway, i don't have *that* problem, as my aggregator actually pulls,
not the other way round.

anyway, the bottom line is that using alternates as-is for anything but
sharing refs/remotes/origin/* (which i'm assuming to be ff-only) is
a recipe for disaster.

anything which is supposed to be in any way safe must make the "donor"
object store aware of the sharing, which at the very least means setting
the proposed append-only flag _by the borrowing_ object store. which
means that the info/alternates file should be obfuscated, so people
can't edit it manually.

> > i would re-propose hallvard's "volatile" alternates (at least i think that's
> > what he was talking about two weeks ago): they can be used to obtain
> > objects, but every object which is in any way referenced from the current
> > clone must be available locally (or from a "regular" alternate). that means
> > that diffing, etc.  would get objects only temporarily, while cherry-picking
> > would actually copy (some of) the objects. this would make it possible to
> > "cross-link" repositories, safely and without any "3rd parties".
> 
> I'm afraid that idea by itself won't work:-(

> Either you borrow from a store or not.
>
correct. from "regular" alternates you "borrow", in "volatile" ones you
only "peek".
so apparently our definitions are different after all.

> If Git uses an object from the volatile store, it can't always know if
> the caller needs the object to be copied.
> 
it doesn't have to. the distinction comes when creating objects: if an
object is only in a volatile alternate, it does not already exist for the
purpose of object creation and is thus created locally.

regards

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store
  2012-08-29  7:42     ` Oswald Buddenhagen
@ 2012-08-29 15:52       ` Junio C Hamano
  2012-08-30  9:53         ` Oswald Buddenhagen
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2012-08-29 15:52 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: Hallvard Breien Furuseth, git

Oswald Buddenhagen <ossi@kde.org> writes:

> On Tue, Aug 28, 2012 at 09:19:53PM +0200, Hallvard Breien Furuseth wrote:
> ...
>> Junio's proposal partially fixes that: It pushes refs/* instead of
>> refs/heads/*, to refs/remotes/<borrowing repo>/.  However...
>> 
> i did exacty that. the tags are *still* not populated - git just tries
> very hard to treat them specially.

Just this part (I won't comment on the other parts in this discussion).

Doesn't

	git push $over_there 'refs/*:refs/remotes/mine/*'

push your tag v1.0 to refs/remotes/mine/v1.0 over there?  The
version of git I ship seems to do this just fine.

> and the "stash" file is also ignored, unfortunately.

There is a work in progress to do this on 'pu'.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store
  2012-08-29 15:52       ` GC of alternate object store Junio C Hamano
@ 2012-08-30  9:53         ` Oswald Buddenhagen
  2012-08-30 16:03           ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Oswald Buddenhagen @ 2012-08-30  9:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Aug 29, 2012 at 08:52:27AM -0700, Junio C Hamano wrote:
> (I won't comment on the other parts in this discussion).
> 
which is kinda unfortunate. ;)

> Oswald Buddenhagen <ossi@kde.org> writes:
> > i did exacty that. the tags are *still* not populated - git just tries
> > very hard to treat them specially.
> 
> Doesn't
> 
> 	git push $over_there 'refs/*:refs/remotes/mine/*'
> 
> push your tag v1.0 to refs/remotes/mine/v1.0 over there?  The
> version of git I ship seems to do this just fine.
> 
as i wrote before, i'm pulling, not pushing, so any differences could be
blamed on that (or git version 1.7.12.23.g948900e).
anyway, it seems this new version does fetch the tags under the remotes'
namespaces, after all - but it still imports them into the global tag
namespace as well, which of course makes a mess with the duplicated
tags.

and for many repos i'm getting something like this (this is kinda new):

Fetching qtbase
remote: Counting objects: 62375, done.
remote: Compressing objects: 100% (28049/28049), done.
remote: Total 55704 (delta 45280), reused 36646 (delta 27368)
Receiving objects: 100% (55704/55704), 16.76 MiB | 4.94 MiB/s, done.
Resolving deltas: 100% (45280/45280), completed with 3017 local objects.
fatal: bad object 90f0f499ec5953d60d616a2ff541ecaf8b0c31a2
error: ../qt5/qtbase did not send all necessary objects

both the aggregator and the fetched repos run cleanly through fsck.

regards

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store
  2012-08-30  9:53         ` Oswald Buddenhagen
@ 2012-08-30 16:03           ` Junio C Hamano
  2012-08-31 16:26             ` Oswald Buddenhagen
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2012-08-30 16:03 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: git

Oswald Buddenhagen <ossi@kde.org> writes:

>> Doesn't
>> 
>> 	git push $over_there 'refs/*:refs/remotes/mine/*'
>> 
>> push your tag v1.0 to refs/remotes/mine/v1.0 over there?  The
>> version of git I ship seems to do this just fine.
>> 
> as i wrote before, i'm pulling, not pushing,...

You would need to decline the automatic tag following with --no-tags
(which in hindsight is misnamed; it really means "do not auto-follow
tags"), like so:

	cd $over_there &&
        git fetch --no-tags $my_repository 'refs/*:refs/remotes/mine/*'

Otherwise, you will also get tags in refs/tags/.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store
  2012-08-30 16:03           ` Junio C Hamano
@ 2012-08-31 16:26             ` Oswald Buddenhagen
  2012-08-31 19:18               ` Dan Johnson
  0 siblings, 1 reply; 26+ messages in thread
From: Oswald Buddenhagen @ 2012-08-31 16:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 30, 2012 at 09:03:34AM -0700, Junio C Hamano wrote:
> Oswald Buddenhagen <ossi@kde.org> writes:
> 
> >> Doesn't
> >> 
> >> 	git push $over_there 'refs/*:refs/remotes/mine/*'
> >> 
> >> push your tag v1.0 to refs/remotes/mine/v1.0 over there?  The
> >> version of git I ship seems to do this just fine.
> >> 
> > as i wrote before, i'm pulling, not pushing,...
> 
> You would need to decline the automatic tag following with --no-tags
> (which in hindsight is misnamed; it really means "do not auto-follow
> tags"), like so:
> 
> 	cd $over_there &&
>         git fetch --no-tags $my_repository 'refs/*:refs/remotes/mine/*'
> 
> Otherwise, you will also get tags in refs/tags/.
> 
git seems to be happily ignoring that flag.
  git fetch --prune --all --no-tags
still re-populates the tags after i delete them manually.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store
  2012-08-31 16:26             ` Oswald Buddenhagen
@ 2012-08-31 19:18               ` Dan Johnson
  2012-08-31 19:45                 ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Dan Johnson @ 2012-08-31 19:18 UTC (permalink / raw)
  To: Oswald Buddenhagen; +Cc: Junio C Hamano, git

On Fri, Aug 31, 2012 at 12:26 PM, Oswald Buddenhagen <ossi@kde.org> wrote:
> On Thu, Aug 30, 2012 at 09:03:34AM -0700, Junio C Hamano wrote:
>> Oswald Buddenhagen <ossi@kde.org> writes:
>>
>> >> Doesn't
>> >>
>> >>    git push $over_there 'refs/*:refs/remotes/mine/*'
>> >>
>> >> push your tag v1.0 to refs/remotes/mine/v1.0 over there?  The
>> >> version of git I ship seems to do this just fine.
>> >>
>> > as i wrote before, i'm pulling, not pushing,...
>>
>> You would need to decline the automatic tag following with --no-tags
>> (which in hindsight is misnamed; it really means "do not auto-follow
>> tags"), like so:
>>
>>       cd $over_there &&
>>         git fetch --no-tags $my_repository 'refs/*:refs/remotes/mine/*'
>>
>> Otherwise, you will also get tags in refs/tags/.
>>
> git seems to be happily ignoring that flag.
>   git fetch --prune --all --no-tags
> still re-populates the tags after i delete them manually.

I believe that is bad interaction with "--all" (probably a bug). If I
am remembering correctly, --no-tags is internally a per-remote
setting, so I'm guessing it's not getting set on all remotes here.

I'll look into this more a bit later tonight. Does fetch --no-tags
work when you specify a remote?

-- 
-Dan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: GC of alternate object store
  2012-08-31 19:18               ` Dan Johnson
@ 2012-08-31 19:45                 ` Junio C Hamano
  2012-09-01  4:25                   ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2012-08-31 19:45 UTC (permalink / raw)
  To: Dan Johnson; +Cc: Oswald Buddenhagen, git

Dan Johnson <computerdruid@gmail.com> writes:

> I believe that is bad interaction with "--all" (probably a bug). If I
> am remembering correctly, --no-tags is internally a per-remote
> setting, so I'm guessing it's not getting set on all remotes here.
>
> I'll look into this more a bit later tonight. Does fetch --no-tags
> work when you specify a remote?

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] fetch --all: pass --tags/--no-tags through to each remote
  2012-08-31 19:45                 ` Junio C Hamano
@ 2012-09-01  4:25                   ` Dan Johnson
  2012-09-01 11:22                     ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Dan Johnson @ 2012-09-01  4:25 UTC (permalink / raw)
  To: git; +Cc: Dan Johnson, Junio C Hamano, Oswald Buddenhagen

Reported-by: Oswald Buddenhagen <ossi@kde.org>
Signed-off-by: Dan Johnson <ComputerDruid@gmail.com>
---

Junio C Hamano <gitster@pobox.com> writes:
>Dan Johnson <computerdruid@gmail.com> writes:
>
>> I believe that is bad interaction with "--all" (probably a bug). If I
>> am remembering correctly, --no-tags is internally a per-remote
>> setting, so I'm guessing it's not getting set on all remotes here.
>>
>> I'll look into this more a bit later tonight. Does fetch --no-tags
>> work when you specify a remote?
>
>Thanks.

And here it is. Apparently we just don't pass those options through. I didn't
look to see if there are any other options we should consider passing through;
it's quite possible there are. I also have not written a test to ensure that
this doesn't break in the future. I will hopefully have time for these things
tomorrow. It's getting too late for me to be able to put sentences together,
so hopefully this mail comes out readable ;)

 builtin/fetch.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index bb9a074..c6bcbdc 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -857,6 +857,10 @@ static void add_options_to_argv(int *argc, const char **argv)
 		argv[(*argc)++] = "--recurse-submodules";
 	else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
 		argv[(*argc)++] = "--recurse-submodules=on-demand";
+	if (tags == TAGS_SET)
+		argv[(*argc)++] = "--tags";
+	else if (tags == TAGS_UNSET)
+		argv[(*argc)++] = "--no-tags";
 	if (verbosity >= 2)
 		argv[(*argc)++] = "-v";
 	if (verbosity >= 1)
-- 
1.7.11.1.59.gbc9e7dd.dirty

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] fetch --all: pass --tags/--no-tags through to each remote
  2012-09-01  4:25                   ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
@ 2012-09-01 11:22                     ` Jeff King
  2012-09-01 11:25                       ` [PATCH 1/2] argv-array: add pop function Jeff King
                                         ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Jeff King @ 2012-09-01 11:22 UTC (permalink / raw)
  To: Dan Johnson; +Cc: git, Junio C Hamano, Oswald Buddenhagen

On Sat, Sep 01, 2012 at 12:25:33AM -0400, Dan Johnson wrote:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index bb9a074..c6bcbdc 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -857,6 +857,10 @@ static void add_options_to_argv(int *argc, const char **argv)
>  		argv[(*argc)++] = "--recurse-submodules";
>  	else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
>  		argv[(*argc)++] = "--recurse-submodules=on-demand";
> +	if (tags == TAGS_SET)
> +		argv[(*argc)++] = "--tags";
> +	else if (tags == TAGS_UNSET)
> +		argv[(*argc)++] = "--no-tags";
>  	if (verbosity >= 2)
>  		argv[(*argc)++] = "-v";
>  	if (verbosity >= 1)

Hmm. We allocate argv in fetch_multiple like this:

  const char *argv[12] = { "fetch", "--append" };

and then add a bunch of options to it, along with the name of the
remote. By my count, the current code can hit exactly 12 (including the
terminating NULL) if all options are set. Your patch would make it
possible to overflow. Of course, I may be miscounting since it is
extremely error-prone to figure out the right number by tracing each
possible conditional.

Maybe we should switch it to a dynamic argv_array? Like this:

  [1/2]: argv-array: add pop function
  [2/2]: fetch: use argv_array instead of hand-building arrays

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] argv-array: add pop function
  2012-09-01 11:22                     ` Jeff King
@ 2012-09-01 11:25                       ` Jeff King
  2012-09-01 11:27                       ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2012-09-01 11:25 UTC (permalink / raw)
  To: Dan Johnson; +Cc: git, Junio C Hamano, Oswald Buddenhagen

Sometimes we build a set of similar command lines, differing
only in the final arguments (e.g., "fetch --multiple"). To
use argv_array for this, you have to either push the same
set of elements repeatedly, or break the abstraction by
manually manipulating the array's internal members.

Instead, let's provide a sanctioned "pop" function to remove
elements from the end.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-argv-array.txt | 4 ++++
 argv-array.c                               | 9 +++++++++
 argv-array.h                               | 1 +
 3 files changed, 14 insertions(+)

diff --git a/Documentation/technical/api-argv-array.txt b/Documentation/technical/api-argv-array.txt
index 1b7d8f1..1a79781 100644
--- a/Documentation/technical/api-argv-array.txt
+++ b/Documentation/technical/api-argv-array.txt
@@ -46,6 +46,10 @@ Functions
 	Format a string and push it onto the end of the array. This is a
 	convenience wrapper combining `strbuf_addf` and `argv_array_push`.
 
+`argv_array_pop`::
+	Remove the final element from the array. If there are no
+	elements in the array, do nothing.
+
 `argv_array_clear`::
 	Free all memory associated with the array and return it to the
 	initial, empty state.
diff --git a/argv-array.c b/argv-array.c
index 0b5f889..55e8443 100644
--- a/argv-array.c
+++ b/argv-array.c
@@ -49,6 +49,15 @@ void argv_array_pushl(struct argv_array *array, ...)
 	va_end(ap);
 }
 
+void argv_array_pop(struct argv_array *array)
+{
+	if (!array->argc)
+		return;
+	free((char *)array->argv[array->argc - 1]);
+	array->argv[array->argc - 1] = NULL;
+	array->argc--;
+}
+
 void argv_array_clear(struct argv_array *array)
 {
 	if (array->argv != empty_argv) {
diff --git a/argv-array.h b/argv-array.h
index b93a69c..f4b9866 100644
--- a/argv-array.h
+++ b/argv-array.h
@@ -16,6 +16,7 @@ void argv_array_pushl(struct argv_array *, ...);
 __attribute__((format (printf,2,3)))
 void argv_array_pushf(struct argv_array *, const char *fmt, ...);
 void argv_array_pushl(struct argv_array *, ...);
+void argv_array_pop(struct argv_array *);
 void argv_array_clear(struct argv_array *);
 
 #endif /* ARGV_ARRAY_H */
-- 
1.7.12.rc3.8.g89db099

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] fetch: use argv_array instead of hand-building arrays
  2012-09-01 11:22                     ` Jeff King
  2012-09-01 11:25                       ` [PATCH 1/2] argv-array: add pop function Jeff King
@ 2012-09-01 11:27                       ` Jeff King
  2012-09-01 14:34                         ` Jens Lehmann
  2012-09-01 11:32                       ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
  2012-09-05 21:22                       ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
  3 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2012-09-01 11:27 UTC (permalink / raw)
  To: Dan Johnson; +Cc: git, Junio C Hamano, Oswald Buddenhagen

Fetch invokes itself recursively when recursing into
submodules or handling "fetch --multiple". In both cases, it
builds the child's command line by pushing options onto a
statically-sized array. In both cases, the array is
currently just big enough to handle the largest possible
case. However, this technique is brittle and error-prone, so
let's replace it with a dynamic argv_array.

Signed-off-by: Jeff King <peff@peff.net>
---
Not very well tested by me, but hopefully it is simple enough that I
managed not to screw it up.

It may be that fetch_populated_submodules would also benefit from
conversion (here I just pass in the argc and argv separately), but I
didn't look.

 builtin/fetch.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index bb9a074..b6a8be0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -14,6 +14,7 @@
 #include "transport.h"
 #include "submodule.h"
 #include "connected.h"
+#include "argv-array.h"
 
 static const char * const builtin_fetch_usage[] = {
 	"git fetch [<options>] [<repository> [<refspec>...]]",
@@ -841,38 +842,35 @@ static int fetch_multiple(struct string_list *list)
 	return 1;
 }
 
-static void add_options_to_argv(int *argc, const char **argv)
+static void add_options_to_argv(struct argv_array *argv)
 {
 	if (dry_run)
-		argv[(*argc)++] = "--dry-run";
+		argv_array_push(argv, "--dry-run");
 	if (prune)
-		argv[(*argc)++] = "--prune";
+		argv_array_push(argv, "--prune");
 	if (update_head_ok)
-		argv[(*argc)++] = "--update-head-ok";
+		argv_array_push(argv, "--update-head-ok");
 	if (force)
-		argv[(*argc)++] = "--force";
+		argv_array_push(argv, "--force");
 	if (keep)
-		argv[(*argc)++] = "--keep";
+		argv_array_push(argv, "--keep");
 	if (recurse_submodules == RECURSE_SUBMODULES_ON)
-		argv[(*argc)++] = "--recurse-submodules";
+		argv_array_push(argv, "--recurse-submodules");
 	else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
-		argv[(*argc)++] = "--recurse-submodules=on-demand";
+		argv_array_push(argv, "--recurse-submodules=on-demand");
 	if (verbosity >= 2)
-		argv[(*argc)++] = "-v";
+		argv_array_push(argv, "-v");
 	if (verbosity >= 1)
-		argv[(*argc)++] = "-v";
+		argv_array_push(argv, "-v");
 	else if (verbosity < 0)
-		argv[(*argc)++] = "-q";
+		argv_array_push(argv, "-q");
 
 }
 
 static int fetch_multiple(struct string_list *list)
 {
 	int i, result = 0;
-	const char *argv[12] = { "fetch", "--append" };
-	int argc = 2;
-
-	add_options_to_argv(&argc, argv);
+	struct argv_array argv = ARGV_ARRAY_INIT;
 
 	if (!append && !dry_run) {
 		int errcode = truncate_fetch_head();
@@ -880,18 +878,22 @@ static int fetch_multiple(struct string_list *list)
 			return errcode;
 	}
 
+	argv_array_pushl(&argv, "fetch", "--append", NULL);
+	add_options_to_argv(&argv);
+
 	for (i = 0; i < list->nr; i++) {
 		const char *name = list->items[i].string;
-		argv[argc] = name;
-		argv[argc + 1] = NULL;
+		argv_array_push(&argv, name);
 		if (verbosity >= 0)
 			printf(_("Fetching %s\n"), name);
-		if (run_command_v_opt(argv, RUN_GIT_CMD)) {
+		if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
 			error(_("Could not fetch %s"), name);
 			result = 1;
 		}
+		argv_array_pop(&argv);
 	}
 
+	argv_array_clear(&argv);
 	return result;
 }
 
@@ -1007,13 +1009,14 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	}
 
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
-		const char *options[10];
-		int num_options = 0;
-		add_options_to_argv(&num_options, options);
-		result = fetch_populated_submodules(num_options, options,
+		struct argv_array options = ARGV_ARRAY_INIT;
+
+		add_options_to_argv(&options);
+		result = fetch_populated_submodules(options.argc, options.argv,
 						    submodule_prefix,
 						    recurse_submodules,
 						    verbosity < 0);
+		argv_array_clear(&options);
 	}
 
 	/* All names were strdup()ed or strndup()ed */
-- 
1.7.12.rc3.8.g89db099

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] fetch --all: pass --tags/--no-tags through to each remote
  2012-09-01 11:22                     ` Jeff King
  2012-09-01 11:25                       ` [PATCH 1/2] argv-array: add pop function Jeff King
  2012-09-01 11:27                       ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
@ 2012-09-01 11:32                       ` Jeff King
  2012-09-01 11:34                         ` [PATCH 3/2] argv-array: fix bogus cast when freeing array Jeff King
  2012-09-05 21:22                       ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
  3 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2012-09-01 11:32 UTC (permalink / raw)
  To: Dan Johnson; +Cc: git, Junio C Hamano, Oswald Buddenhagen

Since the array struct stores a "const char **" argv member
(for compatibility with most of our argv-taking functions),
we have to cast away the const-ness when freeing its
elements.

However, we used the wrong type when doing so.  It doesn't
make a difference since free() take a void pointer anyway,
but it can be slightly confusing to a reader.

Signed-off-by: Jeff King <peff@peff.net>
---
Noticed this while I was adding the other free in argv_array_pop...

 argv-array.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/argv-array.c b/argv-array.c
index 55e8443..256741d 100644
--- a/argv-array.c
+++ b/argv-array.c
@@ -63,7 +63,7 @@ void argv_array_clear(struct argv_array *array)
 	if (array->argv != empty_argv) {
 		int i;
 		for (i = 0; i < array->argc; i++)
-			free((char **)array->argv[i]);
+			free((char *)array->argv[i]);
 		free(array->argv);
 	}
 	argv_array_init(array);
-- 
1.7.12.rc3.8.g89db099

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/2] argv-array: fix bogus cast when freeing array
  2012-09-01 11:32                       ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
@ 2012-09-01 11:34                         ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2012-09-01 11:34 UTC (permalink / raw)
  To: Dan Johnson; +Cc: git, Junio C Hamano, Oswald Buddenhagen

On Sat, Sep 01, 2012 at 07:32:07AM -0400, Jeff King wrote:

> Since the array struct stores a "const char **" argv member
> (for compatibility with most of our argv-taking functions),
> we have to cast away the const-ness when freeing its
> elements.
> 
> However, we used the wrong type when doing so.  It doesn't
> make a difference since free() take a void pointer anyway,
> but it can be slightly confusing to a reader.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Noticed this while I was adding the other free in argv_array_pop...

Argh, managed to botch the subject line. Here it is for real.

-- >8 --
Since the array struct stores a "const char **" argv member
(for compatibility with most of our argv-taking functions),
we have to cast away the const-ness when freeing its
elements.

However, we used the wrong type when doing so.  It doesn't
make a difference since free() take a void pointer anyway,
but it can be slightly confusing to a reader.

Signed-off-by: Jeff King <peff@peff.net>
---
 argv-array.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/argv-array.c b/argv-array.c
index 55e8443..256741d 100644
--- a/argv-array.c
+++ b/argv-array.c
@@ -63,7 +63,7 @@ void argv_array_clear(struct argv_array *array)
 	if (array->argv != empty_argv) {
 		int i;
 		for (i = 0; i < array->argc; i++)
-			free((char **)array->argv[i]);
+			free((char *)array->argv[i]);
 		free(array->argv);
 	}
 	argv_array_init(array);
-- 
1.7.12.rc3.8.g89db099

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] fetch: use argv_array instead of hand-building arrays
  2012-09-01 11:27                       ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
@ 2012-09-01 14:34                         ` Jens Lehmann
  2012-09-01 15:27                           ` [PATCH] submodule: " Jens Lehmann
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Lehmann @ 2012-09-01 14:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Dan Johnson, git, Junio C Hamano, Oswald Buddenhagen

Am 01.09.2012 13:27, schrieb Jeff King:
> Fetch invokes itself recursively when recursing into
> submodules or handling "fetch --multiple". In both cases, it
> builds the child's command line by pushing options onto a
> statically-sized array. In both cases, the array is
> currently just big enough to handle the largest possible
> case. However, this technique is brittle and error-prone, so
> let's replace it with a dynamic argv_array.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Not very well tested by me, but hopefully it is simple enough that I
> managed not to screw it up.

This is definitely an improvement, and I can't spot any problems
either.

> It may be that fetch_populated_submodules would also benefit from
> conversion (here I just pass in the argc and argv separately), but I
> didn't look.

Yes, it does some similar brittle stuff and should be changed to use
the argv-array too. I'll look into that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] submodule: use argv_array instead of hand-building arrays
  2012-09-01 14:34                         ` Jens Lehmann
@ 2012-09-01 15:27                           ` Jens Lehmann
  0 siblings, 0 replies; 26+ messages in thread
From: Jens Lehmann @ 2012-09-01 15:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Dan Johnson, git, Junio C Hamano, Oswald Buddenhagen

fetch_populated_submodules() allocates the full argv array it uses to
recurse into the submodules from the number of given options plus the six
argv values it is going to add. It then initializes it with those values
which won't change during the iteration and copies the given options into
it. Inside the loop the two argv values different for each submodule get
replaced with those currently valid.

However, this technique is brittle and error-prone (as the comment to
explain the magic number 6 indicates), so let's replace it with an
argv_array. Instead of replacing the argv values, push them to the
argv_array just before the run_command() call (including the option
separating them) and pop them from the argv_array right after that.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---


Am 01.09.2012 16:34, schrieb Jens Lehmann:
> Am 01.09.2012 13:27, schrieb Jeff King:
>> It may be that fetch_populated_submodules would also benefit from
>> conversion (here I just pass in the argc and argv separately), but I
>> didn't look.
> 
> Yes, it does some similar brittle stuff and should be changed to use
> the argv-array too. I'll look into that.

Maybe something like this on top of your two patches?

I thought about adding an argv_array_cat() function to replace the
for() loop copying the option values into the argv-array built inside
fetch_populated_submodules(), but I suspect saving one line from the
code is not worth it. Yet I didn't check if others would benefit from
such a function too.


 builtin/fetch.c |  2 +-
 submodule.c     | 31 ++++++++++++++++---------------
 submodule.h     |  3 ++-
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b6a8be0..aaba61e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1012,7 +1012,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		struct argv_array options = ARGV_ARRAY_INIT;

 		add_options_to_argv(&options);
-		result = fetch_populated_submodules(options.argc, options.argv,
+		result = fetch_populated_submodules(&options,
 						    submodule_prefix,
 						    recurse_submodules,
 						    verbosity < 0);
diff --git a/submodule.c b/submodule.c
index 19dc6a6..51d48c2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -588,13 +588,13 @@ static void calculate_changed_submodule_paths(void)
 	initialized_fetch_ref_tips = 0;
 }

-int fetch_populated_submodules(int num_options, const char **options,
+int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
 			       int quiet)
 {
-	int i, result = 0, argc = 0, default_argc;
+	int i, result = 0;
 	struct child_process cp;
-	const char **argv;
+	struct argv_array argv = ARGV_ARRAY_INIT;
 	struct string_list_item *name_for_path;
 	const char *work_tree = get_git_work_tree();
 	if (!work_tree)
@@ -604,17 +604,13 @@ int fetch_populated_submodules(int num_options, const char **options,
 		if (read_cache() < 0)
 			die("index file corrupt");

-	/* 6: "fetch" (options) --recurse-submodules-default default "--submodule-prefix" prefix NULL */
-	argv = xcalloc(num_options + 6, sizeof(const char *));
-	argv[argc++] = "fetch";
-	for (i = 0; i < num_options; i++)
-		argv[argc++] = options[i];
-	argv[argc++] = "--recurse-submodules-default";
-	default_argc = argc++;
-	argv[argc++] = "--submodule-prefix";
+	argv_array_push(&argv, "fetch");
+	for (i = 0; i < options->argc; i++)
+		argv_array_push(&argv, options->argv[i]);
+	argv_array_push(&argv, "--recurse-submodules-default");
+	/* default value, "--submodule-prefix" and its value are added later */

 	memset(&cp, 0, sizeof(cp));
-	cp.argv = argv;
 	cp.env = local_repo_env;
 	cp.git_cmd = 1;
 	cp.no_stdin = 1;
@@ -674,16 +670,21 @@ int fetch_populated_submodules(int num_options, const char **options,
 			if (!quiet)
 				printf("Fetching submodule %s%s\n", prefix, ce->name);
 			cp.dir = submodule_path.buf;
-			argv[default_argc] = default_argv;
-			argv[argc] = submodule_prefix.buf;
+			argv_array_push(&argv, default_argv);
+			argv_array_push(&argv, "--submodule-prefix");
+			argv_array_push(&argv, submodule_prefix.buf);
+			cp.argv = argv.argv;
 			if (run_command(&cp))
 				result = 1;
+			argv_array_pop(&argv);
+			argv_array_pop(&argv);
+			argv_array_pop(&argv);
 		}
 		strbuf_release(&submodule_path);
 		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 	}
-	free(argv);
+	argv_array_clear(&argv);
 out:
 	string_list_clear(&changed_submodule_paths, 1);
 	return result;
diff --git a/submodule.h b/submodule.h
index e105b0e..594b50d 100644
--- a/submodule.h
+++ b/submodule.h
@@ -2,6 +2,7 @@
 #define SUBMODULE_H

 struct diff_options;
+struct argv_array;

 enum {
 	RECURSE_SUBMODULES_ON_DEMAND = -1,
@@ -23,7 +24,7 @@ void show_submodule_summary(FILE *f, const char *path,
 		const char *del, const char *add, const char *reset);
 void set_config_fetch_recurse_submodules(int value);
 void check_for_new_submodule_commits(unsigned char new_sha1[20]);
-int fetch_populated_submodules(int num_options, const char **options,
+int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
 			       int quiet);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
-- 
1.7.12.149.g47e61ec

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote
  2012-09-01 11:22                     ` Jeff King
                                         ` (2 preceding siblings ...)
  2012-09-01 11:32                       ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
@ 2012-09-05 21:22                       ` Dan Johnson
  2012-09-07 17:07                         ` Junio C Hamano
  3 siblings, 1 reply; 26+ messages in thread
From: Dan Johnson @ 2012-09-05 21:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Oswald Buddenhagen, Dan Johnson

When fetch is invoked with --all, we need to pass the tag-following
preference to each individual fetch; without this, we will always
auto-follow tags, preventing us from fetching the remote tags into a
remote-specific namespace, for example.

Reported-by: Oswald Buddenhagen <ossi@kde.org>
Signed-off-by: Dan Johnson <ComputerDruid@gmail.com>
---
On Sat, Sep 1, 2012 at 7:22 AM, Jeff King <peff@peff.net> wrote:
>Hmm. We allocate argv in fetch_multiple like this:
>
>  const char *argv[12] = { "fetch", "--append" };
>
>and then add a bunch of options to it, along with the name of the
>remote. By my count, the current code can hit exactly 12 (including the
>terminating NULL) if all options are set. Your patch would make it
>possible to overflow. Of course, I may be miscounting since it is
>extremely error-prone to figure out the right number by tracing each
>possible conditional.
>
>Maybe we should switch it to a dynamic argv_array? Like this:
>
>  [1/2]: argv-array: add pop function
>  [2/2]: fetch: use argv_array instead of hand-building arrays

This version is re-rolled to be on top of jk/argv-array, avoiding the issue of
the fixed-size array entirely. If needed, we could of course use the old
version of this patch and bump the number, but I figure this is preferable.

I've also added some test cases to cover this behavior, but I'm not entirely
happy with them. I'm not sure if/how we should be testing the pass-through
behavior of various arguments with fetch --all, but if so, we should probably do
so more thouroughly than I have here, but that just seems like combining
together tests of two unrelated things. It might just make more sense to ignore
it and drop these tests, I don't know.

Sorry this took me a few days to send, I just kept not getting around to it.

 builtin/fetch.c           |  4 ++++
 t/t5514-fetch-multiple.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 6196e91..4494aed 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -858,6 +858,10 @@ static void add_options_to_argv(struct argv_array *argv)
 		argv_array_push(argv, "--recurse-submodules");
 	else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
 		argv_array_push(argv, "--recurse-submodules=on-demand");
+	if (tags == TAGS_SET)
+		argv_array_push(argv, "--tags");
+	else if (tags == TAGS_UNSET)
+		argv_array_push(argv, "--no-tags");
 	if (verbosity >= 2)
 		argv_array_push(argv, "-v");
 	if (verbosity >= 1)
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index 227dd56..cbd2460 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -151,4 +151,33 @@ test_expect_success 'git fetch --multiple (ignoring skipFetchAll)' '
 	 test_cmp ../expect output)
 '
 
+cat > expect << EOF
+EOF
+
+test_expect_success 'git fetch --all --no-tags' '
+	(git clone one test5 &&
+	 git clone test5 test6 &&
+	 (cd test5 && git tag test-tag) &&
+	 cd test6 &&
+	 git fetch --all --no-tags &&
+	 git tag >output &&
+	 test_cmp ../expect output)
+'
+
+cat > expect << EOF
+test-tag
+EOF
+
+test_expect_success 'git fetch --all --tags' '
+	(git clone one test7 &&
+	 git clone test7 test8 &&
+	 (cd test7 &&
+      test_commit test-tag &&
+      git reset --hard HEAD^) &&
+	 cd test8 &&
+	 git fetch --all --tags &&
+	 git tag >output &&
+	 test_cmp ../expect output)
+'
+
 test_done
-- 
1.7.11.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote
  2012-09-05 21:22                       ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
@ 2012-09-07 17:07                         ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2012-09-07 17:07 UTC (permalink / raw)
  To: Dan Johnson; +Cc: Jeff King, git, Oswald Buddenhagen

Dan Johnson <computerdruid@gmail.com> writes:

> When fetch is invoked with --all, we need to pass the tag-following
> preference to each individual fetch; without this, we will always
> auto-follow tags, preventing us from fetching the remote tags into a
> remote-specific namespace, for example.
>
> Reported-by: Oswald Buddenhagen <ossi@kde.org>
> Signed-off-by: Dan Johnson <ComputerDruid@gmail.com>
> ---
> On Sat, Sep 1, 2012 at 7:22 AM, Jeff King <peff@peff.net> wrote:
>>Hmm. We allocate argv in fetch_multiple like this:
>>
>>  const char *argv[12] = { "fetch", "--append" };
>>
>>and then add a bunch of options to it, along with the name of the
>>remote. By my count, the current code can hit exactly 12 (including the
>>terminating NULL) if all options are set. Your patch would make it
>>possible to overflow. Of course, I may be miscounting since it is
>>extremely error-prone to figure out the right number by tracing each
>>possible conditional.
>>
>>Maybe we should switch it to a dynamic argv_array? Like this:
>>
>>  [1/2]: argv-array: add pop function
>>  [2/2]: fetch: use argv_array instead of hand-building arrays
>
> This version is re-rolled to be on top of jk/argv-array, avoiding the issue of
> the fixed-size array entirely. If needed, we could of course use the old
> version of this patch and bump the number, but I figure this is preferable.

Thanks.  Queued.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-09-07 17:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-05  4:56 Bringing a bit more sanity to $GIT_DIR/objects/info/alternates? Junio C Hamano
2012-08-05  9:38 ` Michael Haggerty
2012-08-05 19:01   ` Junio C Hamano
2012-08-07  6:16   ` Jeff King
2012-08-06 21:55 ` Junio C Hamano
2012-08-08  1:42 ` Sascha Cunz
2012-08-11  9:35 ` Hallvard Breien Furuseth
2012-08-27 22:39 ` Oswald Buddenhagen
2012-08-28 19:19   ` GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?) Hallvard Breien Furuseth
2012-08-29  7:42     ` Oswald Buddenhagen
2012-08-29 15:52       ` GC of alternate object store Junio C Hamano
2012-08-30  9:53         ` Oswald Buddenhagen
2012-08-30 16:03           ` Junio C Hamano
2012-08-31 16:26             ` Oswald Buddenhagen
2012-08-31 19:18               ` Dan Johnson
2012-08-31 19:45                 ` Junio C Hamano
2012-09-01  4:25                   ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-01 11:22                     ` Jeff King
2012-09-01 11:25                       ` [PATCH 1/2] argv-array: add pop function Jeff King
2012-09-01 11:27                       ` [PATCH 2/2] fetch: use argv_array instead of hand-building arrays Jeff King
2012-09-01 14:34                         ` Jens Lehmann
2012-09-01 15:27                           ` [PATCH] submodule: " Jens Lehmann
2012-09-01 11:32                       ` [PATCH] fetch --all: pass --tags/--no-tags through to each remote Jeff King
2012-09-01 11:34                         ` [PATCH 3/2] argv-array: fix bogus cast when freeing array Jeff King
2012-09-05 21:22                       ` [PATCHv2] fetch --all: pass --tags/--no-tags through to each remote Dan Johnson
2012-09-07 17:07                         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.