git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Mirroring for offline use - best practices?
@ 2017-07-12 10:47 Joachim Durchholz
  2017-07-12 17:40 ` Stefan Beller
  0 siblings, 1 reply; 3+ messages in thread
From: Joachim Durchholz @ 2017-07-12 10:47 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

I'm pretty sure this is a FAQ, but articles I found on the Internet were 
either mere "recipes" (i.e. tell you how, but don't explain why), or 
bogged down in so many details that I was never sure how to proceed from 
there.


Basic situation:

There's a master repository (Github or corporate or whatever), and I 
want to set up a local mirror so that I can create clones without having 
to access the original upstream.
I'd like to set the mirror up so that creating a clone from it will 
automatically set up things to "just work": I.e. branches will track the 
mirror, not upstream, possibly other settings that I'm not aware of.

I gather that local clones are fast because hardlinked - is that correct?
Is that correct on Windows? (I can't easily avoid Windows.)


Ramification 1:

I'm not sure how best to prepare patches for push-to-upstream.
Is there value in collecting them locally into a push-to-upstream repo, 
or is it better to just push from each local clone individually?


Ramification 2:

Some of the repos I work with use submodules. Sometimes they use 
submodules that I'm not aware of. Or a submodule was used historically, 
and git bisect breaks/misbehaves because it can't get the submodule in 
offline mode.
Is there a way to get these, without writing a script that recurses 
through all versions of .gitmodules?
I'm seeing the --recurse-submodules option for git fetch, so this might 
(or might not) be the Right Thing.


Any thoughts welcome, thanks!

Regards,
Jo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Mirroring for offline use - best practices?
  2017-07-12 10:47 Mirroring for offline use - best practices? Joachim Durchholz
@ 2017-07-12 17:40 ` Stefan Beller
  2017-07-12 22:14   ` Joachim Durchholz
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Beller @ 2017-07-12 17:40 UTC (permalink / raw)
  To: Joachim Durchholz; +Cc: Git Mailing List

On Wed, Jul 12, 2017 at 3:47 AM, Joachim Durchholz <jo@durchholz.org> wrote:
> Hi all,
>
> I'm pretty sure this is a FAQ, but articles I found on the Internet were
> either mere "recipes" (i.e. tell you how, but don't explain why), or bogged
> down in so many details that I was never sure how to proceed from there.
>
>
> Basic situation:
>
> There's a master repository (Github or corporate or whatever), and I want to
> set up a local mirror so that I can create clones without having to access
> the original upstream.

'git clone --mirror' should accomplish the mirroring part.

> I'd like to set the mirror up so that creating a clone from it will
> automatically set up things to "just work": I.e. branches will track the
> mirror, not upstream, possibly other settings that I'm not aware of.

And then 'git clone <local-path-to-mirror>'. This would setup the local
mirror as upstream, such that git-fetch would fetch from the
local mirror. However git-push would also go to the mirror. I am not
sure if this is desired or if you rather desire a triangular workflow, i.e.
the local clone would directly push back to the real upstream.
That can be configured with url.<base>.pushInsteadOf, but there
is no way to have that setup by default when cloning from the local
mirror as the config is not copied over.

>
> I gather that local clones are fast because hardlinked - is that correct?

Yes, a local path implies --local in git-clone, which (a) uses hardlinks
and (b) avoids some other protocol overhead.

> Is that correct on Windows? (I can't easily avoid Windows.)

Let's see if a Windows expert shows up, I cannot tell.

> Ramification 1:
>
> I'm not sure how best to prepare patches for push-to-upstream.
> Is there value in collecting them locally into a push-to-upstream repo, or
> is it better to just push from each local clone individually?

It depends on a lot of things:
* How critical is the latency in the desired workflow?

  Say you have this setup on a cruise ship and only push once when
  you are in a harbor, then (a) you want to make sure you pushed everything
  and (b) you care less about latency. Hence you would prefer to collect
  everything in one repo so nothing gets lost.

  Say you are in a fast paced environment, where you want instant feedback
  on your patches as they are mostly exploratory designs. Then you want to
  push directly from the local clone individually to minimize latency, I would
  imagine.

* Does a local clone have any value for having the work from
  another local clone available? In that case you may want to
  have all your changes accumulated into the mirror.

> Ramification 2:
>
> Some of the repos I work with use submodules. Sometimes they use submodules
> that I'm not aware of. Or a submodule was used historically, and git bisect
> breaks/misbehaves because it can't get the submodule in offline mode.

Oh!

> Is there a way to get these, without writing a script that recurses through
> all versions of .gitmodules?

Not, that I am aware of. You need to find all submodules.

When a submodule gets deleted (git rm <submodule> && git commit),
then all entries for that submodule in the .gitmodules file are also removed.
That seems ok, but in an ideal world we may have a tombstone in there
(e.g. the submodule.NAME.path still set) that would help for tasks like finding
all submodules in the future.

> I'm seeing the --recurse-submodules option for git fetch, so this might (or
> might not) be the Right Thing.

That only works for currently initialized (active) submodules. The submodules
of the past and those which you do not have, are not fetched.

Without the submodule ramifications, I would have advised to have
have the local mirror a 'bare' repo.

Hope that helps,
Stefan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Mirroring for offline use - best practices?
  2017-07-12 17:40 ` Stefan Beller
@ 2017-07-12 22:14   ` Joachim Durchholz
  0 siblings, 0 replies; 3+ messages in thread
From: Joachim Durchholz @ 2017-07-12 22:14 UTC (permalink / raw)
  To: Git Mailing List

Am 12.07.2017 um 19:40 schrieb Stefan Beller:

Thanks for the feedback - it's been very, very useful to me!

 > Yes, a local path implies --local in git-clone, which (a) uses hardlinks
 > and (b) avoids some other protocol overhead.

I guess (a) is the most important one for repositories large enough to 
make this kind of stuff matter.
I had gathered so much, but I wasn't sure - my own repos aren't large 
enough to take any measurements.

 >> Ramification 1:
 >>
 >> I'm not sure how best to prepare patches for push-to-upstream.
 >> Is there value in collecting them locally into a push-to-upstream 
repo, or
 >> is it better to just push from each local clone individually?
 >
 > It depends on a lot of things:
 > * How critical is the latency in the desired workflow?
 >
 >    Say you have this setup on a cruise ship and only push once when
 >    you are in a harbor, then (a) you want to make sure you pushed 
everything
 >    and (b) you care less about latency. Hence you would prefer to collect
 >    everything in one repo so nothing gets lost.

Yeah, that's the kind of scenario I'm having.
Less cruise ship (I wish!) and more on-train work during commutes, but 
it's similar.

But I think the "make sure nothing gets lost" aspect is the most 
important one. It's so easy to forget to push some sideline branch, 
particularly if you're in a workflow that uses many branches.

So... an "outbox" repository would be the Right Thing for me.
Not sure how generally applicable this is - what do other people think?

 >    Say you are in a fast paced environment, where you want instant 
feedback
 >    on your patches as they are mostly exploratory designs. Then you 
want to
 >    push directly from the local clone individually to minimize 
latency, I would
 >    imagine.

That's for online work I think.

Of course, there's the situation where you're sometimes offline and 
sometimes online.
I'm not sure how to best handle that - have a script that switches 
configurations? Just stick with that outbox workflow because switching 
workflows would invite error?

One thing that's specific to me is that I tend to be active in multiple 
projects, so I might get home and have queued up pushes in multiple repos.
I'm not sure how to best make sure that I don't forget a push.
OTOH maybe I'm overly optimistic about for how many repositories I might 
be working on a day, and it's a non-issue.

 > * Does a local clone have any value for having the work from
 >    another local clone available? In that case you may want to
 >    have all your changes accumulated into the mirror.

Yeah, definitely.
A repository I work on might be used as a submodule, and I might want to 
check the ramifications.

 > When a submodule gets deleted (git rm <submodule> && git commit),
 > then all entries for that submodule in the .gitmodules file are also 
removed.
 > That seems ok, but in an ideal world we may have a tombstone in there
 > (e.g. the submodule.NAME.path still set) that would help for tasks 
like finding
 > all submodules in the future.

I wouldn't want to use tombstone entries actually, because the content 
of .gitconfig and .gitmodules might have been modified for any number of 
reasons.

The incantations that I'm using for my own "gitmirror" script are:
1. Get all commits that touch .gitmodules via
   git rev-list --all --full-history -- .gitmodules
2. Get a list of all module names mentioned in that .gitmodules version via
   git config \
   --blob "${commit}:.gitmodules" \
   --name-only \
   --get-regexp "^submodule\..*\.path$"
3. Given the module name, extract path and url via
   git config \
     --blob "${commit}:.gitmodules" \
     --get "submodule.${module_name}.path"
resp.
     --get "submodule.${module_name}.url" \

It's not the most efficient way conceivable, but it's using git's 
configuration parser, and it won't get put off by manual edits in 
configuration files .

It's nothing you'd do on the command line, hence the scripting :-)

 >> I'm seeing the --recurse-submodules option for git fetch, so this 
might (or
 >> might not) be the Right Thing.
 >
 > That only works for currently initialized (active) submodules. The 
submodules
 > of the past and those which you do not have, are not fetched.

Aww.
Ah well.

 > Without the submodule ramifications, I would have advised to have
 > have the local mirror a 'bare' repo.
I'm currently steering towards having a cache for all repositories I 
ever downloaded, which would live in ~/.cache/gitmirror.

I can turn this script into a public, maintained project if there's 
interest.
Current state is two files, the script itself and a unit test script. 
It's still in a state of flux so sharing would be for the code reviews 
at this time, general usefulness would come later.

Regards,
Jo

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-07-12 22:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-12 10:47 Mirroring for offline use - best practices? Joachim Durchholz
2017-07-12 17:40 ` Stefan Beller
2017-07-12 22:14   ` Joachim Durchholz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).