All of lore.kernel.org
 help / color / mirror / Atom feed
* Uninitialized submodules as symlinks
@ 2016-10-07 18:17 David Turner
  2016-10-07 18:56 ` Stefan Beller
  2016-10-13 16:10 ` Heiko Voigt
  0 siblings, 2 replies; 10+ messages in thread
From: David Turner @ 2016-10-07 18:17 UTC (permalink / raw)
  To: git

Presently, uninitialized submodules are materialized in the working tree as empty directories.  We would like to consider having them be symlinks.  Specifically, we'd like them to be symlinks into a FUSE filesystem which retrieves files on demand.

We've actually already got a FUSE filesystem written, but we use a different (semi-manual) means to connect it to the initialized submodules.  We hope to release this FUSE filesystem as free software at some point soon, but we do not yet have a fixed schedule for doing so.  Having to run a command to create the symlink-based "union" filesystem is not optimal (since we have to re-run it every time we initialize or deinitialize a submodule).

But if the uninitialized submodules could be symlinks into the FUSE filesystem, we wouldn't have this problem.  This solution isn't necessarily FUSE-specific -- perhaps someone would want copies of the same submodule in multiple repos, and would want to save disk space by having all copies point to the same place.  So the symlinks would be configured by a per-submodule config variable.

Naturally, this would require some changes to code that examines the working tree -- git status, git diff, etc.  They would have to report "unchanged" for submodules which were still symlinks to the configured location.  I have not yet looked at the implementation details beyond this.

Does this idea make any sense?  If I were to implement it (probably in a few months, but no official timeline yet), would patches be considered?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Uninitialized submodules as symlinks
  2016-10-07 18:17 Uninitialized submodules as symlinks David Turner
@ 2016-10-07 18:56 ` Stefan Beller
  2016-10-07 19:59   ` David Turner
  2016-10-13 16:10 ` Heiko Voigt
  1 sibling, 1 reply; 10+ messages in thread
From: Stefan Beller @ 2016-10-07 18:56 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Fri, Oct 7, 2016 at 11:17 AM, David Turner <David.Turner@twosigma.com> wrote:
> Presently, uninitialized submodules are materialized in the working tree as empty directories.

Right, there has to be something, to hint at the user that creating a
file with that
path is probably not what they want.

>  We would like to consider having them be symlinks.  Specifically, we'd like them to be symlinks into a FUSE filesystem which retrieves files on demand.
>
> We've actually already got a FUSE filesystem written, but we use a different (semi-manual) means to connect it to the initialized submodules.

So you currently do a

    git submodule init <pathspec>
    custom-submodule make-symlink <pathspec>

?

> We hope to release this FUSE filesystem as free software at some point soon, but we do not yet have a fixed schedule for doing so.  Having to run a command to create the symlink-based "union" filesystem is not optimal (since we have to re-run it every time we initialize or deinitialize a submodule).
>
> But if the uninitialized submodules could be symlinks into the FUSE filesystem, we wouldn't have this problem.  This solution isn't necessarily FUSE-specific -- perhaps someone would want copies of the same submodule in multiple repos, and would want to save disk space by having all copies point to the same place.  So the symlinks would be configured by a per-submodule config variable.

I'd imagine that you want both a per-submodule config variable as
well as a global variable that is a default for all submodules?

    git config submodule.trySymlinkDefault /mounted/fuse/
    # any (new) submodule tries to be linked to /mounted/fuse/<path>
    git config submodule.<name>.symlinked ~/my/private/symlinked
    # The <name> submodule goes into another path.

As you propose the FUSE filesystem fetches files on demand, you
probably want to disable things that scan the whole submodule,
e.g. look at submodule.<name>.ignore to suppress status looking
at all files.

When looking through the options, you could add the value "symlink" to
submodule.<name>.update, which then respects the
submodule.trySymlinkDefault if present, such that

    git clone --recurse-submodules ...

works and sets up the FUSE thing correctly.

How does the FUSE system handle different versions, i.e.
`git submodule update` to checkout another version of the submodule?
(btw, I plan on working on integrating submodules to "git checkout",
so "submodule update" would not need to be run there, but we'd hook
it into checkout instead)

>
> Naturally, this would require some changes to code that examines the working tree -- git status, git diff, etc.  They would have to report "unchanged" for submodules which were still symlinks to the configured location.  I have not yet looked at the implementation details beyond this.
>
> Does this idea make any sense?  If I were to implement it (probably in a few months, but no official timeline yet), would patches be considered?

I am happy to review patches.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Uninitialized submodules as symlinks
  2016-10-07 18:56 ` Stefan Beller
@ 2016-10-07 19:59   ` David Turner
  2016-10-17  9:45     ` Duy Nguyen
  0 siblings, 1 reply; 10+ messages in thread
From: David Turner @ 2016-10-07 19:59 UTC (permalink / raw)
  To: 'Stefan Beller'; +Cc: git



> -----Original Message-----
> From: Stefan Beller [mailto:sbeller@google.com]
> Sent: Friday, October 07, 2016 2:56 PM
> To: David Turner
> Cc: git@vger.kernel.org
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Fri, Oct 7, 2016 at 11:17 AM, David Turner <David.Turner@twosigma.com>
> wrote:
> > Presently, uninitialized submodules are materialized in the working tree
> > as empty directories.
> 
> Right, there has to be something, to hint at the user that creating a file
> with that path is probably not what they want.
> 
> >  We would like to consider having them be symlinks.  Specifically, we'd
> > like them to be symlinks into a FUSE filesystem which retrieves files on
> > demand.
> >
> > We've actually already got a FUSE filesystem written, but we use a
> > different (semi-manual) means to connect it to the initialized submodules.
> 
> So you currently do a
> 
>     git submodule init <pathspec>
>     custom-submodule make-symlink <pathspec>
> 
> ?

We do something like

For each initialized submodule: symlink it into the right place in .../somedir
For each uninitialized submodule: symlink from the FUSE into the right place in .../somedir

So .../somedir has the structure of the git main repo, but is all symlinks -- some into FUSE, some into the git repo.

This means that when we initialize (or deinitialize) a submodule, we need to re-run the linking script.  

> > We hope to release this FUSE filesystem as free software at some point
> > soon, but we do not yet have a fixed schedule for doing so.  Having to run
> > a command to create the symlink-based "union" filesystem is not optimal
> > (since we have to re-run it every time we initialize or deinitialize a
> > submodule).
> >
> > But if the uninitialized submodules could be symlinks into the FUSE
> > filesystem, we wouldn't have this problem.  This solution isn't
> > necessarily FUSE-specific -- perhaps someone would want copies of the same
> > submodule in multiple repos, and would want to save disk space by having
> > all copies point to the same place.  So the symlinks would be configured
> > by a per-submodule config variable.
> 
> I'd imagine that you want both a per-submodule config variable as well as
> a global variable that is a default for all submodules?
> 
>     git config submodule.trySymlinkDefault /mounted/fuse/
>     # any (new) submodule tries to be linked to /mounted/fuse/<path>
>     git config submodule.<name>.symlinked ~/my/private/symlinked
>     # The <name> submodule goes into another path.
> 
> As you propose the FUSE filesystem fetches files on demand, you probably
> want to disable things that scan the whole submodule, e.g. look at
> submodule.<name>.ignore to suppress status looking at all files.

I would actually expect that git would detect that the symlink is unmodified from the configured symlink and automatically decide not to look there.
 
> When looking through the options, you could add the value "symlink" to
> submodule.<name>.update, which then respects the
> submodule.trySymlinkDefault if present, such that
> 
>     git clone --recurse-submodules ...
> 
> works and sets up the FUSE thing correctly.
> 
> How does the FUSE system handle different versions, i.e.
> `git submodule update` to checkout another version of the submodule?
> (btw, I plan on working on integrating submodules to "git checkout", so
> "submodule update" would not need to be run there, but we'd hook it into
> checkout instead)

The fuse has a (virtual) directory for each SHA of the main repo, with each submodule mapped to the then-current version of the submodule's code. Actually, it's a bit more complicated because the uninitialized modules point to already-built binaries -- that is, the symlink is to something like $fuse/$SHA/built/$submodule. 

If you check out a new version of the main module, in our current setup, you need to again update all of the submodule symlinks (as described above). 

Under my proposal, I guess this would still need to happen.  A post-checkout hook could handle it either way.  Despite this flaw, switching a submodule between an initialized and deinitialized state would still be more seamless with the symlinks.

> > Naturally, this would require some changes to code that examines the
> working tree -- git status, git diff, etc.  They would have to report
> "unchanged" for submodules which were still symlinks to the configured
> location.  I have not yet looked at the implementation details beyond
> this.
> >
> > Does this idea make any sense?  If I were to implement it (probably in a
> few months, but no official timeline yet), would patches be considered?
> 
> I am happy to review patches.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Uninitialized submodules as symlinks
  2016-10-07 18:17 Uninitialized submodules as symlinks David Turner
  2016-10-07 18:56 ` Stefan Beller
@ 2016-10-13 16:10 ` Heiko Voigt
  2016-10-13 19:35   ` Kevin Daudt
  2016-10-17 15:11   ` David Turner
  1 sibling, 2 replies; 10+ messages in thread
From: Heiko Voigt @ 2016-10-13 16:10 UTC (permalink / raw)
  To: David Turner; +Cc: git

On Fri, Oct 07, 2016 at 06:17:05PM +0000, David Turner wrote:
> Presently, uninitialized submodules are materialized in the working
> tree as empty directories.  We would like to consider having them be
> symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> filesystem which retrieves files on demand.

How about portability? This feature would only work on Unix like
operating systems. You have to be careful to not break Windows since
they do not have symlinks.

Cheers Heiko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Uninitialized submodules as symlinks
  2016-10-13 16:10 ` Heiko Voigt
@ 2016-10-13 19:35   ` Kevin Daudt
  2016-10-14 16:48     ` Junio C Hamano
  2016-10-17 15:11   ` David Turner
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Daudt @ 2016-10-13 19:35 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: David Turner, git

On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> On Fri, Oct 07, 2016 at 06:17:05PM +0000, David Turner wrote:
> > Presently, uninitialized submodules are materialized in the working
> > tree as empty directories.  We would like to consider having them be
> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> > filesystem which retrieves files on demand.
> 
> How about portability? This feature would only work on Unix like
> operating systems. You have to be careful to not break Windows since
> they do not have symlinks.
> 

NTFS does have symlinks, but you need admin right to create them though
(unless you change the policy).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Uninitialized submodules as symlinks
  2016-10-13 19:35   ` Kevin Daudt
@ 2016-10-14 16:48     ` Junio C Hamano
  2016-10-17  9:28       ` Heiko Voigt
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2016-10-14 16:48 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: Heiko Voigt, David Turner, git

Kevin Daudt <me@ikke.info> writes:

> On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
>> On Fri, Oct 07, 2016 at 06:17:05PM +0000, David Turner wrote:
>> > Presently, uninitialized submodules are materialized in the working
>> > tree as empty directories.  We would like to consider having them be
>> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
>> > filesystem which retrieves files on demand.
>> 
>> How about portability? This feature would only work on Unix like
>> operating systems. You have to be careful to not break Windows since
>> they do not have symlinks.
>
> NTFS does have symlinks, but you need admin right to create them though
> (unless you change the policy).

That sounds like saying "It has, but it practically is not usable by
Git as a mechanism to achieve this goal" to me.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Uninitialized submodules as symlinks
  2016-10-14 16:48     ` Junio C Hamano
@ 2016-10-17  9:28       ` Heiko Voigt
  0 siblings, 0 replies; 10+ messages in thread
From: Heiko Voigt @ 2016-10-17  9:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Daudt, David Turner, git

On Fri, Oct 14, 2016 at 09:48:16AM -0700, Junio C Hamano wrote:
> Kevin Daudt <me@ikke.info> writes:
> 
> > On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> >> On Fri, Oct 07, 2016 at 06:17:05PM +0000, David Turner wrote:
> >> > Presently, uninitialized submodules are materialized in the working
> >> > tree as empty directories.  We would like to consider having them be
> >> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> >> > filesystem which retrieves files on demand.
> >> 
> >> How about portability? This feature would only work on Unix like
> >> operating systems. You have to be careful to not break Windows since
> >> they do not have symlinks.
> >
> > NTFS does have symlinks, but you need admin right to create them though
> > (unless you change the policy).
> 
> That sounds like saying "It has, but it practically is not usable by
> Git as a mechanism to achieve this goal" to me.

Yes and that is why Git for Windows does not use them and I simplified
to: "Windows does not have symlinks". For a normal user there is no such
thing as symlinks on Windows, unfortunately.

Cheers Heiko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Uninitialized submodules as symlinks
  2016-10-07 19:59   ` David Turner
@ 2016-10-17  9:45     ` Duy Nguyen
  2016-10-17 15:12       ` David Turner
  0 siblings, 1 reply; 10+ messages in thread
From: Duy Nguyen @ 2016-10-17  9:45 UTC (permalink / raw)
  To: David Turner; +Cc: Stefan Beller, git

On Sat, Oct 8, 2016 at 2:59 AM, David Turner <David.Turner@twosigma.com> wrote:
>
>
>> -----Original Message-----
>> From: Stefan Beller [mailto:sbeller@google.com]
>> Sent: Friday, October 07, 2016 2:56 PM
>> To: David Turner
>> Cc: git@vger.kernel.org
>> Subject: Re: Uninitialized submodules as symlinks
>>
>> On Fri, Oct 7, 2016 at 11:17 AM, David Turner <David.Turner@twosigma.com>
>> wrote:
>> > Presently, uninitialized submodules are materialized in the working tree
>> > as empty directories.
>>
>> Right, there has to be something, to hint at the user that creating a file
>> with that path is probably not what they want.
>>
>> >  We would like to consider having them be symlinks.  Specifically, we'd
>> > like them to be symlinks into a FUSE filesystem which retrieves files on
>> > demand.
>> >
>> > We've actually already got a FUSE filesystem written, but we use a
>> > different (semi-manual) means to connect it to the initialized submodules.
>>
>> So you currently do a
>>
>>     git submodule init <pathspec>
>>     custom-submodule make-symlink <pathspec>
>>
>> ?
>
> We do something like
>
> For each initialized submodule: symlink it into the right place in .../somedir
> For each uninitialized submodule: symlink from the FUSE into the right place in .../somedir
>
> So .../somedir has the structure of the git main repo, but is all symlinks -- some into FUSE, some into the git repo.
>
> This means that when we initialize (or deinitialize) a submodule, we need to re-run the linking script.

Do .git files work? If .git files point to somewhere in fuse, I guess
you still have file retrieval on demand. It depends on what files to
retrieve I guess. If you want worktree files, not object database then
.git files won't work because worktree remains in the same filesystem
as the super repo.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Uninitialized submodules as symlinks
  2016-10-13 16:10 ` Heiko Voigt
  2016-10-13 19:35   ` Kevin Daudt
@ 2016-10-17 15:11   ` David Turner
  1 sibling, 0 replies; 10+ messages in thread
From: David Turner @ 2016-10-17 15:11 UTC (permalink / raw)
  To: 'Heiko Voigt'; +Cc: git



> -----Original Message-----
> From: Heiko Voigt [mailto:hvoigt@hvoigt.net]
> Sent: Thursday, October 13, 2016 12:10 PM
> To: David Turner
> Cc: git@vger.kernel.org
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Fri, Oct 07, 2016 at 06:17:05PM +0000, David Turner wrote:
> > Presently, uninitialized submodules are materialized in the working
> > tree as empty directories.  We would like to consider having them be
> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> > filesystem which retrieves files on demand.
> 
> How about portability? This feature would only work on Unix like operating
> systems. You have to be careful to not break Windows since they do not
> have symlinks.

Windows doesn't support FUSE either IIRC.  Since this would be an alternate mode of operation, Windows would still work fine on the old model.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Uninitialized submodules as symlinks
  2016-10-17  9:45     ` Duy Nguyen
@ 2016-10-17 15:12       ` David Turner
  0 siblings, 0 replies; 10+ messages in thread
From: David Turner @ 2016-10-17 15:12 UTC (permalink / raw)
  To: 'Duy Nguyen'; +Cc: Stefan Beller, git

> -----Original Message-----
> From: Duy Nguyen [mailto:pclouds@gmail.com]
> Sent: Monday, October 17, 2016 5:46 AM
> To: David Turner
> Cc: Stefan Beller; git@vger.kernel.org
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Sat, Oct 8, 2016 at 2:59 AM, David Turner <David.Turner@twosigma.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Stefan Beller [mailto:sbeller@google.com]
> >> Sent: Friday, October 07, 2016 2:56 PM
> >> To: David Turner
> >> Cc: git@vger.kernel.org
> >> Subject: Re: Uninitialized submodules as symlinks
> >>
> >> On Fri, Oct 7, 2016 at 11:17 AM, David Turner
> >> <David.Turner@twosigma.com>
> >> wrote:
> >> > Presently, uninitialized submodules are materialized in the working
> >> > tree as empty directories.
> >>
> >> Right, there has to be something, to hint at the user that creating a
> >> file with that path is probably not what they want.
> >>
> >> >  We would like to consider having them be symlinks.  Specifically,
> >> > we'd like them to be symlinks into a FUSE filesystem which
> >> > retrieves files on demand.
> >> >
> >> > We've actually already got a FUSE filesystem written, but we use a
> >> > different (semi-manual) means to connect it to the initialized
> submodules.
> >>
> >> So you currently do a
> >>
> >>     git submodule init <pathspec>
> >>     custom-submodule make-symlink <pathspec>
> >>
> >> ?
> >
> > We do something like
> >
> > For each initialized submodule: symlink it into the right place in
> > .../somedir For each uninitialized submodule: symlink from the FUSE
> > into the right place in .../somedir
> >
> > So .../somedir has the structure of the git main repo, but is all
> symlinks -- some into FUSE, some into the git repo.
> >
> > This means that when we initialize (or deinitialize) a submodule, we
> need to re-run the linking script.
> 
> Do .git files work? If .git files point to somewhere in fuse, I guess you
> still have file retrieval on demand. It depends on what files to retrieve
> I guess. If you want worktree files, not object database then .git files
> won't work because worktree remains in the same filesystem as the super
> repo.

Yes, we want worktree files (or even worktree files + built artifacts).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-10-17 15:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-07 18:17 Uninitialized submodules as symlinks David Turner
2016-10-07 18:56 ` Stefan Beller
2016-10-07 19:59   ` David Turner
2016-10-17  9:45     ` Duy Nguyen
2016-10-17 15:12       ` David Turner
2016-10-13 16:10 ` Heiko Voigt
2016-10-13 19:35   ` Kevin Daudt
2016-10-14 16:48     ` Junio C Hamano
2016-10-17  9:28       ` Heiko Voigt
2016-10-17 15:11   ` David Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.