git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gc and repack ignore .git/*HEAD when checking reachability
@ 2016-07-08  2:59 Josh Triplett
  2016-07-08  4:34 ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Josh Triplett @ 2016-07-08  2:59 UTC (permalink / raw)
  To: git

The manpage for git gc says:
> git gc tries very hard to be safe about the garbage it collects. In
> particular, it will keep not only objects referenced by your current
> set of branches and tags, but also objects referenced by the index,
> remote-tracking branches, refs saved by git filter-branch in
> refs/original/, or reflogs (which may reference commits in branches
> that were later amended or rewound).

gc, repack, and anything else that uses the machinery in reachable.c
will also check HEAD, to include objects only reachable from a detached
HEAD (which the manpage should document).

However, unreachable.c does not check any other ref that sits directly
in the .git directory, such as MERGE_HEAD, FETCH_HEAD, or
CHERRY_PICK_HEAD.  To test this, try creating a new empty repository
with "git init repo ; cd repo", then use "git fetch URL" to fetch a
repository into FETCH_HEAD, then run "git repack -a -d -f", and then
"git show FETCH_HEAD".  This similarly affects "git gc", which will
unpack all the objects from the pack and leave them loose.

This could result in data loss, if a user expected that having an object
referenced from those places would protect it from pruning.

I think the right fix for this would involve having
mark_reachable_objects in reachable.c add all refs that match
.git/*HEAD, not just .git/HEAD itself.  (I'd suggest matching .git/*HEAD
rather than hardcoding the list of "special" refs, to provide
compatibility with any other tool or future version of git that
introduces another such ref.)  This seems fairly easily done with a new
variant of do_head_ref that includes all such refs, along with a
one-line change to mark_reachable_objects to use it.

Does this seem like a reasonable approach?

- Josh Triplett

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08  2:59 gc and repack ignore .git/*HEAD when checking reachability Josh Triplett
@ 2016-07-08  4:34 ` Junio C Hamano
  2016-07-08  6:44   ` Josh Triplett
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2016-07-08  4:34 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

Josh Triplett <josh@joshtriplett.org> writes:

> This could result in data loss, if a user expected that having an object
> referenced from those places would protect it from pruning.

Yeah, luckily, nobody expects such.  I do not think any of our
document says nothing other than HEAD like CHERRY_PICK_HEAD is
reachability anchoring point; they are designed to be transient.

Because they are designed to be transient, I do not think there is
any downside (other than the initial start-up cost) to including
them in reachability computation.  Because they are meant to be
transient, the objects anchored by them would be reachable from
other anchoring points anyway.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08  4:34 ` Junio C Hamano
@ 2016-07-08  6:44   ` Josh Triplett
  2016-07-08 17:14     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Josh Triplett @ 2016-07-08  6:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jul 07, 2016 at 09:34:02PM -0700, Junio C Hamano wrote:
> Josh Triplett <josh@joshtriplett.org> writes:
> > This could result in data loss, if a user expected that having an object
> > referenced from those places would protect it from pruning.
> 
> Yeah, luckily, nobody expects such.  I do not think any of our
> document says nothing other than HEAD like CHERRY_PICK_HEAD is
> reachability anchoring point; they are designed to be transient.

I can imagine at least one scenario that would result in data loss here:
git pull a URL (not referenced via any ref other than
FETCH_HEAD/MERGE_HEAD), get a merge conflict, get halfway through
resolving it, set that repository aside for a while, do something that
triggers a gc, then attempt to finish and commit.

Unlikely, but not impossible.  Same reason the reachability logic looks
at the index.

(I originally encountered this because I intended to add another
HEAD-like ref in .git, so I started investigating the logic around such
HEADs.)

> Because they are designed to be transient, I do not think there is
> any downside (other than the initial start-up cost) to including
> them in reachability computation.  Because they are meant to be
> transient, the objects anchored by them would be reachable from
> other anchoring points anyway.

That sounds reasonable.  And if they *do* end up taking any time to
traverse, it's because they weren't reachable from other anchoring
points, so taking the extra time to traverse them seems fine.

- Josh Triplett

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08  6:44   ` Josh Triplett
@ 2016-07-08 17:14     ` Junio C Hamano
  2016-07-08 19:25       ` Theodore Ts'o
                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Junio C Hamano @ 2016-07-08 17:14 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git

Josh Triplett <josh@joshtriplett.org> writes:

> That sounds reasonable.  And if they *do* end up taking any time to
> traverse, it's because they weren't reachable from other anchoring
> points, so taking the extra time to traverse them seems fine.

The only thing that is hard is to clearly define _what_ are the new
anchoring points.

It cannot be "anything directly under .git that has all-caps name
that ends with _HEAD".  The ones we write we know are going to be
removed at some point in time (e.g. "git reset", "git bisect reset",
"git merge --abort", etc.).  We do not have any control on random
ones that the users and third-party tools leave behind, holding onto
irrelevant objects forever.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08 17:14     ` Junio C Hamano
@ 2016-07-08 19:25       ` Theodore Ts'o
  2016-07-08 20:30         ` Junio C Hamano
  2016-07-08 20:29       ` Josh Triplett
  2016-07-09  7:35       ` Johannes Schindelin
  2 siblings, 1 reply; 27+ messages in thread
From: Theodore Ts'o @ 2016-07-08 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Triplett, git

On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote:
> 
> It cannot be "anything directly under .git that has all-caps name
> that ends with _HEAD".  The ones we write we know are going to be
> removed at some point in time (e.g. "git reset", "git bisect reset",
> "git merge --abort", etc.).  We do not have any control on random
> ones that the users and third-party tools leave behind, holding onto
> irrelevant objects forever.

What about anything that is all-caps and ends in _HEAD which has a
mod-time within the last N days?  (Where N is 2-7 days.)  If it's
older than that, it's almost certainly stale...

						- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08 17:14     ` Junio C Hamano
  2016-07-08 19:25       ` Theodore Ts'o
@ 2016-07-08 20:29       ` Josh Triplett
  2016-07-09  7:35       ` Johannes Schindelin
  2 siblings, 0 replies; 27+ messages in thread
From: Josh Triplett @ 2016-07-08 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote:
> Josh Triplett <josh@joshtriplett.org> writes:
> 
> > That sounds reasonable.  And if they *do* end up taking any time to
> > traverse, it's because they weren't reachable from other anchoring
> > points, so taking the extra time to traverse them seems fine.
> 
> The only thing that is hard is to clearly define _what_ are the new
> anchoring points.
> 
> It cannot be "anything directly under .git that has all-caps name
> that ends with _HEAD".  The ones we write we know are going to be
> removed at some point in time (e.g. "git reset", "git bisect reset",
> "git merge --abort", etc.).  We do not have any control on random
> ones that the users and third-party tools leave behind, holding onto
> irrelevant objects forever.

"We don't understand it, so it must not be important" does not seem like
a safe approach.  An object matching "[_A-Z]*HEAD" might act more like a
detached HEAD, referencing objects the user wants to hold onto
permanently.  And even FETCH_HEAD might point to objects the user
expected to stick around until they chose to ignore them; what makes
FETCH_HEAD a less legitimate head to work with than refs/heads/scratch?
(Or, for that matter, what about an ugly merge in MERGE_HEAD that the
user put off until after a vacation?)

The only downside of "holding onto irrelevant objects forever" is
storage space, which the user can easily reclaim by removing HEADs they
don't want.  The downside of pruning objects the user wanted is data
loss.  Why not just mark all those objects as roots for reachability
permanently, and let the user remove them if they want to prune those
objects?

- Josh Triplett

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08 19:25       ` Theodore Ts'o
@ 2016-07-08 20:30         ` Junio C Hamano
  2016-07-08 23:50           ` Theodore Ts'o
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2016-07-08 20:30 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Josh Triplett, Git Mailing List

On Fri, Jul 8, 2016 at 12:25 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote:
>>
>> It cannot be "anything directly under .git that has all-caps name
>> that ends with _HEAD".  The ones we write we know are going to be
>> removed at some point in time (e.g. "git reset", "git bisect reset",
>> "git merge --abort", etc.).  We do not have any control on random
>> ones that the users and third-party tools leave behind, holding onto
>> irrelevant objects forever.
>
> What about anything that is all-caps and ends in _HEAD which has a
> mod-time within the last N days?  (Where N is 2-7 days.)  If it's
> older than that, it's almost certainly stale...

I can imagine I'd start hacking on a project that I rarely touch, give up
resolving a "git pull" from an unconfigured place (hence, some stuff is
only reachable from FETCH_HEAD), and after 2*N days, come back
to the repository and find that I cannot continue working on it.

Turning the rule to "*_HEAD we know about, and those we don't that
are young" would not change the situation, as I may be depending on
some third-party tool that uses its OWN_HEAD just like we use
FETCH_HEAD in the above example.

So I dunno if that is a good solution. If we are going to declare that
transient stuff will now be kept, i.e. keeping them alive is no longer
end user's responsibility, then probably we should make it end user's
responsibility to clean things up.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08 20:30         ` Junio C Hamano
@ 2016-07-08 23:50           ` Theodore Ts'o
  2016-07-09  5:23             ` Josh Triplett
  0 siblings, 1 reply; 27+ messages in thread
From: Theodore Ts'o @ 2016-07-08 23:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Triplett, Git Mailing List

On Fri, Jul 08, 2016 at 01:30:06PM -0700, Junio C Hamano wrote:
> 
> I can imagine I'd start hacking on a project that I rarely touch, give up
> resolving a "git pull" from an unconfigured place (hence, some stuff is
> only reachable from FETCH_HEAD), and after 2*N days, come back
> to the repository and find that I cannot continue working on it.

Sure, but that's something that could happen today, and no one has
really complained, have they?

> Turning the rule to "*_HEAD we know about, and those we don't that
> are young" would not change the situation, as I may be depending on
> some third-party tool that uses its OWN_HEAD just like we use
> FETCH_HEAD in the above example.
> 
> So I dunno if that is a good solution. If we are going to declare that
> transient stuff will now be kept, i.e. keeping them alive is no longer
> end user's responsibility, then probably we should make it end user's
> responsibility to clean things up.

Well, the question is what does "transient" stuff really mean?  If we
keep them forever, then are they really any different from stuff under
refs/heads?

Maybe pester the user if there is stale *_HEAD files, but don't
actually get rid of the objects?

					- Ted


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08 23:50           ` Theodore Ts'o
@ 2016-07-09  5:23             ` Josh Triplett
  0 siblings, 0 replies; 27+ messages in thread
From: Josh Triplett @ 2016-07-09  5:23 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Junio C Hamano, Git Mailing List

On Fri, Jul 08, 2016 at 07:50:53PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 08, 2016 at 01:30:06PM -0700, Junio C Hamano wrote:
> > 
> > I can imagine I'd start hacking on a project that I rarely touch, give up
> > resolving a "git pull" from an unconfigured place (hence, some stuff is
> > only reachable from FETCH_HEAD), and after 2*N days, come back
> > to the repository and find that I cannot continue working on it.
> 
> Sure, but that's something that could happen today, and no one has
> really complained, have they?

Until now. :)

> > Turning the rule to "*_HEAD we know about, and those we don't that
> > are young" would not change the situation, as I may be depending on
> > some third-party tool that uses its OWN_HEAD just like we use
> > FETCH_HEAD in the above example.
> > 
> > So I dunno if that is a good solution. If we are going to declare that
> > transient stuff will now be kept, i.e. keeping them alive is no longer
> > end user's responsibility, then probably we should make it end user's
> > responsibility to clean things up.
> 
> Well, the question is what does "transient" stuff really mean?  If we
> keep them forever, then are they really any different from stuff under
> refs/heads?

No, they're not.  And I don't think they should be; HEAD itself is *not*
transient, as a detached HEAD can reference valuable non-transient
objects.

In an ideal world, HEAD and all other such refs would live somewhere
under refs, to avoid the special case.

> Maybe pester the user if there is stale *_HEAD files, but don't
> actually get rid of the objects?

Why pester at all?  Just leave them, and if the user has large objects
they don't care about and wants to decrease the size of the repository,
they can follow the advice from the git-gc manpage: "If you are
expecting some objects to be collected and they aren’t, check all of
those locations and decide whether it makes sense in your case to remove
those references."  (Where "those locations" will need to expand to
mention "*HEAD".)

(Also, I'd suggest "*HEAD", possibly limited to characters matching
[-_A-Z], rather than "*_HEAD".)

- Josh Triplett

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-08 17:14     ` Junio C Hamano
  2016-07-08 19:25       ` Theodore Ts'o
  2016-07-08 20:29       ` Josh Triplett
@ 2016-07-09  7:35       ` Johannes Schindelin
  2016-07-09 14:09         ` Josh Triplett
  2 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-09  7:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Josh Triplett, git

Hi Junio,

On Fri, 8 Jul 2016, Junio C Hamano wrote:

> Josh Triplett <josh@joshtriplett.org> writes:
> 
> > That sounds reasonable.  And if they *do* end up taking any time to
> > traverse, it's because they weren't reachable from other anchoring
> > points, so taking the extra time to traverse them seems fine.
> 
> The only thing that is hard is to clearly define _what_ are the new
> anchoring points.
> 
> It cannot be "anything directly under .git that has all-caps name
> that ends with _HEAD".  The ones we write we know are going to be
> removed at some point in time (e.g. "git reset", "git bisect reset",
> "git merge --abort", etc.).  We do not have any control on random
> ones that the users and third-party tools leave behind, holding onto
> irrelevant objects forever.

Please note that bisect already uses the (transient) refs/bisect/
namespace. So I do not think we need to take specific care of the
BISECT_* files.

If we had thought of it back then, we could have used such a transient
namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached
HEADs (which we should have called "unnamed branches").

Now, how about special-casing *just* these legacy files in gc: HEAD,
FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
live in the refs/ namespace, which is already handled.

BTW this issue is getting much more problematic when you have a lot of
worktrees, some of which operate on detached HEADs. Which I do.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-09  7:35       ` Johannes Schindelin
@ 2016-07-09 14:09         ` Josh Triplett
  2016-07-09 16:45           ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Josh Triplett @ 2016-07-09 14:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote:
> On Fri, 8 Jul 2016, Junio C Hamano wrote:
> > Josh Triplett <josh@joshtriplett.org> writes:
> > 
> > > That sounds reasonable.  And if they *do* end up taking any time to
> > > traverse, it's because they weren't reachable from other anchoring
> > > points, so taking the extra time to traverse them seems fine.
> > 
> > The only thing that is hard is to clearly define _what_ are the new
> > anchoring points.
> > 
> > It cannot be "anything directly under .git that has all-caps name
> > that ends with _HEAD".  The ones we write we know are going to be
> > removed at some point in time (e.g. "git reset", "git bisect reset",
> > "git merge --abort", etc.).  We do not have any control on random
> > ones that the users and third-party tools leave behind, holding onto
> > irrelevant objects forever.
> 
> Please note that bisect already uses the (transient) refs/bisect/
> namespace. So I do not think we need to take specific care of the
> BISECT_* files.
> 
> If we had thought of it back then, we could have used such a transient
> namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached
> HEADs (which we should have called "unnamed branches").
> 
> Now, how about special-casing *just* these legacy files in gc: HEAD,
> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
> live in the refs/ namespace, which is already handled.

That seems workable as well; in that case, we should also document this
(in the git-gc manpage at a minimum), and explicitly suggest creating
refs in refs/ but outside of refs/heads/ and refs/tags/, rather than
directly in .git/.

> BTW this issue is getting much more problematic when you have a lot of
> worktrees, some of which operate on detached HEADs. Which I do.
> 
> Ciao,
> Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-09 14:09         ` Josh Triplett
@ 2016-07-09 16:45           ` Duy Nguyen
  2016-07-10 10:59             ` Johannes Schindelin
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-07-09 16:45 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Johannes Schindelin, Junio C Hamano, Git Mailing List

On Sat, Jul 9, 2016 at 4:09 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote:
>> On Fri, 8 Jul 2016, Junio C Hamano wrote:
>> > Josh Triplett <josh@joshtriplett.org> writes:
>> >
>> > > That sounds reasonable.  And if they *do* end up taking any time to
>> > > traverse, it's because they weren't reachable from other anchoring
>> > > points, so taking the extra time to traverse them seems fine.
>> >
>> > The only thing that is hard is to clearly define _what_ are the new
>> > anchoring points.
>> >
>> > It cannot be "anything directly under .git that has all-caps name
>> > that ends with _HEAD".  The ones we write we know are going to be
>> > removed at some point in time (e.g. "git reset", "git bisect reset",
>> > "git merge --abort", etc.).  We do not have any control on random
>> > ones that the users and third-party tools leave behind, holding onto
>> > irrelevant objects forever.
>>
>> Please note that bisect already uses the (transient) refs/bisect/
>> namespace. So I do not think we need to take specific care of the
>> BISECT_* files.
>>
>> If we had thought of it back then, we could have used such a transient
>> namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached
>> HEADs (which we should have called "unnamed branches").
>>
>> Now, how about special-casing *just* these legacy files in gc: HEAD,
>> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
>> live in the refs/ namespace, which is already handled.
>
> That seems workable as well; in that case, we should also document this
> (in the git-gc manpage at a minimum), and explicitly suggest creating
> refs in refs/ but outside of refs/heads/ and refs/tags/, rather than
> directly in .git/.

Not just outside refs/heads and refs/tags. It has to be in a specified
namespace like refs/worktree/ or something (we are close to be ready
for that). We could update the man page about git-gc shortcomings now,
but I think we should wait until refs/worktree (or something like
that) becomes true before suggesting more.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-09 16:45           ` Duy Nguyen
@ 2016-07-10 10:59             ` Johannes Schindelin
  2016-07-10 11:04               ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-10 10:59 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Josh Triplett, Junio C Hamano, Git Mailing List

Hi Duy,

On Sat, 9 Jul 2016, Duy Nguyen wrote:

> On Sat, Jul 9, 2016 at 4:09 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> > On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote:
> >> On Fri, 8 Jul 2016, Junio C Hamano wrote:
> >> > Josh Triplett <josh@joshtriplett.org> writes:
> >> >
> >> > > That sounds reasonable.  And if they *do* end up taking any time to
> >> > > traverse, it's because they weren't reachable from other anchoring
> >> > > points, so taking the extra time to traverse them seems fine.
> >> >
> >> > The only thing that is hard is to clearly define _what_ are the new
> >> > anchoring points.
> >> >
> >> > It cannot be "anything directly under .git that has all-caps name
> >> > that ends with _HEAD".  The ones we write we know are going to be
> >> > removed at some point in time (e.g. "git reset", "git bisect reset",
> >> > "git merge --abort", etc.).  We do not have any control on random
> >> > ones that the users and third-party tools leave behind, holding onto
> >> > irrelevant objects forever.
> >>
> >> Please note that bisect already uses the (transient) refs/bisect/
> >> namespace. So I do not think we need to take specific care of the
> >> BISECT_* files.
> >>
> >> If we had thought of it back then, we could have used such a transient
> >> namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached
> >> HEADs (which we should have called "unnamed branches").
> >>
> >> Now, how about special-casing *just* these legacy files in gc: HEAD,
> >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
> >> live in the refs/ namespace, which is already handled.
> >
> > That seems workable as well; in that case, we should also document this
> > (in the git-gc manpage at a minimum), and explicitly suggest creating
> > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than
> > directly in .git/.
> 
> Not just outside refs/heads and refs/tags. It has to be in a specified
> namespace like refs/worktree/ or something (we are close to be ready
> for that). We could update the man page about git-gc shortcomings now,
> but I think we should wait until refs/worktree (or something like
> that) becomes true before suggesting more.

We have a precedent for a ref that is directly underneath refs/:
refs/stash.

IMO that is okay: depending on the use case, we would need multiple refs
(like refs/notes/*) or a single ref (like refs/stash).

The important part is that the new refs start with refs/, and if they are
to be transient, start neither with refs/heads/ nor with refs/tags/.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-10 10:59             ` Johannes Schindelin
@ 2016-07-10 11:04               ` Duy Nguyen
  2016-07-10 14:16                 ` Johannes Schindelin
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-07-10 11:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Josh Triplett, Junio C Hamano, Git Mailing List

On Sun, Jul 10, 2016 at 12:59 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> >> Now, how about special-casing *just* these legacy files in gc: HEAD,
>> >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
>> >> live in the refs/ namespace, which is already handled.
>> >
>> > That seems workable as well; in that case, we should also document this
>> > (in the git-gc manpage at a minimum), and explicitly suggest creating
>> > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than
>> > directly in .git/.
>>
>> Not just outside refs/heads and refs/tags. It has to be in a specified
>> namespace like refs/worktree/ or something (we are close to be ready
>> for that). We could update the man page about git-gc shortcomings now,
>> but I think we should wait until refs/worktree (or something like
>> that) becomes true before suggesting more.
>
> We have a precedent for a ref that is directly underneath refs/:
> refs/stash.
>
> IMO that is okay: depending on the use case, we would need multiple refs
> (like refs/notes/*) or a single ref (like refs/stash).
>
> The important part is that the new refs start with refs/, and if they are
> to be transient, start neither with refs/heads/ nor with refs/tags/.

No, the point is, refs subsystem needs to know which refs is per-repo,
which is per-worktree. So far the rules are  "everything under refs,
except a few known exceptions, is per-repo" and "everything directly
under $GIT_DIR is per-worktree", which work fine. But if you allow to
move per-worktree to "refs" freely, then the "known exceptions" will
have to be updated every time a new per-worktree ref appears. It'll be
easier to modify the first rule as "everything under refs, except some
legacy exceptions and refs/worktree, is per-repo".
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-10 11:04               ` Duy Nguyen
@ 2016-07-10 14:16                 ` Johannes Schindelin
  2016-07-10 15:01                   ` Duy Nguyen
  2016-07-11 18:52                   ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-10 14:16 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Josh Triplett, Junio C Hamano, Git Mailing List

Hi Duy,

On Sun, 10 Jul 2016, Duy Nguyen wrote:

> On Sun, Jul 10, 2016 at 12:59 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >> >> Now, how about special-casing *just* these legacy files in gc: HEAD,
> >> >> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
> >> >> live in the refs/ namespace, which is already handled.
> >> >
> >> > That seems workable as well; in that case, we should also document this
> >> > (in the git-gc manpage at a minimum), and explicitly suggest creating
> >> > refs in refs/ but outside of refs/heads/ and refs/tags/, rather than
> >> > directly in .git/.
> >>
> >> Not just outside refs/heads and refs/tags. It has to be in a specified
> >> namespace like refs/worktree/ or something (we are close to be ready
> >> for that). We could update the man page about git-gc shortcomings now,
> >> but I think we should wait until refs/worktree (or something like
> >> that) becomes true before suggesting more.
> >
> > We have a precedent for a ref that is directly underneath refs/:
> > refs/stash.
> >
> > IMO that is okay: depending on the use case, we would need multiple refs
> > (like refs/notes/*) or a single ref (like refs/stash).
> >
> > The important part is that the new refs start with refs/, and if they are
> > to be transient, start neither with refs/heads/ nor with refs/tags/.
> 
> No, the point is, refs subsystem needs to know which refs is per-repo,
> which is per-worktree. So far the rules are  "everything under refs,
> except a few known exceptions, is per-repo" and "everything directly
> under $GIT_DIR is per-worktree", which work fine. But if you allow to
> move per-worktree to "refs" freely, then the "known exceptions" will
> have to be updated every time a new per-worktree ref appears. It'll be
> easier to modify the first rule as "everything under refs, except some
> legacy exceptions and refs/worktree, is per-repo".

Given the substantial pain and suffering I have due to per-worktree
reflogs (and those are *just* HEAD's reflogs!), it appears to me that
per-worktree refs would be a particularly poor feature.

I agree that HEAD needs to be per-worktree, but already the fact that the
HEADs of the worktrees, along with their reflogs, are *not* visible to
all other worktrees causes substantial trouble.

In my mind, a much more natural design would have been to map
transparently each worktree's HEAD to refs/worktree/<name>/HEAD *and keep
those refs and reflogs visible across all worktrees*.

With that design, I would not have irretrievably lost reflogs to auto-gc
nor dozens of hours trying to figure out how to have *any* auto-gc succeed
again (because I need to reduce the horrible slowness incurred by too many
loose objects). My interactive rebases would also not have started to
print the auto-gc error after every friggin' pick.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-10 14:16                 ` Johannes Schindelin
@ 2016-07-10 15:01                   ` Duy Nguyen
  2016-07-11  6:07                     ` Johannes Schindelin
  2016-07-11 18:52                   ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-07-10 15:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Josh Triplett, Junio C Hamano, Git Mailing List

On Sun, Jul 10, 2016 at 4:16 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> No, the point is, refs subsystem needs to know which refs is per-repo,
>> which is per-worktree. So far the rules are  "everything under refs,
>> except a few known exceptions, is per-repo" and "everything directly
>> under $GIT_DIR is per-worktree", which work fine. But if you allow to
>> move per-worktree to "refs" freely, then the "known exceptions" will
>> have to be updated every time a new per-worktree ref appears. It'll be
>> easier to modify the first rule as "everything under refs, except some
>> legacy exceptions and refs/worktree, is per-repo".
>
> Given the substantial pain and suffering I have due to per-worktree
> reflogs (and those are *just* HEAD's reflogs!),

I apologize for that. I clearly did not see all the consequences of my
simple-minded approach. Watch out for the shared config issue too if
you start to rely on multiple worktrees. It may take even longer time
to address than this one.

> it appears to me that per-worktree refs would be a particularly poor feature.
>
> I agree that HEAD needs to be per-worktree, but already the fact that the
> HEADs of the worktrees, along with their reflogs, are *not* visible to
> all other worktrees causes substantial trouble.
>
> In my mind, a much more natural design would have been to map
> transparently each worktree's HEAD to refs/worktree/<name>/HEAD *and keep
> those refs and reflogs visible across all worktrees*.

We will be able to see refs from all worktrees if we choose to. There
is no question about that. It's needed for full-repo operations, gc
included. Whether we expose all worktree namespaces via refs/worktree
(as a logical view, not storage one), remains to be seen. Michael's
new ref API may be flexible enough that we could do without. I'm not
sure yet.

> With that design, I would not have irretrievably lost reflogs to auto-gc
> nor dozens of hours trying to figure out how to have *any* auto-gc succeed
> again (because I need to reduce the horrible slowness incurred by too many
> loose objects). My interactive rebases would also not have started to
> print the auto-gc error after every friggin' pick.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-10 15:01                   ` Duy Nguyen
@ 2016-07-11  6:07                     ` Johannes Schindelin
  0 siblings, 0 replies; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-11  6:07 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Josh Triplett, Junio C Hamano, Git Mailing List

Hi Duy,

On Sun, 10 Jul 2016, Duy Nguyen wrote:

> We will be able to see refs from all worktrees if we choose to.

What I tried to say is: even if we make it technically feasible to have
per-worktree refs or reflogs, the downsides are too prohibitive. We should
simply not introduce support for that (and resolve the per-worktree HEAD
issue accordingly).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-10 14:16                 ` Johannes Schindelin
  2016-07-10 15:01                   ` Duy Nguyen
@ 2016-07-11 18:52                   ` Junio C Hamano
  2016-07-12 10:47                     ` Johannes Schindelin
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2016-07-11 18:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Duy Nguyen, Josh Triplett, Git Mailing List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> No, the point is, refs subsystem needs to know which refs is per-repo,
>> which is per-worktree. So far the rules are  "everything under refs,
>> except a few known exceptions, is per-repo" and "everything directly
>> under $GIT_DIR is per-worktree", which work fine. But if you allow to
>> move per-worktree to "refs" freely, then the "known exceptions" will
>> have to be updated every time a new per-worktree ref appears. It'll be
>> easier to modify the first rule as "everything under refs, except some
>> legacy exceptions and refs/worktree, is per-repo".
>
> Given the substantial pain and suffering I have due to per-worktree
> reflogs (and those are *just* HEAD's reflogs!), it appears to me that
> per-worktree refs would be a particularly poor feature.
>
> I agree that HEAD needs to be per-worktree, but already the fact that the
> HEADs of the worktrees, along with their reflogs, are *not* visible to
> all other worktrees causes substantial trouble.

Not so fast; it cuts both ways.

People who want multiple worktrees with branches checked out to work
in would want to do per-worktree things like bisection, which needs
tons more state than we'd be comfortable having directly under
$GIT_DIR (e.g. they may also want "git merge" or "git pull", which
would use MERGE_HEAD and FETCH_HEAD that are per-worktree and not
under refs/; "git bisect" would want to mark number of commits to
denote the perimeter of the area of the history being bisected and
they live refs/bisect/).

And when you are bisecting in the worktree dedicated for a topic,
it is a feature that your other worktrees do not need to see how
much history you narrowed down in that topic.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-11 18:52                   ` Junio C Hamano
@ 2016-07-12 10:47                     ` Johannes Schindelin
  2016-07-12 15:26                       ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-12 10:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Josh Triplett, Git Mailing List

Hi Junio,

On Mon, 11 Jul 2016, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> No, the point is, refs subsystem needs to know which refs is per-repo,
> >> which is per-worktree. So far the rules are  "everything under refs,
> >> except a few known exceptions, is per-repo" and "everything directly
> >> under $GIT_DIR is per-worktree", which work fine. But if you allow to
> >> move per-worktree to "refs" freely, then the "known exceptions" will
> >> have to be updated every time a new per-worktree ref appears. It'll be
> >> easier to modify the first rule as "everything under refs, except some
> >> legacy exceptions and refs/worktree, is per-repo".
> >
> > Given the substantial pain and suffering I have due to per-worktree
> > reflogs (and those are *just* HEAD's reflogs!), it appears to me that
> > per-worktree refs would be a particularly poor feature.
> >
> > I agree that HEAD needs to be per-worktree, but already the fact that the
> > HEADs of the worktrees, along with their reflogs, are *not* visible to
> > all other worktrees causes substantial trouble.
> 
> Not so fast; it cuts both ways.
> 
> People who want multiple worktrees with branches checked out to work
> in would want to do per-worktree things like bisection, which needs
> tons more state than we'd be comfortable having directly under
> $GIT_DIR (e.g. they may also want "git merge" or "git pull", which
> would use MERGE_HEAD and FETCH_HEAD that are per-worktree and not
> under refs/; "git bisect" would want to mark number of commits to
> denote the perimeter of the area of the history being bisected and
> they live refs/bisect/).

Sure, `git bisect` would need to realize that it is running in a worktree
separate from the original one and use a different ref.

> And when you are bisecting in the worktree dedicated for a topic,
> it is a feature that your other worktrees do not need to see how
> much history you narrowed down in that topic.

If you intentionally hide bisections from other worktrees, you will
invariably end up with the same problems I faced with auto-gc: in a
worktree, it is *much, much easier* to forget a bisect in progress.

In other words, your comments make me even more certain that per-worktree
refs are undesirable.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-12 10:47                     ` Johannes Schindelin
@ 2016-07-12 15:26                       ` Jeff King
  2016-07-12 15:46                         ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2016-07-12 15:26 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Duy Nguyen, Josh Triplett, Git Mailing List

On Tue, Jul 12, 2016 at 12:47:06PM +0200, Johannes Schindelin wrote:

> > Not so fast; it cuts both ways.
> > 
> > People who want multiple worktrees with branches checked out to work
> > in would want to do per-worktree things like bisection, which needs
> > tons more state than we'd be comfortable having directly under
> > $GIT_DIR (e.g. they may also want "git merge" or "git pull", which
> > would use MERGE_HEAD and FETCH_HEAD that are per-worktree and not
> > under refs/; "git bisect" would want to mark number of commits to
> > denote the perimeter of the area of the history being bisected and
> > they live refs/bisect/).
> 
> Sure, `git bisect` would need to realize that it is running in a worktree
> separate from the original one and use a different ref.

I am mostly a bystander in all of these worktree discussions, but your
comment here makes a lot of sense to me. We currently have a global ref
namespace, but the current proposals seem to want to slice it on
invisible lines into per-worktree and global bits, where "refs/bisect"
is no longer a global name. But we could also retain a global view, and
just let worktrees carve out their own portion of the namespace
("refs/worktree/foo/bisect", or even organize it by application area,
"refs/bisect/foo/bad", etc).

Besides being conceptually simpler in the code (global reachability Just
Works, because you see all of the refs), it would also let you access
individual refs between worktrees if you wanted. So for example, if you
are bisecting in worktree "foo", you can access its results from another
worktree with "git show bisect/foo/bad".

Likewise for other per-worktree items. If we used refs/MERGE_HEAD and
refs/worktree/foo/MERGE_HEAD, then you could access them independently
by using the fully qualified names.

The only downside I see is that the existing names are sometimes
well-known. I wonder if we could simply add:

  refs/worktree/<your-worktree>/%s

to the dwim ref-lookup when a command is running in a worktree.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-12 15:26                       ` Jeff King
@ 2016-07-12 15:46                         ` Duy Nguyen
  2016-07-12 15:51                           ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-07-12 15:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Junio C Hamano, Josh Triplett, Git Mailing List

On Tue, Jul 12, 2016 at 5:26 PM, Jeff King <peff@peff.net> wrote:
> Likewise for other per-worktree items. If we used refs/MERGE_HEAD and
> refs/worktree/foo/MERGE_HEAD, then you could access them independently
> by using the fully qualified names.

I'm not opposed to letting one worktree see everything, but this move
makes it harder to write new scripts (or new builtin commands, even)
that works with both single and multiple worktrees because you refer
to one ref (in current worktree perspective) differently. If we kill
of the main worktree (i.e. git init always creates a linked worktree)
then it's less of a problem, but still a nuisance to write
refs/worktree/$CURRENT/<something> everywhere.

> The only downside I see is that the existing names are sometimes
> well-known. I wonder if we could simply add:
>
>   refs/worktree/<your-worktree>/%s
>
> to the dwim ref-lookup when a command is running in a worktree.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-12 15:46                         ` Duy Nguyen
@ 2016-07-12 15:51                           ` Jeff King
  2016-07-12 16:13                             ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2016-07-12 15:51 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Johannes Schindelin, Junio C Hamano, Josh Triplett, Git Mailing List

On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:

> I'm not opposed to letting one worktree see everything, but this move
> makes it harder to write new scripts (or new builtin commands, even)
> that works with both single and multiple worktrees because you refer
> to one ref (in current worktree perspective) differently. If we kill
> of the main worktree (i.e. git init always creates a linked worktree)
> then it's less of a problem, but still a nuisance to write
> refs/worktree/$CURRENT/<something> everywhere.

True. I gave a suggestion for the reading side, but the writing side
would still remain tedious.

I wonder if, in a worktree, we could simply convert requests to read or
write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
That makes it a read/write-time alias conversion, but the actual storage
is just vanilla (so the ref storage doesn't need to care, and
reachability just works).

The trickiest thing, I think, is FETCH_HEAD, which is not really a
ref (because it may have a bunch of values, and contain extra
information).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-12 15:51                           ` Jeff King
@ 2016-07-12 16:13                             ` Duy Nguyen
  2016-07-13  8:20                               ` Johannes Schindelin
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-07-12 16:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Junio C Hamano, Josh Triplett, Git Mailing List

On Tue, Jul 12, 2016 at 5:51 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
>
>> I'm not opposed to letting one worktree see everything, but this move
>> makes it harder to write new scripts (or new builtin commands, even)
>> that works with both single and multiple worktrees because you refer
>> to one ref (in current worktree perspective) differently. If we kill
>> of the main worktree (i.e. git init always creates a linked worktree)
>> then it's less of a problem, but still a nuisance to write
>> refs/worktree/$CURRENT/<something> everywhere.
>
> True. I gave a suggestion for the reading side, but the writing side
> would still remain tedious.
>
> I wonder if, in a worktree, we could simply convert requests to read or
> write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
> That makes it a read/write-time alias conversion, but the actual storage
> is just vanilla (so the ref storage doesn't need to care, and
> reachability just works).

A conversion like that is already happening, but it works at
git_path() level instead and maps anything outside refs/ to
worktrees/$CURRENT. Reorganizing all refs in a single ref storage is
probably possible, but...

> The trickiest thing, I think, is FETCH_HEAD, which is not really a
> ref (because it may have a bunch of values, and contain extra
> information).

Yeah.. I think David and Junio touched this when lmdb backend was
discussed, which resulted in leaving per-worktree refs to filesystem
backend even when shared refs are in lmdb. We probably can still make
it work, I think refs subsystem has special case for FETCH_HEAD
already. But 'refs' stuff is really not my area, I should stop writing
now before making too many wrong statements.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-12 16:13                             ` Duy Nguyen
@ 2016-07-13  8:20                               ` Johannes Schindelin
  2016-07-13 14:54                                 ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-13  8:20 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Junio C Hamano, Josh Triplett, Git Mailing List

Hi Duy,

On Tue, 12 Jul 2016, Duy Nguyen wrote:

> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King <peff@peff.net> wrote:
> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
> >
> >> I'm not opposed to letting one worktree see everything, but this move
> >> makes it harder to write new scripts (or new builtin commands, even)
> >> that works with both single and multiple worktrees because you refer
> >> to one ref (in current worktree perspective) differently. If we kill
> >> of the main worktree (i.e. git init always creates a linked worktree)
> >> then it's less of a problem, but still a nuisance to write
> >> refs/worktree/$CURRENT/<something> everywhere.
> >
> > True. I gave a suggestion for the reading side, but the writing side
> > would still remain tedious.
> >
> > I wonder if, in a worktree, we could simply convert requests to read or
> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
> > That makes it a read/write-time alias conversion, but the actual storage
> > is just vanilla (so the ref storage doesn't need to care, and
> > reachability just works).
> 
> A conversion like that is already happening, but it works at
> git_path() level instead and maps anything outside refs/ to
> worktrees/$CURRENT.

Wouldn't you agree that the entire discussion goes into a direction that
reveals that it might simply be a better idea to require commands that want
to have per-worktree refs to do that explicitly?

I mean, it looks to me that the harder we try to avoid that, the more
problems crop up, some of that as serious as my reported data loss.

I do not see any indication that trying even harder to "protect" commands
from knowing that they are running in one of many worktrees is making
things easier. To the contrary, I expect that direction to hold many more
awful surprises for us.

The same holds true for the config, BTW. I really have no love for the
idea to make the config per-worktree. It just holds too many nasty
opportunities for violate the Law of Least Surprises.

Just to name one: imagine you check out a different branch in worktree A,
then switch worktree B to the branch that A had, and all of a sudden you
may end up with a different upstream!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-13  8:20                               ` Johannes Schindelin
@ 2016-07-13 14:54                                 ` Duy Nguyen
  2016-07-13 18:59                                   ` Johannes Schindelin
  0 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-07-13 14:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Junio C Hamano, Josh Triplett, Git Mailing List

On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Tue, 12 Jul 2016, Duy Nguyen wrote:
>
>> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King <peff@peff.net> wrote:
>> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
>> >
>> >> I'm not opposed to letting one worktree see everything, but this move
>> >> makes it harder to write new scripts (or new builtin commands, even)
>> >> that works with both single and multiple worktrees because you refer
>> >> to one ref (in current worktree perspective) differently. If we kill
>> >> of the main worktree (i.e. git init always creates a linked worktree)
>> >> then it's less of a problem, but still a nuisance to write
>> >> refs/worktree/$CURRENT/<something> everywhere.
>> >
>> > True. I gave a suggestion for the reading side, but the writing side
>> > would still remain tedious.
>> >
>> > I wonder if, in a worktree, we could simply convert requests to read or
>> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
>> > That makes it a read/write-time alias conversion, but the actual storage
>> > is just vanilla (so the ref storage doesn't need to care, and
>> > reachability just works).
>>
>> A conversion like that is already happening, but it works at
>> git_path() level instead and maps anything outside refs/ to
>> worktrees/$CURRENT.
>
> Wouldn't you agree that the entire discussion goes into a direction that
> reveals that it might simply be a better idea to require commands that want
> to have per-worktree refs to do that explicitly?

No. To me that's equivalent to let people deal explicitly with
file-based and lmdb refs backends everywhere. Unless the main worktree
concept will die (I doubt it) it may remain the common use case that
people care about and extra worktrees become second citizen that's
rarely tested. I prefer we have a single interface for dealing with
_any_ worktree. If there are fallouts, we deal with them.

> The same holds true for the config, BTW. I really have no love for the
> idea to make the config per-worktree. It just holds too many nasty
> opportunities for violate the Law of Least Surprises.
>
> Just to name one: imagine you check out a different branch in worktree A,
> then switch worktree B to the branch that A had, and all of a sudden you
> may end up with a different upstream!

Everything in moderation. You wouldn't want to enable sparse checkout
on one worktree and it suddenly affects all worktrees because
core.sparsecheckout is shared. And submodules are known not to work
when core.worktree is still shared.

I will not enforce any rule (unless it's very obvious that the other
way is wrong, like core.worktree). I will give you a rifle and you can
either hunt for food or shoot your foot. In other words, you should be
able to share everything if you like it that way while someone else
can share just some config vars, or even nothing in config file.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-13 14:54                                 ` Duy Nguyen
@ 2016-07-13 18:59                                   ` Johannes Schindelin
  2016-07-15 15:46                                     ` Duy Nguyen
  0 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2016-07-13 18:59 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Junio C Hamano, Josh Triplett, Git Mailing List

Hi Duy,

On Wed, 13 Jul 2016, Duy Nguyen wrote:

> On Wed, Jul 13, 2016 at 10:20 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Tue, 12 Jul 2016, Duy Nguyen wrote:
> >
> >> On Tue, Jul 12, 2016 at 5:51 PM, Jeff King <peff@peff.net> wrote:
> >> > On Tue, Jul 12, 2016 at 05:46:12PM +0200, Duy Nguyen wrote:
> >> >
> >> >> I'm not opposed to letting one worktree see everything, but this move
> >> >> makes it harder to write new scripts (or new builtin commands, even)
> >> >> that works with both single and multiple worktrees because you refer
> >> >> to one ref (in current worktree perspective) differently. If we kill
> >> >> of the main worktree (i.e. git init always creates a linked worktree)
> >> >> then it's less of a problem, but still a nuisance to write
> >> >> refs/worktree/$CURRENT/<something> everywhere.
> >> >
> >> > True. I gave a suggestion for the reading side, but the writing side
> >> > would still remain tedious.
> >> >
> >> > I wonder if, in a worktree, we could simply convert requests to read or
> >> > write names that do not begin with "refs/" as "refs/worktree/$CURRENT/"?
> >> > That makes it a read/write-time alias conversion, but the actual storage
> >> > is just vanilla (so the ref storage doesn't need to care, and
> >> > reachability just works).
> >>
> >> A conversion like that is already happening, but it works at
> >> git_path() level instead and maps anything outside refs/ to
> >> worktrees/$CURRENT.
> >
> > Wouldn't you agree that the entire discussion goes into a direction that
> > reveals that it might simply be a better idea to require commands that want
> > to have per-worktree refs to do that explicitly?
> 
> No.

Oh? So you do not see that we are already heading into serious trouble
ourselves?

> I prefer we have a single interface for dealing with _any_ worktree.

I agree. So far, I did not see an interface, though, but the idea that we
should somehow fake things so that there does not *have* to be an
interface.

> > The same holds true for the config, BTW. I really have no love for the
> > idea to make the config per-worktree. It just holds too many nasty
> > opportunities for violate the Law of Least Surprises.
> >
> > Just to name one: imagine you check out a different branch in worktree A,
> > then switch worktree B to the branch that A had, and all of a sudden you
> > may end up with a different upstream!
> 
> Everything in moderation. You wouldn't want to enable sparse checkout
> on one worktree and it suddenly affects all worktrees because
> core.sparsecheckout is shared. And submodules are known not to work
> when core.worktree is still shared.

We have precendence for config variables that are global but also allow
per-<something> overrides. Think e.g. the http.* variables.

The point is: this solution still uses *one* config per repo.

> I will not enforce any rule (unless it's very obvious that the other
> way is wrong, like core.worktree). I will give you a rifle and you can
> either hunt for food or shoot your foot. In other words, you should be
> able to share everything if you like it that way while someone else
> can share just some config vars, or even nothing in config file.

You gave me a rifle alright, and I shot into my foot (by losing objects to
auto-gc). I just did not expect it to be a rifle.

To keep with the analogy: let's not build arms, but a kick-ass tool. And I
seriously disagree that per-worktree refs, reflogs or config are part of
said tool.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: gc and repack ignore .git/*HEAD when checking reachability
  2016-07-13 18:59                                   ` Johannes Schindelin
@ 2016-07-15 15:46                                     ` Duy Nguyen
  0 siblings, 0 replies; 27+ messages in thread
From: Duy Nguyen @ 2016-07-15 15:46 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, Junio C Hamano, Josh Triplett, Git Mailing List

On Wed, Jul 13, 2016 at 8:59 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> No.
>
> Oh? So you do not see that we are already heading into serious trouble
> ourselves?

I do see problems, that have solutions. But I have not yet seen that
we are heading to a dead end.

>> I prefer we have a single interface for dealing with _any_ worktree.
>
> I agree. So far, I did not see an interface, though, but the idea that we
> should somehow fake things so that there does not *have* to be an
> interface.

Calling it "fake" is a bit too strong. I'd call it an abstraction. We
have always had the one-repo/one-worktree relationship, now we're
breaking that assumption. Some operations work at repo level and
require a "repo view" while others work at worktree level. We do it in
a way that a program designed with these separate views can still work
correctly in the traditional one-worktree-one-repo configuration,
where both views are the same.

 - For accessing data in $GIT_DIR, you do not peek directly into it
any more. You use "git rev-parse --git-path" to retrieve a path in
$GIT_DIR (instead of doing `git rev-parse --git-dir`/some/path), then
you can do something about it. This is the equivalent of git_path() at
C level.

 - We have a set of rules to define what part of $GIT_DIR is shared
and what is not. When you add new stuff and follow this rule, it will
work in single or multiple worktree config. So far pretty much every
unknown thing is per-worktree. $GIT_DIR/common will be the shared
place for external scripts, soon.

 - It's the same thing for refs: we may reserve a portion of it for
per-worktree, and the rest is shared.

 - We provide means for one worktree to access data of any other
worktree if needed (e.g. $GIT_COMMON_DIR, or the new ref storage API)

 - Because the majority of operations is per-worktree, that has been
the default view so far when working in multiple worktrees. Repo-level
operations such as git-gc, rev-list --all, fsck... need to "switch
view". I missed this and this seemed to cause big problem for you. My
only excuse is, this is an experimental feature.

The idea of single config file with separate "worktree namespace" (eg.
core.worktree vs worktree.<abc>.worktree) was shot down because it
would result in a lot of changes. And it's the same thing why we start
to split views this way (and make default view "per-worktree") instead
of rewriting every piece of code to deal with multiple worktrees
explicitly (by carrying the worktree id everywhere).

There are two loose ends that are on my mind, but I have not clear
ideas yet: the incorporation of ref namespace and odb alternates.
Imagine when you you submodules, all submodule repos (refs and odb)
can be stored in the top-level .git repo, separated by namespaces.
After all a repo is just a container, of a bunch of refs. This make it
much easier to peek into another submodule from a supermodule, and
makes it safer to "rm -r a-submodule" when you get mad at something.

I hope this shows that it is less of a fake thing.
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-07-15 15:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-08  2:59 gc and repack ignore .git/*HEAD when checking reachability Josh Triplett
2016-07-08  4:34 ` Junio C Hamano
2016-07-08  6:44   ` Josh Triplett
2016-07-08 17:14     ` Junio C Hamano
2016-07-08 19:25       ` Theodore Ts'o
2016-07-08 20:30         ` Junio C Hamano
2016-07-08 23:50           ` Theodore Ts'o
2016-07-09  5:23             ` Josh Triplett
2016-07-08 20:29       ` Josh Triplett
2016-07-09  7:35       ` Johannes Schindelin
2016-07-09 14:09         ` Josh Triplett
2016-07-09 16:45           ` Duy Nguyen
2016-07-10 10:59             ` Johannes Schindelin
2016-07-10 11:04               ` Duy Nguyen
2016-07-10 14:16                 ` Johannes Schindelin
2016-07-10 15:01                   ` Duy Nguyen
2016-07-11  6:07                     ` Johannes Schindelin
2016-07-11 18:52                   ` Junio C Hamano
2016-07-12 10:47                     ` Johannes Schindelin
2016-07-12 15:26                       ` Jeff King
2016-07-12 15:46                         ` Duy Nguyen
2016-07-12 15:51                           ` Jeff King
2016-07-12 16:13                             ` Duy Nguyen
2016-07-13  8:20                               ` Johannes Schindelin
2016-07-13 14:54                                 ` Duy Nguyen
2016-07-13 18:59                                   ` Johannes Schindelin
2016-07-15 15:46                                     ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).