All of lore.kernel.org
 help / color / mirror / Atom feed
* git gc and worktrees
@ 2016-05-31  7:07 Johannes Sixt
  2016-05-31 12:02 ` Duy Nguyen
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 25+ messages in thread
From: Johannes Sixt @ 2016-05-31  7:07 UTC (permalink / raw)
  To: Git Mailing List

Earlier this year I had a largish merge going on in a separate worktree. 
With a mix of staged resolutions and unmerged paths in the index, I ran 
'git gc' in the main worktree. This removed a lot of objects that were 
recorded in that separate worktree index. (I was able to recover them 
because the content was still on disk.)

Now I wanted to derive a test case that shows that breakage, but I am 
unable to. The objects recorded in a separate worktree index, but not 
committed, yet, are not pruned anymore.

Have there been any fixes in this regard recently? Or does this look 
like something else going on?

-- Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-05-31  7:07 git gc and worktrees Johannes Sixt
@ 2016-05-31 12:02 ` Duy Nguyen
  2016-05-31 22:14   ` Jeff King
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 25+ messages in thread
From: Duy Nguyen @ 2016-05-31 12:02 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List, Jeff King

On Tue, May 31, 2016 at 2:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> Earlier this year I had a largish merge going on in a separate worktree.
> With a mix of staged resolutions and unmerged paths in the index, I ran 'git
> gc' in the main worktree. This removed a lot of objects that were recorded
> in that separate worktree index. (I was able to recover them because the
> content was still on disk.)
>
> Now I wanted to derive a test case that shows that breakage, but I am unable
> to. The objects recorded in a separate worktree index, but not committed,
> yet, are not pruned anymore.
>
> Have there been any fixes in this regard recently? Or does this look like
> something else going on?

Not sure. I vaguely recall Jeff touched this pruning issue once,
something about recent objects will not be pruned based on mtime. But
maybe some of those objects in the index are not so young, and because
I think we never check indexes in git-gc/git-prune, they have a chance
to be deleted.

No I'm wrong. mark_reachable_objects() which is used by git-prune,
does add objects from index and HEAD, which only covers those from
_current_ worktree. This should be fixed even if it is not the root
cause of your problem. I'll look into it.
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-05-31 12:02 ` Duy Nguyen
@ 2016-05-31 22:14   ` Jeff King
  2016-06-01  7:00     ` Johannes Sixt
  2016-06-01  8:57     ` Michael Haggerty
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff King @ 2016-05-31 22:14 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Michael Haggerty, Johannes Sixt, Git Mailing List

On Tue, May 31, 2016 at 07:02:15PM +0700, Duy Nguyen wrote:

> On Tue, May 31, 2016 at 2:07 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> > Earlier this year I had a largish merge going on in a separate worktree.
> > With a mix of staged resolutions and unmerged paths in the index, I ran 'git
> > gc' in the main worktree. This removed a lot of objects that were recorded
> > in that separate worktree index. (I was able to recover them because the
> > content was still on disk.)
> >
> > Now I wanted to derive a test case that shows that breakage, but I am unable
> > to. The objects recorded in a separate worktree index, but not committed,
> > yet, are not pruned anymore.
> >
> > Have there been any fixes in this regard recently? Or does this look like
> > something else going on?
> 
> Not sure. I vaguely recall Jeff touched this pruning issue once,
> something about recent objects will not be pruned based on mtime. But
> maybe some of those objects in the index are not so young, and because
> I think we never check indexes in git-gc/git-prune, they have a chance
> to be deleted.
> 
> No I'm wrong. mark_reachable_objects() which is used by git-prune,
> does add objects from index and HEAD, which only covers those from
> _current_ worktree. This should be fixed even if it is not the root
> cause of your problem. I'll look into it.

Right, we use the index for reachability checks (both in "prune", but
also during a "repack -a", which uses the revision parser's
'--indexed-objects" option). That obviously should handle per-worktree
index files, but I don't know whether it currently does.

Michael (cc'd) noted to me off-list recently that there may be some
special cases there regarding reflogs in other worktrees (i.e., that we
don't always include them for our reachability checks). I don't know the
details, though.

The "recent object" stuff you're thinking of is that we will keep a
non-recent object X if it is referenced (directly or indirectly) by
another object Y that _is_ recent, but not otherwise reachable. That
should work independent of worktrees, because it finds the recent
objects directly in the object database, and then walks their graphs
(but it could indeed make writing tests a little trickier, as it may be
harder to convince some objects to get pruned in the first place).

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-05-31 22:14   ` Jeff King
@ 2016-06-01  7:00     ` Johannes Sixt
  2016-06-01  8:57     ` Michael Haggerty
  1 sibling, 0 replies; 25+ messages in thread
From: Johannes Sixt @ 2016-06-01  7:00 UTC (permalink / raw)
  To: Jeff King, Duy Nguyen; +Cc: Michael Haggerty, Git Mailing List

Am 01.06.2016 um 00:14 schrieb Jeff King:
> Right, we use the index for reachability checks (both in "prune", but
> also during a "repack -a", which uses the revision parser's
> '--indexed-objects" option). That obviously should handle per-worktree
> index files, but I don't know whether it currently does.

Thanks. Here's a test case to make sure that --indexed-objects looks at
the indexes of separate worktrees. I'm not submitting a proper patch
because I don't feel like being able to fix the problem and I'm not
sure what the expected order of the listed objects is.

diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 3e752ce..dbd3679 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -96,6 +96,34 @@ test_expect_success 'rev-list can show index objects' '
 	test_cmp expect actual
 '
 
+test_expect_success '--indexed-objects in the index of a separate worktrees' '
+	# see also the comments in the previous case
+	cat >expect <<-\EOF &&
+	8e4020bb5a8d8c873b25de15933e75cc0fc275df one
+	d9d3a7417b9605cfd88ee6306b28dadc29e6ab08 only-in-index
+	2043d83f5a374f119437856d2f1773c936938876 only-in-wt-index
+	9200b628cf9dc883a85a7abc8d6e6730baee589c two
+	EOF
+	echo only-in-index >only-in-index &&
+	git add only-in-index &&
+
+	git worktree add -b side wt &&
+	test_when_finished "rm -r wt; git worktree prune" &&
+	(
+		cd wt &&
+		echo only-in-wt-index >only-in-wt-index &&
+		git add only-in-wt-index &&
+
+		# this must also include the objects of the main worktree
+		git rev-list --objects --indexed-objects >../actual
+	) &&
+	test_cmp expect actual &&
+
+	# same result when invoked from the main worktree
+	git rev-list --objects --indexed-objects >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '--bisect and --first-parent can not be combined' '
 	test_must_fail git rev-list --bisect --first-parent HEAD
 '

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-05-31 22:14   ` Jeff King
  2016-06-01  7:00     ` Johannes Sixt
@ 2016-06-01  8:57     ` Michael Haggerty
  2016-06-01 15:15       ` Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Haggerty @ 2016-06-01  8:57 UTC (permalink / raw)
  To: Jeff King, Duy Nguyen; +Cc: Johannes Sixt, Git Mailing List, David Turner

On 06/01/2016 12:14 AM, Jeff King wrote:
> [...]
> Michael (cc'd) noted to me off-list recently that there may be some
> special cases there regarding reflogs in other worktrees (i.e., that we
> don't always include them for our reachability checks). I don't know the
> details, though.

That's correct. `for_each_reflog()` does its work by walking the
directory tree under `$GIT_DIR/logs`. So when run in the main
repository, it never looks at the per-worktree reflogs. Therefore,
objects that are only reachable from per-worktree reflogs can end up
getting pruned.

This is closely tangled up with my ref-iterators patch series (which is
why I noticed it), especially with

    [PATCH 13/13] for_each_reflog(): reimplement using iterators

[1].

I think reference stores are going to need two distinct types of
reference iteration: one to iterate over the *logical* reference space
of a single repo or worktree, and one to find all *local* references
and/or reachability roots (e.g., when run within a linked repo). The
current approach, where subtrees of the reference namespace are pasted
ad-hoc into each other partly by manhandling the ref_cache and partly
via the hack in `git_path()` is not, I think, a very good long-term design.

I'd prefer the elementary operation on a low-level reference store to be
iterating over all of the references that are stored locally, and to use
`merge_ref_iterator` for pasting together the parts of low-level
reference stores that are needed to form a logical view of the
references in a linked repo.

Michael

[1]
http://thread.gmane.org/gmane.comp.version-control.git/295860/focus=295866

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/4] Fix prune/gc problem with multiple worktrees
  2016-05-31  7:07 git gc and worktrees Johannes Sixt
  2016-05-31 12:02 ` Duy Nguyen
@ 2016-06-01 10:45 ` Nguyễn Thái Ngọc Duy
  2016-06-01 10:45   ` [PATCH 1/4] revision.c: move read_cache() out of add_index_objects_to_pending() Nguyễn Thái Ngọc Duy
                     ` (5 more replies)
  1 sibling, 6 replies; 25+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-01 10:45 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, Jeff King, mhagger, dturner,
	Nguyễn Thái Ngọc Duy

This series makes sure that objects referenced by all worktrees are
marked reachable so that we don't accidentally delete objects that are
being used. Previously per-worktree references in index, detached HEAD
or per-worktree reflogs come from current worktree only, not all
worktrees.

The series deals with git-prune and git-gc specifically. I left out
"git rev-list". It shares the same problem because it will only
consider current worktree's HEAD, index and per-worktree reflogs. The
problem is I am not sure if we simply just change, say
--indexed-objects, to cover all indexes, or should we only do that
with "--all-worktrees --indexed-objects". I guess this is up for
discussion.

Nguyễn Thái Ngọc Duy (4):
  revision.c: move read_cache() out of add_index_objects_to_pending()
  reachable.c: mark reachable objects in index from all worktrees
  reachable.c: mark reachable detached HEAD from all worktrees
  reachable.c: make reachable reflogs for all per-worktree reflogs

 reachable.c      | 47 +++++++++++++++++++++++++++++++++++++++++------
 revision.c       | 34 +++++++++++++++++++++++++++-------
 revision.h       |  7 ++++++-
 t/t5304-prune.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 114 insertions(+), 14 deletions(-)

-- 
2.8.2.524.g6ff3d78

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] revision.c: move read_cache() out of add_index_objects_to_pending()
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
@ 2016-06-01 10:45   ` Nguyễn Thái Ngọc Duy
  2016-06-01 10:45   ` [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees Nguyễn Thái Ngọc Duy
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-01 10:45 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, Jeff King, mhagger, dturner,
	Nguyễn Thái Ngọc Duy

A library function like this should not change global state like
the_index. Granted, read_cache() will not re-read from $GIT_DIR/index
and lose all in-core changes if the index is dirty (i.e. no bad side
effects). But that sort of detail should not be relied on here.

Make the function take the index object from outside instead. The
caller will be responsible for reading index files if necessary.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 reachable.c |  3 ++-
 revision.c  | 15 ++++++++-------
 revision.h  |  4 +++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/reachable.c b/reachable.c
index d0199ca..15dbe60 100644
--- a/reachable.c
+++ b/reachable.c
@@ -170,7 +170,8 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	revs->tree_objects = 1;
 
 	/* Add all refs from the index file */
-	add_index_objects_to_pending(revs, 0);
+	read_cache();
+	add_index_objects_to_pending(revs, 0, &the_index);
 
 	/* Add all external refs */
 	for_each_ref(add_one_ref, revs);
diff --git a/revision.c b/revision.c
index d30d1c4..bbb6ff1 100644
--- a/revision.c
+++ b/revision.c
@@ -1275,13 +1275,13 @@ static void add_cache_tree(struct cache_tree *it, struct rev_info *revs,
 
 }
 
-void add_index_objects_to_pending(struct rev_info *revs, unsigned flags)
+void add_index_objects_to_pending(struct rev_info *revs, unsigned flags,
+				  const struct index_state *istate)
 {
 	int i;
 
-	read_cache();
-	for (i = 0; i < active_nr; i++) {
-		struct cache_entry *ce = active_cache[i];
+	for (i = 0; i < istate->cache_nr; i++) {
+		const struct cache_entry *ce = istate->cache[i];
 		struct blob *blob;
 
 		if (S_ISGITLINK(ce->ce_mode))
@@ -1294,9 +1294,9 @@ void add_index_objects_to_pending(struct rev_info *revs, unsigned flags)
 					     ce->ce_mode, ce->name);
 	}
 
-	if (active_cache_tree) {
+	if (istate->cache_tree) {
 		struct strbuf path = STRBUF_INIT;
-		add_cache_tree(active_cache_tree, revs, &path);
+		add_cache_tree(istate->cache_tree, revs, &path);
 		strbuf_release(&path);
 	}
 }
@@ -2106,7 +2106,8 @@ static int handle_revision_pseudo_opt(const char *submodule,
 	} else if (!strcmp(arg, "--reflog")) {
 		add_reflogs_to_pending(revs, *flags);
 	} else if (!strcmp(arg, "--indexed-objects")) {
-		add_index_objects_to_pending(revs, *flags);
+		read_cache();
+		add_index_objects_to_pending(revs, *flags, &the_index);
 	} else if (!strcmp(arg, "--not")) {
 		*flags ^= UNINTERESTING | BOTTOM;
 	} else if (!strcmp(arg, "--no-walk")) {
diff --git a/revision.h b/revision.h
index 9fac1a6..d06d098 100644
--- a/revision.h
+++ b/revision.h
@@ -29,6 +29,7 @@ struct rev_info;
 struct log_info;
 struct string_list;
 struct saved_parents;
+struct index_state;
 
 struct rev_cmdline_info {
 	unsigned int nr;
@@ -271,7 +272,8 @@ extern void add_pending_sha1(struct rev_info *revs,
 
 extern void add_head_to_pending(struct rev_info *);
 extern void add_reflogs_to_pending(struct rev_info *, unsigned int flags);
-extern void add_index_objects_to_pending(struct rev_info *, unsigned int flags);
+extern void add_index_objects_to_pending(struct rev_info *, unsigned int flags,
+					 const struct index_state *);
 
 enum commit_action {
 	commit_ignore,
-- 
2.8.2.524.g6ff3d78

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
  2016-06-01 10:45   ` [PATCH 1/4] revision.c: move read_cache() out of add_index_objects_to_pending() Nguyễn Thái Ngọc Duy
@ 2016-06-01 10:45   ` Nguyễn Thái Ngọc Duy
  2016-06-01 18:13     ` Eric Sunshine
  2016-06-01 18:57     ` David Turner
  2016-06-01 10:45   ` [PATCH 3/4] reachable.c: mark reachable detached HEAD " Nguyễn Thái Ngọc Duy
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-01 10:45 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, Jeff King, mhagger, dturner,
	Nguyễn Thái Ngọc Duy

Current mark_reachable_objects() only marks objects from index from
_current_ worktree as reachable instead of all worktrees. Because this
function is used for pruning, there is a chance that objects referenced
by other worktrees may be deleted. Fix that.

Small behavior change in "one worktree" case, the index is read again
from file. In the current implementation, if the_index is already
loaded, the index file will not be read from file again. This adds some
more cost to this operation, hopefully insignificant because
reachability test is usually very expensive already.

Reported-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 reachable.c      | 33 +++++++++++++++++++++++++++++----
 t/t5304-prune.sh |  9 +++++++++
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/reachable.c b/reachable.c
index 15dbe60..8f67242 100644
--- a/reachable.c
+++ b/reachable.c
@@ -9,6 +9,7 @@
 #include "cache-tree.h"
 #include "progress.h"
 #include "list-objects.h"
+#include "worktree.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -155,6 +156,32 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 				      FOR_EACH_OBJECT_LOCAL_ONLY);
 }
 
+static void add_objects_from_worktree(struct rev_info *revs)
+{
+	struct worktree **worktrees, **p;
+
+	worktrees = get_worktrees();
+	for (p = worktrees; *p; p++) {
+		struct worktree *wt = *p;
+		struct index_state istate;
+
+		memset(&istate, 0, sizeof(istate));
+		if (read_index_from(&istate,
+				    worktree_git_path(wt, "index")) > 0)
+			add_index_objects_to_pending(revs, 0, &istate);
+		discard_index(&istate);
+	}
+	free_worktrees(worktrees);
+
+	/*
+	 * this is in case the index is already updated but not
+	 * written down in file yet, then add_index_... in the above
+	 * loop will miss new objects that are just created or
+	 * referenced.
+	 */
+	add_index_objects_to_pending(revs, 0, &the_index);
+}
+
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 			    unsigned long mark_recent,
 			    struct progress *progress)
@@ -169,10 +196,6 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	revs->blob_objects = 1;
 	revs->tree_objects = 1;
 
-	/* Add all refs from the index file */
-	read_cache();
-	add_index_objects_to_pending(revs, 0, &the_index);
-
 	/* Add all external refs */
 	for_each_ref(add_one_ref, revs);
 
@@ -183,6 +206,8 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	if (mark_reflog)
 		add_reflogs_to_pending(revs, 0);
 
+	add_objects_from_worktree(revs);
+
 	cp.progress = progress;
 	cp.count = 0;
 
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 133b584..cba45c7 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -283,4 +283,13 @@ test_expect_success 'prune: handle alternate object database' '
 	git -C B prune
 '
 
+test_expect_success 'prune: handle index in multiple worktrees' '
+	git worktree add second-worktree &&
+	echo "new blob for second-worktree" >second-worktree/blob &&
+	git -C second-worktree add blob &&
+	git prune --expire=now &&
+	git -C second-worktree show :blob >actual &&
+	test_cmp second-worktree/blob actual
+'
+
 test_done
-- 
2.8.2.524.g6ff3d78

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] reachable.c: mark reachable detached HEAD from all worktrees
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
  2016-06-01 10:45   ` [PATCH 1/4] revision.c: move read_cache() out of add_index_objects_to_pending() Nguyễn Thái Ngọc Duy
  2016-06-01 10:45   ` [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees Nguyễn Thái Ngọc Duy
@ 2016-06-01 10:45   ` Nguyễn Thái Ngọc Duy
  2016-06-01 10:45   ` [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs Nguyễn Thái Ngọc Duy
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-01 10:45 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, Jeff King, mhagger, dturner,
	Nguyễn Thái Ngọc Duy

Same reason as the previous commit, this prevents potential object
deletion that other worktrees need.

There is a slight change in behavior. The current implementation always
adds HEAD. The new one only adds _detached_ HEAD. But that's fine because
all normal refs should be already covered by for_each_ref() in
mark_reachable_objects().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 reachable.c      | 14 +++++++++++---
 t/t5304-prune.sh | 12 ++++++++++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/reachable.c b/reachable.c
index 8f67242..e5f9170 100644
--- a/reachable.c
+++ b/reachable.c
@@ -170,6 +170,13 @@ static void add_objects_from_worktree(struct rev_info *revs)
 				    worktree_git_path(wt, "index")) > 0)
 			add_index_objects_to_pending(revs, 0, &istate);
 		discard_index(&istate);
+
+		if (wt->is_detached) {
+			struct object *o;
+			o = parse_object_or_die(wt->head_sha1, "HEAD");
+			add_pending_object(revs, o, "");
+		}
+
 	}
 	free_worktrees(worktrees);
 
@@ -199,13 +206,14 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	/* Add all external refs */
 	for_each_ref(add_one_ref, revs);
 
-	/* detached HEAD is not included in the list above */
-	head_ref(add_one_ref, revs);
-
 	/* Add all reflog info */
 	if (mark_reflog)
 		add_reflogs_to_pending(revs, 0);
 
+	/*
+	 * Add all objects from the in-core index file and detached
+	 * HEAD which is not included in the list above
+	 */
 	add_objects_from_worktree(revs);
 
 	cp.progress = progress;
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index cba45c7..683bdb0 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -292,4 +292,16 @@ test_expect_success 'prune: handle index in multiple worktrees' '
 	test_cmp second-worktree/blob actual
 '
 
+test_expect_success 'prune: handle HEAD in multiple worktrees' '
+	git worktree add --detach third-worktree &&
+	echo "new blob for third-worktree" >third-worktree/blob &&
+	git -C third-worktree add blob &&
+	git -C third-worktree commit -m "third" &&
+	rm .git/worktrees/third-worktree/index &&
+	test_must_fail git -C third-worktree show :blob &&
+	git prune --expire=now &&
+	git -C third-worktree show HEAD:blob >actual &&
+	test_cmp third-worktree/blob actual
+'
+
 test_done
-- 
2.8.2.524.g6ff3d78

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2016-06-01 10:45   ` [PATCH 3/4] reachable.c: mark reachable detached HEAD " Nguyễn Thái Ngọc Duy
@ 2016-06-01 10:45   ` Nguyễn Thái Ngọc Duy
  2016-06-01 15:51     ` Michael Haggerty
  2016-06-01 16:01   ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Jeff King
  2016-06-01 16:06   ` Junio C Hamano
  5 siblings, 1 reply; 25+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-01 10:45 UTC (permalink / raw)
  To: git
  Cc: Johannes Sixt, Jeff King, mhagger, dturner,
	Nguyễn Thái Ngọc Duy

Same reason as the previous commit, this is to avoid deleting objects
being referenced by per-worktree reflogs from other worktrees.
"logs/HEAD" is most important. "logs/refs/bisect" should not live long
enough to matter, but let's add it too for safety.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 reachable.c      |  7 ++++---
 revision.c       | 19 +++++++++++++++++++
 revision.h       |  3 +++
 t/t5304-prune.sh | 19 +++++++++++++++++++
 4 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/reachable.c b/reachable.c
index e5f9170..73915e0 100644
--- a/reachable.c
+++ b/reachable.c
@@ -156,7 +156,7 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 				      FOR_EACH_OBJECT_LOCAL_ONLY);
 }
 
-static void add_objects_from_worktree(struct rev_info *revs)
+static void add_objects_from_worktree(struct rev_info *revs, int mark_reflog)
 {
 	struct worktree **worktrees, **p;
 
@@ -176,7 +176,8 @@ static void add_objects_from_worktree(struct rev_info *revs)
 			o = parse_object_or_die(wt->head_sha1, "HEAD");
 			add_pending_object(revs, o, "");
 		}
-
+		if (mark_reflog)
+			add_worktree_reflogs_to_pending(revs, 0, wt);
 	}
 	free_worktrees(worktrees);
 
@@ -214,7 +215,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	 * Add all objects from the in-core index file and detached
 	 * HEAD which is not included in the list above
 	 */
-	add_objects_from_worktree(revs);
+	add_objects_from_worktree(revs, mark_reflog);
 
 	cp.progress = progress;
 	cp.count = 0;
diff --git a/revision.c b/revision.c
index bbb6ff1..6a197a4 100644
--- a/revision.c
+++ b/revision.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "cache-tree.h"
 #include "bisect.h"
+#include "worktree.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -1245,6 +1246,24 @@ static int handle_one_reflog(const char *path, const struct object_id *oid,
 	return 0;
 }
 
+void add_worktree_reflogs_to_pending(struct rev_info *revs, unsigned flags,
+				     struct worktree *wt)
+{
+	struct all_refs_cb cb;
+	char *path;
+
+	cb.all_revs = revs;
+	cb.all_flags = flags;
+	path = xstrdup(worktree_git_path(wt, "logs/HEAD"));
+	if (file_exists(path))
+		handle_one_reflog(path, NULL, 0, &cb);
+	free(path);
+	path = xstrdup(worktree_git_path(wt, "logs/refs/bisect"));
+	if (file_exists(path))
+		handle_one_reflog(path, NULL, 0, &cb);
+	free(path);
+}
+
 void add_reflogs_to_pending(struct rev_info *revs, unsigned flags)
 {
 	struct all_refs_cb cb;
diff --git a/revision.h b/revision.h
index d06d098..9f3f148 100644
--- a/revision.h
+++ b/revision.h
@@ -30,6 +30,7 @@ struct log_info;
 struct string_list;
 struct saved_parents;
 struct index_state;
+struct worktree;
 
 struct rev_cmdline_info {
 	unsigned int nr;
@@ -272,6 +273,8 @@ extern void add_pending_sha1(struct rev_info *revs,
 
 extern void add_head_to_pending(struct rev_info *);
 extern void add_reflogs_to_pending(struct rev_info *, unsigned int flags);
+extern void add_worktree_reflogs_to_pending(struct rev_info *, unsigned int flags,
+					    struct worktree *);
 extern void add_index_objects_to_pending(struct rev_info *, unsigned int flags,
 					 const struct index_state *);
 
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 683bdb0..6b1c456 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -304,4 +304,23 @@ test_expect_success 'prune: handle HEAD in multiple worktrees' '
 	test_cmp third-worktree/blob actual
 '
 
+test_expect_success 'prune: handle HEAD reflog in multiple worktrees' '
+	(
+		cd third-worktree &&
+		git config core.logAllRefUpdates true &&
+		echo "HEAD{1} blob for third-worktree" >blob &&
+		git add blob &&
+		git commit -m "second commit in third" &&
+		cp blob expected &&
+		echo "HEAD{0} blob for third-worktree" >blob &&
+		git add blob &&
+		git commit -m "third commit in third" &&
+		git show HEAD@{1}:blob >actual &&
+		test_cmp expected actual
+	) &&
+	git prune --expire=now &&
+	git -C third-worktree show HEAD@{1}:blob >actual &&
+	test_cmp third-worktree/expected actual
+'
+
 test_done
-- 
2.8.2.524.g6ff3d78

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-06-01  8:57     ` Michael Haggerty
@ 2016-06-01 15:15       ` Junio C Hamano
  2016-06-01 16:12         ` Michael Haggerty
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-06-01 15:15 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, Duy Nguyen, Johannes Sixt, Git Mailing List, David Turner

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I think reference stores are going to need two distinct types of
> reference iteration: one to iterate over the *logical* reference space
> of a single repo or worktree, and one to find all *local* references
> and/or reachability roots (e.g., when run within a linked repo).

This is hard reason about without defining *logical* and *local*.

For protecting necessary objects from pruning by anchoring them as
"reachable", and for avoiding unnecessary object transfer by showing
things via ".have" lines, we're better consider references and
reflogs that do not appear in the namespace visible from the current
worktree.  I wanted to come up with an example of wanting to iterate
over refs of the current worktree, but other than "what are my
branches including HEAD?" I didn't think of anything interesting.

> The current approach, where subtrees of the reference namespace
> are pasted ad-hoc into each other partly by manhandling the
> ref_cache and partly via the hack in `git_path()` is not, I think,
> a very good long-term design.

This I agree with.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs
  2016-06-01 10:45   ` [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs Nguyễn Thái Ngọc Duy
@ 2016-06-01 15:51     ` Michael Haggerty
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Haggerty @ 2016-06-01 15:51 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Johannes Sixt, Jeff King, dturner

On 06/01/2016 12:45 PM, Nguyễn Thái Ngọc Duy wrote:
> [...]
> +void add_worktree_reflogs_to_pending(struct rev_info *revs, unsigned flags,
> +				     struct worktree *wt)
> +{
> +	struct all_refs_cb cb;
> +	char *path;
> +
> +	cb.all_revs = revs;
> +	cb.all_flags = flags;
> +	path = xstrdup(worktree_git_path(wt, "logs/HEAD"));
> +	if (file_exists(path))
> +		handle_one_reflog(path, NULL, 0, &cb);
> +	free(path);
> +	path = xstrdup(worktree_git_path(wt, "logs/refs/bisect"));
> +	if (file_exists(path))
> +		handle_one_reflog(path, NULL, 0, &cb);
> +	free(path);
> +}

`refs/bisect` is not a single reference. It is a namespace that contains
references with names like `refs/bisect/bad` and
`refs/bisect/good-66106691a1b71e445fe5e4d6b8b043dffc7dfe4c`.

> [...]

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
                     ` (3 preceding siblings ...)
  2016-06-01 10:45   ` [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs Nguyễn Thái Ngọc Duy
@ 2016-06-01 16:01   ` Jeff King
  2016-06-01 16:06   ` Junio C Hamano
  5 siblings, 0 replies; 25+ messages in thread
From: Jeff King @ 2016-06-01 16:01 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Johannes Sixt, mhagger, dturner

On Wed, Jun 01, 2016 at 05:45:15PM +0700, Nguyễn Thái Ngọc Duy wrote:

> This series makes sure that objects referenced by all worktrees are
> marked reachable so that we don't accidentally delete objects that are
> being used. Previously per-worktree references in index, detached HEAD
> or per-worktree reflogs come from current worktree only, not all
> worktrees.
> 
> The series deals with git-prune and git-gc specifically. I left out
> "git rev-list". It shares the same problem because it will only
> consider current worktree's HEAD, index and per-worktree reflogs. The
> problem is I am not sure if we simply just change, say
> --indexed-objects, to cover all indexes, or should we only do that
> with "--all-worktrees --indexed-objects". I guess this is up for
> discussion.

I don't think touching reachable.c is enough. You also need to make sure
that calling "git pack-objects --indexed-objects" will get them, too
(otherwise they will be either ejected loose or dropped completely
when --unpack-unreachable=2.weeks.ago or similar is used).

That code path uses rev-list internally. So I think you need something
like "--all-worktrees --indexed-objects", and you need to pass the new
option in from git-repack to pack-objects (or you need to simply make
"--indexed-objects" cover all worktrees by default).

-Peff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees
  2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
                     ` (4 preceding siblings ...)
  2016-06-01 16:01   ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Jeff King
@ 2016-06-01 16:06   ` Junio C Hamano
  2016-06-02  9:53     ` Duy Nguyen
  5 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-06-01 16:06 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Johannes Sixt, Jeff King, mhagger, dturner

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This series makes sure that objects referenced by all worktrees are
> marked reachable so that we don't accidentally delete objects that are
> being used. Previously per-worktree references in index, detached HEAD
> or per-worktree reflogs come from current worktree only, not all
> worktrees.

I'll let this topic simmer on the list for now, instead of picking
it up immediately to 'pu', as Michael in $gmane/296068 makes me
wonder if we want to keep piling on the current "worktree ref
iterations are bolted on" or if we want to first clean it up, whose
natural fallout hopefully would eliminate the bug away.

Thanks.

> The series deals with git-prune and git-gc specifically. I left out
> "git rev-list". It shares the same problem because it will only
> consider current worktree's HEAD, index and per-worktree reflogs. The
> problem is I am not sure if we simply just change, say
> --indexed-objects, to cover all indexes, or should we only do that
> with "--all-worktrees --indexed-objects". I guess this is up for
> discussion.
>
> Nguyễn Thái Ngọc Duy (4):
>   revision.c: move read_cache() out of add_index_objects_to_pending()
>   reachable.c: mark reachable objects in index from all worktrees
>   reachable.c: mark reachable detached HEAD from all worktrees
>   reachable.c: make reachable reflogs for all per-worktree reflogs
>
>  reachable.c      | 47 +++++++++++++++++++++++++++++++++++++++++------
>  revision.c       | 34 +++++++++++++++++++++++++++-------
>  revision.h       |  7 ++++++-
>  t/t5304-prune.sh | 40 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 114 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-06-01 15:15       ` Junio C Hamano
@ 2016-06-01 16:12         ` Michael Haggerty
  2016-06-01 19:39           ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Haggerty @ 2016-06-01 16:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Duy Nguyen, Johannes Sixt, Git Mailing List, David Turner

On 06/01/2016 05:15 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I think reference stores are going to need two distinct types of
>> reference iteration: one to iterate over the *logical* reference space
>> of a single repo or worktree, and one to find all *local* references
>> and/or reachability roots (e.g., when run within a linked repo).
> 
> This is hard reason about without defining *logical* and *local*.

By "logical" I mean the references that appear to be available in the
specified repository. Roughly, "what reference names can I pass to `git
rev-parse`?" If you are in a linked repo, your "logical" references are
most of those in the main repo plus the "HEAD" and "refs/bisect/*" and
probably a couple other pseudo-refs that are stored in your worktree.

"Local" is probably a bad name. What I mean is that we have multiple
physical places to store references. In particular, in the case of
linked repos we have the main repo's references plus we have the
worktree's local references.

Most Git commands care only about the "logical" references in the repo.
But some commands (mostly having to do with reachability) need to know
about all roots, and therefore have to know about the union of all of
the references stored in each of the linked repositories.

Note that in the second case it is a little silly even talking about
"references", because references in different linked repos might have
the same names. We're really talking about reachability roots that might
have a little metadata connected with them, like "this SHA-1 came from
reference 'HEAD' in the 'test' worktree" or "this SHA-1 came from the
reflog of 'refs/heads/master' in the main repo".

I argue that the fundamental concept in terms of the implementation
should be the individual physical reference stores, and these should be
compounded together to form the logical reference collections and the
sets of reachability roots that are interesting at the UI level.

> For protecting necessary objects from pruning by anchoring them as
> "reachable", and for avoiding unnecessary object transfer by showing
> things via ".have" lines, we're better consider references and
> reflogs that do not appear in the namespace visible from the current
> worktree.  I wanted to come up with an example of wanting to iterate
> over refs of the current worktree, but other than "what are my
> branches including HEAD?" I didn't think of anything interesting.

It doesn't show up much at the UI level, but there are some examples, like

* git log --decorate
* git for-each-ref --contains
* gitk --all

> [...]

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees
  2016-06-01 10:45   ` [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees Nguyễn Thái Ngọc Duy
@ 2016-06-01 18:13     ` Eric Sunshine
  2016-06-02  9:35       ` Duy Nguyen
  2016-06-01 18:57     ` David Turner
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Sunshine @ 2016-06-01 18:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Git List, Johannes Sixt, Jeff King, Michael Haggerty, David Turner

On Wed, Jun 1, 2016 at 6:45 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Current mark_reachable_objects() only marks objects from index from
> _current_ worktree as reachable instead of all worktrees. Because this
> function is used for pruning, there is a chance that objects referenced
> by other worktrees may be deleted. Fix that.
>
> Small behavior change in "one worktree" case, the index is read again
> from file. In the current implementation, if the_index is already
> loaded, the index file will not be read from file again. This adds some
> more cost to this operation, hopefully insignificant because
> reachability test is usually very expensive already.

Could this extra index read be avoided by taking advantage of 'struct
worktree::is_current'[1] and passing the already-loaded index to
add_index_objects_to_pending() if true?

Or, am I misunderstanding the issue?

[1]: http://article.gmane.org/gmane.comp.version-control.git/292194

> Reported-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> diff --git a/reachable.c b/reachable.c
> @@ -155,6 +156,32 @@ int add_unseen_recent_objects_to_traversal(struct > +static void add_objects_from_worktree(struct rev_info *revs)
> +{
> +       struct worktree **worktrees, **p;
> +
> +       worktrees = get_worktrees();
> +       for (p = worktrees; *p; p++) {
> +               struct worktree *wt = *p;
> +               struct index_state istate;
> +
> +               memset(&istate, 0, sizeof(istate));
> +               if (read_index_from(&istate,
> +                                   worktree_git_path(wt, "index")) > 0)
> +                       add_index_objects_to_pending(revs, 0, &istate);
> +               discard_index(&istate);
> +       }
> +       free_worktrees(worktrees);
> +
> +       /*
> +        * this is in case the index is already updated but not
> +        * written down in file yet, then add_index_... in the above
> +        * loop will miss new objects that are just created or
> +        * referenced.
> +        */
> +       add_index_objects_to_pending(revs, 0, &the_index);
> +}

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees
  2016-06-01 10:45   ` [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees Nguyễn Thái Ngọc Duy
  2016-06-01 18:13     ` Eric Sunshine
@ 2016-06-01 18:57     ` David Turner
  2016-06-02  9:37       ` Duy Nguyen
  1 sibling, 1 reply; 25+ messages in thread
From: David Turner @ 2016-06-01 18:57 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Johannes Sixt, Jeff King, mhagger

On Wed, 2016-06-01 at 17:45 +0700, Nguyễn Thái Ngọc Duy wrote:
> Current mark_reachable_objects() only marks objects from index from
> _current_ worktree as reachable instead of all worktrees. Because
> this
> function is used for pruning, there is a chance that objects
> referenced
> by other worktrees may be deleted. Fix that.
> 
> Small behavior change in "one worktree" case, the index is read again
> from file. In the current implementation, if the_index is already
> loaded, the index file will not be read from file again. This adds
> some
> more cost to this operation, hopefully insignificant because
> reachability test is usually very expensive already.
> 
> Reported-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  reachable.c      | 33 +++++++++++++++++++++++++++++----
>  t/t5304-prune.sh |  9 +++++++++
>  2 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/reachable.c b/reachable.c
> index 15dbe60..8f67242 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -9,6 +9,7 @@
>  #include "cache-tree.h"
>  #include "progress.h"
>  #include "list-objects.h"
> +#include "worktree.h"
>  
>  struct connectivity_progress {
>  	struct progress *progress;
> @@ -155,6 +156,32 @@ int
> add_unseen_recent_objects_to_traversal(struct rev_info *revs,
>  				      FOR_EACH_OBJECT_LOCAL_ONLY);
>  }
>  
> +static void add_objects_from_worktree(struct rev_info *revs)
> +{
> +	struct worktree **worktrees, **p;
> +
> +	worktrees = get_worktrees();
> +	for (p = worktrees; *p; p++) {
> +		struct worktree *wt = *p;
> +		struct index_state istate;
> +
> +		memset(&istate, 0, sizeof(istate));


Why not just struct index_state istate = {0}; ?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-06-01 16:12         ` Michael Haggerty
@ 2016-06-01 19:39           ` Junio C Hamano
  2016-06-02  4:08             ` Michael Haggerty
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2016-06-01 19:39 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, Duy Nguyen, Johannes Sixt, Git Mailing List, David Turner

Michael Haggerty <mhagger@alum.mit.edu> writes:

> I argue that the fundamental concept in terms of the implementation
> should be the individual physical reference stores, and these should be
> compounded together to form the logical reference collections and the
> sets of reachability roots that are interesting at the UI level.

That is very good in principle.  How does that principle translate
to the current setup (with possible enhancement with pluggable ref
backends) and multiple worktrees?  Let me try thinking it through
aloud.

 * Without pluggable ref backend or worktrees, we start from two
   "physical reference stores"; packed-refs file lists refs that
   will be covered (overridden) by loose refs in .git/refs/.
   Symbolic refs always being in loose falls out as a natural
   consequence that packed-refs file does not record symrefs.

 * Throw in multiple worktrees to the mix.  How?  Do we consider
   selected refs/ hierarchies (like refs/bisect/*) as separate
   physical store (even though it might be backed by the files in
   the same .git/refs/ filesystem hierarchy) and represent the
   "logical" view as an overlay across the traditional two types of
   physical reference stores?  That is:

   - loose refs in .git/HEAD, .git/refs/{bisect,...} for
     per-worktree part form one physical store.  If a ref is found
     here, that is what we use as a part of the logical view.

   - loose refs in .git/refs/{branches,tags,notes,...} for common
     part form one physical store.  For a ref that is not found
     above but is found here becomes a part of the logical view.

   - packed refs in .git/packed-refs is another physical store.  For
     a ref that is not found in the above two but is found here
     becomes a part of the logical view.

Up to this point, I am all for your "separate physical stores are
composited to give a logical view".  I can see how multi-worktree
world view fits within that framework.

 * With pluggable ref backend, we may gain yet another "physical
   reference store" possibility, e.g. one backed by lmdb.  If it
   supports symrefs, a repoitory may use lmdb backed reference store
   without the traditional two.

   But it is unclear how it would interact with the multi-worktree
   world order.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-06-01 19:39           ` Junio C Hamano
@ 2016-06-02  4:08             ` Michael Haggerty
  2016-06-03 16:45               ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Haggerty @ 2016-06-02  4:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Duy Nguyen, Johannes Sixt, Git Mailing List, David Turner

On 06/01/2016 09:39 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> I argue that the fundamental concept in terms of the implementation
>> should be the individual physical reference stores, and these should be
>> compounded together to form the logical reference collections and the
>> sets of reachability roots that are interesting at the UI level.
> 
> That is very good in principle.  How does that principle translate
> to the current setup (with possible enhancement with pluggable ref
> backends) and multiple worktrees?  Let me try thinking it through
> aloud.
> 
>  * Without pluggable ref backend or worktrees, we start from two
>    "physical reference stores"; packed-refs file lists refs that
>    will be covered (overridden) by loose refs in .git/refs/.
>    Symbolic refs always being in loose falls out as a natural
>    consequence that packed-refs file does not record symrefs.
> 
>  * Throw in multiple worktrees to the mix.  How?  Do we consider
>    selected refs/ hierarchies (like refs/bisect/*) as separate
>    physical store (even though it might be backed by the files in
>    the same .git/refs/ filesystem hierarchy) and represent the
>    "logical" view as an overlay across the traditional two types of
>    physical reference stores?  That is:
> 
>    - loose refs in .git/HEAD, .git/refs/{bisect,...} for
>      per-worktree part form one physical store.  If a ref is found
>      here, that is what we use as a part of the logical view.
> 
>    - loose refs in .git/refs/{branches,tags,notes,...} for common
>      part form one physical store.  For a ref that is not found
>      above but is found here becomes a part of the logical view.
> 
>    - packed refs in .git/packed-refs is another physical store.  For
>      a ref that is not found in the above two but is found here
>      becomes a part of the logical view.

I think I would represent the logical store of a worktree repo as
follows. First, I would implement a cached_ref_store that introduces a
layer of caching around another ref_store. Then

    def get_files_ref_store(dir) {
        loose = create_cached_ref_store(get_loose_ref_store(dir))
        packed = create_cached_ref_store(get_packed_ref_store(dir))
        return create_files_ref_store(loose, packed)
    }

    common_ref_store = get_files_ref_store(common_dir)

    /*
     * I think we only allow loose refs in worktrees; otherwise
     * this could be an overlay_ref_store too. Actually, we might
     * want to omit the caching here.
     */
    local_ref_store = create_cached_ref_store(
            get_loose_ref_store(git_dir))

    logical_ref_store = create_worktree_ref_store(
        local_ref_store, common_ref_store)

Where worktree_ref_store does something like

    if (is_per_worktree_ref(refname))
        lookup in local_ref_store
    else
        lookup in common_ref_store

for reading, and uses a merge_ref_iterator with a select function that
does something similar for iterating.

The files_ref_store would do lookups by looking first in the
loose_ref_store then in the packed_ref_store, would use an
overlay_ref_iterator for iteration, and would know to do all writes in
the loose_ref_store (except for deletes, which also have to go to
packed_ref_store). It would have a special "pack-refs" operation,
specific to files_ref_store, that shuffles references between its two
backends.

Writing to a worktree_ref_store is a bit tricker, because we want to
allow ref_transactions to span worktree and common refs (though we
probably need to give up atomicity for any such transaction). The
worktree_ref_transaction_commit() method has to split the main
transaction into two sub-transactions, one for each of its component
ref_stores. I planned for this when designing split_under_lock and think
it is possible, though I admit I haven't implemented it yet.

One nice thing about this design is that you can skip the
worktree_ref_store layer and its overhead entirely for repositories that
are not linked. The decision can be made once, at instantiation time,
rather than every time a reference is looked up. See the pseudocode below.

> Up to this point, I am all for your "separate physical stores are
> composited to give a logical view".  I can see how multi-worktree
> world view fits within that framework.
> 
>  * With pluggable ref backend, we may gain yet another "physical
>    reference store" possibility, e.g. one backed by lmdb.  If it
>    supports symrefs, a repoitory may use lmdb backed reference store
>    without the traditional two.
> 
>    But it is unclear how it would interact with the multi-worktree
>    world order.

Since you could plug-and-play different ref_stores in the above scheme,
I don't see any problem here.

    def get_logical_ref_store() {
        local_ref_store = get_local_ref_store(git_dir)
        if (is_linked_repo) {
            common_ref_store = get_ref_store(common_dir)
            return worktree_ref_store(local_ref_store,
                                      common_ref_store)
        } else {
            return local_ref_store;
        }
    }

get_ref_store() would read the git config to decide what the ref store
to use for the specified repository, which itself might be an
lmdb_ref_store or an overlay_ref_store(loose_ref_store, packed_ref_store).

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees
  2016-06-01 18:13     ` Eric Sunshine
@ 2016-06-02  9:35       ` Duy Nguyen
  0 siblings, 0 replies; 25+ messages in thread
From: Duy Nguyen @ 2016-06-02  9:35 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Johannes Sixt, Jeff King, Michael Haggerty, David Turner

On Thu, Jun 2, 2016 at 1:13 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Wed, Jun 1, 2016 at 6:45 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>> Current mark_reachable_objects() only marks objects from index from
>> _current_ worktree as reachable instead of all worktrees. Because this
>> function is used for pruning, there is a chance that objects referenced
>> by other worktrees may be deleted. Fix that.
>>
>> Small behavior change in "one worktree" case, the index is read again
>> from file. In the current implementation, if the_index is already
>> loaded, the index file will not be read from file again. This adds some
>> more cost to this operation, hopefully insignificant because
>> reachability test is usually very expensive already.
>
> Could this extra index read be avoided by taking advantage of 'struct
> worktree::is_current'[1] and passing the already-loaded index to
> add_index_objects_to_pending() if true?
>
> Or, am I misunderstanding the issue?
>
> [1]: http://article.gmane.org/gmane.comp.version-control.git/292194

Hah! I broke my worktree changes into many pieces and lost track of
them. I thought that one was still in some unsent series. Will use it.
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees
  2016-06-01 18:57     ` David Turner
@ 2016-06-02  9:37       ` Duy Nguyen
  0 siblings, 0 replies; 25+ messages in thread
From: Duy Nguyen @ 2016-06-02  9:37 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List, Johannes Sixt, Jeff King, Michael Haggerty

On Thu, Jun 2, 2016 at 1:57 AM, David Turner <dturner@twopensource.com> wrote:
>> +             struct index_state istate;
>> +
>> +             memset(&istate, 0, sizeof(istate));
>
>
> Why not just struct index_state istate = {0}; ?
>

My first thought was.. "hmm.. C99?" but then there are 24 of them in
the code base already. Will change.
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees
  2016-06-01 16:06   ` Junio C Hamano
@ 2016-06-02  9:53     ` Duy Nguyen
  2016-06-02 11:26       ` Michael Haggerty
  0 siblings, 1 reply; 25+ messages in thread
From: Duy Nguyen @ 2016-06-02  9:53 UTC (permalink / raw)
  To: Junio C Hamano, Michael Haggerty
  Cc: Git Mailing List, Johannes Sixt, Jeff King, David Turner

(from patch 4/4 mail)

On Wed, Jun 1, 2016 at 10:51 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> +     path = xstrdup(worktree_git_path(wt, "logs/refs/bisect"));
>> +     if (file_exists(path))
>> +             handle_one_reflog(path, NULL, 0, &cb);
>> +     free(path);
>> +}
>
> `refs/bisect` is not a single reference. It is a namespace that contains
> references with names like `refs/bisect/bad` and
> `refs/bisect/good-66106691a1b71e445fe5e4d6b8b043dffc7dfe4c`.

Yeah I missed that. I'm not going to write another directory walker to
collect all logs/refs/bisect/*. I didn't add pending objects for
refs/bisect/* of other worktrees either. At that point waiting for the
new ref iterator makes more sense...

On Wed, Jun 1, 2016 at 11:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> This series makes sure that objects referenced by all worktrees are
>> marked reachable so that we don't accidentally delete objects that are
>> being used. Previously per-worktree references in index, detached HEAD
>> or per-worktree reflogs come from current worktree only, not all
>> worktrees.
>
> I'll let this topic simmer on the list for now, instead of picking
> it up immediately to 'pu', as Michael in $gmane/296068 makes me
> wonder if we want to keep piling on the current "worktree ref
> iterations are bolted on" or if we want to first clean it up, whose
> natural fallout hopefully would eliminate the bug away.

So what should be the way forward? My intention was having something
that can fix the problem for now, even if a bit hacky while waiting
for ref iterator to be ready, then convert to use it and clean things
up, because I don't how long ref-iterator would take and losing data
is serous enough that I'd like to fix it soon. If we go with "fix soon
then convert to ref-iterator later", then I will drop the
logs/bisect/* check, checking logs/HEAD alone should be good enough.
Otherwise I'll prepare a series on top of ref-iterator.
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees
  2016-06-02  9:53     ` Duy Nguyen
@ 2016-06-02 11:26       ` Michael Haggerty
  2016-06-02 17:44         ` Junio C Hamano
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Haggerty @ 2016-06-02 11:26 UTC (permalink / raw)
  To: Duy Nguyen, Junio C Hamano
  Cc: Git Mailing List, Johannes Sixt, Jeff King, David Turner

On 06/02/2016 11:53 AM, Duy Nguyen wrote:
> (from patch 4/4 mail)
> 
> On Wed, Jun 1, 2016 at 10:51 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> +     path = xstrdup(worktree_git_path(wt, "logs/refs/bisect"));
>>> +     if (file_exists(path))
>>> +             handle_one_reflog(path, NULL, 0, &cb);
>>> +     free(path);
>>> +}
>>
>> `refs/bisect` is not a single reference. It is a namespace that contains
>> references with names like `refs/bisect/bad` and
>> `refs/bisect/good-66106691a1b71e445fe5e4d6b8b043dffc7dfe4c`.
> 
> Yeah I missed that. I'm not going to write another directory walker to
> collect all logs/refs/bisect/*. I didn't add pending objects for
> refs/bisect/* of other worktrees either. At that point waiting for the
> new ref iterator makes more sense...
> 
> On Wed, Jun 1, 2016 at 11:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>>
>>> This series makes sure that objects referenced by all worktrees are
>>> marked reachable so that we don't accidentally delete objects that are
>>> being used. Previously per-worktree references in index, detached HEAD
>>> or per-worktree reflogs come from current worktree only, not all
>>> worktrees.
>>
>> I'll let this topic simmer on the list for now, instead of picking
>> it up immediately to 'pu', as Michael in $gmane/296068 makes me
>> wonder if we want to keep piling on the current "worktree ref
>> iterations are bolted on" or if we want to first clean it up, whose
>> natural fallout hopefully would eliminate the bug away.
> 
> So what should be the way forward? My intention was having something
> that can fix the problem for now, even if a bit hacky while waiting
> for ref iterator to be ready, then convert to use it and clean things
> up, because I don't how long ref-iterator would take and losing data
> is serous enough that I'd like to fix it soon. If we go with "fix soon
> then convert to ref-iterator later", then I will drop the
> logs/bisect/* check, checking logs/HEAD alone should be good enough.
> Otherwise I'll prepare a series on top of ref-iterator.

Fixing reachability via the index and detached HEADs feels relatively
important.

Fixing refs/bisect/* feels a bit less urgent, because (1) usually
bisections don't take very long, and (2) if the commits that one is
bisecting are not otherwise reachable anymore, then the bisection is
probably not interesting anymore. However, these are real references, so
if they get broken, the worktree will be seen as corrupt and recovery is
not super obvious (I guess most people would end up deleting the whole
worktree).

Fixing the reflogs for HEAD and (especially) refs/bisect/* in worktrees
feels even less important, because reflogs are not nearly as important
as current ref values, Git is relatively tolerant of broken reflog
entries, and there are easy ways to prune them if breakage occurs.

It's hard for me to predict when the ref-iterator stuff will be merged.
It is a big change, but so far the feedback seems pretty good. I can
tell you that pushing it and ref-stores forward is high on my priority list.

Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] Fix prune/gc problem with multiple worktrees
  2016-06-02 11:26       ` Michael Haggerty
@ 2016-06-02 17:44         ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2016-06-02 17:44 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Duy Nguyen, Git Mailing List, Johannes Sixt, Jeff King, David Turner

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Fixing reachability via the index and detached HEADs feels relatively
> important.
> ...

I agree with the order of importance above.  But "relatively" is a
very good keyword.  Just like bisection refs, what is in the index
and the commit detached HEAD points at are expected to be tentative.
As a part of still-experimental feature, I'd rather see our
bandwidth spent on fixing it the right way first time, instead of
piling on an unproven quick-fix as a band aid, having to rip it off
and fixing it properly later.

> It's hard for me to predict when the ref-iterator stuff will be merged.
> It is a big change, but so far the feedback seems pretty good. I can
> tell you that pushing it and ref-stores forward is high on my priority list.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git gc and worktrees
  2016-06-02  4:08             ` Michael Haggerty
@ 2016-06-03 16:45               ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2016-06-03 16:45 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Jeff King, Duy Nguyen, Johannes Sixt, Git Mailing List, David Turner

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 06/01/2016 09:39 PM, Junio C Hamano wrote:
>> ...
> I think I would represent the logical store of a worktree repo as
> follows. First, ...
> ...
>> Up to this point, I am all for your "separate physical stores are
>> composited to give a logical view".  I can see how multi-worktree
>> world view fits within that framework.
>> 
>>  * With pluggable ref backend, we may gain yet another "physical
>>    reference store" possibility, e.g. one backed by lmdb.  If it
>>    supports symrefs, a repoitory may use lmdb backed reference store
>>    without the traditional two.
>> 
>>    But it is unclear how it would interact with the multi-worktree
>>    world order.
>
> Since you could plug-and-play different ref_stores in the above scheme,
> I don't see any problem here.
>
>     def get_logical_ref_store() {
>         local_ref_store = get_local_ref_store(git_dir)
>         if (is_linked_repo) {
>             common_ref_store = get_ref_store(common_dir)
>             return worktree_ref_store(local_ref_store,
>                                       common_ref_store)
>         } else {
>             return local_ref_store;
>         }
>     }
>
> get_ref_store() would read the git config to decide what the ref store
> to use for the specified repository, which itself might be an
> lmdb_ref_store or an overlay_ref_store(loose_ref_store, packed_ref_store).

Sounds all sensible.  Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-06-03 16:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-31  7:07 git gc and worktrees Johannes Sixt
2016-05-31 12:02 ` Duy Nguyen
2016-05-31 22:14   ` Jeff King
2016-06-01  7:00     ` Johannes Sixt
2016-06-01  8:57     ` Michael Haggerty
2016-06-01 15:15       ` Junio C Hamano
2016-06-01 16:12         ` Michael Haggerty
2016-06-01 19:39           ` Junio C Hamano
2016-06-02  4:08             ` Michael Haggerty
2016-06-03 16:45               ` Junio C Hamano
2016-06-01 10:45 ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Nguyễn Thái Ngọc Duy
2016-06-01 10:45   ` [PATCH 1/4] revision.c: move read_cache() out of add_index_objects_to_pending() Nguyễn Thái Ngọc Duy
2016-06-01 10:45   ` [PATCH 2/4] reachable.c: mark reachable objects in index from all worktrees Nguyễn Thái Ngọc Duy
2016-06-01 18:13     ` Eric Sunshine
2016-06-02  9:35       ` Duy Nguyen
2016-06-01 18:57     ` David Turner
2016-06-02  9:37       ` Duy Nguyen
2016-06-01 10:45   ` [PATCH 3/4] reachable.c: mark reachable detached HEAD " Nguyễn Thái Ngọc Duy
2016-06-01 10:45   ` [PATCH 4/4] reachable.c: make reachable reflogs for all per-worktree reflogs Nguyễn Thái Ngọc Duy
2016-06-01 15:51     ` Michael Haggerty
2016-06-01 16:01   ` [PATCH 0/4] Fix prune/gc problem with multiple worktrees Jeff King
2016-06-01 16:06   ` Junio C Hamano
2016-06-02  9:53     ` Duy Nguyen
2016-06-02 11:26       ` Michael Haggerty
2016-06-02 17:44         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.