All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Speedup recursive by flushing index only once for all entries
@ 2007-01-04 10:47 Alex Riesen
  2007-01-04 12:33 ` Johannes Schindelin
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-04 10:47 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Johannes Schindelin, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

Merge-recursive is very slow in repos with lots of files,
especially if lots of them change absolutely identically.
Updating index once after all of them changes speedups
merge quite noticable.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
---

Johannes, I remember suggesting to do index flush for all
entries instead for every entry. It is already quite time ago,
but ... was there any reasons for not doing this?
The patch speeds it up a lot and no wonder: index is 6Mb
here, and this is cygwin.

 merge-recursive.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

[-- Attachment #2: 0001-Speedup-recursive-by-flushing-index-only-once-for-all-entries.txt --]
[-- Type: text/plain, Size: 972 bytes --]

From d0e4d791cef8b307b32954a8b80c4aabd41755a9 Mon Sep 17 00:00:00 2001
From: Alex Riesen <raa.lkml@gmail.com>
Date: Thu, 4 Jan 2007 11:22:47 +0100
Subject: Speedup recursive by flushing index only once for all entries

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
---

 merge-recursive.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index bac16f5..4d3a2ce 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1083,9 +1083,6 @@ static int process_entry(const char *path, struct stage_data *entry,
 	} else
 		die("Fatal merge failure, shouldn't happen.");
 
-	if (cache_dirty)
-		flush_cache();
-
 	return clean_merge;
 }
 
@@ -1133,6 +1130,8 @@ static int merge_trees(struct tree *head,
 			if (!process_entry(path, e, branch1, branch2))
 				clean = 0;
 		}
+		if (cache_dirty)
+			flush_cache();
 
 		path_list_clear(re_merge, 0);
 		path_list_clear(re_head, 0);
-- 
1.5.0.rc0.g8bc4b-dirty


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-04 10:47 [PATCH] Speedup recursive by flushing index only once for all entries Alex Riesen
@ 2007-01-04 12:33 ` Johannes Schindelin
  2007-01-04 12:47   ` Alex Riesen
  0 siblings, 1 reply; 40+ messages in thread
From: Johannes Schindelin @ 2007-01-04 12:33 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano

Hi,

On Thu, 4 Jan 2007, Alex Riesen wrote:

> Johannes, I remember suggesting to do index flush for all
> entries instead for every entry. It is already quite time ago,
> but ... was there any reasons for not doing this?

I wanted to be on the safe side, and eventually look through the code 
again for possible problems.

I think what you did is safe, since you moved the call from 
process_entry() to its sole caller, merge_trees().

However, I was wondering if the index has to be written at all. 
I expect the written index (except the last one, of course) to have no 
user...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-04 12:33 ` Johannes Schindelin
@ 2007-01-04 12:47   ` Alex Riesen
  2007-01-04 20:22     ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-04 12:47 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git Mailing List, Junio C Hamano

On 1/4/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> > Johannes, I remember suggesting to do index flush for all
> > entries instead for every entry. It is already quite time ago,
> > but ... was there any reasons for not doing this?
>
> I wanted to be on the safe side, and eventually look through the code
> again for possible problems.
>
> I think what you did is safe, since you moved the call from
> process_entry() to its sole caller, merge_trees().

Me too, just wondered why didn't we do this back then.
Anyway, my "monster-merge" and the builtin tests pass with
no visible problems.

> However, I was wondering if the index has to be written at all.
> I expect the written index (except the last one, of course) to have no
> user...

Good question...

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-04 12:47   ` Alex Riesen
@ 2007-01-04 20:22     ` Junio C Hamano
  2007-01-05 11:22       ` Alex Riesen
  0 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2007-01-04 20:22 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Johannes Schindelin, Git Mailing List, Junio C Hamano

"Alex Riesen" <raa.lkml@gmail.com> writes:

> Me too, just wondered why didn't we do this back then.
> Anyway, my "monster-merge" and the builtin tests pass with
> no visible problems.
>
>> However, I was wondering if the index has to be written at all.
>> I expect the written index (except the last one, of course) to have no
>> user...
>
> Good question...

That's most likely because you played safe, and started from the
Python version whose only way to manipulate the index and write
out a tree was to actually write the index out.

So let's step back a bit.

 * The top-level interface is merge() function that takes two
   heads and the ancestor, and is responsible for coming up with
   a merged tree in *result if it can.  The main() calls it to
   see if things merge cleanly, and wants the index populated
   with the final merge result, be it clean or unmerged.

 * merge() in addition to two heads and their ancestor takes
   call-depth because it does the "recursive" business.  When it
   operates with positive call-depth, it is coming up with a
   virtual commit that has the merge result tree of two merge
   bases.  In this case index_only is set to true and what is in
   the working tree does not matter (i.e. usual "your local
   modification will be clobbered, cannot merge" check should
   not apply [*1*]).  It calls setup_index() to read in the true
   index (for the outermost case) or a temporary index (from one
   of the ancestor trees in the recursive case) before passing
   control to the real workhorse, merge_trees().

   What merge() does before its call to setup_index() does not
   depend on what the index is (even in recursive case, because
   the real work done by merge_trees() in the recursively called
   merge() is preceded by the call to setup_index()).  When
   merge() returns, the index contents, both in-core and on the
   filesystem, matter only for the outermost call to merge().

   So that suggests that flush_cache() in main() needs to stay.

 * merge_trees() takes the two heads and the ancestor, and comes
   up with the merge result, using in-core index.  It calls
   git_merge_trees() which is 3-way "read-tree -m", and calls
   git_write_tree() with two purposes: (1) to see if the merge
   was cleanly done, and (2) to get the tree to be used as the
   virtual ancestor (in other words, for the outermost merge,
   writing of the tree is not important -- a tree is produced as
   an unused side effect of seeing if the merge was clean).  If
   it results in a clean merge for the outermost merge, the call
   to flush_cache() in main() is what matters -- the calling
   script of merge-recursive writes a tree out from the index
   written by that call.  I'll address the need for
   git_write_tree() to write the index shortly.

   If the call to git_merge_trees() results in conflicted merge,
   merge_trees() falls into the rename detection codepath.  Most
   of the work is done in process_renames(), which updates the
   index, and then remaining unmatched paths are dealt with by
   calling process_entry(), which also updates the index.  These
   index updates could all be done in-core without writing the
   resulting index file out; we should not discard nor re-read
   the index while they are processing one path at a time.

   In a sense, merge_trees() does what 3-way "read-tree -m"
   could have done if it were rename aware.

 * git_merge_trees() is a bit questionable.  It reads from the
   current_index_file which is the true index for the outermost
   merge or the temporary index populated from the 'head'
   parameter give to it earlier by merge().  I think this use of
   temporary index is not necessary.  In other words, we could
   start from an empty cache if index_only, I think.

   And I think the reason git_write_tree() writes the index out
   is because of this call to read_cache_from() at the beginning
   of git_merge_trees().  It is told to write a tree out of the
   current in-core index -- so I do not know why it needs to
   call read_cache_from() to begin with.

Given the above analysis, it seems to me that the current code
too heavily inherited the invariant that the in-core and on
filesystem index should match at every step from the original
Python version.  I think your patch goes in the right direction
to correct that, but it does not spell out the new invariant the
code assumes cleanly enough.  For example, I do not think you
would want the call to flush_cache() in process_renames() for
the same reason you took out the call from process_entry() --
they both make many calls to update_file() and remove_file() to
touch the index and the working tree.

How about making the invariants to be:

	upon entry of merge(), the in-core cache is populated as
	appropriate for the merge.  That is, it has the contents
	of the true index for the outermost one, and discarded
	for the virtual ancestor merge.

	upon exit from merge(), the in-core cache holds the
	merge result for that round of merge.  That is, it is
	suitable for flush_cache() to leave the final result for
	the outermost merge, and it is a merged index that wrote
	the virtual ancestor tree was written out from for inner
	merges.

The codeflow would then become like this:

	main() {
                hold_lock_file_for_update(lock, git_path("index"), 1);
                merge();
		write_cache() || close || commit_lock_file();
	}

	merge() {
        	while (more than one ancestor) {
			discard_cache();
                        merge(two ancestors using their common);
		}
		discard_cache();
		if (call_depth == 0) {
                	read_cache();
                        index_only = 0;
		}
		else
                	index_only = 1;
                merge_trees();
	}

        merge_trees() {
		if (up to date)
                	return;
		git_merge_trees();
		if (in-core index is unmerged) {
			process_renames();
		}
                if (index_only)
                       	git_write_tree();
        }

        git_merge_trees() {
        	unpack_trees();
        }

        git_write_tree() {
		if (stale cache tree)
                	cache_tree_update();
                lookup_tree();
	}

        process_renames(), process_entry() {
		call remove_file() and update_file() as needed,
		trusting that the caller set up the in-core
                index as appropriate and previous calls to these
                functions left the in-core index to correctly
                reflect what have been done so far.
        }

By the way, Alex, you seem to heavily work on Cygwin, and I am
interested in your experience with Shawn's sliding mmap() on
real projects, as I suspect Cygwin would be more heavily
impacted with any mmap() related changes.  You already said
performance is "bearable".  Does that mean it was better and
got worse but still bearable, or it was unusably slow but now it
is bearably usable?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-04 20:22     ` Junio C Hamano
@ 2007-01-05 11:22       ` Alex Riesen
  2007-01-07 16:31         ` Alex Riesen
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-05 11:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Git Mailing List

On 1/4/07, Junio C Hamano <junkio@cox.net> wrote:
> >> However, I was wondering if the index has to be written at all.
> >> I expect the written index (except the last one, of course) to have no
> >> user...
> >
> > Good question...
>
> That's most likely because you played safe, and started from the
> Python version whose only way to manipulate the index and write
> out a tree was to actually write the index out.

Yes, it was because of that. Just didn't thought about when to
update index at all (never really had a time).

> So let's step back a bit.

Excellent analysis, thanks! I suspect heavily it will work as is.
Now, if only someone could find time to code it up...

> By the way, Alex, you seem to heavily work on Cygwin, and I am
> interested in your experience with Shawn's sliding mmap() on
> real projects, as I suspect Cygwin would be more heavily
> impacted with any mmap() related changes.  You already said
> performance is "bearable".  Does that mean it was better and
> got worse but still bearable, or it was unusably slow but now it
> is bearably usable?

It is usably slow: ~30 sec for a commit (I stopped using normal
commit, using update-index and simplified index commit now),
around minute for the recursive merges (if they are simple),
~10 sec for a hot-cache hard reset. Avoiding gitk whenever
possible.

Compared to my only linux system here:

2-3 times slower in diff-tree for 44K files, around 9k differences
(0.2 sec against 0.6 sec, it quickly adds up if you do it often,
like when merging (it's slower for really big merges, constrained
by CPU and memory), commiting, gitk).

The windows machine is a corporate Lenovo 2.66MHz/1Gb,SATA laptop.
The linux machine is 1.2MHz/384Mb, IDE noname notebook.

I somehow adapted myself to it though (reading mails, drinking coffee).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-05 11:22       ` Alex Riesen
@ 2007-01-07 16:31         ` Alex Riesen
  2007-01-10 18:06           ` Junio C Hamano
  2007-01-10 19:28           ` Junio C Hamano
  0 siblings, 2 replies; 40+ messages in thread
From: Alex Riesen @ 2007-01-07 16:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Git Mailing List

Alex Riesen, Fri, Jan 05, 2007 12:22:39 +0100:
> >So let's step back a bit.
> 
> Excellent analysis, thanks! I suspect heavily it will work as is.
> Now, if only someone could find time to code it up...
> 

I'm sorry for asking (because I'm partly guilty in the mess the
merge-recursive is), but could you accept at least the patch which
started the thread? It's not as if it breaks something, or giving
wrong ideas, or anything. It's incomplete, but so long noone seem to
be able to find the time to finish the job, the patch will improve the
state of affairs a bit, will it not?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-07 16:31         ` Alex Riesen
@ 2007-01-10 18:06           ` Junio C Hamano
  2007-01-10 19:28           ` Junio C Hamano
  1 sibling, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-10 18:06 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Johannes Schindelin, Git Mailing List

fork0@t-online.de (Alex Riesen) writes:

> Alex Riesen, Fri, Jan 05, 2007 12:22:39 +0100:
>> >So let's step back a bit.
>> 
>> Excellent analysis, thanks! I suspect heavily it will work as is.
>> Now, if only someone could find time to code it up...
>> 
>
> I'm sorry for asking (because I'm partly guilty in the mess the
> merge-recursive is), but could you accept at least the patch which
> started the thread? It's not as if it breaks something, or giving
> wrong ideas, or anything. It's incomplete, but so long noone seem to
> be able to find the time to finish the job, the patch will improve the
> state of affairs a bit, will it not?

I'm hacking on this right now.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-07 16:31         ` Alex Riesen
  2007-01-10 18:06           ` Junio C Hamano
@ 2007-01-10 19:28           ` Junio C Hamano
  2007-01-10 22:11             ` Junio C Hamano
                               ` (3 more replies)
  1 sibling, 4 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-10 19:28 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

This comes on top of yours.  

I'm reproducing all the merges in linux-2.6 history to make sure
the base one, yours and this produce the same result (the same
clean merge, or the same unmerged index and the same diff from
HEAD).  So far it is looking good.

-- >8 --
From: Junio C Hamano <junkio@cox.net>
Date: Wed, 10 Jan 2007 11:20:58 -0800
Subject: [PATCH] merge-recursive: do not use on-file index when not needed.

This revamps the merge-recursive implementation following the
outline in:

	Message-ID: <7v8xgileza.fsf@assigned-by-dhcp.cox.net>

There is no need to write out the index until the very end just
once from merge-recursive.  Also there is no need to write out
the resulting tree object for the simple case of merging with a
single merge base.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 merge-recursive.c |  169 ++++++++++++++--------------------------------------
 1 files changed, 46 insertions(+), 123 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index aab4c34..5237021 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -110,35 +110,6 @@ static void output_commit_title(struct commit *commit)
 	}
 }
 
-static const char *current_index_file = NULL;
-static const char *original_index_file;
-static const char *temporary_index_file;
-static int cache_dirty = 0;
-
-static int flush_cache(void)
-{
-	/* flush temporary index */
-	struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
-	int fd = hold_lock_file_for_update(lock, current_index_file, 1);
-	if (write_cache(fd, active_cache, active_nr) ||
-			close(fd) || commit_lock_file(lock))
-		die ("unable to write %s", current_index_file);
-	discard_cache();
-	cache_dirty = 0;
-	return 0;
-}
-
-static void setup_index(int temp)
-{
-	current_index_file = temp ? temporary_index_file: original_index_file;
-	if (cache_dirty) {
-		discard_cache();
-		cache_dirty = 0;
-	}
-	unlink(temporary_index_file);
-	discard_cache();
-}
-
 static struct cache_entry *make_cache_entry(unsigned int mode,
 		const unsigned char *sha1, const char *path, int stage, int refresh)
 {
@@ -167,9 +138,6 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
 		const char *path, int stage, int refresh, int options)
 {
 	struct cache_entry *ce;
-	if (!cache_dirty)
-		read_cache_from(current_index_file);
-	cache_dirty++;
 	ce = make_cache_entry(mode, sha1 ? sha1 : null_sha1, path, stage, refresh);
 	if (!ce)
 		return error("cache_addinfo failed: %s", strerror(cache_errno));
@@ -187,26 +155,6 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
  */
 static int index_only = 0;
 
-static int git_read_tree(struct tree *tree)
-{
-	int rc;
-	struct object_list *trees = NULL;
-	struct unpack_trees_options opts;
-
-	if (cache_dirty)
-		die("read-tree with dirty cache");
-
-	memset(&opts, 0, sizeof(opts));
-	object_list_append(&tree->object, &trees);
-	rc = unpack_trees(trees, &opts);
-	cache_tree_free(&active_cache_tree);
-
-	if (rc == 0)
-		cache_dirty = 1;
-
-	return rc;
-}
-
 static int git_merge_trees(int index_only,
 			   struct tree *common,
 			   struct tree *head,
@@ -216,11 +164,6 @@ static int git_merge_trees(int index_only,
 	struct object_list *trees = NULL;
 	struct unpack_trees_options opts;
 
-	if (!cache_dirty) {
-		read_cache_from(current_index_file);
-		cache_dirty = 1;
-	}
-
 	memset(&opts, 0, sizeof(opts));
 	if (index_only)
 		opts.index_only = 1;
@@ -236,39 +179,37 @@ static int git_merge_trees(int index_only,
 
 	rc = unpack_trees(trees, &opts);
 	cache_tree_free(&active_cache_tree);
-
-	cache_dirty = 1;
-
 	return rc;
 }
 
+static int unmerged_index(void)
+{
+	int i;
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+		if (ce_stage(ce))
+			return 1;
+	}
+	return 0;
+}
+
 static struct tree *git_write_tree(void)
 {
 	struct tree *result = NULL;
 
-	if (cache_dirty) {
-		unsigned i;
-		for (i = 0; i < active_nr; i++) {
-			struct cache_entry *ce = active_cache[i];
-			if (ce_stage(ce))
-				return NULL;
-		}
-	} else
-		read_cache_from(current_index_file);
+	if (unmerged_index())
+		return NULL;
 
 	if (!active_cache_tree)
 		active_cache_tree = cache_tree();
 
 	if (!cache_tree_fully_valid(active_cache_tree) &&
-			cache_tree_update(active_cache_tree,
-				active_cache, active_nr, 0, 0) < 0)
+	    cache_tree_update(active_cache_tree,
+			      active_cache, active_nr, 0, 0) < 0)
 		die("error building trees");
 
 	result = lookup_tree(active_cache_tree->sha1);
 
-	flush_cache();
-	cache_dirty = 0;
-
 	return result;
 }
 
@@ -331,10 +272,7 @@ static struct path_list *get_unmerged(void)
 	int i;
 
 	unmerged->strdup_paths = 1;
-	if (!cache_dirty) {
-		read_cache_from(current_index_file);
-		cache_dirty++;
-	}
+
 	for (i = 0; i < active_nr; i++) {
 		struct path_list_item *item;
 		struct stage_data *e;
@@ -469,9 +407,6 @@ static int remove_file(int clean, const char *path, int no_wd)
 	int update_working_directory = !index_only && !no_wd;
 
 	if (update_cache) {
-		if (!cache_dirty)
-			read_cache_from(current_index_file);
-		cache_dirty++;
 		if (remove_file_from_cache(path))
 			return -1;
 	}
@@ -1105,9 +1040,7 @@ static int merge_trees(struct tree *head,
 		    sha1_to_hex(head->object.sha1),
 		    sha1_to_hex(merge->object.sha1));
 
-	*result = git_write_tree();
-
-	if (!*result) {
+	if (unmerged_index()) {
 		struct path_list *entries, *re_head, *re_merge;
 		int i;
 		path_list_clear(&current_file_set, 1);
@@ -1128,17 +1061,11 @@ static int merge_trees(struct tree *head,
 			if (!process_entry(path, e, branch1, branch2))
 				clean = 0;
 		}
-		if (cache_dirty)
-			flush_cache();
 
 		path_list_clear(re_merge, 0);
 		path_list_clear(re_head, 0);
 		path_list_clear(entries, 1);
 
-		if (clean || index_only)
-			*result = git_write_tree();
-		else
-			*result = NULL;
 	} else {
 		clean = 1;
 		printf("merging of trees %s and %s resulted in %s\n",
@@ -1146,6 +1073,8 @@ static int merge_trees(struct tree *head,
 		       sha1_to_hex(merge->object.sha1),
 		       sha1_to_hex((*result)->object.sha1));
 	}
+	if (index_only)
+		*result = git_write_tree();
 
 	return clean;
 }
@@ -1170,10 +1099,10 @@ static int merge(struct commit *h1,
 		 const char *branch1,
 		 const char *branch2,
 		 int call_depth /* =0 */,
-		 struct commit *ancestor /* =None */,
+		 struct commit_list *ca,
 		 struct commit **result)
 {
-	struct commit_list *ca = NULL, *iter;
+	struct commit_list *iter;
 	struct commit *merged_common_ancestors;
 	struct tree *mrtree;
 	int clean;
@@ -1182,10 +1111,10 @@ static int merge(struct commit *h1,
 	output_commit_title(h1);
 	output_commit_title(h2);
 
-	if (ancestor)
-		commit_list_insert(ancestor, &ca);
-	else
-		ca = reverse_commit_list(get_merge_bases(h1, h2, 1));
+	if (!ca) {
+		ca = get_merge_bases(h1, h2, 1);
+		ca = reverse_commit_list(ca);
+	}
 
 	output("found %u common ancestor(s):", commit_list_count(ca));
 	for (iter = ca; iter; iter = iter->next)
@@ -1211,6 +1140,7 @@ static int merge(struct commit *h1,
 		 * merge_trees has always overwritten it: the commited
 		 * "conflicts" were already resolved.
 		 */
+		discard_cache();
 		merge(merged_common_ancestors, iter->item,
 		      "Temporary merge branch 1",
 		      "Temporary merge branch 2",
@@ -1223,25 +1153,21 @@ static int merge(struct commit *h1,
 			die("merge returned no commit");
 	}
 
+	discard_cache();
 	if (call_depth == 0) {
-		setup_index(0 /* $GIT_DIR/index */);
+		read_cache();
 		index_only = 0;
-	} else {
-		setup_index(1 /* temporary index */);
-		git_read_tree(h1->tree);
+	} else
 		index_only = 1;
-	}
 
 	clean = merge_trees(h1->tree, h2->tree, merged_common_ancestors->tree,
 			    branch1, branch2, &mrtree);
 
-	if (!ancestor && (clean || index_only)) {
+	if (index_only) {
 		*result = make_virtual_commit(mrtree, "merged tree");
 		commit_list_insert(h1, &(*result)->parents);
 		commit_list_insert(h2, &(*result)->parents->next);
-	} else
-		*result = NULL;
-
+	}
 	return clean;
 }
 
@@ -1277,19 +1203,16 @@ static struct commit *get_ref(const char *ref)
 
 int main(int argc, char *argv[])
 {
-	static const char *bases[2];
+	static const char *bases[20];
 	static unsigned bases_count = 0;
 	int i, clean;
 	const char *branch1, *branch2;
 	struct commit *result, *h1, *h2;
+	struct commit_list *ca = NULL;
+	struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
+	int index_fd;
 
 	git_config(git_default_config); /* core.filemode */
-	original_index_file = getenv(INDEX_ENVIRONMENT);
-
-	if (!original_index_file)
-		original_index_file = xstrdup(git_path("index"));
-
-	temporary_index_file = xstrdup(git_path("mrg-rcrsv-tmp-idx"));
 
 	if (argc < 4)
 		die("Usage: %s <base>... -- <head> <remote> ...\n", argv[0]);
@@ -1313,18 +1236,18 @@ int main(int argc, char *argv[])
 	branch2 = better_branch_name(branch2);
 	printf("Merging %s with %s\n", branch1, branch2);
 
-	if (bases_count == 1) {
-		struct commit *ancestor = get_ref(bases[0]);
-		clean = merge(h1, h2, branch1, branch2, 0, ancestor, &result);
-	} else
-		clean = merge(h1, h2, branch1, branch2, 0, NULL, &result);
+	index_fd = hold_lock_file_for_update(lock, get_index_file(), 1);
 
-	if (cache_dirty)
-		flush_cache();
+	for (i = 0; i < bases_count; i++) {
+		struct commit *ancestor = get_ref(bases[i]);
+		ca = commit_list_insert(ancestor, &ca);
+	}
+	clean = merge(h1, h2, branch1, branch2, 0, ca, &result);
+
+	if (active_cache_changed &&
+	    (write_cache(index_fd, active_cache, active_nr) ||
+	     close(index_fd) || commit_lock_file(lock)))
+			die ("unable to write %s", get_index_file());
 
 	return clean ? 0: 1;
 }
-
-/*
-vim: sw=8 noet
-*/
-- 
1.4.4.4.gc3d6

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 19:28           ` Junio C Hamano
@ 2007-01-10 22:11             ` Junio C Hamano
  2007-01-10 23:07             ` Alex Riesen
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-10 22:11 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> This comes on top of yours.  
>
> I'm reproducing all the merges in linux-2.6 history to make sure
> the base one, yours and this produce the same result (the same
> clean merge, or the same unmerged index and the same diff from
> HEAD).  So far it is looking good.

Looks like this is good to go -- among the 2700+ merges I've
finished about 1000 of them and the results from the three
implementation exactly match.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 19:28           ` Junio C Hamano
  2007-01-10 22:11             ` Junio C Hamano
@ 2007-01-10 23:07             ` Alex Riesen
  2007-01-10 23:23               ` Linus Torvalds
  2007-01-11  0:34               ` Junio C Hamano
  2007-01-11  8:15             ` Johannes Schindelin
  2007-01-12 15:48             ` Sergey Vlasov
  3 siblings, 2 replies; 40+ messages in thread
From: Alex Riesen @ 2007-01-10 23:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 1/10/07, Junio C Hamano <junkio@cox.net> wrote:
> This comes on top of yours.
>
> I'm reproducing all the merges in linux-2.6 history to make sure
> the base one, yours and this produce the same result (the same
> clean merge, or the same unmerged index and the same diff from
> HEAD).  So far it is looking good.

Yep. Tried the monster merge on it: 1m15sec on that small laptop.

For whatever reason your patch left an "if (cache_dirty) flush_cache()",
that's after my patch + yours. Had it removed.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 23:07             ` Alex Riesen
@ 2007-01-10 23:23               ` Linus Torvalds
  2007-01-11  8:14                 ` Johannes Schindelin
  2007-01-11  9:02                 ` Alex Riesen
  2007-01-11  0:34               ` Junio C Hamano
  1 sibling, 2 replies; 40+ messages in thread
From: Linus Torvalds @ 2007-01-10 23:23 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, git



On Thu, 11 Jan 2007, Alex Riesen wrote:
> 
> Yep. Tried the monster merge on it: 1m15sec on that small laptop.

Is that supposed to be good? That still sounds really slow to me. What 
kind of nasty project are you doing? Is this the 44k file project, and 
under cygwin? Or is it that bad even under Linux?

			Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 23:07             ` Alex Riesen
  2007-01-10 23:23               ` Linus Torvalds
@ 2007-01-11  0:34               ` Junio C Hamano
  1 sibling, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-11  0:34 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, git

"Alex Riesen" <raa.lkml@gmail.com> writes:

> On 1/10/07, Junio C Hamano <junkio@cox.net> wrote:
>> This comes on top of yours.
>>
>> I'm reproducing all the merges in linux-2.6 history to make sure
>> the base one, yours and this produce the same result (the same
>> clean merge, or the same unmerged index and the same diff from
>> HEAD).  So far it is looking good.
>
> Yep. Tried the monster merge on it: 1m15sec on that small laptop.

Is that supposed to be a good news?  It sounds awfully slow.

> For whatever reason your patch left an "if (cache_dirty) flush_cache()",
> that's after my patch + yours. Had it removed.

That's because my copy of "your patch" has the fix-up I
suggested to remove the flush from process_renames() already --
the removal of that one and removal from process_entry() you did
logically belong to each other.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 23:23               ` Linus Torvalds
@ 2007-01-11  8:14                 ` Johannes Schindelin
  2007-01-11  9:03                   ` Alex Riesen
  2007-01-11  9:02                 ` Alex Riesen
  1 sibling, 1 reply; 40+ messages in thread
From: Johannes Schindelin @ 2007-01-11  8:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Riesen, Junio C Hamano, git

Hi,

On Wed, 10 Jan 2007, Linus Torvalds wrote:

> On Thu, 11 Jan 2007, Alex Riesen wrote:
> > 
> > Yep. Tried the monster merge on it: 1m15sec on that small laptop.
> 
> Is that supposed to be good? That still sounds really slow to me. What 
> kind of nasty project are you doing? Is this the 44k file project, and 
> under cygwin? Or is it that bad even under Linux?

This _is_ cygwin. And 1m15sec is actually very, very good, if you happen 
to know that it took more than 10 minutes(!) when we started our quest of 
inbuilding recursive merge.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 19:28           ` Junio C Hamano
  2007-01-10 22:11             ` Junio C Hamano
  2007-01-10 23:07             ` Alex Riesen
@ 2007-01-11  8:15             ` Johannes Schindelin
  2007-01-12 15:48             ` Sergey Vlasov
  3 siblings, 0 replies; 40+ messages in thread
From: Johannes Schindelin @ 2007-01-11  8:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Riesen, git

Hi,

On Wed, 10 Jan 2007, Junio C Hamano wrote:

> Subject: [PATCH] merge-recursive: do not use on-file index when not needed.
> 
> This revamps the merge-recursive implementation following the
> outline in:
> 
> 	Message-ID: <7v8xgileza.fsf@assigned-by-dhcp.cox.net>

Thank you very much! I know, it was my task and I punted, but my time is 
really scarce these days.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 23:23               ` Linus Torvalds
  2007-01-11  8:14                 ` Johannes Schindelin
@ 2007-01-11  9:02                 ` Alex Riesen
  2007-01-11 16:38                   ` Linus Torvalds
  1 sibling, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-11  9:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On 1/11/07, Linus Torvalds <torvalds@osdl.org> wrote:
> >
> > Yep. Tried the monster merge on it: 1m15sec on that small laptop.
>
> Is that supposed to be good? That still sounds really slow to me. What
> kind of nasty project are you doing? Is this the 44k file project, and
> under cygwin? Or is it that bad even under Linux?

It is that "bad" on a 384Mb linux laptop and 1.2GHz Celeron.
Yes, it is that 44k files project. The previous code finishes
that merge on that laptop in about 20 minutes, so it's defnitely
an improvement. My cygwin machine has a lot more memory (2Gb),
so I can't really compare them here.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11  8:14                 ` Johannes Schindelin
@ 2007-01-11  9:03                   ` Alex Riesen
  2007-01-11 12:11                     ` Alex Riesen
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-11  9:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, git

On 1/11/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > Yep. Tried the monster merge on it: 1m15sec on that small laptop.
> >
> > Is that supposed to be good? That still sounds really slow to me. What
> > kind of nasty project are you doing? Is this the 44k file project, and
> > under cygwin? Or is it that bad even under Linux?
>
> This _is_ cygwin. And 1m15sec is actually very, very good, if you happen

No, this is linux, in a very constrained conditions. On cygwin I
haven't tried it yet.

> to know that it took more than 10 minutes(!) when we started our quest of
> inbuilding recursive merge.

Right.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11  9:03                   ` Alex Riesen
@ 2007-01-11 12:11                     ` Alex Riesen
  2007-01-11 20:37                       ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-11 12:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, git

On 1/11/07, Alex Riesen <raa.lkml@gmail.com> wrote:
> > > > Yep. Tried the monster merge on it: 1m15sec on that small laptop.
> > >
> > > Is that supposed to be good? That still sounds really slow to me. What
> > > kind of nasty project are you doing? Is this the 44k file project, and
> > > under cygwin? Or is it that bad even under Linux?
> >
> > This _is_ cygwin. And 1m15sec is actually very, very good, if you happen
>
> No, this is linux, in a very constrained conditions. On cygwin I
> haven't tried it yet.

Well, tried it now. Strangely enough - there is almost no speedup with
Junio's patch on top of mine. I must have done something stupid,
but cannot find what. And the times reported by time on cygwin jump
wildly: from 29 to 38 sec!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11  9:02                 ` Alex Riesen
@ 2007-01-11 16:38                   ` Linus Torvalds
  2007-01-11 17:43                     ` Alex Riesen
  2007-01-11 20:23                     ` Junio C Hamano
  0 siblings, 2 replies; 40+ messages in thread
From: Linus Torvalds @ 2007-01-11 16:38 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, git



On Thu, 11 Jan 2007, Alex Riesen wrote:
> On 1/11/07, Linus Torvalds <torvalds@osdl.org> wrote:
> > >
> > > Yep. Tried the monster merge on it: 1m15sec on that small laptop.
> > 
> > Is that supposed to be good? That still sounds really slow to me. What
> > kind of nasty project are you doing? Is this the 44k file project, and
> > under cygwin? Or is it that bad even under Linux?
> 
> It is that "bad" on a 384Mb linux laptop and 1.2GHz Celeron.
> Yes, it is that 44k files project. The previous code finishes
> that merge on that laptop in about 20 minutes, so it's defnitely
> an improvement. My cygwin machine has a lot more memory (2Gb),
> so I can't really compare them here.

Ok. Junio, I'd suggest putting it into 1.5.0, then - it's a fairly simple 
thing, after all, and if it's the difference between 20 minutes and just 
over one minute, it clearly matters.

With 384MB of memory, and 44 thousand files, I bet the problem is just 
that the working set doesn't fit entirely in RAM. It probably caches 
*most* of it, but with inodes and directories being spread out on disk 
(and I assume there are more files in the actual working tree), so writing 
out a 6MB index file (or whatever) and then reading it back several times 
just ends up generating IO simply because 6MB is actually a noticeable 
chunk of memory in that situation.

(It also generates a ton of tree objects early, so the effect at run-time 
is probably much more than 6MB).

That said, I think we actually have another problem entirely:

Look at "write_cache()", Junio: isn't it leaking memory like mad?

Shouldn't we have something like this?

It's entirely possible that the _real_ problem with the "flush the index 
all the time" was that it just caused this bug: tons and tons of lost 
memory, causing git-merge-recursive to grow explosively (~6MB per 
cache flush, and a _lot_ of cache flushes), which on a 384MB machine 
quickly uses up memory and causes totally unnecessary swapping.

Of course, it's also entirely possible that I'm a complete retard, and 
just didn't see where the data buffer is still used or freed.

"Linus - complete retard or hero in shining armor? You decide!"

		Linus

---
diff --git a/read-cache.c b/read-cache.c
index 8ecd826..c54a611 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1010,7 +1010,7 @@ int write_cache(int newfd, struct cache_entry **cache, int entries)
 		if (data &&
 		    !write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sz) &&
 		    !ce_write(&c, newfd, data, sz))
-			;
+			free(data);
 		else {
 			free(data);
 			return -1;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 16:38                   ` Linus Torvalds
@ 2007-01-11 17:43                     ` Alex Riesen
  2007-01-11 18:02                       ` Linus Torvalds
  2007-01-11 20:23                     ` Junio C Hamano
  1 sibling, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-11 17:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On 1/11/07, Linus Torvalds <torvalds@osdl.org> wrote:
> That said, I think we actually have another problem entirely:
>
> Look at "write_cache()", Junio: isn't it leaking memory like mad?

Unless it is used in some corner case not covered by tests - it
looks like it does leak memory like mad. With the patch the
memory usage for 44k-merge is more than halved!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 17:43                     ` Alex Riesen
@ 2007-01-11 18:02                       ` Linus Torvalds
  2007-01-11 21:48                         ` Alex Riesen
  0 siblings, 1 reply; 40+ messages in thread
From: Linus Torvalds @ 2007-01-11 18:02 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, git



On Thu, 11 Jan 2007, Alex Riesen wrote:

> On 1/11/07, Linus Torvalds <torvalds@osdl.org> wrote:
> > That said, I think we actually have another problem entirely:
> > 
> > Look at "write_cache()", Junio: isn't it leaking memory like mad?
> 
> Unless it is used in some corner case not covered by tests - it
> looks like it does leak memory like mad. With the patch the
> memory usage for 44k-merge is more than halved!

Is that halving on _top_ of your and Junio's fixes to not flush 
unnecessarily?

Junio, I looked and looked, and that trivial one-liner definitely looks 
right to me. The pointer is not free'd by anybody else, and none of the 
things we call in to with it save it away, and they expect the caller to 
manage it.

And it does pass all the tests, although I don't know how much coverage 
they have in this area..

		Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 16:38                   ` Linus Torvalds
  2007-01-11 17:43                     ` Alex Riesen
@ 2007-01-11 20:23                     ` Junio C Hamano
  2007-01-11 22:10                       ` Alex Riesen
  1 sibling, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2007-01-11 20:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Riesen, git

Linus Torvalds <torvalds@osdl.org> writes:

> That said, I think we actually have another problem entirely:
>
> Look at "write_cache()", Junio: isn't it leaking memory like mad?
>
> Shouldn't we have something like this?
>
> It's entirely possible that the _real_ problem with the "flush the index 
> all the time" was that it just caused this bug: tons and tons of lost 
> memory, causing git-merge-recursive to grow explosively (~6MB per 
> cache flush, and a _lot_ of cache flushes), which on a 384MB machine 
> quickly uses up memory and causes totally unnecessary swapping.

You are right -- there is absolutely no reason to retain this
memory.  It is a serialized representation of cache-tree data
only to be stored in the index, and no other user of this data
exists.  Thanks for spotting this.

Writing out 6MB per every path changed in a merge would still be
an unnecessary overhead over the one in 'next', so there is no
reason to replace 'next' with this single liner of yours, but I
am interested in seeing how much of the 20-minute vs 1-minute
difference is attributable to this leak, just out of curiosity.

Alex, if you have a chance, could you apply Linus's single-liner
on top of 'master', without either of the merge-recursive
patches in 'next', and see what kind of numbers you would get?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 12:11                     ` Alex Riesen
@ 2007-01-11 20:37                       ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-11 20:37 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Johannes Schindelin, Linus Torvalds, Junio C Hamano, git

"Alex Riesen" <raa.lkml@gmail.com> writes:

> Well, tried it now. Strangely enough - there is almost no speedup with
> Junio's patch on top of mine.

Mine is primarily meant to be a conceptual clean-up.  While it
does save unnecessary write-tree at the end, and it also saves
unnecessary write-cache/read-cache for the recursive part when
you have more than one merge base, I would not be surprised if
the effect of tons of unnecessary write-index, once per path
involved in the merge, which was what your patch removed, would
drawf anything else.

Since single merge base cases dominate in practice, you might
see improvements from the avoidance of the final write-tree but
probably would not see much benefit of write-cache/read-cache
avoidance unless your merge has many merge bases.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 18:02                       ` Linus Torvalds
@ 2007-01-11 21:48                         ` Alex Riesen
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Riesen @ 2007-01-11 21:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds, Thu, Jan 11, 2007 19:02:52 +0100:
> > > That said, I think we actually have another problem entirely:
> > > 
> > > Look at "write_cache()", Junio: isn't it leaking memory like mad?
> > 
> > Unless it is used in some corner case not covered by tests - it
> > looks like it does leak memory like mad. With the patch the
> > memory usage for 44k-merge is more than halved!
> 
> Is that halving on _top_ of your and Junio's fixes to not flush 
> unnecessarily?

Yes.

> And it does pass all the tests, although I don't know how much coverage 
> they have in this area..

Quite a bit: criss-cross, renames, simple. My 44k-merge is just a
primitive two-head merge useful only as a stress test (it doesn't even
has any renames).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 20:23                     ` Junio C Hamano
@ 2007-01-11 22:10                       ` Alex Riesen
  2007-01-11 22:28                         ` Linus Torvalds
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-11 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano, Thu, Jan 11, 2007 21:23:45 +0100:
> > That said, I think we actually have another problem entirely:
> >
> > Look at "write_cache()", Junio: isn't it leaking memory like mad?
> >
> > Shouldn't we have something like this?
> >
> > It's entirely possible that the _real_ problem with the "flush the index 
> > all the time" was that it just caused this bug: tons and tons of lost 
> > memory, causing git-merge-recursive to grow explosively (~6MB per 
> > cache flush, and a _lot_ of cache flushes), which on a 384MB machine 
> > quickly uses up memory and causes totally unnecessary swapping.
> 
> You are right -- there is absolutely no reason to retain this
> memory.  It is a serialized representation of cache-tree data
> only to be stored in the index, and no other user of this data
> exists.  Thanks for spotting this.
> 
> Writing out 6MB per every path changed in a merge would still be
> an unnecessary overhead over the one in 'next', so there is no
> reason to replace 'next' with this single liner of yours, but I
> am interested in seeing how much of the 20-minute vs 1-minute
> difference is attributable to this leak, just out of curiosity.
> 
> Alex, if you have a chance, could you apply Linus's single-liner
> on top of 'master', without either of the merge-recursive
> patches in 'next', and see what kind of numbers you would get?

With regard to speed: not noticable on the cygwin machine. The
384Mb-laptop liked it: moved into 40-50 sec range (it had real
problems (minutes) doing that merge without at least my first patch.
Because of the leak, as we now understand).

It must have been large leak, as I really have seen the memory usage
dropping down significantly.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 22:10                       ` Alex Riesen
@ 2007-01-11 22:28                         ` Linus Torvalds
  2007-01-11 23:53                           ` Junio C Hamano
  2007-01-12  0:18                           ` Alex Riesen
  0 siblings, 2 replies; 40+ messages in thread
From: Linus Torvalds @ 2007-01-11 22:28 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, git



On Thu, 11 Jan 2007, Alex Riesen wrote:
> 
> It must have been large leak, as I really have seen the memory usage
> dropping down significantly.

I really think it was about 6MB (or whatever your index file size was) per 
every single resolved file. I think merge-recursive used to flush the 
index file every time it resolved something, and every flush would 
basically leak the whole buffer used to write the index.

Anyway, 40-50 sec on a fairly weak laptop for a 44k-file merge sounds like 
git doesn't have to be totally embarrassed. I'm not saying we shouldn't be 
able to do it faster, but it's at least _possible_ that a lot of the time 
spent is now spent doing real work (ie maybe you actually have a fair 
amount of file-level merging? Maybe it's 40-50 sec because there's some 
amount of real IO going on, and a fair amount of real work done too?)

			Linus

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 22:28                         ` Linus Torvalds
@ 2007-01-11 23:53                           ` Junio C Hamano
  2007-01-12  0:18                           ` Alex Riesen
  1 sibling, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-11 23:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Riesen, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 11 Jan 2007, Alex Riesen wrote:
>> 
>> It must have been large leak, as I really have seen the memory usage
>> dropping down significantly.
>
> I really think it was about 6MB (or whatever your index file size was) per 
> every single resolved file. I think merge-recursive used to flush the 
> index file every time it resolved something, and every flush would 
> basically leak the whole buffer used to write the index.

This does not change the conclusion "leaking is bad", but it was
not as bad as "the whole buffer used to write the index".

There are 10 4-byte ints, 20-bye SHA1, and a short plus pathname
and padding stored per a file and Alex has 44k files.  They are
not included in that *data the code was leaking, I would suspect
maybe a meg or so per path.

The version of merge-recursive in 'next' would not write out the
index at all until the very end once, so that makes this leak
somewhat irrelevant for that particular program ;-) but thanks
for the fix.

Here is the list of what I have queued for 'master' (I am
sending the list because it will be some time before I can push
them out):

Eric Wong (1):
      Avoid errors and warnings when attempting to do I/O on zero bytes

Junio C Hamano (5):
      Document git-init
      index-pack: write-or-die instead of unchecked write-in-full.
      config-set: check write-in-full returns in set_multivar
      git-rm: do not fail on already removed file.
      git-status: wording update to deal with deleted files.

Linus Torvalds (3):
      write-cache: do not leak the serialized cache-tree data.
      write_in_full: really write in full or return error on disk full.
      Better error messages for corrupt databases

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-11 22:28                         ` Linus Torvalds
  2007-01-11 23:53                           ` Junio C Hamano
@ 2007-01-12  0:18                           ` Alex Riesen
  1 sibling, 0 replies; 40+ messages in thread
From: Alex Riesen @ 2007-01-12  0:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds, Thu, Jan 11, 2007 23:28:21 +0100:
> > 
> > It must have been large leak, as I really have seen the memory usage
> > dropping down significantly.
> 
> I really think it was about 6MB (or whatever your index file size was) per 
> every single resolved file. I think merge-recursive used to flush the 
> index file every time it resolved something, and every flush would 
> basically leak the whole buffer used to write the index.

Looks like. Resulting merge has about 10 files changed, all the other
files are same, just got different ways on the branches to be merged.

> Anyway, 40-50 sec on a fairly weak laptop for a 44k-file merge sounds like 
> git doesn't have to be totally embarrassed. I'm not saying we shouldn't be 
> able to do it faster, but it's at least _possible_ that a lot of the time 
> spent is now spent doing real work (ie maybe you actually have a fair 
> amount of file-level merging? Maybe it's 40-50 sec because there's some 
> amount of real IO going on, and a fair amount of real work done too?)

Some work, definitely: git diff-tree branch1 branch2 lists 9042
differences, among them some on the same files.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-10 19:28           ` Junio C Hamano
                               ` (2 preceding siblings ...)
  2007-01-11  8:15             ` Johannes Schindelin
@ 2007-01-12 15:48             ` Sergey Vlasov
  2007-01-12 17:38               ` Alex Riesen
  2007-01-12 18:23               ` Junio C Hamano
  3 siblings, 2 replies; 40+ messages in thread
From: Sergey Vlasov @ 2007-01-12 15:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Alex Riesen

[-- Attachment #1: Type: text/plain, Size: 2846 bytes --]

On Wed, 10 Jan 2007 11:28:14 -0800 Junio C Hamano wrote:

> From: Junio C Hamano <junkio@cox.net>
> Date: Wed, 10 Jan 2007 11:20:58 -0800
> Subject: [PATCH] merge-recursive: do not use on-file index when not needed.
>
> This revamps the merge-recursive implementation following the
> outline in:
>
> 	Message-ID: <7v8xgileza.fsf@assigned-by-dhcp.cox.net>
>
> There is no need to write out the index until the very end just
> once from merge-recursive.  Also there is no need to write out
> the resulting tree object for the simple case of merging with a
> single merge base.
>
> Signed-off-by: Junio C Hamano <junkio@cox.net>

This commit broke t3401-rebase-partial.sh:

...
*   ok 3: rebase topic branch against new master and check git-am did not get halted

* expecting success: git-checkout -f my-topic-branch-merge &&
         git-rebase --merge master-merge &&
         test ! -d .git/.dotest-merge
First, rewinding head to replay your work on top of it...
HEAD is now at 5f97179... Add C.
Merging master-merge with my-topic-branch-merge~1
Merging:
5f97179 Add C.
1be2c8e Add B.
found 1 common ancestor(s):
0e8cba9 Add A.
.../git-rebase: line 82: 11517 Segmentation fault      git-merge-$strategy "$cmt^" -- "$hd" "$cmt"
Unknown exit code (139) from command: git-merge-recursive 1be2c8e0eba8a7a383d0403facb1c72c622c0939^ -- HEAD 1be2c8e0eba8a7a383d0403facb1c72c622c0939
* FAIL 4: rebase --merge topic branch that was partially merged upstream
        git-checkout -f my-topic-branch-merge &&
                 git-rebase --merge master-merge &&
                 test ! -d .git/.dotest-merge

* failed 1 among 4 test(s)

> @@ -1105,9 +1040,7 @@ static int merge_trees(struct tree *head,
>  		    sha1_to_hex(head->object.sha1),
>  		    sha1_to_hex(merge->object.sha1));
>
> -	*result = git_write_tree();

Previously *result was set here...

> -
> -	if (!*result) {
> +	if (unmerged_index()) {
>  		struct path_list *entries, *re_head, *re_merge;
>  		int i;
>  		path_list_clear(&current_file_set, 1);
> @@ -1128,17 +1061,11 @@ static int merge_trees(struct tree *head,
>  			if (!process_entry(path, e, branch1, branch2))
>  				clean = 0;
>  		}
> -		if (cache_dirty)
> -			flush_cache();
>
>  		path_list_clear(re_merge, 0);
>  		path_list_clear(re_head, 0);
>  		path_list_clear(entries, 1);
>
> -		if (clean || index_only)
> -			*result = git_write_tree();
> -		else
> -			*result = NULL;
>  	} else {
>  		clean = 1;
>  		printf("merging of trees %s and %s resulted in %s\n",
> @@ -1146,6 +1073,8 @@ static int merge_trees(struct tree *head,
>  		       sha1_to_hex(merge->object.sha1),
>  		       sha1_to_hex((*result)->object.sha1));

...and it is still used here - however, after the patch *result is
uninitialized at this point.

>  	}
> +	if (index_only)
> +		*result = git_write_tree();

Too late...

>
>  	return clean;
>  }



[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-12 15:48             ` Sergey Vlasov
@ 2007-01-12 17:38               ` Alex Riesen
  2007-01-12 20:37                 ` Sergey Vlasov
  2007-01-12 18:23               ` Junio C Hamano
  1 sibling, 1 reply; 40+ messages in thread
From: Alex Riesen @ 2007-01-12 17:38 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: Junio C Hamano, git

On 1/12/07, Sergey Vlasov <vsu@altlinux.ru> wrote:
> > Subject: [PATCH] merge-recursive: do not use on-file index when not needed.
>
> This commit broke t3401-rebase-partial.sh:
>
> ...
> *   ok 3: rebase topic branch against new master and check git-am did not get halted
>

Hmm... Can't reproduce. Do you have your own patches in the tree?
Or could you post your merge-recursive.c?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-12 15:48             ` Sergey Vlasov
  2007-01-12 17:38               ` Alex Riesen
@ 2007-01-12 18:23               ` Junio C Hamano
  2007-01-12 20:09                 ` [PATCH] merge-recursive: do not report the resulting tree object name Junio C Hamano
                                   ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-12 18:23 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: git

Sergey Vlasov <vsu@altlinux.ru> writes:

> On Wed, 10 Jan 2007 11:28:14 -0800 Junio C Hamano wrote:
>
>> This revamps the merge-recursive implementation following the
>> outline in:
>> ...
> This commit broke t3401-rebase-partial.sh:
> ...
> ...and it is still used here - however, after the patch *result is
> uninitialized at this point.

Very true.  This untested patch should fix it.

Note that this stops (relative to the older
version of merge-recursive that always wrote a tree even when it
was not needed) reporting the tree object name for outermost
merge, but I think that reporting was primarily meant for people
who are debugging merge-recursive and did not have a real
value.  We could even remove the whole printf(), which I tend to
prefer.

--
diff --git a/merge-recursive.c b/merge-recursive.c
index 5237021..40c12aa 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1066,15 +1066,17 @@ static int merge_trees(struct tree *head,
 		path_list_clear(re_head, 0);
 		path_list_clear(entries, 1);
 
-	} else {
+	}
+	else
 		clean = 1;
+
+	if (index_only) {
+		*result = git_write_tree();
 		printf("merging of trees %s and %s resulted in %s\n",
 		       sha1_to_hex(head->object.sha1),
 		       sha1_to_hex(merge->object.sha1),
 		       sha1_to_hex((*result)->object.sha1));
 	}
-	if (index_only)
-		*result = git_write_tree();
 
 	return clean;
 }

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-12 18:23               ` Junio C Hamano
@ 2007-01-12 20:09                 ` Junio C Hamano
  2007-01-12 23:36                   ` Johannes Schindelin
  2007-01-12 20:30                 ` [PATCH] Speedup recursive by flushing index only once for all entries Alex Riesen
  2007-01-12 21:07                 ` Sergey Vlasov
  2 siblings, 1 reply; 40+ messages in thread
From: Junio C Hamano @ 2007-01-12 20:09 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: git, Alex Riesen

It is not available in the outermost merge, and it is only
useful for debugging merge-recursive in the inner merges.

Sergey Vlasov noticed that the old code accesses an
uninitialized location.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 Junio C Hamano <junkio@cox.net> writes:

 > Very true.  This untested patch should fix it.
 >
 > Note that this stops (relative to the older
 > version of merge-recursive that always wrote a tree even when it
 > was not needed) reporting the tree object name for outermost
 > merge, but I think that reporting was primarily meant for people
 > who are debugging merge-recursive and did not have a real
 > value.  We could even remove the whole printf(), which I tend to
 > prefer.

 So I'd commit this -- I tested it this time, with

 	if (result) *result = NULL

 at the beginning of that function.

 merge-recursive.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5237021..b4acbb7 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1066,13 +1066,10 @@ static int merge_trees(struct tree *head,
 		path_list_clear(re_head, 0);
 		path_list_clear(entries, 1);
 
-	} else {
-		clean = 1;
-		printf("merging of trees %s and %s resulted in %s\n",
-		       sha1_to_hex(head->object.sha1),
-		       sha1_to_hex(merge->object.sha1),
-		       sha1_to_hex((*result)->object.sha1));
 	}
+	else
+		clean = 1;
+
 	if (index_only)
 		*result = git_write_tree();
 
-- 
1.5.0.rc1.g397d

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-12 18:23               ` Junio C Hamano
  2007-01-12 20:09                 ` [PATCH] merge-recursive: do not report the resulting tree object name Junio C Hamano
@ 2007-01-12 20:30                 ` Alex Riesen
  2007-01-12 21:07                 ` Sergey Vlasov
  2 siblings, 0 replies; 40+ messages in thread
From: Alex Riesen @ 2007-01-12 20:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergey Vlasov, git

Junio C Hamano, Fri, Jan 12, 2007 19:23:37 +0100:
> > ...and it is still used here - however, after the patch *result is
> > uninitialized at this point.
> 
> Very true.  This untested patch should fix it.
> 

I had to initialize mrtree of merge() with NULL to reproduce it.
Sneaky bastard...

> We could even remove the whole printf(), which I tend to prefer.

I agree. The merges of this kind a rare in comparison to simple ones.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-12 17:38               ` Alex Riesen
@ 2007-01-12 20:37                 ` Sergey Vlasov
  0 siblings, 0 replies; 40+ messages in thread
From: Sergey Vlasov @ 2007-01-12 20:37 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 767 bytes --]

On Fri, Jan 12, 2007 at 06:38:26PM +0100, Alex Riesen wrote:
> On 1/12/07, Sergey Vlasov <vsu@altlinux.ru> wrote:
> >> Subject: [PATCH] merge-recursive: do not use on-file index when not 
> >needed.
> >
> >This commit broke t3401-rebase-partial.sh:
> >
> >...
> >*   ok 3: rebase topic branch against new master and check git-am did not 
> >get halted
> >
> 
> Hmm... Can't reproduce. Do you have your own patches in the tree?

I had when I encountered the problem, but then retested with clean
'master' (4494c656e2e29c468c48c9c2b20595342056e9dc) and got the same
crash.  But since the problem is an uninitialized stack variable, you
can get anything depending on the phase of the moon from it (however,
Valgrind should still be able to catch it).

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Speedup recursive by flushing index only once for all entries
  2007-01-12 18:23               ` Junio C Hamano
  2007-01-12 20:09                 ` [PATCH] merge-recursive: do not report the resulting tree object name Junio C Hamano
  2007-01-12 20:30                 ` [PATCH] Speedup recursive by flushing index only once for all entries Alex Riesen
@ 2007-01-12 21:07                 ` Sergey Vlasov
  2 siblings, 0 replies; 40+ messages in thread
From: Sergey Vlasov @ 2007-01-12 21:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]

On Fri, Jan 12, 2007 at 10:23:37AM -0800, Junio C Hamano wrote:
> Sergey Vlasov <vsu@altlinux.ru> writes:
> 
> > On Wed, 10 Jan 2007 11:28:14 -0800 Junio C Hamano wrote:
> >
> >> This revamps the merge-recursive implementation following the
> >> outline in:
> >> ...
> > This commit broke t3401-rebase-partial.sh:
> > ...
> > ...and it is still used here - however, after the patch *result is
> > uninitialized at this point.
> 
> Very true.  This untested patch should fix it.

BTW, the same code does not crash on another (x86_64) machine;
however, valgrind-3.2.1 complains:

==20571== Use of uninitialised value of size 8
==20571==    at 0x411FF2: sha1_to_hex (sha1_file.c:125)
==20571==    by 0x405D90: merge_trees (merge-recursive.c:1071)
==20571==    by 0x406044: merge (merge-recursive.c:1163)
==20571==    by 0x40641D: main (merge-recursive.c:1245)

After the patch valgrind does not complain anymore.

> Note that this stops (relative to the older
> version of merge-recursive that always wrote a tree even when it
> was not needed) reporting the tree object name for outermost
> merge, but I think that reporting was primarily meant for people
> who are debugging merge-recursive and did not have a real
> value.  We could even remove the whole printf(), which I tend to
> prefer.

If that printf() is just a debug output, we should definitely remove
it - the merge output is verbose enough already.

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 5237021..40c12aa 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -1066,15 +1066,17 @@ static int merge_trees(struct tree *head,
>  		path_list_clear(re_head, 0);
>  		path_list_clear(entries, 1);
>  
> -	} else {
> +	}
> +	else
>  		clean = 1;
> +
> +	if (index_only) {
> +		*result = git_write_tree();

Hmm, can git_write_tree() return NULL at this point?  Does the code in
the if (unmerged_index()) {...} branch above resolve all unmerged
index entries?  It probably should, if I understand the
merge-recursive logic...

>  		printf("merging of trees %s and %s resulted in %s\n",
>  		       sha1_to_hex(head->object.sha1),
>  		       sha1_to_hex(merge->object.sha1),
>  		       sha1_to_hex((*result)->object.sha1));
>  	}
> -	if (index_only)
> -		*result = git_write_tree();
>  
>  	return clean;
>  }

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-12 20:09                 ` [PATCH] merge-recursive: do not report the resulting tree object name Junio C Hamano
@ 2007-01-12 23:36                   ` Johannes Schindelin
  2007-01-13  0:32                     ` Junio C Hamano
  0 siblings, 1 reply; 40+ messages in thread
From: Johannes Schindelin @ 2007-01-12 23:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergey Vlasov, git, Alex Riesen

Hi,

I like this patch. merge-recursive is very talkative, to the intimidating 
astonishment of unsuspecting users.

The real information is in the conflict markers now, which tell the user 
where what hunk came from. And the names in the markers are _really_ 
useful now.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-12 23:36                   ` Johannes Schindelin
@ 2007-01-13  0:32                     ` Junio C Hamano
  2007-01-13  0:57                       ` Jakub Narebski
  2007-01-13  5:14                       ` Shawn O. Pearce
  0 siblings, 2 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-13  0:32 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I like this patch. merge-recursive is very talkative, to the intimidating 
> astonishment of unsuspecting users.

This is a smallish example:

        $ git merge jc/merge-base
     1	Trying really trivial in-index merge...
     2	fatal: Merge requires file-level merging
     3	Nope.
     4	Merging HEAD with jc/merge-base
     5	Merging:
     6	b60daf0 Make git-prune-packed a bit more chatty.
     7	5b75a55 Teach "git-merge-base --check-ancestry" about refs.
     8	found 1 common ancestor(s):
     9	1c23d79 Don't die in git-http-fetch when fetching packs.
    10	Auto-merging Makefile
    11	Auto-merging builtin-branch.c
    12	Auto-merging builtin-reflog.c
    13	CONFLICT (content): Merge conflict in builtin-reflog.c
    14	Auto-merging builtin.h
    15	Auto-merging git.c
    16	Removing merge-base.c
    17	Resolved 'builtin-reflog.c' using previous resolution.
    18	Automatic merge failed; fix conflicts and then commit the result.

Among these, I think lines 2..3 are somewhat confusing but I am
used to seeing them and do not mind them too much.

Lines 4..9 do not have any real information that helps the end
user (even though it would be a very good debugging aid for
merge-recursive developers).

Lines 10..16 are useful, but I think we probably should show
them only for outermost merges.

An multi-base example:

        $ git merge 82560983997c961d9deafe0074b787c8484c2e1d
     1	Merging HEAD with 82560983997c961d9deafe0074b787c8484c2e1d
     2	Merging:
     3	9ee93dc Merge for-each-ref to sync gitweb fully with 'next'...
     4	8256098 gitweb: Print commit message without title in commi...
     5	found 2 common ancestor(s):
     6	b2d3476 Gitweb - provide site headers and footers
     7	1259404 Merge branch 'maint'
     8	  Merging:
     9	  b2d3476 Gitweb - provide site headers and footers
    10	  1259404 Merge branch 'maint'
    11	  found 1 common ancestor(s):
    12	  128eead gitweb: document webserver configuration for comm...
    13	  Auto-merging Makefile
    14	  Auto-merging gitweb/gitweb.perl
    15	  CONFLICT (content): Merge conflict in gitweb/gitweb.perl
    16	Auto-merging gitweb/gitweb.perl
    17	Merge made by recursive.
    18	 gitweb/gitweb.css  |    2 +
    19	 gitweb/gitweb.perl |  165 ++++++++++++++++++++++++++++++++...
    20	 2 files changed, 117 insertions(+), 50 deletions(-)

I do not think we need to show 1..15 at all, perhaps without
"export GIT_MERGE_BASE_DEBUG=YesPlease".

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-13  0:32                     ` Junio C Hamano
@ 2007-01-13  0:57                       ` Jakub Narebski
  2007-01-13 11:01                         ` Johannes Schindelin
  2007-01-13  5:14                       ` Shawn O. Pearce
  1 sibling, 1 reply; 40+ messages in thread
From: Jakub Narebski @ 2007-01-13  0:57 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> I do not think we need to show 1..15 at all, perhaps without
> "export GIT_MERGE_BASE_DEBUG=YesPlease".

Or a -v/--verbose (or even -v -v -v) flag set.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-13  0:32                     ` Junio C Hamano
  2007-01-13  0:57                       ` Jakub Narebski
@ 2007-01-13  5:14                       ` Shawn O. Pearce
  2007-01-13  7:03                         ` Junio C Hamano
  1 sibling, 1 reply; 40+ messages in thread
From: Shawn O. Pearce @ 2007-01-13  5:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

Junio C Hamano <junkio@cox.net> wrote:
>         $ git merge jc/merge-base
>      1	Trying really trivial in-index merge...
>      2	fatal: Merge requires file-level merging
>      3	Nope.
>      4	Merging HEAD with jc/merge-base
>      5	Merging:
>      6	b60daf0 Make git-prune-packed a bit more chatty.
>      7	5b75a55 Teach "git-merge-base --check-ancestry" about refs.
>      8	found 1 common ancestor(s):
>      9	1c23d79 Don't die in git-http-fetch when fetching packs.
>     10	Auto-merging Makefile
>     11	Auto-merging builtin-branch.c
>     12	Auto-merging builtin-reflog.c
>     13	CONFLICT (content): Merge conflict in builtin-reflog.c
>     14	Auto-merging builtin.h
>     15	Auto-merging git.c
>     16	Removing merge-base.c
>     17	Resolved 'builtin-reflog.c' using previous resolution.
>     18	Automatic merge failed; fix conflicts and then commit the result.
> 
> Among these, I think lines 2..3 are somewhat confusing but I am
> used to seeing them and do not mind them too much.

In my experience these lines scare new users.  And then they start
to ignore other "fatal:" messages from Git because they can safely
ignore this particular one.  Not good.  One reason I like my patch
that's in next.

> Lines 4..9 do not have any real information that helps the end
> user (even though it would be a very good debugging aid for
> merge-recursive developers).

I agree.  I've grown used to seeing them and read it for
entertainment.  Clearly I need to get out more.  They probably
should be relegated to a GIT_MERGE_OPTIONS environment variable
flag or to a command line parameter, as they are probably only
useful when debugging the application itself.
 
> Lines 10..16 are useful, but I think we probably should show
> them only for outermost merges.

Actually I think that only 13 is useful.  10-12,14-17 are
pretty useless messages in my mind.  I really don't care that
merge-recursive automatically merged these files, as in all cases but
the one reported by line 13 the merge was successful.  The diffstat
that is normally displayed by git-merge after a successful merge
shows you what files were modified by the other branch.  It also
often causes the output of merge-recursive to scroll off the screen,
making those messages even less useful.

> An multi-base example:
>     16	Auto-merging gitweb/gitweb.perl
>     17	Merge made by recursive.
>     18	 gitweb/gitweb.css  |    2 +
>     19	 gitweb/gitweb.perl |  165 ++++++++++++++++++++++++++++++++...
>     20	 2 files changed, 117 insertions(+), 50 deletions(-)
> 
> I do not think we need to show 1..15 at all, perhaps without
> "export GIT_MERGE_BASE_DEBUG=YesPlease".

Yes, I agree.  Except I'd say 1..16, for the reason stated above.

But then I would like a progress meter, showing % of files resolved,
to keep the user entertained.  Alex has 1 min+ merges.  1 minute
of absolutely no feedback is not very nice to a new user.

Maybe when I'm done hacking on git-describe performance improvements
I'll look at merge-recursive.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-13  5:14                       ` Shawn O. Pearce
@ 2007-01-13  7:03                         ` Junio C Hamano
  0 siblings, 0 replies; 40+ messages in thread
From: Junio C Hamano @ 2007-01-13  7:03 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

"Shawn O. Pearce" <spearce@spearce.org> writes:

>> Among these, I think lines 2..3 are somewhat confusing but I am
>> used to seeing them and do not mind them too much.
>
> In my experience these lines scare new users.  And then they start
> to ignore other "fatal:" messages from Git because they can safely
> ignore this particular one.

I tend to agree; we could do something like the attached.

>> Lines 10..16 are useful, but I think we probably should show
>> them only for outermost merges.
>
> Actually I think that only 13 is useful.  10-12,14-17 are
> pretty useless messages in my mind.

I am not sure.  It is nice to view which paths have
content-level merges as it is more significant than path-level
merges.

I think the output from merge-recursive can be categorized into
5 verbosity levels:

 1. "CONFLICT", "Rename", "Adding here instead due to D/F conflict" (outermost)

 2. "Auto-merged successfully" (outermost)

 3. The first "Merging X with Y".

 4. outermost "Merging:\ntitle1\ntitle2".

 5. outermost "found N common ancestors\nancestor1\nancestor2\n..."
    and anything from inner merge.

I would prefer the default verbosity level to be 2 (that is,
show both 1 and 2); your "quieter" option would show only level
1, and somebody who is debugging reursive would ask for all
levels.

-- >8 --
[PATCH] Make 'trivial merge' attempt less verbose.

This replaces die() calls in unpack-trees with simple and quiet
exit(1) when we are trying trivial merges only and die() is
about the case that cannot trivially be merged (i.e. not a
serious corruption error but expected).  Also it makes the
nontrivial merges exit early.

And then this updates git-merge so that we do not have to say
"trying..." followed by "wonderful" or "nope".

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 git-merge.sh   |   36 ++++++++++++++++++++++--------------
 unpack-trees.c |   29 ++++++++++++++++++-----------
 2 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/git-merge.sh b/git-merge.sh
index 3eef048..6240a73 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -302,21 +302,29 @@ f,*)
 	# one common.  See if it is really trivial.
 	git var GIT_COMMITTER_IDENT >/dev/null || exit
 
-	echo "Trying really trivial in-index merge..."
 	git-update-index --refresh 2>/dev/null
-	if git-read-tree --trivial -m -u -v $common $head "$1" &&
-	   result_tree=$(git-write-tree)
-	then
-	    echo "Wonderful."
-	    result_commit=$(
-	        echo "$merge_msg" |
-	        git-commit-tree $result_tree -p HEAD -p "$1"
-	    ) || exit
-	    finish "$result_commit" "In-index merge"
-	    dropsave
-	    exit 0
-	fi
-	echo "Nope."
+	git-read-tree --trivial -m -u -v $common $head "$1"
+	case "$?" in
+	1)	: expected failure from non-trivial merge
+		;;
+	0)
+		if result_tree=$(git-write-tree)
+		then
+			echo "Trivially merged in index."
+			result_commit=$(
+				echo "$merge_msg" |
+				git-commit-tree $result_tree -p HEAD -p "$1"
+			) || exit
+			finish "$result_commit" "In-index merge"
+			dropsave
+			exit 0
+		fi
+		;;
+	*)
+		: This could be serious failure.
+		echo >&2 "Tried trivial merge but did not work; don't worry..."
+		;;
+	esac
 	;;
 *)
 	# An octopus.  If we can reach all the remote we are up to date.
diff --git a/unpack-trees.c b/unpack-trees.c
index 2e2232c..0fca83b 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -409,17 +409,17 @@ int unpack_trees(struct object_list *trees, struct unpack_trees_options *o)
 			return -1;
 	}
 
-	if (o->trivial_merges_only && o->nontrivial_merge)
-		die("Merge requires file-level merging");
-
 	check_updates(active_cache, active_nr, o);
 	return 0;
 }
 
 /* Here come the merge functions */
 
-static void reject_merge(struct cache_entry *ce)
+static void reject_merge(struct cache_entry *ce,
+			 struct unpack_trees_options *o)
 {
+	if (o->trivial_merges_only)
+		exit(1);
 	die("Entry '%s' would be overwritten by merge. Cannot merge.",
 	    ce->name);
 }
@@ -459,6 +459,8 @@ static void verify_uptodate(struct cache_entry *ce,
 	}
 	if (errno == ENOENT)
 		return;
+	if (o->trivial_merges_only)
+		exit(1);
 	die("Entry '%s' not uptodate. Cannot merge.", ce->name);
 }
 
@@ -473,15 +475,18 @@ static void invalidate_ce_path(struct cache_entry *ce)
  * is not tracked, unless it is ignored.
  */
 static void verify_absent(const char *path, const char *action,
-		struct unpack_trees_options *o)
+			  struct unpack_trees_options *o)
 {
 	struct stat st;
 
 	if (o->index_only || o->reset || !o->update)
 		return;
-	if (!lstat(path, &st) && !(o->dir && excluded(o->dir, path)))
+	if (!lstat(path, &st) && !(o->dir && excluded(o->dir, path))) {
+		if (o->trivial_merges_only)
+			exit(1);
 		die("Untracked working tree file '%s' "
 		    "would be %s by merge.", path, action);
+	}
 }
 
 static int merged_entry(struct cache_entry *merge, struct cache_entry *old,
@@ -617,7 +622,7 @@ int threeway_merge(struct cache_entry **stages,
 	/* #14, #14ALT, #2ALT */
 	if (remote && !df_conflict_head && head_match && !remote_match) {
 		if (index && !same(index, remote) && !same(index, head))
-			reject_merge(index);
+			reject_merge(index, o);
 		return merged_entry(remote, index, o);
 	}
 	/*
@@ -625,7 +630,7 @@ int threeway_merge(struct cache_entry **stages,
 	 * make sure that it matches head.
 	 */
 	if (index && !same(index, head)) {
-		reject_merge(index);
+		reject_merge(index, o);
 	}
 
 	if (head) {
@@ -677,6 +682,8 @@ int threeway_merge(struct cache_entry **stages,
 	}
 
 	o->nontrivial_merge = 1;
+	if (o->trivial_merges_only)
+		exit(1);
 
 	/* #2, #3, #4, #6, #7, #9, #11. */
 	count = 0;
@@ -743,11 +750,11 @@ int twoway_merge(struct cache_entry **src,
 		else {
 			/* all other failures */
 			if (oldtree)
-				reject_merge(oldtree);
+				reject_merge(oldtree, o);
 			if (current)
-				reject_merge(current);
+				reject_merge(current, o);
 			if (newtree)
-				reject_merge(newtree);
+				reject_merge(newtree, o);
 			return -1;
 		}
 	}
-- 
1.5.0.rc1.g120b

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] merge-recursive: do not report the resulting tree object name
  2007-01-13  0:57                       ` Jakub Narebski
@ 2007-01-13 11:01                         ` Johannes Schindelin
  0 siblings, 0 replies; 40+ messages in thread
From: Johannes Schindelin @ 2007-01-13 11:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Hi,

On Sat, 13 Jan 2007, Jakub Narebski wrote:

> Junio C Hamano wrote:
> 
> > I do not think we need to show 1..15 at all, perhaps without
> > "export GIT_MERGE_BASE_DEBUG=YesPlease".
> 
> Or a -v/--verbose (or even -v -v -v) flag set.

... and you'd pass them from git-pull to git-merge to git-merge-recursive? 
Three different programs parse the same option? And worse, the other 
strategies ignore the setting? What about strategies people implemented 
on _their_ side, which do not know about the "-v" flag? I don't think so.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2007-01-13 11:01 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-04 10:47 [PATCH] Speedup recursive by flushing index only once for all entries Alex Riesen
2007-01-04 12:33 ` Johannes Schindelin
2007-01-04 12:47   ` Alex Riesen
2007-01-04 20:22     ` Junio C Hamano
2007-01-05 11:22       ` Alex Riesen
2007-01-07 16:31         ` Alex Riesen
2007-01-10 18:06           ` Junio C Hamano
2007-01-10 19:28           ` Junio C Hamano
2007-01-10 22:11             ` Junio C Hamano
2007-01-10 23:07             ` Alex Riesen
2007-01-10 23:23               ` Linus Torvalds
2007-01-11  8:14                 ` Johannes Schindelin
2007-01-11  9:03                   ` Alex Riesen
2007-01-11 12:11                     ` Alex Riesen
2007-01-11 20:37                       ` Junio C Hamano
2007-01-11  9:02                 ` Alex Riesen
2007-01-11 16:38                   ` Linus Torvalds
2007-01-11 17:43                     ` Alex Riesen
2007-01-11 18:02                       ` Linus Torvalds
2007-01-11 21:48                         ` Alex Riesen
2007-01-11 20:23                     ` Junio C Hamano
2007-01-11 22:10                       ` Alex Riesen
2007-01-11 22:28                         ` Linus Torvalds
2007-01-11 23:53                           ` Junio C Hamano
2007-01-12  0:18                           ` Alex Riesen
2007-01-11  0:34               ` Junio C Hamano
2007-01-11  8:15             ` Johannes Schindelin
2007-01-12 15:48             ` Sergey Vlasov
2007-01-12 17:38               ` Alex Riesen
2007-01-12 20:37                 ` Sergey Vlasov
2007-01-12 18:23               ` Junio C Hamano
2007-01-12 20:09                 ` [PATCH] merge-recursive: do not report the resulting tree object name Junio C Hamano
2007-01-12 23:36                   ` Johannes Schindelin
2007-01-13  0:32                     ` Junio C Hamano
2007-01-13  0:57                       ` Jakub Narebski
2007-01-13 11:01                         ` Johannes Schindelin
2007-01-13  5:14                       ` Shawn O. Pearce
2007-01-13  7:03                         ` Junio C Hamano
2007-01-12 20:30                 ` [PATCH] Speedup recursive by flushing index only once for all entries Alex Riesen
2007-01-12 21:07                 ` Sergey Vlasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.