All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] merge-recursive: replace string_list with hashmap
@ 2017-08-28 20:28 Kevin Willford
  2017-08-28 20:28 ` [PATCH 1/3] merge-recursive: fix memory leak Kevin Willford
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kevin Willford @ 2017-08-28 20:28 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Kevin Willford

Code was using two string_lists, one for the directories and
one for the files.  The code never checks the lists independently
so we should be able to only use one list.  The string_list also
is a O(log n) for lookup and insertion.  Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.

Also cleaned up a memory leak and method where the return was not
being used.

Kevin Willford (3):
  merge-recursive: fix memory leak
  merge-recursive: remove return value from get_files_dirs
  merge-recursive: change current file dir string_lists to hashmap

 merge-recursive.c | 68 +++++++++++++++++++++++++++++++++++++++----------------
 merge-recursive.h |  3 +--
 2 files changed, 49 insertions(+), 22 deletions(-)

-- 
2.14.1.329.g6edf0add19


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] merge-recursive: fix memory leak
  2017-08-28 20:28 [PATCH 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
@ 2017-08-28 20:28 ` Kevin Willford
  2017-08-28 22:42   ` Ben Peart
  2017-08-29  8:12   ` Jeff King
  2017-08-28 20:28 ` [PATCH 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
  2017-08-28 20:28 ` [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
  2 siblings, 2 replies; 15+ messages in thread
From: Kevin Willford @ 2017-08-28 20:28 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Kevin Willford

In merge_trees if process_renames or process_entry returns less
than zero, the method will just return and not free re_merge,
re_head, or entries.

This change cleans up the allocated variables before returning
to the caller.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
 merge-recursive.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb82..033d7cd406 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o,
 		re_merge = get_renames(o, merge, common, head, merge, entries);
 		clean = process_renames(o, re_head, re_merge);
 		if (clean < 0)
-			return clean;
+			goto cleanup;
 		for (i = entries->nr-1; 0 <= i; i--) {
 			const char *path = entries->items[i].string;
 			struct stage_data *e = entries->items[i].util;
@@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o,
 				int ret = process_entry(o, path, e);
 				if (!ret)
 					clean = 0;
-				else if (ret < 0)
-					return ret;
+				else if (ret < 0) {
+					clean = ret;
+					goto cleanup;
+				}
 			}
 		}
 		for (i = 0; i < entries->nr; i++) {
@@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o,
 				    entries->items[i].string);
 		}
 
+cleanup:
 		string_list_clear(re_merge, 0);
 		string_list_clear(re_head, 0);
 		string_list_clear(entries, 1);
@@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o,
 		free(re_merge);
 		free(re_head);
 		free(entries);
+
+		if (clean < 0)
+			return clean;
 	}
 	else
 		clean = 1;
-- 
2.14.1.329.g6edf0add19


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-28 20:28 [PATCH 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
  2017-08-28 20:28 ` [PATCH 1/3] merge-recursive: fix memory leak Kevin Willford
@ 2017-08-28 20:28 ` Kevin Willford
  2017-08-28 22:45   ` Ben Peart
  2017-08-29  8:17   ` Jeff King
  2017-08-28 20:28 ` [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
  2 siblings, 2 replies; 15+ messages in thread
From: Kevin Willford @ 2017-08-28 20:28 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Kevin Willford

The return value of the get_files_dirs call is never being used.
Looking at the history of the file and it was originally only
being used for debug output statements.  Also when
read_tree_recursive return value is non zero it is changed to
zero.  This leads me to believe that it doesn't matter if
read_tree_recursive gets an error.

Since the debug output has been removed and the caller isn't
checking the return value there is no reason to keep calulating
and returning a value.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
 merge-recursive.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 033d7cd406..d47f40ea81 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1,
 	return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
 }
 
-static int get_files_dirs(struct merge_options *o, struct tree *tree)
+static void get_files_dirs(struct merge_options *o, struct tree *tree)
 {
-	int n;
 	struct pathspec match_all;
 	memset(&match_all, 0, sizeof(match_all));
-	if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o))
-		return 0;
-	n = o->current_file_set.nr + o->current_directory_set.nr;
-	return n;
+	read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o);
 }
 
 /*
-- 
2.14.1.329.g6edf0add19


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap
  2017-08-28 20:28 [PATCH 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
  2017-08-28 20:28 ` [PATCH 1/3] merge-recursive: fix memory leak Kevin Willford
  2017-08-28 20:28 ` [PATCH 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
@ 2017-08-28 20:28 ` Kevin Willford
  2017-08-28 23:06   ` Ben Peart
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Kevin Willford @ 2017-08-28 20:28 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Kevin Willford

The code was using two string_lists, one for the directories and
one for the files.  The code never checks the lists independently
so we should be able to only use one list.  The string_list also
is a O(log n) for lookup and insertion.  Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
 merge-recursive.c | 48 +++++++++++++++++++++++++++++++++++++-----------
 merge-recursive.h |  3 +--
 2 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index d47f40ea81..ef4fe5f09f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -24,6 +24,26 @@
 #include "dir.h"
 #include "submodule.h"
 
+struct path_hashmap_entry {
+	struct hashmap_entry;
+	char path[FLEX_ARRAY];
+};
+
+static unsigned int (*pathhash)(const char *path);
+static int (*pathcmp)(const char *a, const char *b);
+
+static int path_hashmap_cmp(const void *cmp_data,
+			    const void *entry,
+			    const void *entry_or_key,
+			    const void *keydata)
+{
+	const struct path_hashmap_entry *a = entry;
+	const struct path_hashmap_entry *b = entry_or_key;
+	const char *key = keydata;
+
+	return pathcmp(a->path, key ? key : b->path);
+}
+
 static void flush_output(struct merge_options *o)
 {
 	if (o->buffer_output < 2 && o->obuf.len) {
@@ -314,15 +334,15 @@ static int save_files_dirs(const unsigned char *sha1,
 		struct strbuf *base, const char *path,
 		unsigned int mode, int stage, void *context)
 {
+	struct path_hashmap_entry *entry;
 	int baselen = base->len;
 	struct merge_options *o = context;
 
 	strbuf_addstr(base, path);
 
-	if (S_ISDIR(mode))
-		string_list_insert(&o->current_directory_set, base->buf);
-	else
-		string_list_insert(&o->current_file_set, base->buf);
+	FLEX_ALLOC_MEM(entry, path, base->buf, base->len);
+	hashmap_entry_init(entry, pathhash(entry->path));
+	hashmap_add(&o->current_file_dir_set, entry);
 
 	strbuf_setlen(base, baselen);
 	return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
@@ -642,6 +662,8 @@ static void add_flattened_path(struct strbuf *out, const char *s)
 
 static char *unique_path(struct merge_options *o, const char *path, const char *branch)
 {
+	struct path_hashmap_entry dummy;
+	struct path_hashmap_entry *entry;
 	struct strbuf newpath = STRBUF_INIT;
 	int suffix = 0;
 	size_t base_len;
@@ -650,14 +672,17 @@ static char *unique_path(struct merge_options *o, const char *path, const char *
 	add_flattened_path(&newpath, branch);
 
 	base_len = newpath.len;
-	while (string_list_has_string(&o->current_file_set, newpath.buf) ||
-	       string_list_has_string(&o->current_directory_set, newpath.buf) ||
+	hashmap_entry_init(&dummy, pathhash(newpath.buf));
+	while (hashmap_get(&o->current_file_dir_set, &dummy, newpath.buf) ||
 	       (!o->call_depth && file_exists(newpath.buf))) {
 		strbuf_setlen(&newpath, base_len);
 		strbuf_addf(&newpath, "_%d", suffix++);
+		hashmap_entry_init(&dummy, pathhash(newpath.buf));
 	}
 
-	string_list_insert(&o->current_file_set, newpath.buf);
+	FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len);
+	hashmap_entry_init(entry, pathhash(entry->path));
+	hashmap_add(&o->current_file_dir_set, entry);
 	return strbuf_detach(&newpath, NULL);
 }
 
@@ -1941,8 +1966,7 @@ int merge_trees(struct merge_options *o,
 	if (unmerged_cache()) {
 		struct string_list *entries, *re_head, *re_merge;
 		int i;
-		string_list_clear(&o->current_file_set, 1);
-		string_list_clear(&o->current_directory_set, 1);
+		hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512);
 		get_files_dirs(o, head);
 		get_files_dirs(o, merge);
 
@@ -1978,6 +2002,8 @@ int merge_trees(struct merge_options *o,
 		string_list_clear(re_head, 0);
 		string_list_clear(entries, 1);
 
+		hashmap_free(&o->current_file_dir_set, 1);
+
 		free(re_merge);
 		free(re_head);
 		free(entries);
@@ -2179,8 +2205,8 @@ void init_merge_options(struct merge_options *o)
 	if (o->verbosity >= 5)
 		o->buffer_output = 0;
 	strbuf_init(&o->obuf, 0);
-	string_list_init(&o->current_file_set, 1);
-	string_list_init(&o->current_directory_set, 1);
+	pathhash = ignore_case ? strihash : strhash;
+	pathcmp = ignore_case ? strcasecmp : strcmp;
 	string_list_init(&o->df_conflict_file_set, 1);
 }
 
diff --git a/merge-recursive.h b/merge-recursive.h
index 735343b413..80d69d1401 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -25,8 +25,7 @@ struct merge_options {
 	int show_rename_progress;
 	int call_depth;
 	struct strbuf obuf;
-	struct string_list current_file_set;
-	struct string_list current_directory_set;
+	struct hashmap current_file_dir_set;
 	struct string_list df_conflict_file_set;
 };
 
-- 
2.14.1.329.g6edf0add19


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] merge-recursive: fix memory leak
  2017-08-28 20:28 ` [PATCH 1/3] merge-recursive: fix memory leak Kevin Willford
@ 2017-08-28 22:42   ` Ben Peart
  2017-08-29  8:12   ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-08-28 22:42 UTC (permalink / raw)
  To: Kevin Willford, git; +Cc: gitster, peff



On 8/28/2017 4:28 PM, Kevin Willford wrote:
> In merge_trees if process_renames or process_entry returns less
> than zero, the method will just return and not free re_merge,
> re_head, or entries.
> 
> This change cleans up the allocated variables before returning
> to the caller.
> 
> Signed-off-by: Kevin Willford <kewillf@microsoft.com>

Nice catch on the leaks.  Looks good to me.

> ---
>   merge-recursive.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 1494ffdb82..033d7cd406 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o,
>   		re_merge = get_renames(o, merge, common, head, merge, entries);
>   		clean = process_renames(o, re_head, re_merge);
>   		if (clean < 0)
> -			return clean;
> +			goto cleanup;
>   		for (i = entries->nr-1; 0 <= i; i--) {
>   			const char *path = entries->items[i].string;
>   			struct stage_data *e = entries->items[i].util;
> @@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o,
>   				int ret = process_entry(o, path, e);
>   				if (!ret)
>   					clean = 0;
> -				else if (ret < 0)
> -					return ret;
> +				else if (ret < 0) {
> +					clean = ret;
> +					goto cleanup;
> +				}
>   			}
>   		}
>   		for (i = 0; i < entries->nr; i++) {
> @@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o,
>   				    entries->items[i].string);
>   		}
>   
> +cleanup:
>   		string_list_clear(re_merge, 0);
>   		string_list_clear(re_head, 0);
>   		string_list_clear(entries, 1);
> @@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o,
>   		free(re_merge);
>   		free(re_head);
>   		free(entries);
> +
> +		if (clean < 0)
> +			return clean;
>   	}
>   	else
>   		clean = 1;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-28 20:28 ` [PATCH 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
@ 2017-08-28 22:45   ` Ben Peart
  2017-08-29  8:19     ` Jeff King
  2017-08-29  8:17   ` Jeff King
  1 sibling, 1 reply; 15+ messages in thread
From: Ben Peart @ 2017-08-28 22:45 UTC (permalink / raw)
  To: Kevin Willford, git; +Cc: gitster, peff



On 8/28/2017 4:28 PM, Kevin Willford wrote:
> The return value of the get_files_dirs call is never being used.
> Looking at the history of the file and it was originally only
> being used for debug output statements.  Also when
> read_tree_recursive return value is non zero it is changed to
> zero.  This leads me to believe that it doesn't matter if
> read_tree_recursive gets an error.
> 
> Since the debug output has been removed and the caller isn't
> checking the return value there is no reason to keep calulating
> and returning a value.
> 

Did a quick search and you're right, nothing is using the return value. 
No reason to spend time calculating it.  Looks good.

> Signed-off-by: Kevin Willford <kewillf@microsoft.com>
> ---
>   merge-recursive.c | 8 ++------
>   1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 033d7cd406..d47f40ea81 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1,
>   	return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
>   }
>   
> -static int get_files_dirs(struct merge_options *o, struct tree *tree)
> +static void get_files_dirs(struct merge_options *o, struct tree *tree)
>   {
> -	int n;
>   	struct pathspec match_all;
>   	memset(&match_all, 0, sizeof(match_all));
> -	if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o))
> -		return 0;
> -	n = o->current_file_set.nr + o->current_directory_set.nr;
> -	return n;
> +	read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o);
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap
  2017-08-28 20:28 ` [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
@ 2017-08-28 23:06   ` Ben Peart
  2017-08-29  8:41   ` Jeff King
  2017-09-06  3:35   ` Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-08-28 23:06 UTC (permalink / raw)
  To: Kevin Willford, git; +Cc: gitster, peff



On 8/28/2017 4:28 PM, Kevin Willford wrote:
> The code was using two string_lists, one for the directories and
> one for the files.  The code never checks the lists independently
> so we should be able to only use one list.  The string_list also
> is a O(log n) for lookup and insertion.  Switching this to use a
> hashmap will give O(1) which will save some time when there are
> millions of paths that will be checked.
> 

Good observation that the two string lists were never used 
independently.  Perhaps the intent was to help the O(log n) issue by 
splitting them up?  A single hashmap is much faster.  Looks good.


> Signed-off-by: Kevin Willford <kewillf@microsoft.com>
> ---
>   merge-recursive.c | 48 +++++++++++++++++++++++++++++++++++++-----------
>   merge-recursive.h |  3 +--
>   2 files changed, 38 insertions(+), 13 deletions(-)
> 
> diff --git a/merge-recursive.c b/merge-recursive.c
> index d47f40ea81..ef4fe5f09f 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -24,6 +24,26 @@
>   #include "dir.h"
>   #include "submodule.h"
>   
> +struct path_hashmap_entry {
> +	struct hashmap_entry;
> +	char path[FLEX_ARRAY];
> +};
> +
> +static unsigned int (*pathhash)(const char *path);
> +static int (*pathcmp)(const char *a, const char *b);
> +
> +static int path_hashmap_cmp(const void *cmp_data,
> +			    const void *entry,
> +			    const void *entry_or_key,
> +			    const void *keydata)
> +{
> +	const struct path_hashmap_entry *a = entry;
> +	const struct path_hashmap_entry *b = entry_or_key;
> +	const char *key = keydata;
> +
> +	return pathcmp(a->path, key ? key : b->path);
> +}
> +
>   static void flush_output(struct merge_options *o)
>   {
>   	if (o->buffer_output < 2 && o->obuf.len) {
> @@ -314,15 +334,15 @@ static int save_files_dirs(const unsigned char *sha1,
>   		struct strbuf *base, const char *path,
>   		unsigned int mode, int stage, void *context)
>   {
> +	struct path_hashmap_entry *entry;
>   	int baselen = base->len;
>   	struct merge_options *o = context;
>   
>   	strbuf_addstr(base, path);
>   
> -	if (S_ISDIR(mode))
> -		string_list_insert(&o->current_directory_set, base->buf);
> -	else
> -		string_list_insert(&o->current_file_set, base->buf);
> +	FLEX_ALLOC_MEM(entry, path, base->buf, base->len);
> +	hashmap_entry_init(entry, pathhash(entry->path));
> +	hashmap_add(&o->current_file_dir_set, entry);
>   
>   	strbuf_setlen(base, baselen);
>   	return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> @@ -642,6 +662,8 @@ static void add_flattened_path(struct strbuf *out, const char *s)
>   
>   static char *unique_path(struct merge_options *o, const char *path, const char *branch)
>   {
> +	struct path_hashmap_entry dummy;
> +	struct path_hashmap_entry *entry;
>   	struct strbuf newpath = STRBUF_INIT;
>   	int suffix = 0;
>   	size_t base_len;
> @@ -650,14 +672,17 @@ static char *unique_path(struct merge_options *o, const char *path, const char *
>   	add_flattened_path(&newpath, branch);
>   
>   	base_len = newpath.len;
> -	while (string_list_has_string(&o->current_file_set, newpath.buf) ||
> -	       string_list_has_string(&o->current_directory_set, newpath.buf) ||
> +	hashmap_entry_init(&dummy, pathhash(newpath.buf));
> +	while (hashmap_get(&o->current_file_dir_set, &dummy, newpath.buf) ||
>   	       (!o->call_depth && file_exists(newpath.buf))) {
>   		strbuf_setlen(&newpath, base_len);
>   		strbuf_addf(&newpath, "_%d", suffix++);
> +		hashmap_entry_init(&dummy, pathhash(newpath.buf));
>   	}
>   
> -	string_list_insert(&o->current_file_set, newpath.buf);
> +	FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len);
> +	hashmap_entry_init(entry, pathhash(entry->path));
> +	hashmap_add(&o->current_file_dir_set, entry);
>   	return strbuf_detach(&newpath, NULL);
>   }
>   
> @@ -1941,8 +1966,7 @@ int merge_trees(struct merge_options *o,
>   	if (unmerged_cache()) {
>   		struct string_list *entries, *re_head, *re_merge;
>   		int i;
> -		string_list_clear(&o->current_file_set, 1);
> -		string_list_clear(&o->current_directory_set, 1);
> +		hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512);
>   		get_files_dirs(o, head);
>   		get_files_dirs(o, merge);
>   
> @@ -1978,6 +2002,8 @@ int merge_trees(struct merge_options *o,
>   		string_list_clear(re_head, 0);
>   		string_list_clear(entries, 1);
>   
> +		hashmap_free(&o->current_file_dir_set, 1);
> +
>   		free(re_merge);
>   		free(re_head);
>   		free(entries);
> @@ -2179,8 +2205,8 @@ void init_merge_options(struct merge_options *o)
>   	if (o->verbosity >= 5)
>   		o->buffer_output = 0;
>   	strbuf_init(&o->obuf, 0);
> -	string_list_init(&o->current_file_set, 1);
> -	string_list_init(&o->current_directory_set, 1);
> +	pathhash = ignore_case ? strihash : strhash;
> +	pathcmp = ignore_case ? strcasecmp : strcmp;
>   	string_list_init(&o->df_conflict_file_set, 1);
>   }
>   
> diff --git a/merge-recursive.h b/merge-recursive.h
> index 735343b413..80d69d1401 100644
> --- a/merge-recursive.h
> +++ b/merge-recursive.h
> @@ -25,8 +25,7 @@ struct merge_options {
>   	int show_rename_progress;
>   	int call_depth;
>   	struct strbuf obuf;
> -	struct string_list current_file_set;
> -	struct string_list current_directory_set;
> +	struct hashmap current_file_dir_set;
>   	struct string_list df_conflict_file_set;
>   };
>   
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] merge-recursive: fix memory leak
  2017-08-28 20:28 ` [PATCH 1/3] merge-recursive: fix memory leak Kevin Willford
  2017-08-28 22:42   ` Ben Peart
@ 2017-08-29  8:12   ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-08-29  8:12 UTC (permalink / raw)
  To: Kevin Willford; +Cc: git, gitster

On Mon, Aug 28, 2017 at 02:28:27PM -0600, Kevin Willford wrote:

> In merge_trees if process_renames or process_entry returns less
> than zero, the method will just return and not free re_merge,
> re_head, or entries.
> 
> This change cleans up the allocated variables before returning
> to the caller.

Good catch. I suspect this function could stand to be refactored a bit.
For instance, pulling those inner bits of the conditional into a helper
would let us do:

  re_merge = get_renames(...);
  ... other setup ...

  clean = our_new_helper(re_merge, ...);

  string_clear(re_merge);
  ... other cleanup ...

  if (clean < 0)
	return clean;

without having to resort to a goto. But certainly I don't mind this much
more minimal change, which fixes the actual functionality problem.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-28 20:28 ` [PATCH 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
  2017-08-28 22:45   ` Ben Peart
@ 2017-08-29  8:17   ` Jeff King
  2017-08-29 15:58     ` Kevin Willford
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-08-29  8:17 UTC (permalink / raw)
  To: Kevin Willford; +Cc: git, gitster

On Mon, Aug 28, 2017 at 02:28:28PM -0600, Kevin Willford wrote:

> The return value of the get_files_dirs call is never being used.
> Looking at the history of the file and it was originally only
> being used for debug output statements.  Also when
> read_tree_recursive return value is non zero it is changed to
> zero.  This leads me to believe that it doesn't matter if
> read_tree_recursive gets an error.

Or that the function is buggy. :)

I'm tempted to say that we should probably die() when
read_tree_recursive fails. This should only happen if we fail to parse
the tree, or if our callback (save_files_dirs here) returns failure, and
the latter looks like it never happens.

> Since the debug output has been removed and the caller isn't
> checking the return value there is no reason to keep calulating
> and returning a value.

Agreed, and I'm happy to see dead code go.

Minor nit: s/calulating/calculating/ in your commit message.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-28 22:45   ` Ben Peart
@ 2017-08-29  8:19     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-08-29  8:19 UTC (permalink / raw)
  To: Ben Peart; +Cc: Kevin Willford, git, gitster

On Mon, Aug 28, 2017 at 06:45:29PM -0400, Ben Peart wrote:

> > Since the debug output has been removed and the caller isn't
> > checking the return value there is no reason to keep calulating
> > and returning a value.
> > 
> 
> Did a quick search and you're right, nothing is using the return value. No
> reason to spend time calculating it.  Looks good.

This is actually one we can reasonably trust the compiler to check for
us. Though of course it doesn't hurt to look at the callers and make
sure that it's sane that they do not care about this return value (I
don't know merge-recursive.c very well, but the current return value
does seem kind of useless).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap
  2017-08-28 20:28 ` [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
  2017-08-28 23:06   ` Ben Peart
@ 2017-08-29  8:41   ` Jeff King
  2017-09-06  3:35   ` Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-08-29  8:41 UTC (permalink / raw)
  To: Kevin Willford; +Cc: git, gitster

On Mon, Aug 28, 2017 at 02:28:29PM -0600, Kevin Willford wrote:

> The code was using two string_lists, one for the directories and
> one for the files.  The code never checks the lists independently
> so we should be able to only use one list.  The string_list also
> is a O(log n) for lookup and insertion.  Switching this to use a
> hashmap will give O(1) which will save some time when there are
> millions of paths that will be checked.

Makes sense. I think when this code was written we didn't have the
generalized hashmap.c, which is why it relies on string lists.

> diff --git a/merge-recursive.c b/merge-recursive.c
> index d47f40ea81..ef4fe5f09f 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -24,6 +24,26 @@
>  #include "dir.h"
>  #include "submodule.h"
>  
> +struct path_hashmap_entry {
> +	struct hashmap_entry;
> +	char path[FLEX_ARRAY];
> +};

This seems like a good use of FLEX_ARRAY.

> +static unsigned int (*pathhash)(const char *path);
> +static int (*pathcmp)(const char *a, const char *b);

These global function pointers are sufficiently out of the ordinary that
it might be worth adding a comment about how they're intended to be
set and used.

I also suspect they could be stuffed into the merge_options struct along
with the hashmap itself, and then passed via the cmp_data pointer (which
is relatively new-ish; I won't be surprised if you wrote this patch
before that existed and forward-ported it).

> +static int path_hashmap_cmp(const void *cmp_data,
> +			    const void *entry,
> +			    const void *entry_or_key,
> +			    const void *keydata)
> +{
> +	const struct path_hashmap_entry *a = entry;
> +	const struct path_hashmap_entry *b = entry_or_key;
> +	const char *key = keydata;
> +
> +	return pathcmp(a->path, key ? key : b->path);
> +}

The hashmap interface makes its cmp functions a bit tricky to write, but
this all looks correct to me.

> @@ -642,6 +662,8 @@ static void add_flattened_path(struct strbuf *out, const char *s)
>  
>  static char *unique_path(struct merge_options *o, const char *path, const char *branch)
>  {
> +	struct path_hashmap_entry dummy;
> +	struct path_hashmap_entry *entry;
>  	struct strbuf newpath = STRBUF_INIT;
>  	int suffix = 0;
>  	size_t base_len;
> @@ -650,14 +672,17 @@ static char *unique_path(struct merge_options *o, const char *path, const char *
>  	add_flattened_path(&newpath, branch);
>  
>  	base_len = newpath.len;
> -	while (string_list_has_string(&o->current_file_set, newpath.buf) ||
> -	       string_list_has_string(&o->current_directory_set, newpath.buf) ||
> +	hashmap_entry_init(&dummy, pathhash(newpath.buf));
> +	while (hashmap_get(&o->current_file_dir_set, &dummy, newpath.buf) ||
>  	       (!o->call_depth && file_exists(newpath.buf))) {
>  		strbuf_setlen(&newpath, base_len);
>  		strbuf_addf(&newpath, "_%d", suffix++);
> +		hashmap_entry_init(&dummy, pathhash(newpath.buf));
>  	}

I think you can use hashmap_get_from_hash() here to avoid dealing with
"dummy" yourself.

> @@ -1941,8 +1966,7 @@ int merge_trees(struct merge_options *o,
>  	if (unmerged_cache()) {
>  		struct string_list *entries, *re_head, *re_merge;
>  		int i;
> -		string_list_clear(&o->current_file_set, 1);
> -		string_list_clear(&o->current_directory_set, 1);
> +		hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512);

This hunk seems funny. string_list_clear() assumes the input is an
already-initialized string list, and we drop all of its entries. But
hashmap_init assumes we have an _uninitialized_ chunk of memory, and
throws away whatever was there.

Do we ever hit this code path when there's actually something in the
hashmap?

I'd think we would want to hashmap_init() at the same place we used to
string_list_init(). I.e., in init_merge_options.

> @@ -1978,6 +2002,8 @@ int merge_trees(struct merge_options *o,
>  		string_list_clear(re_head, 0);
>  		string_list_clear(entries, 1);
>  
> +		hashmap_free(&o->current_file_dir_set, 1);

Ah, I think this answers my question above. We only care about the
contents during this one bit of code, so we do an init/free pair. And
since there's no "clear" function for a hashmap that drops the entries
without resetting things like the comparison function, we _have_ to
call hashmap_init in the earlier hunk.

That does make me wonder if it would be sane to just declare the hashmap
local to this block, which would make its lifetime much more clear. I
guess having it in merge_options is convenient for passing it around,
though.

> @@ -2179,8 +2205,8 @@ void init_merge_options(struct merge_options *o)
> [...]

The rest of the patch looks like a straight-forward conversion. With the
understanding I have above, I think everything here is correct, but
there are a few minor bits it might be worth addressing to make the code
more clear.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-29  8:17   ` Jeff King
@ 2017-08-29 15:58     ` Kevin Willford
  2017-08-29 16:50       ` Jeff King
  2017-08-31 18:12       ` Stefan Beller
  0 siblings, 2 replies; 15+ messages in thread
From: Kevin Willford @ 2017-08-29 15:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster

> 
> On Mon, Aug 28, 2017 at 02:28:28PM -0600, Kevin Willford wrote:
> 
> > The return value of the get_files_dirs call is never being used.
> > Looking at the history of the file and it was originally only
> > being used for debug output statements.  Also when
> > read_tree_recursive return value is non zero it is changed to
> > zero.  This leads me to believe that it doesn't matter if
> > read_tree_recursive gets an error.
> 
> Or that the function is buggy. :)

That was one of my questions as well.  Should read_tree_recursive
be propagating a -1 and merge_trees be checking for that and bail
when the call to get_files_dirs return is < 0?  I made a commit with
this change and ran the tests and they all still passed so either this
return really doesn't matter or there are not sufficient tests covering
it.

I went with this change because it was not changing any of the
current functionality and if we find a case where it matters that
read_tree_recursive fails due to bad tree or something else we
can address it then.

> 
> I'm tempted to say that we should probably die() when
> read_tree_recursive fails. This should only happen if we fail to parse
> the tree, or if our callback (save_files_dirs here) returns failure, and
> the latter looks like it never happens.
> 
> > Since the debug output has been removed and the caller isn't
> > checking the return value there is no reason to keep calulating
> > and returning a value.
> 
> Agreed, and I'm happy to see dead code go.
> 
> Minor nit: s/calulating/calculating/ in your commit message.

When will that spell checker for git messages be ready? ;)

> 
> -Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-29 15:58     ` Kevin Willford
@ 2017-08-29 16:50       ` Jeff King
  2017-08-31 18:12       ` Stefan Beller
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-08-29 16:50 UTC (permalink / raw)
  To: Kevin Willford; +Cc: git, gitster

On Tue, Aug 29, 2017 at 03:58:00PM +0000, Kevin Willford wrote:

> > > The return value of the get_files_dirs call is never being used.
> > > Looking at the history of the file and it was originally only
> > > being used for debug output statements.  Also when
> > > read_tree_recursive return value is non zero it is changed to
> > > zero.  This leads me to believe that it doesn't matter if
> > > read_tree_recursive gets an error.
> > 
> > Or that the function is buggy. :)
> 
> That was one of my questions as well.  Should read_tree_recursive
> be propagating a -1 and merge_trees be checking for that and bail
> when the call to get_files_dirs return is < 0?  I made a commit with
> this change and ran the tests and they all still passed so either this
> return really doesn't matter or there are not sufficient tests covering
> it.

Right, in a normal flow I'd say that it should propagate the -1, etc.
But since the only possible error is parse_tree() failing, which happens
in a corrupt repository, I think it would be OK to just die(). That
saves having to deal with error propagation. It's not very graceful,
perhaps, but the important thing is that we don't quietly ignore the
error.

> I went with this change because it was not changing any of the
> current functionality and if we find a case where it matters that
> read_tree_recursive fails due to bad tree or something else we
> can address it then.

I think it's worth doing while we're thinking about it now, but it
certainly can be a separate patch.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
  2017-08-29 15:58     ` Kevin Willford
  2017-08-29 16:50       ` Jeff King
@ 2017-08-31 18:12       ` Stefan Beller
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan Beller @ 2017-08-31 18:12 UTC (permalink / raw)
  To: Kevin Willford; +Cc: Jeff King, git, gitster

On Tue, Aug 29, 2017 at 8:58 AM, Kevin Willford <kewillf@microsoft.com> wrote:
>>
>> On Mon, Aug 28, 2017 at 02:28:28PM -0600, Kevin Willford wrote:
>>
>> > The return value of the get_files_dirs call is never being used.
>> > Looking at the history of the file and it was originally only
>> > being used for debug output statements.  Also when
>> > read_tree_recursive return value is non zero it is changed to
>> > zero.  This leads me to believe that it doesn't matter if
>> > read_tree_recursive gets an error.
>>
>> Or that the function is buggy. :)
>
> That was one of my questions as well.  Should read_tree_recursive
> be propagating a -1 and merge_trees be checking for that and bail
> when the call to get_files_dirs return is < 0?  I made a commit with
> this change and ran the tests and they all still passed so either this
> return really doesn't matter or there are not sufficient tests covering
> it.
>
> I went with this change because it was not changing any of the
> current functionality and if we find a case where it matters that
> read_tree_recursive fails due to bad tree or something else we
> can address it then.
>
>>
>> I'm tempted to say that we should probably die() when
>> read_tree_recursive fails. This should only happen if we fail to parse
>> the tree, or if our callback (save_files_dirs here) returns failure, and
>> the latter looks like it never happens.
>>
>> > Since the debug output has been removed and the caller isn't
>> > checking the return value there is no reason to keep calulating
>> > and returning a value.
>>
>> Agreed, and I'm happy to see dead code go.
>>
>> Minor nit: s/calulating/calculating/ in your commit message.
>
> When will that spell checker for git messages be ready? ;)

Different workflows need different setups apparently.
(me, as a heavy user of git-gui, likes the builtin spell checker,
though I need to find a way to train it to accept git lingo, such
as "submodule", or "gitlink")

Maybe:
https://tarasalenin.wordpress.com/2010/09/11/configure-git-gui-spell-checker-on-windows/

If you do not use git-gui... you are at the merci of your $EDITOR
($GIT_EDITOR, or core.editor) to have spell checking.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap
  2017-08-28 20:28 ` [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
  2017-08-28 23:06   ` Ben Peart
  2017-08-29  8:41   ` Jeff King
@ 2017-09-06  3:35   ` Junio C Hamano
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-09-06  3:35 UTC (permalink / raw)
  To: Kevin Willford; +Cc: git, peff

I needed this squashed into before I could even compile the
resulting code.  Perhaps my compiler is stale?



merge-recursive.c:28:22: error: declaration does not declare anything [-Werror]
  struct hashmap_entry;
                      ^
merge-recursive.c:29:7: error: flexible array member in otherwise empty struct
  char path[FLEX_ARRAY];
       ^
---
 merge-recursive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index ef4fe5f09f..1cd35db3f0 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -25,7 +25,7 @@
 #include "submodule.h"
 
 struct path_hashmap_entry {
-	struct hashmap_entry;
+	struct hashmap_entry e;
 	char path[FLEX_ARRAY];
 };
 
-- 
2.14.1-546-g97ae4c876d


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-06  3:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 20:28 [PATCH 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-08-28 20:28 ` [PATCH 1/3] merge-recursive: fix memory leak Kevin Willford
2017-08-28 22:42   ` Ben Peart
2017-08-29  8:12   ` Jeff King
2017-08-28 20:28 ` [PATCH 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
2017-08-28 22:45   ` Ben Peart
2017-08-29  8:19     ` Jeff King
2017-08-29  8:17   ` Jeff King
2017-08-29 15:58     ` Kevin Willford
2017-08-29 16:50       ` Jeff King
2017-08-31 18:12       ` Stefan Beller
2017-08-28 20:28 ` [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
2017-08-28 23:06   ` Ben Peart
2017-08-29  8:41   ` Jeff King
2017-09-06  3:35   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.