* [PATCH v3 0/3] merge-recursive: replace string_list with hashmap
@ 2017-09-07 16:25 Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
Code was using two string_lists, one for the directories and
one for the files. The code never checks the lists independently
so we should be able to only use one list. The string_list also
is a O(log n) for lookup and insertion. Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.
Also cleaned up a memory leak and method where the return was not
being used.
Changes since last version:
1. Removed the function pointers and just check the ignore_case in the
compare and hash methods.
2. Added a comment about the hashmap and why it is getting initialized
and freed but not a local.
3. Use hashmap_get_from_hash and remove the dummy entry
Kevin Willford (3):
merge-recursive: fix memory leak
merge-recursive: remove return value from get_files_dirs
merge-recursive: change current file dir string_lists to hashmap
merge-recursive.c | 76 ++++++++++++++++++++++++++++++++++++++++---------------
merge-recursive.h | 3 +--
2 files changed, 57 insertions(+), 22 deletions(-)
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/3] merge-recursive: fix memory leak
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
@ 2017-09-07 16:25 ` Kevin Willford
2017-09-07 16:25 ` [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
2017-09-07 16:25 ` [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
2 siblings, 0 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
In merge_trees if process_renames or process_entry returns less
than zero, the method will just return and not free re_merge,
re_head, or entries.
This change cleans up the allocated variables before returning
to the caller.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
merge-recursive.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb82..033d7cd406 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o,
re_merge = get_renames(o, merge, common, head, merge, entries);
clean = process_renames(o, re_head, re_merge);
if (clean < 0)
- return clean;
+ goto cleanup;
for (i = entries->nr-1; 0 <= i; i--) {
const char *path = entries->items[i].string;
struct stage_data *e = entries->items[i].util;
@@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o,
int ret = process_entry(o, path, e);
if (!ret)
clean = 0;
- else if (ret < 0)
- return ret;
+ else if (ret < 0) {
+ clean = ret;
+ goto cleanup;
+ }
}
}
for (i = 0; i < entries->nr; i++) {
@@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o,
entries->items[i].string);
}
+cleanup:
string_list_clear(re_merge, 0);
string_list_clear(re_head, 0);
string_list_clear(entries, 1);
@@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o,
free(re_merge);
free(re_head);
free(entries);
+
+ if (clean < 0)
+ return clean;
}
else
clean = 1;
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
@ 2017-09-07 16:25 ` Kevin Willford
2017-09-07 16:25 ` [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
2 siblings, 0 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
The return value of the get_files_dirs call is never being used.
Looking at the history of the file and it was originally only
being used for debug output statements. Also when
read_tree_recursive return value is non zero it is changed to
zero. This leads me to believe that it doesn't matter if
read_tree_recursive gets an error.
Since the debug output has been removed and the caller isn't
checking the return value there is no reason to keep calculating
and returning a value.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
merge-recursive.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 033d7cd406..d47f40ea81 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1,
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
}
-static int get_files_dirs(struct merge_options *o, struct tree *tree)
+static void get_files_dirs(struct merge_options *o, struct tree *tree)
{
- int n;
struct pathspec match_all;
memset(&match_all, 0, sizeof(match_all));
- if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o))
- return 0;
- n = o->current_file_set.nr + o->current_directory_set.nr;
- return n;
+ read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o);
}
/*
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
2017-09-07 16:25 ` [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
@ 2017-09-07 16:25 ` Kevin Willford
2 siblings, 0 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
The code was using two string_lists, one for the directories and
one for the files. The code never checks the lists independently
so we should be able to only use one list. The string_list also
is a O(log n) for lookup and insertion. Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
merge-recursive.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-----------
merge-recursive.h | 3 +--
2 files changed, 46 insertions(+), 13 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index d47f40ea81..35af3761ba 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -24,6 +24,31 @@
#include "dir.h"
#include "submodule.h"
+struct path_hashmap_entry {
+ struct hashmap_entry e;
+ char path[FLEX_ARRAY];
+};
+
+static int path_hashmap_cmp(const void *cmp_data,
+ const void *entry,
+ const void *entry_or_key,
+ const void *keydata)
+{
+ const struct path_hashmap_entry *a = entry;
+ const struct path_hashmap_entry *b = entry_or_key;
+ const char *key = keydata;
+
+ if (ignore_case)
+ return strcasecmp(a->path, key ? key : b->path);
+ else
+ return strcmp(a->path, key ? key : b->path);
+}
+
+static unsigned int path_hash(const char *path)
+{
+ return ignore_case ? strihash(path) : strhash(path);
+}
+
static void flush_output(struct merge_options *o)
{
if (o->buffer_output < 2 && o->obuf.len) {
@@ -314,15 +339,15 @@ static int save_files_dirs(const unsigned char *sha1,
struct strbuf *base, const char *path,
unsigned int mode, int stage, void *context)
{
+ struct path_hashmap_entry *entry;
int baselen = base->len;
struct merge_options *o = context;
strbuf_addstr(base, path);
- if (S_ISDIR(mode))
- string_list_insert(&o->current_directory_set, base->buf);
- else
- string_list_insert(&o->current_file_set, base->buf);
+ FLEX_ALLOC_MEM(entry, path, base->buf, base->len);
+ hashmap_entry_init(entry, path_hash(entry->path));
+ hashmap_add(&o->current_file_dir_set, entry);
strbuf_setlen(base, baselen);
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
@@ -642,6 +667,7 @@ static void add_flattened_path(struct strbuf *out, const char *s)
static char *unique_path(struct merge_options *o, const char *path, const char *branch)
{
+ struct path_hashmap_entry *entry;
struct strbuf newpath = STRBUF_INIT;
int suffix = 0;
size_t base_len;
@@ -650,14 +676,16 @@ static char *unique_path(struct merge_options *o, const char *path, const char *
add_flattened_path(&newpath, branch);
base_len = newpath.len;
- while (string_list_has_string(&o->current_file_set, newpath.buf) ||
- string_list_has_string(&o->current_directory_set, newpath.buf) ||
+ while (hashmap_get_from_hash(&o->current_file_dir_set,
+ path_hash(newpath.buf), newpath.buf) ||
(!o->call_depth && file_exists(newpath.buf))) {
strbuf_setlen(&newpath, base_len);
strbuf_addf(&newpath, "_%d", suffix++);
}
- string_list_insert(&o->current_file_set, newpath.buf);
+ FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len);
+ hashmap_entry_init(entry, path_hash(entry->path));
+ hashmap_add(&o->current_file_dir_set, entry);
return strbuf_detach(&newpath, NULL);
}
@@ -1941,8 +1969,14 @@ int merge_trees(struct merge_options *o,
if (unmerged_cache()) {
struct string_list *entries, *re_head, *re_merge;
int i;
- string_list_clear(&o->current_file_set, 1);
- string_list_clear(&o->current_directory_set, 1);
+ /*
+ * Only need the hashmap while processing entries, so
+ * initialize it here and free it when we are done running
+ * through the entries. Keeping it in the merge_options as
+ * opposed to decaring a local hashmap is for convenience
+ * so that we don't have to pass it to around.
+ */
+ hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512);
get_files_dirs(o, head);
get_files_dirs(o, merge);
@@ -1978,6 +2012,8 @@ int merge_trees(struct merge_options *o,
string_list_clear(re_head, 0);
string_list_clear(entries, 1);
+ hashmap_free(&o->current_file_dir_set, 1);
+
free(re_merge);
free(re_head);
free(entries);
@@ -2179,8 +2215,6 @@ void init_merge_options(struct merge_options *o)
if (o->verbosity >= 5)
o->buffer_output = 0;
strbuf_init(&o->obuf, 0);
- string_list_init(&o->current_file_set, 1);
- string_list_init(&o->current_directory_set, 1);
string_list_init(&o->df_conflict_file_set, 1);
}
diff --git a/merge-recursive.h b/merge-recursive.h
index 735343b413..80d69d1401 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -25,8 +25,7 @@ struct merge_options {
int show_rename_progress;
int call_depth;
struct strbuf obuf;
- struct string_list current_file_set;
- struct string_list current_directory_set;
+ struct hashmap current_file_dir_set;
struct string_list df_conflict_file_set;
};
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-07 16:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
2017-09-07 16:25 ` [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
2017-09-07 16:25 ` [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).