All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] git config cache & special querying api utilizing the cache
@ 2014-06-23 10:11 Tanay Abhra
  2014-06-23 10:11 ` [PATCH v3 1/3] string-list: add string_list initialiser helper functions Tanay Abhra
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Tanay Abhra @ 2014-06-23 10:11 UTC (permalink / raw)
  To: git; +Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy

Hi,

[PATCH v3]: Added flag for NULL values that were causing segfaults in some cases.
	Added test-config for usage examples.
	Minor changes and corrections. Refer to discussion thread[5] for more details.
	Thanks to Matthieu, Jeff and Junio for their valuable suggestions.

[PATCH v2]:Changed the string_list to a struct instead of pointer to a struct.
	Added string-list initilization functions.
	Minor mistakes corrected acoording to review comments[4]. Thanks to
	Eric and Matthieu for their review.

[PATCH V1]:Most of the invaluable suggestions by Eric Sunshine, Torsten Bogershausen and
	Jeff King has been implemented[1]. Complete rewrite of config_cache*() family
	using git_config() as hook as suggested by Jeff. Thanks for the review.

[RFC V2]: Improved according to the suggestions by Eric Sunshine and Torsten Bogershausen.
	Added cache invalidation when config file is changed.[2]
	I am using git_config_set_multivar_in_file() as an update hook.

This is my first patch for this year's GSoC. My project is
"Git Config API improvements". The link of my proposal is appended below [3].

The aim of this patch series is to generate a cache for querying values from
the config files in a non-callback manner as the current method reads and
parses the config files every time a value is queried for.

The cache is generated from hooking the update_cache function to the current
parsing and callback mechanism in config.c. It is implemented as an hashmap
using the hashmap-api with variables and its corresponding values list as
its members. The values in the list are sorted in order of increasing priority.
The cache is initialised the first time when any of the new query functions is
called. It is invalidated by using git_config_set_multivar_in_file() as an
update hook.

We add two new functions to the config-api, git_config_get_string() and
git_config_get_string_multi() for querying in a non callback manner from
the cache.

[1] http://marc.info/?t=140172066200006&r=1&w=2
[2] http://git.661346.n2.nabble.com/RFC-PATCH-0-2-Git-config-cache-amp-special-querying-api-utilizing-the-cache-td7611691.html
[3] https://drive.google.com/file/d/0B4suZ-aHqDcnSUZJRXVTTnZUN1E/edit?usp=sharing
[4] http://thread.gmane.org/gmane.comp.version-control.git/251073/focus=251369
[5] http://thread.gmane.org/gmane.comp.version-control.git/251704/

Tanay Abhra (3):
  string-list: Add string_list initializer helper functions
  config: Add hashtable for config parsing & retrieval
  test-config

 .gitignore                             |   1 +
 Documentation/technical/api-config.txt |  26 ++++++
 Makefile                               |   1 +
 cache.h                                |   2 +
 config.c                               | 144 +++++++++++++++++++++++++++++++++
 string-list.c                          |  18 +++++
 string-list.h                          |   3 +
 test-config.c                          |  93 +++++++++++++++++++++
 8 files changed, 288 insertions(+)
 create mode 100644 test-config.c

-- 
1.9.0.GIT

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 1/3] string-list: add string_list initialiser helper functions
  2014-06-23 10:11 [PATCH v3 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
@ 2014-06-23 10:11 ` Tanay Abhra
  2014-06-23 12:36   ` Torsten Bögershausen
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
  2014-06-23 10:11 ` [PATCH v3 3/3] test-config: add usage examples for non-callback query functions Tanay Abhra
  2 siblings, 1 reply; 35+ messages in thread
From: Tanay Abhra @ 2014-06-23 10:11 UTC (permalink / raw)
  To: git; +Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy

The string-list API has STRING_LIST_INIT_* macros to be used
to define variables with initialisers, but lacks functions
to initialise an uninitialised piece of memory to be used as
a string-list at the run-time.
Introduce string_list_init_{dup,nodup}() functions for that.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
---
 string-list.c | 18 ++++++++++++++++++
 string-list.h |  3 +++
 2 files changed, 21 insertions(+)

diff --git a/string-list.c b/string-list.c
index aabb25e..8c3a4eb 100644
--- a/string-list.c
+++ b/string-list.c
@@ -1,6 +1,24 @@
 #include "cache.h"
 #include "string-list.h"
 
+void string_list_init_nodup(struct string_list *list)
+{
+	list->items = NULL;
+	list->nr = 0;
+	list->alloc = 0;
+	list->strdup_strings = 0;
+	list->cmp = NULL;
+}
+
+void string_list_init_dup(struct string_list *list)
+{
+	list->items = NULL;
+	list->nr = 0;
+	list->alloc = 0;
+	list->strdup_strings = 1;
+	list->cmp = NULL;
+}
+
 /* if there is no exact match, point to the index where the entry could be
  * inserted */
 static int get_entry_index(const struct string_list *list, const char *string,
diff --git a/string-list.h b/string-list.h
index dd5e294..c90d6c1 100644
--- a/string-list.h
+++ b/string-list.h
@@ -18,6 +18,9 @@ struct string_list {
 #define STRING_LIST_INIT_NODUP { NULL, 0, 0, 0, NULL }
 #define STRING_LIST_INIT_DUP   { NULL, 0, 0, 1, NULL }
 
+void string_list_init_nodup(struct string_list *list);
+void string_list_init_dup(struct string_list *list);
+
 void print_string_list(const struct string_list *p, const char *text);
 void string_list_clear(struct string_list *list, int free_util);
 
-- 
1.9.0.GIT

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 10:11 [PATCH v3 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
  2014-06-23 10:11 ` [PATCH v3 1/3] string-list: add string_list initialiser helper functions Tanay Abhra
@ 2014-06-23 10:11 ` Tanay Abhra
  2014-06-23 11:55   ` Matthieu Moy
                     ` (4 more replies)
  2014-06-23 10:11 ` [PATCH v3 3/3] test-config: add usage examples for non-callback query functions Tanay Abhra
  2 siblings, 5 replies; 35+ messages in thread
From: Tanay Abhra @ 2014-06-23 10:11 UTC (permalink / raw)
  To: git; +Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy

Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.

Add a hash table to cache all key-value pairs read from config files
(repo specific .git/config, user wide ~/.gitconfig and the global
/etc/gitconfig). Add two external functions `git_config_get_string` and
`git_config_get_string_multi` for querying in a non-callback manner from the
hash table. They support O(1) lookups once the hash table is constructed.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
---
 Documentation/technical/api-config.txt |  26 ++++++
 cache.h                                |   2 +
 config.c                               | 144 +++++++++++++++++++++++++++++++++
 3 files changed, 172 insertions(+)

diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index 230b3a0..13f3f40 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -77,6 +77,32 @@ To read a specific file in git-config format, use
 `git_config_from_file`. This takes the same callback and data parameters
 as `git_config`.
 
+Querying For Specific Variables
+-------------------------------
+
+For programs wanting to query for specific variables in a non-callback
+manner, the config API provides two functions `git_config_get_string`
+and `git_config_get_string_multi`.They both read values from an internal
+cache generated previously from reading the config files.
+
+`git_config_get_string` takes two parameters,
+
+- a key string in canonical flat form for which the corresponding value
+  with the highest priority (i.e. value in the repo config will be
+  preferred over value in user wide config for the same variable) will
+  be retrieved.
+
+- a pointer to a string which will point to the retrieved value.
+
+`git_config_get_string` returns 0 for success, or -1 for no value found.
+
+`git_config_get_string_multi` returns a `string_list` containing all the
+values for the key passed as parameter, sorted in order of increasing
+priority (Note: NULL values are flagged as 1, check `util` for each
+'string_list_item` for flag value).
+
+See test-config.c for usage examples.
+
 Value Parsing Helpers
 ---------------------
 
diff --git a/cache.h b/cache.h
index cbe1935..fec0a63 100644
--- a/cache.h
+++ b/cache.h
@@ -1294,6 +1294,8 @@ extern int check_repository_format_version(const char *var, const char *value, v
 extern int git_env_bool(const char *, int);
 extern int git_config_system(void);
 extern int config_error_nonbool(const char *);
+extern int git_config_get_string(const char *, const char **);
+extern const struct string_list *git_config_get_string_multi(const char *);
 #if defined(__GNUC__)
 #define config_error_nonbool(s) (config_error_nonbool(s), const_error())
 #endif
diff --git a/config.c b/config.c
index a1aef1c..6200f36 100644
--- a/config.c
+++ b/config.c
@@ -9,6 +9,8 @@
 #include "exec_cmd.h"
 #include "strbuf.h"
 #include "quote.h"
+#include "hashmap.h"
+#include "string-list.h"
 
 struct config_source {
 	struct config_source *prev;
@@ -37,6 +39,141 @@ static struct config_source *cf;
 
 static int zlib_compression_seen;
 
+struct config_cache_entry {
+	struct hashmap_entry ent;
+	char *key;
+	struct string_list value_list;
+};
+
+static int hashmap_initialized;
+
+static int config_cache_add_value(const char *key, const char *value);
+
+static int config_cache_entry_cmp(const struct config_cache_entry *e1,
+				 const struct config_cache_entry *e2, const void *unused)
+{
+	return strcmp(e1->key, e2->key);
+}
+
+static void config_cache_init(struct hashmap *config_cache)
+{
+	hashmap_init(config_cache, (hashmap_cmp_fn)config_cache_entry_cmp, 0);
+}
+
+static int config_cache_callback(const char *key, const char *value, void *unused)
+{
+	config_cache_add_value(key, value);
+	return 0;
+}
+
+static struct hashmap *get_config_cache(void)
+{
+	static struct hashmap config_cache;
+	if (!hashmap_initialized) {
+		config_cache_init(&config_cache);
+		hashmap_initialized = 1;
+		git_config(config_cache_callback, NULL);
+	}
+	return &config_cache;
+}
+
+static void config_cache_free(void)
+{
+	struct hashmap *config_cache;
+	struct config_cache_entry *entry;
+	struct hashmap_iter iter;
+	config_cache = get_config_cache();
+	hashmap_iter_init(config_cache, &iter);
+	while ((entry = hashmap_iter_next(&iter))) {
+		free(entry->key);
+		string_list_clear(&entry->value_list, 1);
+	}
+	hashmap_free(config_cache, 1);
+	hashmap_initialized = 0;
+}
+
+static struct config_cache_entry *config_cache_find_entry(const char *key)
+{
+	struct hashmap *config_cache;
+	struct config_cache_entry k;
+	struct config_cache_entry *found_entry;
+	char *normalized_key;
+	int ret;
+	config_cache = get_config_cache();
+	ret = git_config_parse_key(key, &normalized_key, NULL);
+
+	if (ret)
+		return NULL;
+
+	hashmap_entry_init(&k, strhash(normalized_key));
+	k.key = normalized_key;
+	found_entry = hashmap_get(config_cache, &k, NULL);
+	free(normalized_key);
+	return found_entry;
+}
+
+static struct string_list *config_cache_get_value(const char *key)
+{
+	struct config_cache_entry *e = config_cache_find_entry(key);
+	return e ? &e->value_list : NULL;
+}
+
+static int config_cache_add_value(const char *key, const char *value)
+{
+	struct hashmap *config_cache;
+	struct config_cache_entry *e;
+	struct string_list_item *item;
+	int *boolean_null_flag;
+
+	config_cache = get_config_cache();
+	e = config_cache_find_entry(key);
+
+	boolean_null_flag = xcalloc(1, sizeof(*boolean_null_flag));
+
+	if (!e) {
+		e = xmalloc(sizeof(*e));
+		hashmap_entry_init(e, strhash(key));
+		e->key = xstrdup(key);
+		string_list_init_dup(&e->value_list);
+		hashmap_add(config_cache, e);
+	}
+	/*
+	 * If the variable had no value specified, the value will be NULL
+	 * (typically this means it should be interpreted as boolean true).
+	 * For such values, silently convert them to "BOOLEAN_NULL" to
+	 * store in hashmap and flag that string_list_item as 1.
+	 */
+	if (value == NULL) {
+		value = "BOOLEAN_NULL";
+		*boolean_null_flag = 1;
+	}
+	item = string_list_append(&e->value_list, value);
+	item->util = boolean_null_flag;
+
+	return 0;
+}
+
+int git_config_get_string(const char *key, const char **value)
+{
+	struct string_list *values;
+	int *flag;
+	values = config_cache_get_value(key);
+	if (!values)
+		return -1;
+	assert(values->nr > 0);
+	*value = values->items[values->nr - 1].string;
+	flag = values->items[values->nr - 1].util;
+	if (*flag)
+		*value = NULL;
+	return 0;
+}
+
+/* for NULL values, 'util' for each `string_list_item` is flagged as 1 */
+const struct string_list *git_config_get_string_multi(const char *key)
+{
+	return config_cache_get_value(key);
+}
+
 static int config_file_fgetc(struct config_source *conf)
 {
 	return fgetc(conf->u.file);
@@ -1708,6 +1845,13 @@ int git_config_set_multivar_in_file(const char *config_filename,
 	lock = NULL;
 	ret = 0;
 
+	/*
+	 * content of config file has changed, so invalidate the
+	 * config cache used by non-callback based query functions.
+	 */
+	if (hashmap_initialized)
+		config_cache_free();
+
 out_free:
 	if (lock)
 		rollback_lock_file(lock);
-- 
1.9.0.GIT

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 3/3] test-config: add usage examples for non-callback query functions
  2014-06-23 10:11 [PATCH v3 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
  2014-06-23 10:11 ` [PATCH v3 1/3] string-list: add string_list initialiser helper functions Tanay Abhra
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
@ 2014-06-23 10:11 ` Tanay Abhra
  2014-06-25 11:19   ` Eric Sunshine
  2 siblings, 1 reply; 35+ messages in thread
From: Tanay Abhra @ 2014-06-23 10:11 UTC (permalink / raw)
  To: git; +Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy

Add different usage examples for 'git_config_get_string' and
`git_config_get_string_multi`. They will serve as documentation
on how to query for config values in a non-callback manner.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
---
 .gitignore    |  1 +
 Makefile      |  1 +
 test-config.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 test-config.c

diff --git a/.gitignore b/.gitignore
index 42294e5..7677533 100644
--- a/.gitignore
+++ b/.gitignore
@@ -177,6 +177,7 @@
 /gitweb/static/gitweb.min.*
 /test-chmtime
 /test-ctype
+/test-config
 /test-date
 /test-delta
 /test-dump-cache-tree
diff --git a/Makefile b/Makefile
index 07ea105..9544efb 100644
--- a/Makefile
+++ b/Makefile
@@ -549,6 +549,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 
 TEST_PROGRAMS_NEED_X += test-chmtime
 TEST_PROGRAMS_NEED_X += test-ctype
+TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
diff --git a/test-config.c b/test-config.c
new file mode 100644
index 0000000..ff24cb8
--- /dev/null
+++ b/test-config.c
@@ -0,0 +1,93 @@
+#include "cache.h"
+#include "hashmap.h"
+#include "string-list.h"
+
+/*
+ * This program gives examples on how to use non-callback based query
+ * functions like git_config_get_string & git_config_get_string_multi.
+ *
+ * Reads stdin and prints result of command to stdout:
+ *
+ * print_all -> prints all the key-value pairs contained in the hashmap
+ *		also checks if all entries in the hashmap matches with
+ *		the content of config files
+ *
+ * get_value -> prints the value with highest priority for the entered key
+ *
+ * get_all_values -> prints all values for the given key in increasing order
+ *		     of priority
+ * Examples:
+ *
+ * To print the value with highest priority for key "foo.bAr Baz.rock":
+ * 	test-config get_value "foo.bAr Baz.rock"
+ *
+ */
+
+static const char *v;
+static const struct string_list *strptr;
+static int i;
+static int *flag;
+
+static int config_cache_callback(const char *key, const char *value, void *unused)
+{
+	strptr = git_config_get_string_multi(key);
+	if (strptr) {
+		for (i = 0; i < strptr->nr; i++)
+		{
+			v = strptr->items[i].string;
+			flag = strptr->items[i].util;
+			/* NULL values are flagged as 1 */
+			if (*flag == 1)
+				printf("%s\n", key);
+			else if (*flag == 0)
+				printf("%s=%s\n", key, v);
+			/* key-value pair printed so flag them as done */
+			*flag = -1;
+		}
+		return 0;
+	} else {
+		printf("%s\n", "Config hashmap inconsistent\n");
+		return -1;
+	}
+}
+
+ int main(int argc, char **argv)
+{
+	if (argc == 2 && !strcmp(argv[1], "print_all")) {
+		git_config(config_cache_callback, NULL);
+		return 0;
+	} else if (argc == 3 && !strcmp(argv[1], "get_value")) {
+		/* enter key in canonical form enclosed in quotes */
+		if (!git_config_get_string(argv[2], &v)) {
+			printf("%s\n", v);
+			return 0;
+		} else {
+			printf("%s\n", "Value not found for the entered key\n");
+			return -1;
+		}
+	} else if (argc == 3 && !strcmp(argv[1], "get_all_values")) {
+		/* enter key in canonical form enclosed in quotes */
+		strptr = git_config_get_string_multi(argv[2]);
+		if (strptr) {
+			for (i = 0; i < strptr->nr; i++)
+			{
+				v = strptr->items[i].string;
+				flag = strptr->items[i].util;
+				/* prints NULL values as "-" */
+				if (*flag)
+					printf("%s ", "-");
+				else
+					printf("%s ", v);
+			}
+			printf("\n");
+			return 0;
+		} else {
+			printf("%s\n", "Value not found for the entered key\n");
+			return -1;
+		}
+	}
+
+	fprintf(stderr, "%s: unknown function name: %s\n", argv[0],
+		argv[1] ? argv[1] : "(there was none)");
+	return 1;
+}
-- 
1.9.0.GIT

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
@ 2014-06-23 11:55   ` Matthieu Moy
  2014-06-24 12:06     ` Tanay Abhra
  2014-06-23 14:57   ` Ramsay Jones
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2014-06-23 11:55 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra

Tanay Abhra <tanayabh@gmail.com> writes:

> +/* for NULL values, 'util' for each `string_list_item` is flagged as 1 */

It's a void *, so just saying that it is flagged as 1 does not say how
it's encoded. How about "... is an 'int *' pointing to the value 1".

And actually, you can save one malloc by encoding the value directly in
the util pointer itself like

#define NULL_FLAG (void *)1

	item->util = NULL_FLAG;

I find this a bit ugly, but I think Git already uses this in a few
places (git grep 'void \*) *1' for examples).

Or you can use a pointer to a single static value:

static const int boolean_null_flag = 1; // Or even without = 1.

and then

	item->util = &boolean_null_flag;

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 1/3] string-list: add string_list initialiser helper functions
  2014-06-23 10:11 ` [PATCH v3 1/3] string-list: add string_list initialiser helper functions Tanay Abhra
@ 2014-06-23 12:36   ` Torsten Bögershausen
  2014-06-23 13:19     ` Tanay Abhra
  0 siblings, 1 reply; 35+ messages in thread
From: Torsten Bögershausen @ 2014-06-23 12:36 UTC (permalink / raw)
  To: Tanay Abhra, git; +Cc: Ramkumar Ramachandra, Matthieu Moy

On 2014-06-23 12.11, Tanay Abhra wrote:
> The string-list API has STRING_LIST_INIT_* macros to be used
> to define variables with initialisers, but lacks functions
> to initialise an uninitialised piece of memory to be used as
> a string-list at the run-time.
> Introduce string_list_init_{dup,nodup}() functions for that.
> 
> Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
> ---
>  string-list.c | 18 ++++++++++++++++++
>  string-list.h |  3 +++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/string-list.c b/string-list.c
> index aabb25e..8c3a4eb 100644
> --- a/string-list.c
> +++ b/string-list.c
> @@ -1,6 +1,24 @@
>  #include "cache.h"
>  #include "string-list.h"
>  
> +void string_list_init_nodup(struct string_list *list)
> +{
> +	list->items = NULL;
> +	list->nr = 0;
> +	list->alloc = 0;
> +	list->strdup_strings = 0;
> +	list->cmp = NULL;
> +}
> +
If we look at the definition below:
struct string_list {
	struct string_list_item *items;
	unsigned int nr, alloc;
	unsigned int strdup_strings:1;
	compare_strings_fn cmp; /* NULL uses strcmp() */
I think a simple memset() will be easier to read,
and it will be more future proof:
In case elements are added, the will have 0 or NULL automatically:

void string_list_init_nodup(struct string_list *list)
{
	memset (list, 0, sizeof(*list));
}
(But then I wonder if we need the function at all ?)

Or does it make sense to have a common function similar to this,
which covers both cases:

void string_list_init(struct string_list *list, int strdup_strings)
{
	memset (list, 0, sizeof(*list));
	list->strdup_strings = strdup_strings;
}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 1/3] string-list: add string_list initialiser helper functions
  2014-06-23 12:36   ` Torsten Bögershausen
@ 2014-06-23 13:19     ` Tanay Abhra
  0 siblings, 0 replies; 35+ messages in thread
From: Tanay Abhra @ 2014-06-23 13:19 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Ramkumar Ramachandra, Matthieu Moy

On 06/23/2014 05:36 AM, Torsten Bögershausen wrote:
> If we look at the definition below:
> struct string_list {
> 	struct string_list_item *items;
> 	unsigned int nr, alloc;
> 	unsigned int strdup_strings:1;
> 	compare_strings_fn cmp; /* NULL uses strcmp() */
> I think a simple memset() will be easier to read,
> and it will be more future proof:
> In case elements are added, the will have 0 or NULL automatically:

Yes, you are right. After sending the patch I saw that for string_list
initialization the codebase either uses xcalloc or memset and after that
marks the list as DUP or NODUP.

> void string_list_init_nodup(struct string_list *list)
> {
> 	memset (list, 0, sizeof(*list));
> }
> (But then I wonder if we need the function at all ?)
> 
> Or does it make sense to have a common function similar to this,
> which covers both cases:
> 
> void string_list_init(struct string_list *list, int strdup_strings)
> {
> 	memset (list, 0, sizeof(*list));
> 	list->strdup_strings = strdup_strings;
> }
> 

A common function would be much better as other API constructs as strbuf
have runtime init functions like the version you have shown above.

Thanks for the review.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
  2014-06-23 11:55   ` Matthieu Moy
@ 2014-06-23 14:57   ` Ramsay Jones
  2014-06-23 16:20     ` Tanay Abhra
  2014-06-23 23:25     ` Junio C Hamano
  2014-06-23 23:14   ` Junio C Hamano
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Ramsay Jones @ 2014-06-23 14:57 UTC (permalink / raw)
  To: Tanay Abhra, git; +Cc: Ramkumar Ramachandra, Matthieu Moy

On 23/06/14 11:11, Tanay Abhra wrote:

[snip]

> diff --git a/config.c b/config.c
> index a1aef1c..6200f36 100644
> --- a/config.c
> +++ b/config.c
> @@ -9,6 +9,8 @@
>  #include "exec_cmd.h"
>  #include "strbuf.h"
>  #include "quote.h"
> +#include "hashmap.h"
> +#include "string-list.h"
>  
>  struct config_source {
>  	struct config_source *prev;
> @@ -37,6 +39,141 @@ static struct config_source *cf;
>  
>  static int zlib_compression_seen;
>  
> +struct config_cache_entry {
> +	struct hashmap_entry ent;
> +	char *key;
> +	struct string_list value_list;
> +};
> +
> +static int hashmap_initialized;
> +
> +static int config_cache_add_value(const char *key, const char *value);
> +
> +static int config_cache_entry_cmp(const struct config_cache_entry *e1,
> +				 const struct config_cache_entry *e2, const void *unused)
> +{
> +	return strcmp(e1->key, e2->key);
> +}
> +
> +static void config_cache_init(struct hashmap *config_cache)
> +{
> +	hashmap_init(config_cache, (hashmap_cmp_fn)config_cache_entry_cmp, 0);
> +}
> +
> +static int config_cache_callback(const char *key, const char *value, void *unused)
> +{
> +	config_cache_add_value(key, value);
> +	return 0;
> +}
> +
> +static struct hashmap *get_config_cache(void)
> +{
> +	static struct hashmap config_cache;
> +	if (!hashmap_initialized) {
> +		config_cache_init(&config_cache);
> +		hashmap_initialized = 1;
> +		git_config(config_cache_callback, NULL);
> +	}
> +	return &config_cache;
> +}

[I have not been following this series at all (sorry I haven't had
the time to spare), so take these comments with a very big pinch of
salt! ie just ignore me if it's already been discussed etc. ;-) ]

The 'git config' command can be used to read arbitrary files (so long
as they conform to the config syntax). For example, see the --file and
--blob options to git-config. At present, I think only scripted commands
use this facility (eg git-submodule). Noting the singleton config_cache,
what happens when git-submodule becomes a C builtin, or indeed any other
C builtin wants to take advantage of the new code when processing a non-
standard config file?

> +
> +static void config_cache_free(void)
> +{
> +	struct hashmap *config_cache;
> +	struct config_cache_entry *entry;
> +	struct hashmap_iter iter;
> +	config_cache = get_config_cache();
> +	hashmap_iter_init(config_cache, &iter);
> +	while ((entry = hashmap_iter_next(&iter))) {
> +		free(entry->key);
> +		string_list_clear(&entry->value_list, 1);
> +	}
> +	hashmap_free(config_cache, 1);
> +	hashmap_initialized = 0;
> +}
> +
> +static struct config_cache_entry *config_cache_find_entry(const char *key)
> +{
> +	struct hashmap *config_cache;
> +	struct config_cache_entry k;
> +	struct config_cache_entry *found_entry;
> +	char *normalized_key;
> +	int ret;
> +	config_cache = get_config_cache();
> +	ret = git_config_parse_key(key, &normalized_key, NULL);
> +
> +	if (ret)
> +		return NULL;
> +
> +	hashmap_entry_init(&k, strhash(normalized_key));
> +	k.key = normalized_key;
> +	found_entry = hashmap_get(config_cache, &k, NULL);
> +	free(normalized_key);
> +	return found_entry;
> +}
> +
> +static struct string_list *config_cache_get_value(const char *key)
> +{
> +	struct config_cache_entry *e = config_cache_find_entry(key);
> +	return e ? &e->value_list : NULL;
> +}
> +
> +static int config_cache_add_value(const char *key, const char *value)
> +{
> +	struct hashmap *config_cache;
> +	struct config_cache_entry *e;
> +	struct string_list_item *item;
> +	int *boolean_null_flag;
> +
> +	config_cache = get_config_cache();
> +	e = config_cache_find_entry(key);
> +
> +	boolean_null_flag = xcalloc(1, sizeof(*boolean_null_flag));
> +
> +	if (!e) {
> +		e = xmalloc(sizeof(*e));
> +		hashmap_entry_init(e, strhash(key));
> +		e->key = xstrdup(key);

config_cache_find_entry() searches for (and hashes the) normalized_key.
Should you not be entering the normalized key here?

> +		string_list_init_dup(&e->value_list);
> +		hashmap_add(config_cache, e);
> +	}
> +	/*
> +	 * If the variable had no value specified, the value will be NULL
> +	 * (typically this means it should be interpreted as boolean true).
> +	 * For such values, silently convert them to "BOOLEAN_NULL" to
> +	 * store in hashmap and flag that string_list_item as 1.
> +	 */
> +	if (value == NULL) {
> +		value = "BOOLEAN_NULL";
> +		*boolean_null_flag = 1;
> +	}
> +	item = string_list_append(&e->value_list, value);
> +	item->util = boolean_null_flag;
> +
> +	return 0;
> +}
> +

[snip]

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 14:57   ` Ramsay Jones
@ 2014-06-23 16:20     ` Tanay Abhra
  2014-06-24 15:32       ` Ramsay Jones
  2014-06-23 23:25     ` Junio C Hamano
  1 sibling, 1 reply; 35+ messages in thread
From: Tanay Abhra @ 2014-06-23 16:20 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: git, Ramkumar Ramachandra, Matthieu Moy

On 06/23/2014 07:57 AM, Ramsay Jones wrote:
> On 23/06/14 11:11, Tanay Abhra wrote:
>> diff --git a/config.c b/config.c
>> index a1aef1c..6200f36 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -9,6 +9,8 @@
>>  #include "exec_cmd.h"
>>  #include "strbuf.h"
>>  #include "quote.h"
>> +#include "hashmap.h"
>> +#include "string-list.h"
>>  
>>  struct config_source {
>>  	struct config_source *prev;
>> @@ -37,6 +39,141 @@ static struct config_source *cf;
>>  
>>  static int zlib_compression_seen;
>>  
>> +struct config_cache_entry {
>> +	struct hashmap_entry ent;
>> +	char *key;
>> +	struct string_list value_list;
>> +};
>> +
>> +static int hashmap_initialized;
>> +
>> +static int config_cache_add_value(const char *key, const char *value);
>> +
>> +static int config_cache_entry_cmp(const struct config_cache_entry *e1,
>> +				 const struct config_cache_entry *e2, const void *unused)
>> +{
>> +	return strcmp(e1->key, e2->key);
>> +}
>> +
>> +static void config_cache_init(struct hashmap *config_cache)
>> +{
>> +	hashmap_init(config_cache, (hashmap_cmp_fn)config_cache_entry_cmp, 0);
>> +}
>> +
>> +static int config_cache_callback(const char *key, const char *value, void *unused)
>> +{
>> +	config_cache_add_value(key, value);
>> +	return 0;
>> +}
>> +
>> +static struct hashmap *get_config_cache(void)
>> +{
>> +	static struct hashmap config_cache;
>> +	if (!hashmap_initialized) {
>> +		config_cache_init(&config_cache);
>> +		hashmap_initialized = 1;
>> +		git_config(config_cache_callback, NULL);
>> +	}
>> +	return &config_cache;
>> +}
> 
> [I have not been following this series at all (sorry I haven't had
> the time to spare), so take these comments with a very big pinch of
> salt! ie just ignore me if it's already been discussed etc. ;-) ]
> 
> The 'git config' command can be used to read arbitrary files (so long
> as they conform to the config syntax). For example, see the --file and
> --blob options to git-config. At present, I think only scripted commands
> use this facility (eg git-submodule). Noting the singleton config_cache,
> what happens when git-submodule becomes a C builtin, or indeed any other
> C builtin wants to take advantage of the new code when processing a non-
> standard config file?
> 

This series was mainly to replace git_config() invocations around the codebase.
There are currently 111 git_config() invocations, each of which causes a file
reread whenever called. git_config() only feeds values from the standard config
files(i.e repo, user and global config).

For reading config values from specific files or blobs, there are three functions
git_config_with_options, git_config_from_file & git_config_from_blob which can be
easily used inside a C builtin or anywhere in the code.

The bulk of git_config_api calls are only for git_config(). For example,
git_config_from_file() has three hits only in entire codebase,
git_config_with_options() has 5 hits, so I concentrated on generating a cache
for the usual config files only. For other files, the callers can fall back on older
API functions like I had mentioned above.

Forgive me if I inferred your question incorrectly. More below.


>> +
>> +static void config_cache_free(void)
>> +{
>> +	struct hashmap *config_cache;
>> +	struct config_cache_entry *entry;
>> +	struct hashmap_iter iter;
>> +	config_cache = get_config_cache();
>> +	hashmap_iter_init(config_cache, &iter);
>> +	while ((entry = hashmap_iter_next(&iter))) {
>> +		free(entry->key);
>> +		string_list_clear(&entry->value_list, 1);
>> +	}
>> +	hashmap_free(config_cache, 1);
>> +	hashmap_initialized = 0;
>> +}
>> +
>> +static struct config_cache_entry *config_cache_find_entry(const char *key)
>> +{
>> +	struct hashmap *config_cache;
>> +	struct config_cache_entry k;
>> +	struct config_cache_entry *found_entry;
>> +	char *normalized_key;
>> +	int ret;
>> +	config_cache = get_config_cache();
>> +	ret = git_config_parse_key(key, &normalized_key, NULL);
>> +
>> +	if (ret)
>> +		return NULL;
>> +
>> +	hashmap_entry_init(&k, strhash(normalized_key));
>> +	k.key = normalized_key;
>> +	found_entry = hashmap_get(config_cache, &k, NULL);
>> +	free(normalized_key);
>> +	return found_entry;
>> +}
>> +
>> +static struct string_list *config_cache_get_value(const char *key)
>> +{
>> +	struct config_cache_entry *e = config_cache_find_entry(key);
>> +	return e ? &e->value_list : NULL;
>> +}
>> +
>> +static int config_cache_add_value(const char *key, const char *value)
>> +{
>> +	struct hashmap *config_cache;
>> +	struct config_cache_entry *e;
>> +	struct string_list_item *item;
>> +	int *boolean_null_flag;
>> +
>> +	config_cache = get_config_cache();
>> +	e = config_cache_find_entry(key);
>> +
>> +	boolean_null_flag = xcalloc(1, sizeof(*boolean_null_flag));
>> +
>> +	if (!e) {
>> +		e = xmalloc(sizeof(*e));
>> +		hashmap_entry_init(e, strhash(key));
>> +		e->key = xstrdup(key);
> 
> config_cache_find_entry() searches for (and hashes the) normalized_key.
> Should you not be entering the normalized key here?
> 

config_cache_add_value() is fed key-values pairs through the git_config()
callback mechanism, which normalises the key beforehand, so no need for
renormalising.

Thanks for the review. :)

Cheers,
Tanay Abhra.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
  2014-06-23 11:55   ` Matthieu Moy
  2014-06-23 14:57   ` Ramsay Jones
@ 2014-06-23 23:14   ` Junio C Hamano
  2014-06-24 12:21     ` Tanay Abhra
  2014-06-25 21:44   ` Karsten Blees
  2014-06-26 16:43   ` Matthieu Moy
  4 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2014-06-23 23:14 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra, Matthieu Moy

Tanay Abhra <tanayabh@gmail.com> writes:

> +static int hashmap_initialized;
> + ...
> +static struct hashmap *get_config_cache(void)
> +{
> +	static struct hashmap config_cache;
> +	if (!hashmap_initialized) {
> +		config_cache_init(&config_cache);
> +		hashmap_initialized = 1;

I find the arrangement of these two variables somewhat dubious at
the API design level.

If you are going to keep the singleton "config_cache" as a function
scope static, shouldn't the corresponding guard also be in the same
scope?

If you ever need to "uninitialize" to force re-read the file to the
in-core cache, such an uninitializer will need access to not just
the "is hashmap initialized?" boolean (which you do by having it as
a file-scope global like this patch does) but also the thing that
may need to be uninitialized (i.e. the hashmap that may already be
populated), but a function scope static variable config_cache does
not allow access from other places, so you end up calling this
function to initialize it if necessary only to get the pointer to
that structure in order to uninitialize it.

Sounds very twisted and ugly, doesn't it?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 14:57   ` Ramsay Jones
  2014-06-23 16:20     ` Tanay Abhra
@ 2014-06-23 23:25     ` Junio C Hamano
  2014-06-24  7:23       ` Tanay Abhra
                         ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-06-23 23:25 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Tanay Abhra, git, Ramkumar Ramachandra, Matthieu Moy

Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:

>> +static struct hashmap *get_config_cache(void)
>> +{
>> +	static struct hashmap config_cache;
>> +	if (!hashmap_initialized) {
>> +		config_cache_init(&config_cache);
>> +		hashmap_initialized = 1;
>> +		git_config(config_cache_callback, NULL);
>> +	}
>> +	return &config_cache;
>> +}
>
> [I have not been following this series at all (sorry I haven't had
> the time to spare), so take these comments with a very big pinch of
> salt! ie just ignore me if it's already been discussed etc. ;-) ]
>
> The 'git config' command can be used to read arbitrary files (so long
> as they conform to the config syntax). For example, see the --file and
> --blob options to git-config. At present, I think only scripted commands
> use this facility (eg git-submodule). Noting the singleton config_cache,
> what happens when git-submodule becomes a C builtin, or indeed any other
> C builtin wants to take advantage of the new code when processing a non-
> standard config file?

Yup, that is a very good point.  There needs an infrastructure to
tie a set of files (i.e. the standard one being the chain of
system-global /etc/gitconfig to repo-specific .git/config, and any
custom one that can be specified by the caller like submodule code)
to a separate hashmap; a future built-in submodule code would use
two hashmaps aka "config-caches", one to manage the usual
"configuration" and the other to manage the contents of the
.gitmodules file.

When we are operating across repositories (think "recursive
checkout"), we somehow have to deal with multiple sets of
configuration, one that applies to the top-level superproject and
others that apply to individual submodule repositories.

Even there is currently one caller, the "git config" command
implementation, if one is designing the new API, the design
shouldn't be tied to a singleton limitation, I would have to say, so
that future callers do not have to throw away everything done in
this series and start from scratch.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 23:25     ` Junio C Hamano
@ 2014-06-24  7:23       ` Tanay Abhra
  2014-06-25 18:21         ` Junio C Hamano
  2014-06-24  7:25       ` Tanay Abhra
  2014-06-24 15:57       ` Ramsay Jones
  2 siblings, 1 reply; 35+ messages in thread
From: Tanay Abhra @ 2014-06-24  7:23 UTC (permalink / raw)
  To: git; +Cc: git, Ramkumar Ramachandra, Matthieu Moy, Heiko Voigt

On 6/24/2014 4:55 AM, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:
>
>>> +static struct hashmap *get_config_cache(void)
>>> +{
>>> +	static struct hashmap config_cache;
>>> +	if (!hashmap_initialized) {
>>> +		config_cache_init(&config_cache);
>>> +		hashmap_initialized = 1;
>>> +		git_config(config_cache_callback, NULL);
>>> +	}
>>> +	return &config_cache;
>>> +}
>>
>> [I have not been following this series at all (sorry I haven't had
>> the time to spare), so take these comments with a very big pinch of
>> salt! ie just ignore me if it's already been discussed etc. ;-) ]
>>
>> The 'git config' command can be used to read arbitrary files (so long
>> as they conform to the config syntax). For example, see the --file and
>> --blob options to git-config. At present, I think only scripted commands
>> use this facility (eg git-submodule). Noting the singleton config_cache,
>> what happens when git-submodule becomes a C builtin, or indeed any other
>> C builtin wants to take advantage of the new code when processing a non-
>> standard config file?
>
> Yup, that is a very good point.  There needs an infrastructure to
> tie a set of files (i.e. the standard one being the chain of
> system-global /etc/gitconfig to repo-specific .git/config, and any
> custom one that can be specified by the caller like submodule code)
> to a separate hashmap; a future built-in submodule code would use
> two hashmaps aka "config-caches", one to manage the usual
> "configuration" and the other to manage the contents of the
> .gitmodules file.
>
> When we are operating across repositories (think "recursive
> checkout"), we somehow have to deal with multiple sets of
> configuration, one that applies to the top-level superproject and
> others that apply to individual submodule repositories.
>
> Even there is currently one caller, the "git config" command
> implementation, if one is designing the new API, the design
> shouldn't be tied to a singleton limitation, I would have to say, so
> that future callers do not have to throw away everything done in
> this series and start from scratch.
>

What changes should I implement in this series? Should I add new
user facing API adding to the singleton callers which are already in
this series.

"submodule lookup API" was intoduced by Heiko Voigt in [1], which is in
"cooking" now. His series already allows looking up local configuration
by passing in the null_sha1 for gitmodule or commit sha1. What else can
I add to my implementation, please suggest.

Thanks,
Tanay Abhra

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 23:25     ` Junio C Hamano
  2014-06-24  7:23       ` Tanay Abhra
@ 2014-06-24  7:25       ` Tanay Abhra
  2014-06-24 15:57       ` Ramsay Jones
  2 siblings, 0 replies; 35+ messages in thread
From: Tanay Abhra @ 2014-06-24  7:25 UTC (permalink / raw)
  To: Junio C Hamano, Ramsay Jones
  Cc: git, Ramkumar Ramachandra, Matthieu Moy, Heiko Voigt

On 6/24/2014 4:55 AM, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:
>
>>> +static struct hashmap *get_config_cache(void)
>>> +{
>>> +	static struct hashmap config_cache;
>>> +	if (!hashmap_initialized) {
>>> +		config_cache_init(&config_cache);
>>> +		hashmap_initialized = 1;
>>> +		git_config(config_cache_callback, NULL);
>>> +	}
>>> +	return &config_cache;
>>> +}
>>
>> [I have not been following this series at all (sorry I haven't had
>> the time to spare), so take these comments with a very big pinch of
>> salt! ie just ignore me if it's already been discussed etc. ;-) ]
>>
>> The 'git config' command can be used to read arbitrary files (so long
>> as they conform to the config syntax). For example, see the --file and
>> --blob options to git-config. At present, I think only scripted commands
>> use this facility (eg git-submodule). Noting the singleton config_cache,
>> what happens when git-submodule becomes a C builtin, or indeed any other
>> C builtin wants to take advantage of the new code when processing a non-
>> standard config file?
>
> Yup, that is a very good point.  There needs an infrastructure to
> tie a set of files (i.e. the standard one being the chain of
> system-global /etc/gitconfig to repo-specific .git/config, and any
> custom one that can be specified by the caller like submodule code)
> to a separate hashmap; a future built-in submodule code would use
> two hashmaps aka "config-caches", one to manage the usual
> "configuration" and the other to manage the contents of the
> .gitmodules file.
>
> When we are operating across repositories (think "recursive
> checkout"), we somehow have to deal with multiple sets of
> configuration, one that applies to the top-level superproject and
> others that apply to individual submodule repositories.
>
> Even there is currently one caller, the "git config" command
> implementation, if one is designing the new API, the design
> shouldn't be tied to a singleton limitation, I would have to say, so
> that future callers do not have to throw away everything done in
> this series and start from scratch.
>
Sorry for the repost, missed the link.

What changes should I implement in this series? Should I add new
user facing API adding to the singleton callers which are already in
this series.

"submodule lookup API" was intoduced by Heiko Voigt in [1], which is in
"cooking" now. His series already allows looking up local configuration
by passing in the null_sha1 for gitmodule or commit sha1. What else can
I add to my implementation, please suggest.

[1] http://thread.gmane.org/gmane.comp.version-control.git/250811

Thanks,
Tanay Abhra

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 11:55   ` Matthieu Moy
@ 2014-06-24 12:06     ` Tanay Abhra
  2014-06-25 20:25       ` Karsten Blees
  0 siblings, 1 reply; 35+ messages in thread
From: Tanay Abhra @ 2014-06-24 12:06 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, Ramkumar Ramachandra

On 6/23/2014 5:25 PM, Matthieu Moy wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
>
>> +/* for NULL values, 'util' for each `string_list_item` is flagged as 1 */
>
> It's a void *, so just saying that it is flagged as 1 does not say how
> it's encoded. How about "... is an 'int *' pointing to the value 1".
>
> And actually, you can save one malloc by encoding the value directly in
> the util pointer itself like
>
> #define NULL_FLAG (void *)1
>
> 	item->util = NULL_FLAG;
>
> I find this a bit ugly, but I think Git already uses this in a few
> places (git grep 'void \*) *1' for examples).
>
> Or you can use a pointer to a single static value:
>
> static const int boolean_null_flag = 1; // Or even without = 1.
>
> and then
>
> 	item->util = &boolean_null_flag;
>

Thanks for the review. I will change the flag format to what you have
suggested. Instead of giving the users of the API the headache of
thinking about the flag, let me give you an alternative,

+const struct string_list *git_config_get_string_multi(const char *key)
+{
+	int i, *flag;
+	struct string_list *temp = config_cache_get_value(key);
+	if (!temp)
+		return NULL;
+	/* create a NODUP string list which can have NULL values */
+	struct string_list *l = xcalloc(1, sizeof(*l));
+	for(i=0; i < temp->nr; i++) {
+		flag = temp->items[i].util;
+		if (*flag)
+			string_list_append(l, NULL);
+		else
+			string_list_append(l, temp->items[i].string);
+	}
+	return l;
+}

Now the only headache for the user will be to free the string-list once
he is done with it. What do you think about this approach?

Also I have a doubt, when rewriting git_config() callers I saw a
curious case, when at the end of the callback they sometimes call
git_default_config().

For example in add.c,

static int add_config(const char *var, const char *value, void *cb)
{
	if (!strcmp(var, "add.ignoreerrors") ||
	    !strcmp(var, "add.ignore-errors")) {
		ignore_add_errors = git_config_bool(var, value);
		return 0;
	}
	return git_default_config(var, value, cb);
}

What should I do about this, should I change the default_config() calls
to git_config_get_string calls which I can easily do, if the values it 
is setting are not being set by any other callers.

For example in config.c,

static int git_default_core_config(const char *var, const char *value)
{
	/* This needs a better name */
	if (!strcmp(var, "core.filemode")) {
		trust_executable_bit = git_config_bool(var, value);
		return 0;
	}
	if (!strcmp(var, "core.trustctime")) {
		trust_ctime = git_config_bool(var, value);
		return 0;
	}
	if (!strcmp(var, "core.statinfo") ||
	    !strcmp(var, "core.checkstat")) {
--[snip]
variables like "core.filemode" or "core.statinfo" have been requested 
here only in the whole codebase, so I am safe to rewrite this function
with git_config_get_string, am I?

If not many callers (add_config...) behave like this. What should I
do with them?

Also, what are your views for the points Ramsay has raised.

Cheers,
Tanay Abhra.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 23:14   ` Junio C Hamano
@ 2014-06-24 12:21     ` Tanay Abhra
  2014-06-26 16:27       ` Matthieu Moy
  0 siblings, 1 reply; 35+ messages in thread
From: Tanay Abhra @ 2014-06-24 12:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ramkumar Ramachandra, Matthieu Moy

On 6/24/2014 4:44 AM, Junio C Hamano wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
>
>> +static int hashmap_initialized;
>> + ...
>> +static struct hashmap *get_config_cache(void)
>> +{
>> +	static struct hashmap config_cache;
>> +	if (!hashmap_initialized) {
>> +		config_cache_init(&config_cache);
>> +		hashmap_initialized = 1;
>
> I find the arrangement of these two variables somewhat dubious at
> the API design level.
>
> If you are going to keep the singleton "config_cache" as a function
> scope static, shouldn't the corresponding guard also be in the same
> scope?
>
> If you ever need to "uninitialize" to force re-read the file to the
> in-core cache, such an uninitializer will need access to not just
> the "is hashmap initialized?" boolean (which you do by having it as
> a file-scope global like this patch does) but also the thing that
> may need to be uninitialized (i.e. the hashmap that may already be
> populated), but a function scope static variable config_cache does
> not allow access from other places, so you end up calling this
> function to initialize it if necessary only to get the pointer to
> that structure in order to uninitialize it.
>
> Sounds very twisted and ugly, doesn't it?
>

Hmn, You are right. but I couldn't find a way for that without making
"config_cache" file-scope static.

For the config_cache_free(), would this change be enough?

+static void config_cache_free(void)
+{
+	struct hashmap *config_cache;
+	struct config_cache_entry *entry;
+	struct hashmap_iter iter;
+	if (!hashmap_initialized)
+		return;
+	config_cache = get_config_cache();
+	hashmap_iter_init(config_cache, &iter);
+	while ((entry = hashmap_iter_next(&iter))) {
+		free(entry->key);
+		string_list_clear(&entry->value_list, 1);
+	}
+	hashmap_free(config_cache, 1);
+	hashmap_initialized = 0;
+}
+

The idea was only initialize the hashmap only once,
as it is fed by git_config(). I ended up following
this approach where the map would be lazily generated whenever
a function asked for the map. Is there another way out?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 16:20     ` Tanay Abhra
@ 2014-06-24 15:32       ` Ramsay Jones
  2014-06-26 16:15         ` Matthieu Moy
  0 siblings, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-06-24 15:32 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra, Matthieu Moy

On 23/06/14 17:20, Tanay Abhra wrote:
> On 06/23/2014 07:57 AM, Ramsay Jones wrote:
>> On 23/06/14 11:11, Tanay Abhra wrote:
[snip]

>>> +static struct hashmap *get_config_cache(void)
>>> +{
>>> +	static struct hashmap config_cache;
>>> +	if (!hashmap_initialized) {
>>> +		config_cache_init(&config_cache);
>>> +		hashmap_initialized = 1;
>>> +		git_config(config_cache_callback, NULL);
>>> +	}
>>> +	return &config_cache;
>>> +}
>>
>> [I have not been following this series at all (sorry I haven't had
>> the time to spare), so take these comments with a very big pinch of
>> salt! ie just ignore me if it's already been discussed etc. ;-) ]
>>
>> The 'git config' command can be used to read arbitrary files (so long
>> as they conform to the config syntax). For example, see the --file and
>> --blob options to git-config. At present, I think only scripted commands
>> use this facility (eg git-submodule). Noting the singleton config_cache,
>> what happens when git-submodule becomes a C builtin, or indeed any other
>> C builtin wants to take advantage of the new code when processing a non-
>> standard config file?
>>
> 
> This series was mainly to replace git_config() invocations around the codebase.
> There are currently 111 git_config() invocations, each of which causes a file
> reread whenever called. git_config() only feeds values from the standard config
> files(i.e repo, user and global config).
> 
> For reading config values from specific files or blobs, there are three functions
> git_config_with_options, git_config_from_file & git_config_from_blob which can be
> easily used inside a C builtin or anywhere in the code.
> 
> The bulk of git_config_api calls are only for git_config(). For example,
> git_config_from_file() has three hits only in entire codebase,
> git_config_with_options() has 5 hits, so I concentrated on generating a cache
> for the usual config files only. For other files, the callers can fall back on older
> API functions like I had mentioned above.
> 
> Forgive me if I inferred your question incorrectly. More below.

Hmm, maybe. The "... take advantage of the new code" refers to the
possibility (or otherwise) of re-using your work to update these
"older API" functions to the new API style. (also, see Junio's response).
[In order to do this, I would have expected to see one hash table
for each file/blob, so the singleton object took me by surprise.]

An "out of scope for this project" is a perfectly acceptable
response (*particularly* since it is very late in the day to be
bringing this up!).

>>> +static struct config_cache_entry *config_cache_find_entry(const char *key)
>>> +{
>>> +	struct hashmap *config_cache;
>>> +	struct config_cache_entry k;
>>> +	struct config_cache_entry *found_entry;
>>> +	char *normalized_key;
>>> +	int ret;
>>> +	config_cache = get_config_cache();
>>> +	ret = git_config_parse_key(key, &normalized_key, NULL);
>>> +
>>> +	if (ret)
>>> +		return NULL;
>>> +
>>> +	hashmap_entry_init(&k, strhash(normalized_key));
>>> +	k.key = normalized_key;
>>> +	found_entry = hashmap_get(config_cache, &k, NULL);
>>> +	free(normalized_key);
>>> +	return found_entry;
>>> +}
>>> +
>>> +static struct string_list *config_cache_get_value(const char *key)
>>> +{
>>> +	struct config_cache_entry *e = config_cache_find_entry(key);
>>> +	return e ? &e->value_list : NULL;
>>> +}
>>> +
>>> +static int config_cache_add_value(const char *key, const char *value)
>>> +{
>>> +	struct hashmap *config_cache;
>>> +	struct config_cache_entry *e;
>>> +	struct string_list_item *item;
>>> +	int *boolean_null_flag;
>>> +
>>> +	config_cache = get_config_cache();
>>> +	e = config_cache_find_entry(key);
>>> +
>>> +	boolean_null_flag = xcalloc(1, sizeof(*boolean_null_flag));
>>> +
>>> +	if (!e) {
>>> +		e = xmalloc(sizeof(*e));
>>> +		hashmap_entry_init(e, strhash(key));
>>> +		e->key = xstrdup(key);
>>
>> config_cache_find_entry() searches for (and hashes the) normalized_key.
>> Should you not be entering the normalized key here?
>>
> 
> config_cache_add_value() is fed key-values pairs through the git_config()
> callback mechanism, which normalises the key beforehand, so no need for
> renormalising.

Ah, yes, I forgot that the parsing code does a tolower() at various
places while accumulating the key string. So the (potentially) non-
normalized keys come from the user via the new API functions and,
rather than putting code to normalize the key in each of those,
just do it once in config_cache_find_entry(). (Although, you could
possibly do that in config_cache_get_value()).  OK.

Hmm, maybe add a short comment to that effect? dunno.

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 23:25     ` Junio C Hamano
  2014-06-24  7:23       ` Tanay Abhra
  2014-06-24  7:25       ` Tanay Abhra
@ 2014-06-24 15:57       ` Ramsay Jones
  2014-06-25 18:13         ` Junio C Hamano
  2 siblings, 1 reply; 35+ messages in thread
From: Ramsay Jones @ 2014-06-24 15:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tanay Abhra, git, Ramkumar Ramachandra, Matthieu Moy

On 24/06/14 00:25, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:
> 
>>> +static struct hashmap *get_config_cache(void)
>>> +{
>>> +	static struct hashmap config_cache;
>>> +	if (!hashmap_initialized) {
>>> +		config_cache_init(&config_cache);
>>> +		hashmap_initialized = 1;
>>> +		git_config(config_cache_callback, NULL);
>>> +	}
>>> +	return &config_cache;
>>> +}
>>
>> [I have not been following this series at all (sorry I haven't had
>> the time to spare), so take these comments with a very big pinch of
>> salt! ie just ignore me if it's already been discussed etc. ;-) ]
>>
>> The 'git config' command can be used to read arbitrary files (so long
>> as they conform to the config syntax). For example, see the --file and
>> --blob options to git-config. At present, I think only scripted commands
>> use this facility (eg git-submodule). Noting the singleton config_cache,
>> what happens when git-submodule becomes a C builtin, or indeed any other
>> C builtin wants to take advantage of the new code when processing a non-
>> standard config file?
> 
> Yup, that is a very good point.  There needs an infrastructure to
> tie a set of files (i.e. the standard one being the chain of
> system-global /etc/gitconfig to repo-specific .git/config, and any
> custom one that can be specified by the caller like submodule code)
> to a separate hashmap; a future built-in submodule code would use
> two hashmaps aka "config-caches", one to manage the usual
> "configuration" and the other to manage the contents of the
> .gitmodules file.
> 

I had expected to see one hash table per file/blob, with the three
standard config hash tables linked together to implement the scope/
priority rules. (Well, these could be merged into one, as the current
code does, since that makes handling "multi" keys slightly easier).

ie when looking up a symbol in the .git/config, if not found then
follow the link to parent scope (~/.gitconfig) and search there ...

Of course, other uses (I think) represent only a single scope, so they
would not have any parent scopes to link to.

> When we are operating across repositories (think "recursive
> checkout"), we somehow have to deal with multiple sets of
> configuration, one that applies to the top-level superproject and
> others that apply to individual submodule repositories.
> 

Here, you could push the new .git/config onto a stack while linking
to the parent (~/.gitconfig) scope. So, the lookup code starts from
the stack top, etc, ...

> Even there is currently one caller, the "git config" command
> implementation, if one is designing the new API, the design
> shouldn't be tied to a singleton limitation, I would have to say, so
> that future callers do not have to throw away everything done in
> this series and start from scratch.
> .

I would like to stress that, despite the above, I have not given
any great thought to this project. So please take my comments with
a big pinch of salt. I was just scrolling through the patch in my
email client and was surprised to see the singleton ...

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 3/3] test-config: add usage examples for non-callback query functions
  2014-06-23 10:11 ` [PATCH v3 3/3] test-config: add usage examples for non-callback query functions Tanay Abhra
@ 2014-06-25 11:19   ` Eric Sunshine
  2014-06-26  8:40     ` Tanay Abhra
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Sunshine @ 2014-06-25 11:19 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: Git List, Ramkumar Ramachandra, Matthieu Moy

On Mon, Jun 23, 2014 at 6:11 AM, Tanay Abhra <tanayabh@gmail.com> wrote:
> Add different usage examples for 'git_config_get_string' and
> `git_config_get_string_multi`. They will serve as documentation
> on how to query for config values in a non-callback manner.

This is a good start, but it's not fully what Matthieu was suggesting
when he said that you should prove to other developers, by way of
reproducible tests, that your changes work. What he meant, was that
you should write a test-config program which exposes (as a runnable
command) the new config C API you've added, and then write tests which
exercise that API and implementation exhaustively.

For example, take a look at test-string-list.c and
t/t0063-string-list.sh. The C program does no checking itself. It
merely exposes the C API via command-line arguments, such as "split",
"filter", etc. The test script then employs that program to perform
the actual testing in a reproducible and (hopefully) exhaustive
fashion. Because t/t0063-string-list.sh is part of the test suite, the
string-list tests are run regularly by many developers. It's not just
something that someone might remember to run once in a while.

Contrary to your commit message and the comment in the program itself,
the purpose of test-config is not to serve as documentation or to
provide examples of usage. (Real documentation is better suited for
those purposes.) Instead, test-config should exist in support of a
real test script in t/ which is run regularly. The new script you add
to t/ should exercise the exposed C API as exhaustively as possible.
This means checking each possible state: for instance, (1) when a key
is absent, (2) when a value is boolean (NULL), (3) one non-boolean
(non-NULL) value, (4) multiple values, etc. Moreover, it should check
expected success _and_ expected failure modes. Check not only that it
returns expected values, but that it fails when appropriate.

More below.

> Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
> ---
>  .gitignore    |  1 +
>  Makefile      |  1 +
>  test-config.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 95 insertions(+)
>  create mode 100644 test-config.c
>
> diff --git a/.gitignore b/.gitignore
> index 42294e5..7677533 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -177,6 +177,7 @@
>  /gitweb/static/gitweb.min.*
>  /test-chmtime
>  /test-ctype
> +/test-config
>  /test-date
>  /test-delta
>  /test-dump-cache-tree
> diff --git a/Makefile b/Makefile
> index 07ea105..9544efb 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -549,6 +549,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
>
>  TEST_PROGRAMS_NEED_X += test-chmtime
>  TEST_PROGRAMS_NEED_X += test-ctype
> +TEST_PROGRAMS_NEED_X += test-config
>  TEST_PROGRAMS_NEED_X += test-date
>  TEST_PROGRAMS_NEED_X += test-delta
>  TEST_PROGRAMS_NEED_X += test-dump-cache-tree
> diff --git a/test-config.c b/test-config.c
> new file mode 100644
> index 0000000..ff24cb8
> --- /dev/null
> +++ b/test-config.c
> @@ -0,0 +1,93 @@
> +#include "cache.h"
> +#include "hashmap.h"
> +#include "string-list.h"
> +
> +/*
> + * This program gives examples on how to use non-callback based query
> + * functions like git_config_get_string & git_config_get_string_multi.

See above regarding the true purpose of test-config. A more accurate
description might be something like this:

    This program exposes the C API of the configuration mechanism
    as a set of simple commands in order to facilitate testing.

> + *
> + * Reads stdin and prints result of command to stdout:
> + *
> + * print_all -> prints all the key-value pairs contained in the hashmap
> + *             also checks if all entries in the hashmap matches with
> + *             the content of config files

Some comments:

This doesn't just print all values. It's doing validation on its own,
so the name 'print_all' is misleading.

This isn't part of the C API which you are exposing, so such a command
is somewhat out of place here. A script in t/ could perform similar
functionality via the exposed API.

Having said that, it probably doesn't hurt to do such internal
integrity checking, however, a better command name is warranted.
Perhaps 'check_integrity'. Moreover, a command which checks internal
integrity need not verbosely print all items. More useful, upon
success, would be to emit nothing at all or a simple "ok" or such. On
failure, emit enough information about detected problems to allow them
to be tracked down and fixed.

> + *
> + * get_value -> prints the value with highest priority for the entered key
> + *
> + * get_all_values -> prints all values for the given key in increasing order
> + *                  of priority

Since you're exposing the C API, it might make sense to name these
after the C functions. For instance, "get_string" and
"get_string_multi". (This can become important as the API is
expanded.)

> + * Examples:
> + *
> + * To print the value with highest priority for key "foo.bAr Baz.rock":
> + *     test-config get_value "foo.bAr Baz.rock"
> + *
> + */
> +
> +static const char *v;
> +static const struct string_list *strptr;
> +static int i;
> +static int *flag;

These variables are not shared between different functions in this
file, so they don't need to be global. Instead, you should declare
them as local variables where needed.

> +static int config_cache_callback(const char *key, const char *value, void *unused)

This name doesn't convey much. Since this is actually implementing
functionality of the 'show_all' command, perhaps show_all_cb() would
be better. Even better, given the comments above,
check_integrity_cb().

> +{
> +       strptr = git_config_get_string_multi(key);
> +       if (strptr) {
> +               for (i = 0; i < strptr->nr; i++)
> +               {

Style: { on same line as 'for'

> +                       v = strptr->items[i].string;
> +                       flag = strptr->items[i].util;
> +                       /* NULL values are flagged as 1 */
> +                       if (*flag == 1)
> +                               printf("%s\n", key);
> +                       else if (*flag == 0)
> +                               printf("%s=%s\n", key, v);
> +                       /* key-value pair printed so flag them as done */
> +                       *flag = -1;

Curious: It's not clear what the purpose of this -1 is. Is the
callback being invoked multiple times with the same key/value pair?

> +               }
> +               return 0;
> +       } else {
> +               printf("%s\n", "Config hashmap inconsistent\n");

Spelling out the actual inconsistency, rather than making a only
general statement, could help a programmer diagnose the problem.

> +               return -1;
> +       }
> +}
> +
> + int main(int argc, char **argv)
> +{
> +       if (argc == 2 && !strcmp(argv[1], "print_all")) {
> +               git_config(config_cache_callback, NULL);
> +               return 0;
> +       } else if (argc == 3 && !strcmp(argv[1], "get_value")) {
> +               /* enter key in canonical form enclosed in quotes */

What does this comment mean? Is it telling us what the code is doing
or is it an instruction to the person typing at the shell? The bit
about "canonical form" is something which you can document in the
comment block at the top of this file.

> +               if (!git_config_get_string(argv[2], &v)) {
> +                       printf("%s\n", v);

What happens if this is a boolean (NULL) value? Behavior of '%s' on a
NULL value is undefined. Better would be to emit some special,
recognizable token when NULL.

> +                       return 0;
> +               } else {
> +                       printf("%s\n", "Value not found for the entered key\n");

Although the key was specified on the command-line, this diagnostic
can be more helpful by mentioning which key was not found.

> +                       return -1;
> +               }
> +       } else if (argc == 3 && !strcmp(argv[1], "get_all_values")) {
> +               /* enter key in canonical form enclosed in quotes */

Ditto regarding comment.

> +               strptr = git_config_get_string_multi(argv[2]);
> +               if (strptr) {
> +                       for (i = 0; i < strptr->nr; i++)
> +                       {

Style: { on same line as 'for'

> +                               v = strptr->items[i].string;
> +                               flag = strptr->items[i].util;
> +                               /* prints NULL values as "-" */

Rather than "-", it might make more sense to choose a more unique and
identifiable representation which can be easily and reliably
recognized by the test script.

> +                               if (*flag)
> +                                       printf("%s ", "-");
> +                               else
> +                                       printf("%s ", v);
> +                       }
> +                       printf("\n");

Printing all items of a multi-part config value on a single line
delimited by spaces may not be the best output format. The values
themselves may contain whitespace. Taking into consideration that this
output should be easily consumable by a test script, it probably makes
more sees to print each value on its own line.

> +                       return 0;
> +               } else {
> +                       printf("%s\n", "Value not found for the entered key\n");

To be helpful, say which key was not found.

> +                       return -1;
> +               }
> +       }
> +
> +       fprintf(stderr, "%s: unknown function name: %s\n", argv[0],
> +               argv[1] ? argv[1] : "(there was none)");

This is a bit of a lie. This diagnostic will be triggered, not only
for an unrecognized function, but also if the number of arguments to a
function was incorrect.

> +       return 1;
> +}
> --
> 1.9.0.GIT

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-24 15:57       ` Ramsay Jones
@ 2014-06-25 18:13         ` Junio C Hamano
  2014-06-25 20:23           ` Karsten Blees
  2014-06-26 17:37           ` Matthieu Moy
  0 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-06-25 18:13 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Tanay Abhra, git, Ramkumar Ramachandra, Matthieu Moy

Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:

> On 24/06/14 00:25, Junio C Hamano wrote:
> ...
>> Yup, that is a very good point.  There needs an infrastructure to
>> tie a set of files (i.e. the standard one being the chain of
>> system-global /etc/gitconfig to repo-specific .git/config, and any
>> custom one that can be specified by the caller like submodule code)
>> to a separate hashmap; a future built-in submodule code would use
>> two hashmaps aka "config-caches", one to manage the usual
>> "configuration" and the other to manage the contents of the
>> .gitmodules file.
>> 
>
> I had expected to see one hash table per file/blob, with the three
> standard config hash tables linked together to implement the scope/
> priority rules. (Well, these could be merged into one, as the current
> code does, since that makes handling "multi" keys slightly easier).

Again, good point.  I think a rough outline of a design that take
both

 (1) we may have to read two or more separate sets of "config like
     things" (e.g. the contents from the normal config system and
     the contents from the .gitmodules file) and

 (2) we may have to read two or more files that make up a logically
     single set of "config-like things" (e.g. the "normal config
     system" reads from three separate files)

into account may look like this:

 * Each instance of in-core "config-like things" is expressed as a
   struct "config-set".

 * A "config-set" points at an ordered set of struct "config-file",
   each of which represents what was read and cached in-core from a
   file.

 * When we know or notice that a single file on the filesystem was
   modified, we do not have to invalidate the whole "config-set"
   that depends on the file; the "config-file" that corresponds to
   the file on the filesystem is invalidated instead.

 * The most generic API to read the values for a given key or
   enumerate the keys in a set of "config-like things" takes
   "config-set" as an argument, and reads from the ordered set of
   "config-file" to keep the established illusion that we read them
   all and accumulate, leading to "the last one wins" for single
   valued variables.

 * Because reading from the normal config system happens everywhere
   in the existing code, we will have one struct "config-set"
   instance, called "the_config_set", and have a parallel API of the
   most generic API above, that do not take the "config-set" as an
   explicit argument.  They operate on the_config_set singleton
   instance.


The implementation of the API function "git-config-get-string" may
look like this:

	int git_config_get_string(const char *key, const char **value)
	{
        	return git_configset_get_string(&the_config_set, key, value);
	}

which is the "thin-wrapper" for the more generic API to allow you
read from an arbitrary config_set, which may even look like this:

	#define git_config_get_string(k, v) \
        	git_configset_get_string(&the_config_set, (k), (v))


When the submodule script that uses "git config -f .gitmodules" is
converted into C, if the updated config API is ready, it may be able
to do something like these in a single program:

	const char *url;
	struct config_set *gm_config;

        /* read from $GIT_DIR/config */
        url = git_config_get_string("remote.origin.url");

        /*
         * allow us to read from .gitmodules; the parameters are
         * list of files that make up the configset, perhaps.
         */
	gm_config = git_configset_new(".gitmodules", NULL);


        if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore")) {
		/* do a lot of stuff for the submodule */
                ...
	}

        /* when we are done with the configset */
        git_configset_clear(gm_config);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-24  7:23       ` Tanay Abhra
@ 2014-06-25 18:21         ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-06-25 18:21 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra, Matthieu Moy, Heiko Voigt

Tanay Abhra <tanayabh@gmail.com> writes:

> What changes should I implement in this series? Should I add new
> user facing API adding to the singleton callers which are already in
> this series.

I think the underlying data structures that represent what the
entire set of config data is needs to be refined.

We did this kind of conversion once, back when we made it possible
to handle more than one instances of in-core cache.  You may want to
study the series that implemented the transition to learn ideas from.

 - We used to have a single "struct cache_entry **cache", a single
   pair of "unsigned int cache_nr, cache_alloc", etc. and the first
   patch to introduce the feature was to define "struct index_state"
   that holds all of these "what the entire set of data that
   represent the status of the in-core cache is" variables.

 - We also used to have a set of functions that work on the
   singleton instance of the in-core cache (read_cache(),
   write_cache(), etc.).  We introduced a new set of API functions
   to take an explicit pointer to the "struct index_state" we want
   to work on, and made these old "assume the singleton" functions a
   set of thin wrappers that pass &the_index to the corresponding
   new API functions.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-25 18:13         ` Junio C Hamano
@ 2014-06-25 20:23           ` Karsten Blees
  2014-06-25 20:53             ` Junio C Hamano
  2014-06-26 17:37           ` Matthieu Moy
  1 sibling, 1 reply; 35+ messages in thread
From: Karsten Blees @ 2014-06-25 20:23 UTC (permalink / raw)
  To: Junio C Hamano, Ramsay Jones
  Cc: Tanay Abhra, git, Ramkumar Ramachandra, Matthieu Moy

Am 25.06.2014 20:13, schrieb Junio C Hamano:
> Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:
> 
>> On 24/06/14 00:25, Junio C Hamano wrote:
>> ...
>>> Yup, that is a very good point.  There needs an infrastructure to
>>> tie a set of files (i.e. the standard one being the chain of
>>> system-global /etc/gitconfig to repo-specific .git/config, and any
>>> custom one that can be specified by the caller like submodule code)
>>> to a separate hashmap; a future built-in submodule code would use
>>> two hashmaps aka "config-caches", one to manage the usual
>>> "configuration" and the other to manage the contents of the
>>> .gitmodules file.
>>>
>>
>> I had expected to see one hash table per file/blob, with the three
>> standard config hash tables linked together to implement the scope/
>> priority rules. (Well, these could be merged into one, as the current
>> code does, since that makes handling "multi" keys slightly easier).
> 
> Again, good point.  I think a rough outline of a design that take
> both
> 
>  (1) we may have to read two or more separate sets of "config like
>      things" (e.g. the contents from the normal config system and
>      the contents from the .gitmodules file) and
> 
>  (2) we may have to read two or more files that make up a logically
>      single set of "config-like things" (e.g. the "normal config
>      system" reads from three separate files)
> 
> into account may look like this:
> 
>  * Each instance of in-core "config-like things" is expressed as a
>    struct "config-set".
> 
>  * A "config-set" points at an ordered set of struct "config-file",
>    each of which represents what was read and cached in-core from a
>    file.

Is this additional complexity really necessary?

How would you handle included files? Split up the including file in before / after parts? I.e.

  repo-config-file[include-to-end]
  included-file
  repo-config-file[top-to-include]
  user-config-file
  ...

Looking up a single-valued key would then be O(n) (where n is the number of sruct config_file's in the config_set) rather than O(1).

Looking up a multi-valued key would involve joining values from all files, every time the value is looked up (dynamically allocating lists on the heap etc.).

The configuration is typically loaded once, followed by lots of lookups. So from a performance perspective, doing the merging at load time is sure better.

> 
>  * When we know or notice that a single file on the filesystem was
>    modified, we do not have to invalidate the whole "config-set"
>    that depends on the file; the "config-file" that corresponds to
>    the file on the filesystem is invalidated instead.
> 

What's the use case for this? Do you expect e.g. 'git gc' to detect changed depth/window size at run time and adjust the algorithm accordingly? Or do you just intend to cache parsed config data (the latter could be done by recording all involved file names and stats in the config-set and reloading the whole thing if any of the files change)?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-24 12:06     ` Tanay Abhra
@ 2014-06-25 20:25       ` Karsten Blees
  0 siblings, 0 replies; 35+ messages in thread
From: Karsten Blees @ 2014-06-25 20:25 UTC (permalink / raw)
  To: Tanay Abhra, Matthieu Moy; +Cc: git, Ramkumar Ramachandra

Am 24.06.2014 14:06, schrieb Tanay Abhra:
> On 6/23/2014 5:25 PM, Matthieu Moy wrote:
>> Tanay Abhra <tanayabh@gmail.com> writes:
>>
>>> +/* for NULL values, 'util' for each `string_list_item` is flagged as 1 */
>>
>> It's a void *, so just saying that it is flagged as 1 does not say how
>> it's encoded. How about "... is an 'int *' pointing to the value 1".
>>
>> And actually, you can save one malloc by encoding the value directly in
>> the util pointer itself like
>>
>> #define NULL_FLAG (void *)1
>>
>>     item->util = NULL_FLAG;
>>
>> I find this a bit ugly, but I think Git already uses this in a few
>> places (git grep 'void \*) *1' for examples).
>>
>> Or you can use a pointer to a single static value:
>>
>> static const int boolean_null_flag = 1; // Or even without = 1.
>>
>> and then
>>
>>     item->util = &boolean_null_flag;
>>
> 
> Thanks for the review. I will change the flag format to what you have
> suggested. Instead of giving the users of the API the headache of
> thinking about the flag, let me give you an alternative,
> 
> +const struct string_list *git_config_get_string_multi(const char *key)
> +{
> +    int i, *flag;
> +    struct string_list *temp = config_cache_get_value(key);
> +    if (!temp)
> +        return NULL;
> +    /* create a NODUP string list which can have NULL values */
> +    struct string_list *l = xcalloc(1, sizeof(*l));
> +    for(i=0; i < temp->nr; i++) {
> +        flag = temp->items[i].util;
> +        if (*flag)
> +            string_list_append(l, NULL);
> +        else
> +            string_list_append(l, temp->items[i].string);
> +    }
> +    return l;
> +}
> 

Returning a list that is 'half-owned' by the caller (i.e. list is, values are not) is kindof strange. Especially weird is that the caller needs to string_list_clear() _and_ free() the list. Why don't you store NULL values in the string_list in the cache instead of doing this flag magic? I.e.

  static int config_cache_add_value(const char *key, const char *value)
  {
  ...
    if (value == NULL)
      string_list_append_nodup(&e->value_list, NULL);
    else
      string_list_append(&e->value_list, value);

or even

    string_list_append_nodup(&e->value_list, value ? xstrdup(value) : NULL);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-25 20:23           ` Karsten Blees
@ 2014-06-25 20:53             ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-06-25 20:53 UTC (permalink / raw)
  To: Karsten Blees
  Cc: Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra, Matthieu Moy

Karsten Blees <karsten.blees@gmail.com> writes:

> Am 25.06.2014 20:13, schrieb Junio C Hamano:
>> Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:
>>> I had expected to see one hash table per file/blob, with the three
>>> standard config hash tables linked together to implement the scope/
>>> priority rules. (Well, these could be merged into one, as the current
>>> code does, since that makes handling "multi" keys slightly easier).
>> 
>> Again, good point....
>
> Is this additional complexity really necessary?

Nothing is *really* necessary ;-) and it is possible that the best
balance may be at "parse a single chain of files into a single
hashtable for a config-set, and if anything changes, re-read
everything from scratch".

The point Ramsay raised about being able to share the pre-parsed
$HOME/.gitconfig across multiple config-sets (one for the top-level
superproject, and the others for submodules, when having to work
across module boundaries) triggered this thought experiment (aka "I
am not married to the approach") to use one table per source.  If we
wanted to take advantage of updating a single file and not having to
re-read the whole thing, includes need to be handled a bit more
carefully than "one config-file for one source", as you noticed, and
a single source may have to be split into multiple pieces.

And it is possible that the complexity necessary to do these
correctly may make it not worth pursuing the approach.  Or it may
not.  I don't know at this point, and thinking these things through
to arrive at a good design is part of the GSoC project after all, so
I'd rather not to think it through to the end myself ;-).

> What's the use case for this? Do you expect e.g. 'git gc' to
> detect changed depth/window size at run time and adjust the
> algorithm accordingly?

I did write "detect" but I think a more realistic example is that we
do git-config-set internally and wish to see the effect inside the
same process (i.e. something like "pull --set-upstream" that sets
configuration variable for later invocations and also perform the
operation with the configuration in effect at the same time).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
                     ` (2 preceding siblings ...)
  2014-06-23 23:14   ` Junio C Hamano
@ 2014-06-25 21:44   ` Karsten Blees
  2014-06-26 16:43   ` Matthieu Moy
  4 siblings, 0 replies; 35+ messages in thread
From: Karsten Blees @ 2014-06-25 21:44 UTC (permalink / raw)
  To: Tanay Abhra, git; +Cc: Ramkumar Ramachandra, Matthieu Moy

Am 23.06.2014 12:11, schrieb Tanay Abhra:

[...]

> +
> +static struct config_cache_entry *config_cache_find_entry(const char *key)
> +{
> +	struct hashmap *config_cache;
> +	struct config_cache_entry k;
> +	struct config_cache_entry *found_entry;
> +	char *normalized_key;
> +	int ret;
> +	config_cache = get_config_cache();
> +	ret = git_config_parse_key(key, &normalized_key, NULL);

[I didn't follow all previous discussion, so feel free to ignore me if this has been discussed already]

Is it really necessary to normalize keys on each lookup? The current git config code simply does a 'strcmp("lower-case-key", var)' so normalization shouldn't be necessary for the majority of callers. If a caller uses a non-constant key (e.g. as passed to 'git config --get <name>'), it would be the caller's responsibility to normalize.

[...]

> +int git_config_get_string(const char *key, const char **value)

I would have expected '..._get_string' to return a string. If the return value indicates whether something was found, it should probably be called '..._find_...'?

In the end, you want typed config functions that do type conversion and handle parse errors, at least for the common types bool/int/string... Thus the generic function that returns unparsed data should probably be called '..._value', not '..._string'?

A typical pull-style config API will also let you specify default values, e.g.:

  const char *git_config_get_string(const char *key, const char *default_value)
  {
    const char *value;
    if (!git_config_find_value(key, &value))
      return default_value;
    if (!value)
      config_error_nonbool();
    return value;
  }

  int git_config_get_bool(const char *key, int default_value)
  {
    const char *value;
    if (!git_config_find_value(key, &value))
      return default_value;
    return git_config_bool(key, value);
  }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 3/3] test-config: add usage examples for non-callback query functions
  2014-06-25 11:19   ` Eric Sunshine
@ 2014-06-26  8:40     ` Tanay Abhra
  0 siblings, 0 replies; 35+ messages in thread
From: Tanay Abhra @ 2014-06-26  8:40 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Ramkumar Ramachandra, Matthieu Moy

Hi,

I thought about adding a test*.sh file after sending the series.
No worries, I will rectify it in the next patch.
Also, I have read all your comments.

Thanks for the review.

Cheers,
Tanay Abhra.

On 6/25/2014 4:49 PM, Eric Sunshine wrote:
> On Mon, Jun 23, 2014 at 6:11 AM, Tanay Abhra <tanayabh@gmail.com> wrote:
>> Add different usage examples for 'git_config_get_string' and
>> `git_config_get_string_multi`. They will serve as documentation
>> on how to query for config values in a non-callback manner.
> 
> This is a good start, but it's not fully what Matthieu was suggesting
> when he said that you should prove to other developers, by way of
> reproducible tests, that your changes work. What he meant, was that
> you should write a test-config program which exposes (as a runnable
> command) the new config C API you've added, and then write tests which
> exercise that API and implementation exhaustively.
> 
> For example, take a look at test-string-list.c and
> t/t0063-string-list.sh. The C program does no checking itself. It
> merely exposes the C API via command-line arguments, such as "split",
> "filter", etc. The test script then employs that program to perform
> the actual testing in a reproducible and (hopefully) exhaustive
> fashion. Because t/t0063-string-list.sh is part of the test suite, the
> string-list tests are run regularly by many developers. It's not just
> something that someone might remember to run once in a while.
> 
> Contrary to your commit message and the comment in the program itself,
> the purpose of test-config is not to serve as documentation or to
> provide examples of usage. (Real documentation is better suited for
> those purposes.) Instead, test-config should exist in support of a
> real test script in t/ which is run regularly. The new script you add
> to t/ should exercise the exposed C API as exhaustively as possible.
> This means checking each possible state: for instance, (1) when a key
> is absent, (2) when a value is boolean (NULL), (3) one non-boolean
> (non-NULL) value, (4) multiple values, etc. Moreover, it should check
> expected success _and_ expected failure modes. Check not only that it
> returns expected values, but that it fails when appropriate.
> 
> More below.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-24 15:32       ` Ramsay Jones
@ 2014-06-26 16:15         ` Matthieu Moy
  0 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2014-06-26 16:15 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Tanay Abhra, git, Ramkumar Ramachandra

Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:

> Hmm, maybe. The "... take advantage of the new code" refers to the
> possibility (or otherwise) of re-using your work to update these
> "older API" functions to the new API style. (also, see Junio's response).

I agree that, while caching the usual config files is the most
important, allowing other users of the config code to use non-callback
functions would be nice too.

Not really for efficiency, but because I find it far more natural to ask
"what's the value of config key XYZ" to the code than to ask "can you
parse all config keys in the file, let me know, and I'll tell you later
which ones I'm interested in".

> [In order to do this, I would have expected to see one hash table
> for each file/blob, so the singleton object took me by surprise.]

The singleton could be fine as long as it is optional. We can have
an API looking like:

int git_config_get_string(const char *key, const char **value) {
	return git_config_get_string_from(the singleton config cache,
                                          key, value);
}

int git_config_get_string_from(struct hashmap *map, const char *key, const char **value);

(or take a structure representing "a set of config files" instead of
directly the hashmap)

Now, the good news is that such API can be written later, without
breaking the API. So I'd say it's fine to keep the code as-is for now,
and make it more general later. No strong opinion, though.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-24 12:21     ` Tanay Abhra
@ 2014-06-26 16:27       ` Matthieu Moy
  0 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2014-06-26 16:27 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: Junio C Hamano, git, Ramkumar Ramachandra

Tanay Abhra <tanayabh@gmail.com> writes:

> For the config_cache_free(), would this change be enough?
>
> +static void config_cache_free(void)
> +{
> +	struct hashmap *config_cache;
> +	struct config_cache_entry *entry;
> +	struct hashmap_iter iter;
> +	if (!hashmap_initialized)
> +		return;
> +	config_cache = get_config_cache();

That sounds better to me. And it justifies the different scopes: one can
ask whether the map is initialized from anywhere in the file, but can
only get the map after initialization, so it prevents mis-use of an
uninitialized pointer.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
                     ` (3 preceding siblings ...)
  2014-06-25 21:44   ` Karsten Blees
@ 2014-06-26 16:43   ` Matthieu Moy
  4 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2014-06-26 16:43 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra

Tanay Abhra <tanayabh@gmail.com> writes:

> +Querying For Specific Variables
> +-------------------------------
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_string`
> +and `git_config_get_string_multi`.They both read values from an internal
> +cache generated previously from reading the config files.
> +
> +`git_config_get_string` takes two parameters,
> +
> +- a key string in canonical flat form for which the corresponding value
> +  with the highest priority (i.e. value in the repo config will be
> +  preferred over value in user wide config for the same variable) will
> +  be retrieved.
> +
> +- a pointer to a string which will point to the retrieved value.
> +
> +`git_config_get_string` returns 0 for success, or -1 for no value found.
> +
> +`git_config_get_string_multi` returns a `string_list` containing all the
> +values for the key passed as parameter, sorted in order of increasing
> +priority (Note: NULL values are flagged as 1, check `util` for each
> +'string_list_item` for flag value).

I think you need to add something like:

   Both functions return pointers to values owned by the config cache. The
   caller need not free them, but should not modify them.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-25 18:13         ` Junio C Hamano
  2014-06-25 20:23           ` Karsten Blees
@ 2014-06-26 17:37           ` Matthieu Moy
  2014-06-26 19:00             ` Junio C Hamano
  1 sibling, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2014-06-26 17:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Junio C Hamano <gitster@pobox.com> writes:

> When the submodule script that uses "git config -f .gitmodules" is
> converted into C, if the updated config API is ready, it may be able
> to do something like these in a single program:
>
> 	const char *url;
> 	struct config_set *gm_config;
>
>         /* read from $GIT_DIR/config */
>         url = git_config_get_string("remote.origin.url");
>
>         /*
>          * allow us to read from .gitmodules; the parameters are
>          * list of files that make up the configset, perhaps.
>          */
> 	gm_config = git_configset_new(".gitmodules", NULL);
>
>
>         if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore")) {
> 		/* do a lot of stuff for the submodule */
>                 ...
> 	}
>
>         /* when we are done with the configset */
>         git_configset_clear(gm_config);

Isn't that a bit overkill? Why not just let the caller manage a hashmap
directly instead of a config_set? Your code could become

  struct hashmap *gm_config;
  config_cache_init(&gm_config);
  config_cache_read_from_file(".gitmodule", gm_config);
  /* possibly more calls to
     config_cache_read_from_file("some-other-file", ...). */
  
and then

  if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore")) {
     ...
  

Perhaps a bit more complicated to use, but much simpler to implement.
Since there are very few callers, I'd say it's better to keep the
implementation simple.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-26 17:37           ` Matthieu Moy
@ 2014-06-26 19:00             ` Junio C Hamano
  2014-06-26 19:19               ` Karsten Blees
  2014-06-27  8:19               ` Matthieu Moy
  0 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-06-26 19:00 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> When the submodule script that uses "git config -f .gitmodules" is
>> converted into C, if the updated config API is ready, it may be able
>> to do something like these in a single program:
>>
>> 	const char *url;
>> 	struct config_set *gm_config;
>>
>>         /* read from $GIT_DIR/config */
>>         url = git_config_get_string("remote.origin.url");
>>
>>         /*
>>          * allow us to read from .gitmodules; the parameters are
>>          * list of files that make up the configset, perhaps.
>>          */
>> 	gm_config = git_configset_new(".gitmodules", NULL);
>>
>>
>>         if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore")) {
>> 		/* do a lot of stuff for the submodule */
>>                 ...
>> 	}
>>
>>         /* when we are done with the configset */
>>         git_configset_clear(gm_config);
>
> Isn't that a bit overkill? Why not just let the caller manage a hashmap
> directly instead of a config_set?

Because I had an experience under my belt of a painful refactoring
of "the_index" which turned out to be not just a single array, I
simply suspect that the final data structure to represent a "set of
config-like things" will not be just a single hashmap, hence I do
prefer to have one layer of abstraction "struct config_set", which
would contain a hashmap and possibly more.  Doesn't "is the hashmap
initialized" bit belong there, for example?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-26 19:00             ` Junio C Hamano
@ 2014-06-26 19:19               ` Karsten Blees
  2014-06-26 21:21                 ` Junio C Hamano
  2014-06-27  8:19               ` Matthieu Moy
  1 sibling, 1 reply; 35+ messages in thread
From: Karsten Blees @ 2014-06-26 19:19 UTC (permalink / raw)
  To: Junio C Hamano, Matthieu Moy
  Cc: Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Am 26.06.2014 21:00, schrieb Junio C Hamano:
> Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:
> 
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> When the submodule script that uses "git config -f .gitmodules" is
>>> converted into C, if the updated config API is ready, it may be able
>>> to do something like these in a single program:
>>>
>>> 	const char *url;
>>> 	struct config_set *gm_config;
>>>
>>>         /* read from $GIT_DIR/config */
>>>         url = git_config_get_string("remote.origin.url");
>>>
>>>         /*
>>>          * allow us to read from .gitmodules; the parameters are
>>>          * list of files that make up the configset, perhaps.
>>>          */
>>> 	gm_config = git_configset_new(".gitmodules", NULL);
>>>
>>>
>>>         if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore")) {
>>> 		/* do a lot of stuff for the submodule */
>>>                 ...
>>> 	}
>>>
>>>         /* when we are done with the configset */
>>>         git_configset_clear(gm_config);
>>
>> Isn't that a bit overkill? Why not just let the caller manage a hashmap
>> directly instead of a config_set?
> 
> Because I had an experience under my belt of a painful refactoring
> of "the_index" which turned out to be not just a single array, I
> simply suspect that the final data structure to represent a "set of
> config-like things" will not be just a single hashmap, hence I do
> prefer to have one layer of abstraction "struct config_set", which
> would contain a hashmap and possibly more.  Doesn't "is the hashmap
> initialized" bit belong there, for example?

Would an additional

  int hashmap_is_initialized(constr struct hashmap *map)
  {
    return !!map->table;
  }

API help? (Note that hashmap_free() already does memset(0), so the usual notion of "zero memory means unitialized" applies).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-26 19:19               ` Karsten Blees
@ 2014-06-26 21:21                 ` Junio C Hamano
  2014-06-27  8:19                   ` Karsten Blees
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2014-06-26 21:21 UTC (permalink / raw)
  To: Karsten Blees
  Cc: Matthieu Moy, Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Karsten Blees <karsten.blees@gmail.com> writes:

>> Because I had an experience under my belt of a painful refactoring
>> of "the_index" which turned out to be not just a single array, I
>> simply suspect that the final data structure to represent a "set of
>> config-like things" will not be just a single hashmap, hence I do
>> prefer to have one layer of abstraction "struct config_set", which
>> would contain a hashmap and possibly more.  Doesn't "is the hashmap
>> initialized" bit belong there, for example?
>
> Would an additional
>
>   int hashmap_is_initialized(constr struct hashmap *map)
>   {
>     return !!map->table;
>   }
>
> API help? (Note that hashmap_free() already does memset(0), so the
> usual notion of "zero memory means unitialized" applies).

It may remove the need for the separate "hashmap_initialized" bit
that was implemented as a file-scope global in the patch.

I however am not convinced that it will be the _only_ thing other
than the hashmap we would need to use to keep track of the in-core
"set of config-like things", and usually a blanket statement "these
are the only thing we would ever need" tends not to hold for long,
so...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-26 21:21                 ` Junio C Hamano
@ 2014-06-27  8:19                   ` Karsten Blees
  0 siblings, 0 replies; 35+ messages in thread
From: Karsten Blees @ 2014-06-27  8:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Am 26.06.2014 23:21, schrieb Junio C Hamano:
> Karsten Blees <karsten.blees@gmail.com> writes:
> 
>>> Because I had an experience under my belt of a painful refactoring
>>> of "the_index" which turned out to be not just a single array, I
>>> simply suspect that the final data structure to represent a "set of
>>> config-like things" will not be just a single hashmap, hence I do
>>> prefer to have one layer of abstraction "struct config_set", which
>>> would contain a hashmap and possibly more.  Doesn't "is the hashmap
>>> initialized" bit belong there, for example?
>>
>> Would an additional
>>
>>   int hashmap_is_initialized(constr struct hashmap *map)
>>   {
>>     return !!map->table;
>>   }
>>
>> API help? (Note that hashmap_free() already does memset(0), so the
>> usual notion of "zero memory means unitialized" applies).
> 
> It may remove the need for the separate "hashmap_initialized" bit
> that was implemented as a file-scope global in the patch.
> 

That the variable is redundant was all I was trying to say, sorry for
the misunderstanding. But reading through api-hashmap.txt again, I
found that documentation of the hashmap members is pretty vague, so
perhaps improving the docs would suffice (e.g. "hashmap.table != NULL
indicates the hashmap has been initialized.").

> I however am not convinced that it will be the _only_ thing other
> than the hashmap we would need to use to keep track of the in-core
> "set of config-like things", and usually a blanket statement "these
> are the only thing we would ever need" tends not to hold for long,
> so...
>

I think having an extra type for the configuration is a Good Thing
(even if the only member is a hashmap). Although I wouldn't call it
'config_set', as it conflicts with the verb 'set', resulting in
weird APIs ('config_set_get_value()', 'config_set_set_value()').
Perhaps 'config_map' or 'config_settings', or just 'config'?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-26 19:00             ` Junio C Hamano
  2014-06-26 19:19               ` Karsten Blees
@ 2014-06-27  8:19               ` Matthieu Moy
  2014-06-27 17:13                 ` Junio C Hamano
  1 sibling, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2014-06-27  8:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Junio C Hamano <gitster@pobox.com> writes:

> Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> When the submodule script that uses "git config -f .gitmodules" is
>>> converted into C, if the updated config API is ready, it may be able
>>> to do something like these in a single program:
>>>
>>> 	const char *url;
>>> 	struct config_set *gm_config;
>>>
>>>         /* read from $GIT_DIR/config */
>>>         url = git_config_get_string("remote.origin.url");
>>>
>>>         /*
>>>          * allow us to read from .gitmodules; the parameters are
>>>          * list of files that make up the configset, perhaps.
>>>          */
>>> 	gm_config = git_configset_new(".gitmodules", NULL);
>>>
>>>
>>>         if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore")) {
>>> 		/* do a lot of stuff for the submodule */
>>>                 ...
>>> 	}
>>>
>>>         /* when we are done with the configset */
>>>         git_configset_clear(gm_config);
>>
>> Isn't that a bit overkill? Why not just let the caller manage a hashmap
>> directly instead of a config_set?
>
> Because I had an experience under my belt of a painful refactoring
> of "the_index" which turned out to be not just a single array, I
> simply suspect that the final data structure to represent a "set of
> config-like things" will not be just a single hashmap, hence I do
> prefer to have one layer of abstraction "struct config_set", which
> would contain a hashmap and possibly more.

OK, I guess I overinterpreted what you meant by "struct config_set". If
it's a thin abstraction layer on top of the hashmap (i.e. essentially
contain the hashmap, and possibly a few more metadata), then it
definitely makes sense.

> Doesn't "is the hashmap initialized" bit belong there, for example?

Yes.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/3] config: add hashtable for config parsing & retrieval
  2014-06-27  8:19               ` Matthieu Moy
@ 2014-06-27 17:13                 ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2014-06-27 17:13 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Ramsay Jones, Tanay Abhra, git, Ramkumar Ramachandra

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

>>> Isn't that a bit overkill? Why not just let the caller manage a hashmap
>>> directly instead of a config_set?
>>
>> Because I had an experience under my belt of a painful refactoring
>> of "the_index" which turned out to be not just a single array, I
>> simply suspect that the final data structure to represent a "set of
>> config-like things" will not be just a single hashmap, hence I do
>> prefer to have one layer of abstraction "struct config_set", which
>> would contain a hashmap and possibly more.
>
> OK, I guess I overinterpreted what you meant by "struct config_set". If
> it's a thin abstraction layer on top of the hashmap (i.e. essentially
> contain the hashmap, and possibly a few more metadata), then it
> definitely makes sense.

Yup, and I do not strongly mind the initialization sequence of "if
you want to overlay from a file, call *_from_file()" you outlined;
an initialiser that takes a list of file paths to read from was
merely an example and not meant to be the sole interface (it is
overly rigid to be one).

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-06-27 17:13 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-23 10:11 [PATCH v3 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
2014-06-23 10:11 ` [PATCH v3 1/3] string-list: add string_list initialiser helper functions Tanay Abhra
2014-06-23 12:36   ` Torsten Bögershausen
2014-06-23 13:19     ` Tanay Abhra
2014-06-23 10:11 ` [PATCH v3 2/3] config: add hashtable for config parsing & retrieval Tanay Abhra
2014-06-23 11:55   ` Matthieu Moy
2014-06-24 12:06     ` Tanay Abhra
2014-06-25 20:25       ` Karsten Blees
2014-06-23 14:57   ` Ramsay Jones
2014-06-23 16:20     ` Tanay Abhra
2014-06-24 15:32       ` Ramsay Jones
2014-06-26 16:15         ` Matthieu Moy
2014-06-23 23:25     ` Junio C Hamano
2014-06-24  7:23       ` Tanay Abhra
2014-06-25 18:21         ` Junio C Hamano
2014-06-24  7:25       ` Tanay Abhra
2014-06-24 15:57       ` Ramsay Jones
2014-06-25 18:13         ` Junio C Hamano
2014-06-25 20:23           ` Karsten Blees
2014-06-25 20:53             ` Junio C Hamano
2014-06-26 17:37           ` Matthieu Moy
2014-06-26 19:00             ` Junio C Hamano
2014-06-26 19:19               ` Karsten Blees
2014-06-26 21:21                 ` Junio C Hamano
2014-06-27  8:19                   ` Karsten Blees
2014-06-27  8:19               ` Matthieu Moy
2014-06-27 17:13                 ` Junio C Hamano
2014-06-23 23:14   ` Junio C Hamano
2014-06-24 12:21     ` Tanay Abhra
2014-06-26 16:27       ` Matthieu Moy
2014-06-25 21:44   ` Karsten Blees
2014-06-26 16:43   ` Matthieu Moy
2014-06-23 10:11 ` [PATCH v3 3/3] test-config: add usage examples for non-callback query functions Tanay Abhra
2014-06-25 11:19   ` Eric Sunshine
2014-06-26  8:40     ` Tanay Abhra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.